### Business Document Processing: Summarization, Keyword Extraction, and Sentiment Analysis

- Loads and processes diverse business documents (PDF, HTML, CSV) for analysis.
- Summarizes document content for quick business insights using GPT-3.5.
- Extracts key business-relevant keywords to aid decision-making.
- Performs sentiment analysis on customer feedback to gauge satisfaction.
- Provides efficient business document analysis through automated text processing.

In [5]:
# Read the open ai api key from your text file
f = open('C:\\Users\\Shailendra Kadre\\Desktop\\OPEN_AI_KEY.txt')
api_key = f.read()

In [11]:
import os
from langchain.document_loaders import PyPDFLoader, UnstructuredHTMLLoader, CSVLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain_openai import ChatOpenAI  # Correct import for OpenAI Chat model
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

# Load and parse documents
pdf_loader = PyPDFLoader("sample_pdf.pdf")
pdf_docs = pdf_loader.load()

html_loader = UnstructuredHTMLLoader("sample_html.html")
html_docs = html_loader.load()

csv_loader = CSVLoader("sample_csv.csv")
csv_docs = csv_loader.load()

# Text Splitting for Processing
splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50)
pdf_chunks = splitter.split_documents(pdf_docs)
html_chunks = splitter.split_documents(html_docs)
csv_chunks = splitter.split_documents(csv_docs)

# Initialize the ChatOpenAI model with API key
llm = ChatOpenAI(model="gpt-3.5-turbo", openai_api_key=api_key)

# Summarization using OpenAI (Using LLMChain directly)
summary_prompt = PromptTemplate(
    input_variables=["text"],
    template="Summarize the following business document:\n{text}"
)
summary_chain = LLMChain(llm=llm, prompt=summary_prompt)

# Using .invoke() instead of .run()
pdf_summary = summary_chain.invoke({"text": pdf_chunks[0].page_content})
html_summary = summary_chain.invoke({"text": html_chunks[0].page_content})
csv_summary = summary_chain.invoke({"text": csv_chunks[0].page_content})

print("\n--- PDF Summary ---\n", pdf_summary)
print("\n--- HTML Summary ---\n", html_summary)
print("\n--- CSV Summary ---\n", csv_summary)

# Keyword Extraction using OpenAI (Using LLMChain directly)
keyword_prompt = PromptTemplate(
    input_variables=["text"],
    template="Extract the top 5 keywords from the following document:\n{text}"
)
keyword_chain = LLMChain(llm=llm, prompt=keyword_prompt)

# Using .invoke() instead of .run()
pdf_keywords = keyword_chain.invoke({"text": pdf_chunks[0].page_content})
html_keywords = keyword_chain.invoke({"text": html_chunks[0].page_content})
csv_keywords = keyword_chain.invoke({"text": csv_chunks[0].page_content})

print("\n--- PDF Keywords ---\n", pdf_keywords)
print("\n--- HTML Keywords ---\n", html_keywords)
print("\n--- CSV Keywords ---\n", csv_keywords)

# Sentiment Analysis for Business Insights (Using LLMChain directly)
sentiment_prompt = PromptTemplate(
    input_variables=["text"],
    template="Analyze the sentiment of the following customer feedback and rate as Positive, Neutral, or Negative:\n{text}"
)
sentiment_chain = LLMChain(llm=llm, prompt=sentiment_prompt)

csv_sentiment = sentiment_chain.invoke({"text": csv_chunks[0].page_content})

# Print the sentiment analysis output
print("\n--- CSV Sentiment Analysis ---\n", csv_sentiment['text'])



--- PDF Summary ---
 {'text': 'The document discusses Lorem ipsum dolor sit amet, consectetur adipiscing elit. It mentions various business activities such as marketing, sales, and customer service. It also includes a table with rows and columns for data representation.'}

--- HTML Summary ---
 {'text': 'The document discusses various topics, such as comparing medicine and governance to wisdom, the importance of speaking in a customary manner when discussing something, and whether enduring suffering increases happiness. It also raises questions about whether jokes, secrets, and hidden truths should be shared with everyone. The document concludes with a question about whether prolonged suffering ultimately leads to greater happiness.'}

--- CSV Summary ---
 {'text': 'The document likely discusses information related to the accounting and finance industry, which may include financial reporting, taxation, auditing, and other financial services. It may also cover industry trends, regulato