Mini Agent - We'll use Python, LangChain, and OpenAI (or compatible LLM) to build it.

1. 🏗️ Setup

In [15]:
pip install langchain openai pypdf faiss-cpu



In [16]:
!pip install --upgrade langchain



In [17]:
!pip install --upgrade langchain langchain-community



In [18]:
!pip install unstructured langchain-unstructured




In [19]:
!pip install "unstructured[all-docs]" pdfminer.six



In [20]:
import os
os.environ["OPENAI_API_KEY"] = ""      # Get from https://platform.openai.com/account/api-keys

2. 📄 Load PDF & Chunk

In [21]:
# 1. Install required packages
!pip install -qU langchain-community unstructured[all-docs] pdfminer.six

# ⚠️ After install, restart the runtime (in Colab: Runtime → Restart runtime)

# 2. Download PDF using the **actual raw URL**
import requests
import logging

PDF_URL = "https://raw.githubusercontent.com/baheldeepti/AgenticAI/main/Sample%20Business%20Report.pdf"
PDF_FILE = "Sample_Business_Report.pdf"

resp = requests.get(PDF_URL, timeout=30)
resp.raise_for_status()
with open(PDF_FILE, "wb") as f:
    f.write(resp.content)
logging.info(f"✅ PDF downloaded to '{PDF_FILE}'")

# 3. Load & split using UnstructuredPDFLoader
from langchain_community.document_loaders import UnstructuredPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

def load_and_split_pdf(file_path, chunk_size=1000, chunk_overlap=200):
    loader = UnstructuredPDFLoader(file_path, mode="elements")
    docs = list(loader.lazy_load())

    # Add metadata
    for idx, doc in enumerate(docs):
        doc.metadata.setdefault("source", file_path)
        doc.metadata["page"] = idx + 1

    logging.info(f"✅ Loaded {len(docs)} elements from '{file_path}'")

    splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
        length_function=len
    )
    chunks = splitter.split_documents(docs)
    logging.info(f"✅ Split into {len(chunks)} chunks")
    return chunks

# Execute loading and splitting
chunks = load_and_split_pdf(PDF_FILE)

# 4. Inspect a sample chunk
sample = chunks[0]
print(f"--- Chunk (page {sample.metadata['page']}) preview ---")
print(sample.page_content[:500] + "…")

--- Chunk (page 1) preview ---
Format your paper according to your assignment instructions: APA, MLA, Chicago Style…


3. 🔍 ResearchAgent: Extract Key Metrics

In [26]:
!pip install -qU sentence-transformers langchain langchain-openai langchain-huggingface faiss-cpu


[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/470.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m460.8/470.2 kB[0m [31m14.5 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m470.2/470.2 kB[0m [31m9.9 MB/s[0m eta [36m0:00:00[0m
[?25h

In [29]:
from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI  # correct import

# Local embeddings (no OpenAI API errors)
embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
db = FAISS.from_documents(chunks, embeddings)
retriever = db.as_retriever(k=4)

# Use ChatOpenAI for chat-based models
research_chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7),
    chain_type="stuff",
    retriever=retriever,
    input_key="query",
)

metrics = research_chain.run(query="What are the main metrics or numbers reported?")
print("📊 Metrics:", metrics)


  llm=ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7),


📊 Metrics: The main metrics or numbers reported in a business report can vary depending on the specific focus or purpose of the report. Common metrics that are often included in business reports are financial metrics like revenue, profit margins, expenses, and cash flow. Other common metrics can include sales figures, market share, customer satisfaction scores, employee productivity metrics, and any other key performance indicators relevant to the business being analyzed.


4. 💡 InsightAgent: Derive Observations

In [31]:
# ✅ Imports
from langchain_core.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

# 1️⃣ Define the prompt
insight_prompt = PromptTemplate.from_template(
    "Here are the metrics:\n{metrics}\nWhat are the key insights and trends you can derive?"
)

# 2️⃣ Choose an accessible chat model
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.7)

# 3️⃣ Build a RunnableSequence replacing deprecated LLMChain
chain = insight_prompt | llm | StrOutputParser()

# 4️⃣ Run the chain with your metrics string
# Replace <metrics_text> with your actual metrics result
result = chain.invoke({"metrics": metrics})

print("🔍 Insights:\n", result)


🔍 Insights:
 1. Revenue growth: By analyzing revenue figures over time, you can identify trends in the company's financial performance and determine if the business is growing or declining.

2. Profit margins: Monitoring profit margins can help you understand the efficiency of the company's operations and assess its profitability.

3. Sales figures: Tracking sales figures can provide insights into the demand for the company's products or services and help identify potential growth opportunities.

4. Market share: Comparing the company's market share to competitors can help assess its competitive position in the industry and identify areas for improvement.

5. Customer satisfaction scores: Monitoring customer satisfaction scores can help identify areas where the company is excelling and areas where it may need to make improvements to better meet customer needs.

6. Employee productivity metrics: Analyzing employee productivity metrics can help identify areas where the company may need t

5. 🎯 StrategyAgent: Recommend Next Steps

In [34]:
# 1️⃣ Insights step using ChatOpenAI
from langchain_core.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

insight_prompt = PromptTemplate.from_template(
    "Here are the metrics:\n{metrics}\nWhat are the key insights and trends you can derive?"
)
insight_chain = insight_prompt | ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.7) | StrOutputParser()

# Run insights chain and store result in insights_text
insights_text = insight_chain.invoke({ "metrics": metrics })  # make sure `metrics` exists
print("🔍 Insights:\n", insights_text)


# 2️⃣ Strategy step using the stored insights_text
strategy_prompt = PromptTemplate.from_template(
    "Insights:\n{insights}\nBased on these, suggest 3 strategic next steps the business should take."
)
strategy_chain = strategy_prompt | ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.7) | StrOutputParser()

# Run strategy chain using insights_text
strategies_text = strategy_chain.invoke({ "insights": insights_text })
print("🚀 Strategy Suggestions:\n", strategies_text)


🔍 Insights:
 Key insights and trends that can be derived from these metrics include:

1. Financial performance: By analyzing revenue, profit margins, expenses, and cash flow, businesses can determine their overall financial health and identify any areas for improvement.

2. Sales trends: Sales figures can provide insights into customer demand and market trends, helping businesses make informed decisions about their product offerings and marketing strategies.

3. Market share: Monitoring market share can help businesses understand their competitive position and track their growth relative to industry peers.

4. Customer satisfaction: Customer satisfaction scores can indicate how well a business is meeting the needs and expectations of its customers, leading to insights on areas for improvement and opportunities for growth.

5. Employee productivity: Tracking employee productivity metrics can help businesses identify opportunities to streamline operations, improve efficiency, and boost o

🧩 6. 🧠 Compose Full Pipeline and Logging

In [35]:
# 1️⃣ Install dependencies
!pip install -qU langchain-community unstructured[all-docs] pdfminer.six faiss-cpu langchain_openai

# 2️⃣ Imports and logging setup
import logging, requests
from langchain_community.document_loaders import UnstructuredPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.vectorstores import FAISS
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

logging.basicConfig(level=logging.INFO)

# 3️⃣ Loader function (download + split)
def load_and_split_pdf(pdf_url: str, chunk_size=1000, overlap=200):
    resp = requests.get(pdf_url, timeout=30)
    resp.raise_for_status()
    local_file = pdf_url.split("/")[-1]
    with open(local_file, "wb") as f:
        f.write(resp.content)
    logging.info(f"✅ Downloaded PDF to {local_file}")

    loader = UnstructuredPDFLoader(local_file, mode="elements")
    docs = list(loader.lazy_load())
    for idx, d in enumerate(docs, start=1):
        d.metadata["source"], d.metadata["page"] = local_file, idx

    splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size, chunk_overlap=overlap, length_function=len
    )
    chunks = splitter.split_documents(docs)
    logging.info(f"📄 Created {len(chunks)} chunks")
    return chunks

# 4️⃣ Prompt templates
insight_prompt = PromptTemplate.from_template(
    "Here are the metrics:\n{metrics}\nWhat are the key insights and trends you can derive?"
)
strategy_prompt = PromptTemplate.from_template(
    "Insights:\n{insights}\nBased on these, suggest 3 strategic next steps the business should take."
)

# 5️⃣ Build reusable pipelines
def research_agent(chunks, k=4):
    embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
    db = FAISS.from_documents(chunks, embeddings)
    retriever = db.as_retriever(k=k)
    qa = RetrievalQA.from_chain_type(llm=ChatOpenAI(model_name="gpt-3.5-turbo"), chain_type="stuff", retriever=retriever)
    metrics = qa.run(query="List key metrics or numbers in this report.")
    logging.info("📊 Research completed")
    return metrics

def insight_agent(metrics):
    chain = insight_prompt | ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.7) | StrOutputParser()
    insights = chain.invoke({"metrics": metrics})
    logging.info("🔍 Insight generated")
    return insights

def strategy_agent(insights):
    chain = strategy_prompt | ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.7) | StrOutputParser()
    strategies = chain.invoke({"insights": insights})
    logging.info("🚀 Strategy suggested")
    return strategies

# 6️⃣ Orchestrator
def run_pipeline(url):
    chunks = load_and_split_pdf(url)
    metrics = research_agent(chunks)
    insights = insight_agent(metrics)
    strategies = strategy_agent(insights)
    return metrics, insights, strategies

# 7️⃣ Execution block
if __name__ == "__main__":
    URL = "https://raw.githubusercontent.com/baheldeepti/AgenticAI/main/Sample%20Business%20Report.pdf"
    metrics, insights, strategies = run_pipeline(URL)

    print("\n📊 Metrics:\n", metrics)
    print("\n🔍 Insights:\n", insights)
    print("\n🚀 Strategies:\n", strategies)

    with open("all_results.txt", "w") as f:
        f.write("METRICS:\n" + metrics + "\n\n")
        f.write("INSIGHTS:\n" + insights + "\n\n")
        f.write("STRATEGIES:\n" + strategies)
    logging.info("✅ All results saved to 'all_results.txt'")



📊 Metrics:
 I don't have the specific key metrics or numbers from the report as they are not provided in the context.

🔍 Insights:
 Without specific key metrics or numbers, it is difficult to provide specific insights and trends. However, some general insights and trends that could potentially be derived from a report could include:

- Overall performance: Assessing whether performance has improved or declined over a certain period of time.
- Market trends: Identifying any shifts or changes in the market that could impact the business.
- Customer behavior: Understanding how customers are interacting with the business and any changes in their preferences.
- Operational efficiency: Evaluating how efficiently the business is operating and if there are any areas for improvement.
- Financial health: Assessing the financial stability and profitability of the business.
- Competitive landscape: Understanding how the business is positioned relative to its competitors and any emerging threats o