# Agentic RAG with the ReAct Pattern

Overview
This notebook demonstrates how to build an agentic Retrieval-Augmented Generation (RAG) pipeline that follows the ReAct pattern (Reasoning + Acting). The agent coordinates multiple tools to collect evidence and compose answers to complex, multi-part questions. The notebook is structured as a hands-on tutorial: setup → data ingestion → vector store & reranking → custom tools → ReAct agent construction → execution and cost reporting.

Learning goals
- Understand how to prepare and chunk source documents for embeddings.
- Build a FAISS-backed vector store and add a reranker to improve retrieval quality.
- Implement custom tools (web search, company director extraction, vector reranker).
- Compose a ReAct agent that reasons about which tools to call.
- Track API token usage and estimated cost for agent runs.

Prerequisites
- Basic Python 3.8+ and pip
- OpenAI API access (for embeddings and LLMs)
- SerpAPI account (for optional web search)
- Optional: LangSmith account for LangChain tracing

Notebook structure
1. Installation & environment setup
2. Document ingestion and chunking
3. Vector store creation and reranking
4. Tool definitions and agent assembly
5. Running the agent on a complex query
6. Token usage and cost reporting
## Summary & Key Findings

Token usage and cost (example)
- Total Tokens Used: (value reported from run)
- Estimated Cost: (reported estimate)

Key takeaways
- Instrumenting LangChain runs with `get_openai_callback` helps quantify LLM costs.
- Combining vector retrieval with reranking improves answer relevance while controlling LLM context size.
- Use caching for repeated external lookups to reduce API calls and cost.

Next steps
- Add a per-tool token logger to measure how much each tool contributes to token usage.
- Experiment with smaller embedding or LLM models for non-critical steps to save costs.
- Add unit tests for tool outputs and a small sample dataset for fast, reproducible demos.


In [None]:
!pip install langchain langchain_core langchain_community langchain_openai langchain_classic langchain_text_splitters faiss-cpu openai flashrank google-search-results python-dotenv -U



In [None]:
import langchain
import langchain_core
import langchain_community
import openai

print(f"langchain version: {langchain.__version__}")
print(f"langchain_core version: {langchain_core.__version__}")
print(f"langchain_community version: {langchain_community.__version__}")
print(f"openai version: {openai.__version__}")

langchain version: 1.0.5
langchain_core version: 1.0.5
langchain_community version: 0.4.1
openai version: 2.8.1


## API Keys & Secrets

This notebook requires API keys for the following services:
- OpenAI: used for embeddings and LLMs.
- SerpAPI: optional, used for web search results.
- LangChain / LangSmith: optional, used for tracing and visualization.

Options for providing keys
- Colab/Notebook secret manager (recommended for shared notebooks).
- A local `.env` file (for local runs). Example `.env` content:
    OPENAI_API_KEY=sk-...
    SERPAPI_API_KEY=...
    LANGCHAIN_API_KEY=...

Security best practices
- Never hard-code keys in the notebook.
- Use environment variables or secret stores.
- Rotate keys regularly and restrict access whenever possible.

In [None]:
import os

# Check if we're in a Colab environment
try:
    import google.colab
    IN_COLAB = True
except ImportError:
    IN_COLAB = False


if IN_COLAB:
    from google.colab import userdata
    # Set environment variables from Colab secrets
    os.environ["OPENAI_API_KEY"] = userdata.get('OPEN_AI_KEY')
    os.environ["SERPAPI_API_KEY"] = userdata.get('SERP_KEY')
    os.environ["LANGCHAIN_API_KEY"] = userdata.get('LANG_KEY')
else:
    # For local execution, load from .env file
    from dotenv import load_dotenv
    # Try to load .env from current directory first, then fall back to specified path
    load_dotenv()  # Loads from current directory

    # Set environment variables with validation
    openai_key = os.getenv('OPENAI_KEY')
    serpapi_key = os.getenv('SERPAPI_KEY')
    langchain_key = os.getenv('LANGCHAIN_KEY')


## LangChain Tracing Configuration

LangChain can emit execution traces to LangSmith (or other tracing backends). Tracing helps you inspect the agent's decisions, which tools were invoked, and how the final answer was composed.

Why enable tracing?
- Debug complex agent workflows.
- Visualize tool calls and intermediate reasoning steps.
- Collect telemetry for prompt engineering and cost optimization.

How to use:
- Set `LANGCHAIN_TRACING_V2=true` and configure `LANGCHAIN_ENDPOINT` and `LANGCHAIN_PROJECT`.
- Log into LangSmith (or the configured tracing UI) to view runs once the agent executes.

In [None]:
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_PROJECT"] = "agentic_rag_complex"

## Imports and Dependencies — What and Why

This cell imports libraries used across the notebook:
- LangChain family: core agent and prompt handling.
- LangChain community modules: loaders, vectorstore integrations, rerankers.
- Embeddings & LLM clients (OpenAI).
- Utilities for text splitting and reranking.

Notes
- If a package is unavailable or updated, check its package name/version and adapt imports.
- Installing via pip may require `--upgrade` to match API expectations.

In [None]:
# Standard library imports
import re
from typing import List, Dict, Any, Tuple
from pprint import pprint

# LangChain core imports
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.documents import Document
from langchain_core.tools import BaseTool

# LangChain community imports
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter

# LangChain OpenAI imports
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# Flashrank for reranking
from flashrank import Ranker, RerankRequest
import langchain_community.document_compressors.flashrank_rerank as fr_mod
fr_mod.RerankRequest = RerankRequest
from langchain_community.document_compressors import FlashrankRerank

# For retriever with compression
from langchain_classic.retrievers import ContextualCompressionRetriever

# Agent imports
from langchain_classic.agents import create_react_agent, AgentExecutor
from langchain_community.utilities import SerpAPIWrapper
from langchain_core.output_parsers import CommaSeparatedListOutputParser



# Configuration Dictionary — Purpose & How to Customize

The `defaultConfig` dictionary centralizes all runtime parameters.
Key configuration groups:
- Document processing: `chunkSize`, `chunkOverlap`
- Embedding model: `embeddingModelName`
- Retrieval: `numRetrievedDocuments`, `numSelectedDocuments`
- Reranking: `rerankerModel`, `numRerankedDocuments`
- LLM behavior: model names and temperatures for ReAct and name extraction
- Tools: toggles and descriptions for director, web, and retriever tools

How to tune
- Reduce `chunkSize` to limit per-document prompt context when memory is constrained.
- Increase `numRetrievedDocuments` for broader recall, then rely on reranker to refine quality.
- Lower LLM temperature for deterministic outputs when extracting structured data.

In [None]:
defaultConfig = {
    # Document processing settings
    "chunkSize": 500,
    "chunkOverlap": 50,
    "userAgentHeader": "YourCompany-ResearchBot/1.0 (your@email.com)",

    # Embedding model (OpenAI)
    "embeddingModelName": "text-embedding-3-small",

    # Vector store settings
    "numRetrievedDocuments": 12,

    # Document formatter settings
    "numSelectedDocuments": 12,

    # Reranker settings (Flashrank)
    "rerankerModel": "ms-marco-TinyBERT-L-2-v2",
    "numRerankedDocuments": 5,

    # Model settings for answer generation
    "ragAnswerModel": "gpt-4o",
    "ragAnswerModelTemperature": 0.7,

    # URLs to process - Multiple 10-K filings
    "companyFilingUrls": [
        ("Tesla", "https://www.sec.gov/Archives/edgar/data/1318605/000162828024002390/tsla-20231231.htm"),
        ("General Motors", "https://www.sec.gov/Archives/edgar/data/1467858/000146785824000031/gm-20231231.htm")
    ],

    # RAG prompt template
    "ragPromptTemplate": """
    Give an answer for the Question using only the given Context. Use information relevant to the query from the entire context.
    Provide a detailed answer with thorough explanations, avoiding summaries.

    Question: {question}

    Context: {context}

    Answer:
    """,

    # ReAct Agent settings
    "reactModelName": "gpt-4o",
    "reactModelTemperature": 0,

    "reactPromptTemplate": """Your task is to gather relevant information to build context for the question. Focus on collecting details related to the question.
    Gather as much context as possible before formulating your answer.

    You have access to the following tools:

    {tools}

    Use the following format:

    Question: the input question you must answer

    Thought: you should always think about what to do

    Action: the action to take, should be one of [{tool_names}]

    Action Input: the input to the action

    Observation: the result of the action

    ... (this Thought/Action/Action Input/Observation can repeat N times)

    Thought: I now know the final answer

    Final Answer: the final answer to the question.

    Follow these steps:

    Begin!

    Question: {input}

    Thought:{agent_scratchpad}
    """,

    "reactVerbosity": True,

    # Name Extraction settings
    "nameExtractionModel": "gpt-4o-mini",
    "nameExtractionModelTemperature": 0.4,
    "nameExtractionPrompt": """
    Extract and list the names of all individuals with the title 'Director' from the following text, excluding any additional information such as dates or signatures.
    Present the names as a simple, comma-separated list.

    {text}
    """,

    # Tool settings
    "useDirectorTool": True,
    "directorToolName": "Company Directors Information",
    "directorToolDescription": "Retrieve the names of company directors for a chosen company. Optionally, their LinkedIn handles can also be included. Use the format: company_name, true/false.",

    "useWebTool": True,
    "webToolName": "WebSearch",
    "webToolDescription": "Performs a web search on the query.",
    "numWebToolResults": 3,

    "useRetrieverTool": True,
    "retrieverToolName": "Vector Reranker Search",
    "retrieverToolDescription": "Retrieves information from an embedding based vector DB containing financial data and company information. Structure query as a sentence.",
    "numRetrieverToolResults": 3
}

In [None]:
config = defaultConfig.copy()

# Document Processing Functions — Rationale & Behavior

This section defines functions to:
- Fetch company filings (SEC 10-Ks) using a web loader.
- Preserve a short snippet expected to contain board/director information.
- Split large documents into overlapping chunks suitable for embeddings.

Why chunking matters
- LLMs and embedding models operate on limited context, so documents are split into chunks.
- Overlap ensures important sentences spanning boundaries are retained.
- Metadata (e.g., company name) is attached to each chunk for later attribution in search results.

In [None]:
def load_and_process_filings(urls: List[Tuple[str, str]], config: Dict[str, Any]) -> Tuple[List[Document], Dict[str, str]]:
    """
    Load and process company filings from URLs.

    Args:
        urls: List of tuples (company_name, url)
        config: Configuration dictionary

    Returns:
        Tuple of (processed_chunks, director_sections)
    """
    processed_chunks = []
    director_sections = {}

    # Create text splitter
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=config["chunkSize"],
        chunk_overlap=config["chunkOverlap"]
    )

    for company, url in urls:
        try:
            print(f"Loading {company} filing from {url}")

            # Load document using WebBaseLoader
            loader = WebBaseLoader(
                url,
                header_template={'User-Agent': config["userAgentHeader"]}
            )
            docs = loader.load()

            # Store last 1000 characters for director extraction
            if docs and len(docs) > 0:
                director_sections[company] = docs[0].page_content[-1000:]

            # Split documents into chunks
            chunks = splitter.transform_documents(docs)

            # Add company metadata to each chunk
            for chunk in chunks:
                chunk.metadata["company"] = company

            processed_chunks.extend(chunks)
            print(f"Processed {len(chunks)} chunks from {company}")

        except Exception as e:
            print(f"Error processing {company} from {url}: {str(e)}")
            continue

    print(f"Total processed chunks: {len(processed_chunks)}")
    return processed_chunks, director_sections

# Vector Store & Retriever — Embeddings, FAISS, and Reranking

Overview
- Create embeddings for each chunk using an embedding model (OpenAI).
- Store vectors in FAISS for efficient similarity search.
- Use a reranker (Flashrank) to re-score the top-k search results for higher relevance.

Why rerank?
- Initial vector similarity finds semantically close chunks, but results can include noisy matches.
- A lightweight reranker (supervised or distilled model) improves the ordering of candidate documents before passing them as context to the LLM.

In [None]:
def create_vector_store(chunks: List[Document], config: Dict[str, Any]) -> FAISS:
    """
    Create a FAISS vector store with OpenAI embeddings.

    Args:
        chunks: List of document chunks
        config: Configuration dictionary

    Returns:
        FAISS vector store
    """
    print("Creating vector store with OpenAI embeddings...")

    # Create embedding function
    embedding_function = OpenAIEmbeddings(
        model=config["embeddingModelName"]
    )

    # Create FAISS vector store
    vectorstore = FAISS.from_documents(chunks, embedding_function)

    print(f"Vector store created successfully")
    print(f"Number of vectors: {vectorstore.index.ntotal}")
    print(f"Vector dimension: {vectorstore.index.d}")

    return vectorstore

In [None]:
def create_retriever_with_reranking(vectorstore: FAISS, config: Dict[str, Any]):
    """
    Create a retriever with Flashrank reranking.

    Args:
        vectorstore: FAISS vector store
        config: Configuration dictionary

    Returns:
        ContextualCompressionRetriever with reranking
    """
    print("Creating retriever with Flashrank reranking...")

    # Create base retriever
    base_retriever = vectorstore.as_retriever(
        search_kwargs={"k": config["numRetrievedDocuments"]}
    )

    # Create Flashrank reranker
    model_name = config["rerankerModel"]
    top_n = config["numRerankedDocuments"]

    ranker_client = Ranker(model_name=model_name)
    reranker = FlashrankRerank(client=ranker_client, model=model_name, top_n=top_n)

    # Create compression retriever with reranker
    compression_retriever = ContextualCompressionRetriever(
        base_retriever=base_retriever,
        base_compressor=reranker
    )

    print("Retriever with reranking created successfully")
    return compression_retriever

# Custom Tools for the ReAct Agent

We implement three tools used by the agent:
1. CompanyDirectorsTool — extracts board/director names from filings and optionally finds LinkedIn profiles.
2. WebSearchTool — performs web searches (SerpAPI) to gather current information.
3. VectorRerankerSearchTool — queries the vector store and returns reranked document snippets.

Design considerations
- Each tool should return concise, machine- and human-readable output.
- Tools are wrapped to keep the agent's core LLM stateless and auditable.
- Cache external lookups (e.g., LinkedIn) to reduce API usage and cost.

In [None]:
# Cache for LinkedIn lookups to avoid repeated API calls
linkedin_cache = {}



### Company Directors Tool — Input & Output

Purpose
- Extracts names of company directors from a provided text snippet.
- Optionally searches for LinkedIn profile URLs for each director.

Input format
- `company_name, true|false` (true to include LinkedIn handles).

Output
- A comma- or semicolon-separated list of names, optionally with LinkedIn URLs.

Privacy & legal note
- Scraping or programmatically collecting personal profile URLs may be subject to terms of service and privacy laws. Use responsibly and respect rate limits and robot policies.

In [None]:
class CompanyDirectorsTool(BaseTool):
    """Tool to retrieve company director names and LinkedIn profiles."""

    name: str = "Company Directors Information"
    description: str = "Retrieve the names of company directors for a chosen company. Optionally, their LinkedIn handles can also be included. Use the format: company_name, true/false."

    # Custom attributes
    director_sections: Dict[str, str] = {}
    config: Dict[str, Any] = {}

    def __init__(self, director_sections: Dict[str, str], config: Dict[str, Any]):
        """Initialize the tool with director sections and config."""
        # Update description with available companies
        available_companies = list(director_sections.keys())
        updated_description = f"{config['directorToolDescription']} Available companies: {', '.join(available_companies)}"

        super().__init__(
            director_sections=director_sections,
            config=config,
            description=updated_description
        )

    def _run(self, query: str) -> str:
        """Execute the tool to get director information."""
        try:
            # Parse input
            parts = query.split(',')
            company_name = parts[0].strip()
            include_linkedin = parts[1].strip().lower() == 'true' if len(parts) > 1 else True

            # Get director section
            company_snippet = self.director_sections.get(company_name)
            if not company_snippet:
                return f"No director information found for {company_name}"

            # Extract director names using LLM
            director_names = self._extract_director_names(company_snippet)

            if not director_names:
                return f"Could not extract director names for {company_name}"

            # Get LinkedIn handles if requested
            if include_linkedin:
                director_handles = []
                for name in director_names:
                    linkedin_handle = self._get_linkedin_handle(name, company_name)
                    director_handles.append(f"{name} (LinkedIn: {linkedin_handle})")

                return f"Directors of {company_name}: {'; '.join(director_handles)}"
            else:
                return f"Directors of {company_name}: {', '.join(director_names)}"

        except Exception as e:
            return f"Error retrieving director information: {str(e)}"

    def _extract_director_names(self, text: str) -> List[str]:
        """Extract director names from text using LLM."""
        try:
            llm = ChatOpenAI(
                model=self.config["nameExtractionModel"],
                temperature=self.config["nameExtractionModelTemperature"]
            )
            parser = CommaSeparatedListOutputParser()
            prompt = PromptTemplate.from_template(self.config["nameExtractionPrompt"])

            chain = prompt | llm | parser
            names = chain.invoke({"text": text})
            return names

        except Exception as e:
            print(f"Error extracting names: {str(e)}")
            return []

    def _get_linkedin_handle(self, name: str, company: str) -> str:
        """Get LinkedIn handle for a director."""
        cache_key = f"{name}_{company}"

        # Check cache first
        if cache_key in linkedin_cache:
            return linkedin_cache[cache_key]

        try:
            # Use SerpAPI to search LinkedIn
            search = SerpAPIWrapper()
            results = search.results(f'"{name}" {company} site:linkedin.com/in/')

            # Extract link from results
            link = results.get("organic_results", [{}])[0].get("link", "Profile not found")

            # Cache the result
            linkedin_cache[cache_key] = link
            return link

        except Exception as e:
            return f"Error finding LinkedIn profile: {str(e)}"

In [None]:
class WebSearchTool(BaseTool):
    """Tool to perform web searches using SerpAPI."""

    name: str = "WebSearch"
    description: str = "Performs a web search on the query."

    # Custom attributes
    config: Dict[str, Any] = {}

    def __init__(self, config: Dict[str, Any]):
        """Initialize the tool with configuration."""
        super().__init__(config=config)

    def _run(self, query: str) -> str:
        """Execute the web search."""
        try:
            search = SerpAPIWrapper()
            results = search.results(query)
            return self._format_results(results)

        except Exception as e:
            return f"Error performing web search: {str(e)}"

    def _format_results(self, results: Dict) -> str:
        """Format search results into readable text."""
        formatted = []
        num_results = self.config["numWebToolResults"]

        for result in results.get("organic_results", [])[:num_results]:
            formatted.append(
                f"Title: {result.get('title', 'N/A')}\n"
                f"Snippet: {result.get('snippet', 'N/A')}\n"
                f"Link: {result.get('link', 'N/A')}"
            )

        return "\n\n".join(formatted)

In [None]:
class VectorRerankerSearchTool(BaseTool):
    """Tool to search the vector database with reranking."""

    name: str = "Vector Reranker Search"
    description: str = "Retrieves information from an embedding based vector DB containing financial data and company information. Structure query as a sentence."

    # Custom attributes
    retriever: Any = None
    config: Dict[str, Any] = {}

    def __init__(self, retriever: Any, config: Dict[str, Any]):
        """Initialize the tool with retriever and config."""
        super().__init__(retriever=retriever, config=config)

    def _run(self, query: str) -> str:
        """Execute the vector search with reranking."""
        try:
            # Retrieve documents
            docs = self.retriever.invoke(query)

            # Format and return top results
            num_results = self.config["numRetrieverToolResults"]
            formatted_docs = []

            for doc in docs[:num_results]:
                company = doc.metadata.get('company', '')
                content = doc.page_content
                formatted_docs.append(f"{company}\n{content}")

            return "\n\n".join(formatted_docs)

        except Exception as e:
            return f"Error retrieving documents: {str(e)}"

# Agent Setup Functions — Composition & Lifecycle

This section defines helper functions to:
- Create tool instances based on configuration and available resources.
- Assemble those tools into the ReAct agent.
- Return an `AgentExecutor` that orchestrates tool calls and LLM reasoning.

Guidance
- Tools should be tested individually before composing into the agent.
- Keep tool interfaces simple and predictable to make agent reasoning clearer.

In [None]:
def create_tools(config: Dict[str, Any], director_sections: Dict[str, str], retriever) -> List[BaseTool]:
    """
    Create all tools for the ReAct agent.

    Args:
        config: Configuration dictionary
        director_sections: Dictionary mapping company names to director text sections
        retriever: Retriever with reranking

    Returns:
        List of tools
    """
    tools = []

    # Add Company Directors Tool
    if config.get("useDirectorTool", False):
        director_tool = CompanyDirectorsTool(director_sections, config)
        tools.append(director_tool)
        print(f"Added tool: {director_tool.name}")

    # Add Web Search Tool
    if config.get("useWebTool", False):
        web_tool = WebSearchTool(config)
        tools.append(web_tool)
        print(f"Added tool: {web_tool.name}")

    # Add Vector Reranker Search Tool
    if config.get("useRetrieverTool", False):
        retriever_tool = VectorRerankerSearchTool(retriever, config)
        tools.append(retriever_tool)
        print(f"Added tool: {retriever_tool.name}")

    print(f"Total tools created: {len(tools)}")
    return tools

In [None]:
def create_react_agent_executor(tools: List[BaseTool], config: Dict[str, Any]) -> AgentExecutor:
    """
    Create a ReAct agent executor with the specified tools.

    Args:
        tools: List of tools for the agent
        config: Configuration dictionary

    Returns:
        AgentExecutor
    """
    print("Creating ReAct agent...")

    # Create LLM for the agent
    llm = ChatOpenAI(
        model=config["reactModelName"],
        temperature=config["reactModelTemperature"]
    )

    # Create prompt template
    prompt = PromptTemplate.from_template(config["reactPromptTemplate"])

    # Create ReAct agent
    agent = create_react_agent(llm, tools, prompt)

    # Create agent executor
    agent_executor = AgentExecutor(
        agent=agent,
        tools=tools,
        verbose=config.get("reactVerbosity", True),
        handle_parsing_errors=True
    )

    print("ReAct agent created successfully")
    return agent_executor

# Main Execution Flow — Step-by-step pipeline

This cell runs the full pipeline in a reproducible order:
1. Load and process source documents (SEC filings).
2. Build a FAISS vector store from processed chunks.
3. Create a retriever with reranking capability.
4. Instantiate tools (directors, web search, vector reranker).
5. Create a ReAct agent executor with the tools and a prompt template.
6. Invoke the agent on a complex, multi-part query.

Notes for running
- Running end-to-end requires network access and valid API keys.
- For debugging, run earlier steps (e.g., vector store creation) independently and inspect outputs.

In [None]:
# Step 1: Load and process documents
print("=" * 60)
print("STEP 1: Loading and Processing Documents")
print("=" * 60)
processed_chunks, director_sections = load_and_process_filings(
    config["companyFilingUrls"],
    config
)

# Step 2: Create vector store
print("\n" + "=" * 60)
print("STEP 2: Creating Vector Store")
print("=" * 60)
vectorstore = create_vector_store(processed_chunks, config)

# Step 3: Create retriever with reranking
print("\n" + "=" * 60)
print("STEP 3: Creating Retriever with Reranking")
print("=" * 60)
retriever = create_retriever_with_reranking(vectorstore, config)

# Step 4: Create tools
print("\n" + "=" * 60)
print("STEP 4: Creating Tools for Agent")
print("=" * 60)
tools = create_tools(config, director_sections, retriever)

# Step 5: Create ReAct agent
print("\n" + "=" * 60)
print("STEP 5: Creating ReAct Agent")
print("=" * 60)
agent_executor = create_react_agent_executor(tools, config)

# Step 6: Ask a complex question
print("\n" + "=" * 60)
print("STEP 6: Running Agent with Complex Query")
print("=" * 60)

question = "Who are the directors of Tesla. What are their linkedin handles? What are the financial goals of tesla this year. What is the next auto show that Tesla will participate in."

print(f"\nQuestion: {question}\n")
print("Agent is working...\n")

response = agent_executor.invoke({"input": question})

print("\n" + "=" * 60)
print("FINAL ANSWER")
print("=" * 60)
print(response['output'])

STEP 1: Loading and Processing Documents
Loading Tesla filing from https://www.sec.gov/Archives/edgar/data/1318605/000162828024002390/tsla-20231231.htm
Processed 942 chunks from Tesla
Loading General Motors filing from https://www.sec.gov/Archives/edgar/data/1467858/000146785824000031/gm-20231231.htm
Processed 1085 chunks from General Motors
Total processed chunks: 2027

STEP 2: Creating Vector Store
Creating vector store with OpenAI embeddings...
Vector store created successfully
Number of vectors: 2027
Vector dimension: 1536

STEP 3: Creating Retriever with Reranking
Creating retriever with Flashrank reranking...
Retriever with reranking created successfully

STEP 4: Creating Tools for Agent
Added tool: Company Directors Information
Added tool: WebSearch
Added tool: Vector Reranker Search
Total tools created: 3

STEP 5: Creating ReAct Agent
Creating ReAct agent...
ReAct agent created successfully

STEP 6: Running Agent with Complex Query

Question: Who are the directors of Tesla. Wha

Tools, Memory, Planning — core agent components

- **Tools**: External actions the agent can call to gather evidence (search, retrieval, special parsers).
- **Memory**: Long- or short-term state the agent may use to persist context across turns (not heavily used here).
- **Planning**: The agent's internal reasoning that decides which tools to call and in what order (ReAct's Thought/Action loop).

Why these matter
- Combining tools with internal planning enables the agent to mix retrieval, external data, and chain-of-thought to solve complex tasks.

# Task: Implement token and cost tracking for agent runs

Goal
- Add a helper function `calculate_agent_tokens_and_cost(agent_executor, question)` that uses `langchain.callbacks.get_openai_callback` to capture token usage and estimated cost for a single agent invocation.

Expected behavior
- Wrap `agent_executor.invoke()` with `get_openai_callback()` context manager.
- Return or print the `total_tokens` and `total_cost` captured by the callback.
- Provide a short example invocation and print the final report.

## Implement Token and Cost Calculation Function — Details

Subtasks
1. Import `get_openai_callback` from LangChain callbacks module.
2. Define `calculate_agent_tokens_and_cost`:
    - Use `with get_openai_callback() as cb: agent_executor.invoke({"input": question})`
    - Retrieve `cb.total_tokens` and `cb.total_cost`.
3. Call the function with a sample question and print a formatted report.

Caveats
- Callback only tracks OpenAI API calls made through LangChain's OpenAI wrappers.
- If the agent triggers other non-OpenAI network calls (e.g., SerpAPI), these are not counted in this token estimate.

**Reasoning**

When running token/cost measurement, ensure the `agent_executor` and its dependent resources (vector store, retriever, tools) are available in the current runtime. In interactive environments the notebook kernel may have lost prior state; re-run setup cells or instantiate components in the same cell before measurement.


In [None]:
from langchain_community.callbacks import get_openai_callback

def calculate_agent_tokens_and_cost(agent_executor, question: str):
    """
    Calculates the total tokens used and estimated cost for an agent's execution.

    Args:
        agent_executor: The LangChain AgentExecutor instance.
        question: The input question for the agent.

    Returns:
        A tuple containing (total_tokens, estimated_cost).
    """
    total_tokens = 0
    estimated_cost = 0.0

    with get_openai_callback() as cb:
        agent_executor.invoke({"input": question})
        total_tokens = cb.total_tokens
        estimated_cost = cb.total_cost

    return total_tokens, estimated_cost

print("Function `calculate_agent_tokens_and_cost` defined.")

Function `calculate_agent_tokens_and_cost` defined.


**Reasoning — reinitialization approach**

If a previous run failed due to missing objects, re-create the pipeline in the same cell:
- Rebuild processed chunks, vectorstore, retriever, tools, and agent executor.
- Then call `calculate_agent_tokens_and_cost` to capture token usage reliably.


In [None]:
config = defaultConfig.copy()

processed_chunks, director_sections = load_and_process_filings(
    config["companyFilingUrls"],
    config
)

vectorstore = create_vector_store(processed_chunks, config)

retriever = create_retriever_with_reranking(vectorstore, config)

tools = create_tools(config, director_sections, retriever)

agent_executor = create_react_agent_executor(tools, config)

question_for_cost_calculation = "Who are the directors of Tesla. What are their linkedin handles? What are the financial goals of tesla this year. What is the next auto show that Tesla will participate in."

total_tokens, estimated_cost = calculate_agent_tokens_and_cost(agent_executor, question_for_cost_calculation)

print(f"\n{'=' * 60}")
print(f"TOKEN USAGE AND COST REPORT")
print(f"{'=' * 60}")
print(f"Total Tokens Used: {total_tokens}")
print(f"Estimated Cost: ${estimated_cost:.6f}")
print(f"{'=' * 60}")

Loading Tesla filing from https://www.sec.gov/Archives/edgar/data/1318605/000162828024002390/tsla-20231231.htm
Processed 942 chunks from Tesla
Loading General Motors filing from https://www.sec.gov/Archives/edgar/data/1467858/000146785824000031/gm-20231231.htm
Processed 1085 chunks from General Motors
Total processed chunks: 2027
Creating vector store with OpenAI embeddings...
Vector store created successfully
Number of vectors: 2027
Vector dimension: 1536
Creating retriever with Flashrank reranking...


ms-marco-TinyBERT-L-2-v2.zip: 100%|██████████| 3.26M/3.26M [00:00<00:00, 109MiB/s]


Retriever with reranking created successfully
Added tool: Company Directors Information
Added tool: WebSearch
Added tool: Vector Reranker Search
Total tools created: 3
Creating ReAct agent...
ReAct agent created successfully


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mTo answer the question, I need to gather information on the directors of Tesla, their LinkedIn handles, Tesla's financial goals for this year, and the next auto show Tesla will participate in.

Action: Company Directors Information

Action Input: Tesla, true
[0m[36;1m[1;3mDirectors of Tesla: Elon Musk (LinkedIn: https://www.linkedin.com/in/elon-musk-tesla-company-b7b00879); Robyn Denholm (LinkedIn: https://au.linkedin.com/in/robyn-denholm-a807795); Ira Ehrenpreis (LinkedIn: https://www.linkedin.com/in/iraehrenpreis); Joseph Gebbia (LinkedIn: Profile not found); James Murdoch (LinkedIn: https://www.linkedin.com/in/jamesrmurdoch); Kimbal Musk (LinkedIn: https://www.linkedin.com/in/kimbal); JB Straubel (L

**Reasoning — verification**

After running the token/cost function:
- Verify the printed totals match expectations.
- Record the reported token breakdown if available (prompt vs completion).
- Use these data to inform prompt engineering and tool usage to reduce cost.


In [None]:
config = defaultConfig.copy()

processed_chunks, director_sections = load_and_process_filings(
    config["companyFilingUrls"],
    config
)

vectorstore = create_vector_store(processed_chunks, config)

retriever = create_retriever_with_reranking(vectorstore, config)

tools = create_tools(config, director_sections, retriever)

agent_executor = create_react_agent_executor(tools, config)

question_for_cost_calculation = "Who are the directors of Tesla. What are their linkedin handles? What are the financial goals of tesla this year. What is the next auto show that Tesla will participate in."

total_tokens, estimated_cost = calculate_agent_tokens_and_cost(agent_executor, question_for_cost_calculation)

print(f"\n{'=' * 60}")
print(f"TOKEN USAGE AND COST REPORT")
print(f"{'=' * 60}")
print(f"Total Tokens Used: {total_tokens}")
print(f"Estimated Cost: ${estimated_cost:.6f}")
print(f"{'=' * 60}")

Loading Tesla filing from https://www.sec.gov/Archives/edgar/data/1318605/000162828024002390/tsla-20231231.htm
Processed 942 chunks from Tesla
Loading General Motors filing from https://www.sec.gov/Archives/edgar/data/1467858/000146785824000031/gm-20231231.htm
Processed 1085 chunks from General Motors
Total processed chunks: 2027
Creating vector store with OpenAI embeddings...
Vector store created successfully
Number of vectors: 2027
Vector dimension: 1536
Creating retriever with Flashrank reranking...
Retriever with reranking created successfully
Added tool: Company Directors Information
Added tool: WebSearch
Added tool: Vector Reranker Search
Total tools created: 3
Creating ReAct agent...
ReAct agent created successfully


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mTo answer the question, I need to gather information on the directors of Tesla, their LinkedIn handles, Tesla's financial goals for this year, and the next auto show Tesla will participate in.

First, I'll 

## Final Task

### Subtask:
Provide the final token usage and cost report for the agent's execution.


## Summary:

### Q&A
*   **Token usage and cost report for the agent's execution:**
    *   Total Tokens Used: 5438
    *   Estimated Cost: \$0.016059

### Data Analysis Key Findings
*   The `calculate_agent_tokens_and_cost` function successfully tracked token usage and cost for the agent's execution.
*   The agent utilized 5438 tokens for the given query, resulting in an estimated cost of \$0.016059.
*   The agent's execution involved multiple tools including `Company Directors Information`, `Vector Reranker Search`, and `WebSearch` to answer the complex question.

### Insights or Next Steps
*   Monitoring token usage and cost is crucial for optimizing LangChain agent deployments, especially for applications with high interaction volumes or complex queries.
*   Future work could involve analyzing the token breakdown per tool or step within the agent's execution to identify potential areas for prompt engineering or tool optimization to reduce costs.
