<a href="https://colab.research.google.com/github/koyeliaghosh/GenAI_Agents_Competitive-Intel/blob/main/Competitive_intel_agent_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1. Install Required Libraries

In [29]:
pip install langchain faiss-cpu chromadb streamlit langgraph PyPDF2 sentence-transformers



2. Data Ingestion and Indexing

In [31]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import SentenceTransformerEmbeddings
import os

# Path to the financial documents
documents_path = r"/content/sample_data"

# Load and preprocess financial statements
financial_statements = []
for filename in os.listdir(documents_path):
    if filename.endswith(".pdf"):
        loader = PyPDFLoader(os.path.join(documents_path, filename))
        documents = loader.load()
        financial_statements.extend(documents)

# Split documents into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(financial_statements)

# Create embeddings and index
embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
index = FAISS.from_documents(texts, embeddings)

# Save the index
index.save_local('financial_statements_index')

  embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
  from tqdm.autonotebook import tqdm, trange
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

3. Vector Storage in ChromaDB


In [35]:
import chromadb
from chromadb.utils import embedding_functions

# Initialize ChromaDB client
client = chromadb.Client()

# Create or get the collection
collection_name = "financial_statements"
try:
    collection = client.create_collection(collection_name)
except chromadb.errors.UniqueConstraintError:
    collection = client.get_collection(collection_name)

# Add documents to the collection
for doc in texts:
    collection.add(
        embeddings=[embeddings.embed_query(doc.page_content)],
        metadatas=[{"source": "NVIDIA Q4'24 report"}],
        documents=[doc.page_content],
        ids=[doc.page_content[:10]]  # Use a unique ID
    )



4. Mathematical Calculation Tool

In [36]:
def calculate_growth(previous_value, current_value):
    return (current_value - previous_value) / previous_value

def calculate_variance(values):
    mean = sum(values) / len(values)
    return sum((x - mean) ** 2 for x in values) / len(values)

5. Agents and LangChain Orchestration

In [41]:
from langchain import LLMChain, PromptTemplate
from langchain.llms import Ollama
from langchain.agents import Tool, AgentExecutor, LLMSingleActionAgent, AgentOutputParser
from langchain.prompts import StringPromptTemplate
from langchain.schema import AgentAction, AgentFinish

# Define the LLM using Ollama to host LLAMA 3.1
llm = Ollama(model="llama3.1")

# Define the tools
tools = [
    Tool(
        name="RAG Agent",
        func=lambda query: index.similarity_search(query, k=4),
        description="Useful for retrieving financial statements."
    ),
    Tool(
        name="Mathematical Calculation",
        func=lambda input: str(calculate_growth(*map(float, input.split(',')))),
        description="Useful for calculating growth between two values."
    ),
    Tool(
        name="Inference Agent",
        func=lambda input: llm(input),
        description="Useful for summarizing information."
    )
]

# Define the prompt template
prompt_template = """
You are a financial analyst. Your task is to analyze financial statements and provide insights.

{input}

{agent_scratchpad}
"""

# Define the output parser
class CustomOutputParser(AgentOutputParser):
    def parse(self, llm_output: str) -> AgentAction | AgentFinish:
        if "Final Answer:" in llm_output:
            return AgentFinish(
                return_values={"output": llm_output.split("Final Answer:")[-1].strip()},
                log=llm_output,
            )
        # Parse out the action and action input
        regex = r"Action\s*\d*\s*:(.*?)\nAction\s*\d*\s*Input\s*\d*\s*:[\s]*(.*)"
        match = re.search(regex, llm_output, re.DOTALL)
        if not match:
            raise ValueError(f"Could not parse LLM output: `{llm_output}`")
        action = match.group(1).strip()
        action_input = match.group(2)
        return AgentAction(tool=action, tool_input=action_input.strip(" ").strip('"'), log=llm_output)

output_parser = CustomOutputParser()

# Define the agent
agent = LLMSingleActionAgent(
    llm_chain=LLMChain(llm=llm, prompt=PromptTemplate(template=prompt_template, input_variables=["input", "agent_scratchpad"])),
    output_parser=output_parser,
    stop=["\nObservation:"],
    allowed_tools=[tool.name for tool in tools]
)

# Define the agent executor
agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True)

  llm = Ollama(model="llama3.1")
  llm_chain=LLMChain(llm=llm, prompt=PromptTemplate(template=prompt_template, input_variables=["input", "agent_scratchpad"])),
  agent = LLMSingleActionAgent(


6. Define the Workflow


In [42]:
def execute_workflow(query):
    return agent_executor.run(query)

7. Front End with Streamlit

In [57]:
import streamlit as st

st.title("Competitive Intelligence Analysis")

query = st.text_input("Enter your query:")

if st.button("Analyze"):
    result = execute_workflow(query)
    st.write(result)



**Running the Application**

run the streamlit app

In [58]:
!streamlit run app.py

Usage: streamlit run [OPTIONS] TARGET [ARGS]...
Try 'streamlit run --help' for help.

Error: Invalid value: File does not exist: app.py


**Notes**
**OLLAMA Setup:** Ensure that OLLAMA is correctly set up to host LLAMA 3.1. You may need to configure the OLLAMA server and ensure it is accessible from your application.

**Customization:** You may need to customize the prompt templates, output parsers, and other components based on your specific use case.

**Error Handling:** Add error handling and logging as needed for production use.

This implementation uses **sentence-transformers** for embeddings and **FAISS** for vector storage. The workflow is managed using the **LangChain framework** directly, ensuring that the agents and tools interact in a structured manner. You can expand and refine the workflow as needed based on your specific requirements and data.