<h2>Importing Required Libraries</h2>

In [1]:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.schema import SystemMessage, HumanMessage
from langchain.tools import Tool
from typing import List, Optional
from sklearn.metrics.pairwise import cosine_similarity
from langchain_community.tools import DuckDuckGoSearchRun
import numpy as np
from datetime import datetime
from sentence_transformers import SentenceTransformer
from langchain.agents import initialize_agent, AgentType
from langchain.document_loaders import UnstructuredURLLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceHubEmbeddings
from langchain.vectorstores import FAISS
import os



In [2]:
os.environ["GOOGLE_API_KEY"] = "AIzaSyAv-TyOOKV05TfD4B7PnD-IJur4_-PTJyA"
os.environ["HUGGINGFACE_API_KEY"] = "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"

<h2>Data to test email search</h2>

In [3]:
# we can use Gmail Api to obtain these emails

emails = [
    {'timestamp': '2025-02-21 09:15:32', 'sender': 'rahul@company.com', 'content': 'Pull Request #10 has been raised for the latest bug fix on validation error for react forms. Please review and approve.'},
    {'timestamp': '2025-02-21 10:05:14', 'sender': 'rajesh@company.com', 'content': 'There is an issue with the API response time. I have raised ticket #5678 for investigation.'},
    {'timestamp': '2025-02-20 11:32:45', 'sender': 'ram@company.com', 'content': 'Assigning task #11 to Rahul for UI improvements. Deadline is Friday.'},
    {'timestamp': '2025-02-20 12:47:21', 'sender': 'raghu@company.com', 'content': 'Pull Request #12 is ready for review. This includes performance optimizations.'},
    {'timestamp': '2025-02-20 14:03:09', 'sender': 'raghav@company.com', 'content': 'Created a new task #13 for backend refactoring. Let me know who can take it.'},
    {'timestamp': '2025-02-20 15:25:34', 'sender': 'raju@company.com', 'content': 'The security vulnerability reported last week has been patched. Pull Request(PR) #14 raised.'},
    {'timestamp': '2025-02-20 16:12:50', 'sender': 'ramu@company.com', 'content': 'The deployment pipeline failed due to a misconfiguration. Issue #100 has been logged.'},
    {'timestamp': '2025-02-19 17:44:12', 'sender': 'roshan@company.com', 'content': 'Assigning ticket #15 to the DevOps team for infrastructure scaling.'},
    {'timestamp': '2025-02-19 18:56:40', 'sender': 'rohit@company.com', 'content': 'Pull Request #16 contains the latest API enhancements. Please review before EOD.'},
    {'timestamp': '2025-02-19 19:30:00', 'sender': 'rao@company.com', 'content': 'Raised issue #101 regarding broken openai api. Need urgent fix.'},
    {'timestamp': '2025-02-18 19:30:15', 'sender': 'ramesh@company.com', 'content': 'Get to know obstacle detection in autonomous cars, we need to gain a deeper understanding of the concept in order to implement it in upcoming project'}
]

<h2>Email Search Tool</h2>

<p>At least one argument (sender, email_content, or timestamp) is required for the search.</p>
<p>The function integrated three different methods to search emails from the list:</p>
<ol>
    <li> If the sender parameter is provided, the function looks for an exact match in the email list.</li>
    <li>If email_content is provided, the function uses semantic similarity with vector embeddings to find relevant emails.</li>
    <li>f timestamp is provided, the function performs a linear search to find emails matching the given timestamp.</li>
</ol>

In [4]:

def email_search(
    email_content: Optional[str] = None,
    sender: Optional[str] = None,
    timestamp: Optional[str] = None,
    similarity_threshold: float = 0.25  # After experimenting 0.3 threshold worked well
) -> List[dict]:
    """
    Searches through emails based on content similarity, sender, or timestamp.
    """
    if not any([email_content, sender, timestamp]):
        raise ValueError("At least one parameter (email_content, sender, or timestamp) must be provided.") #to make sure atlease one parameter is present
    
    results = emails  

    # IF snder argument is given 
    
    if sender:
        sender = sender.strip().lower()
        results = [email for email in emails if email['sender'].strip().lower() == sender]

    #if timestamp argument is given
    
    if timestamp:
        try:
            target_timestamp = datetime.strptime(timestamp, "%Y-%m-%d %H:%M:%S")
            results = [
                email for email in results
                if datetime.strptime(email['timestamp'], "%Y-%m-%d %H:%M:%S") == target_timestamp
            ]
        except ValueError:
            raise ValueError("Timestamp must be in the format 'YYYY-MM-DD HH:MM:SS'.")

    #If email content argument is given
    
    if email_content and results:
        model = SentenceTransformer('all-MiniLM-L6-v2')
        query_embedding = model.encode(email_content, convert_to_numpy=True)
        email_embeddings = model.encode([email['content'] for email in results], convert_to_numpy=True)
        
        # cosine similarity
        similarities = cosine_similarity(query_embedding.reshape(1, -1), email_embeddings).flatten()
        
        results = [email for email, sim in zip(results, similarities) if sim >= similarity_threshold]
    
    return results

<h2>Web Search Tool</h2>

<p>Tried out duckduckgo search and google serper</p>

In [5]:
def web_search(query: str) -> str:
    """
    Executes a search query on the internet to retrieve relevant web pages, articles, or resources.
    """
    search = DuckDuckGoSearchRun()
    result = search.run(query)
    return result

<h2>Company Website Search Tool</h2>

<p>Implemented using RAG-based retrieval pipeline by fetching relevant documents from a website and using semantic search to find the best-matching results and provided them to the llm</p>
<ul>
    <li>Used FAISS for embedding similarity</li>
    <li>Used Huggingface vector embeddings</li>
    <li>Used unstructuredURLLoader to obtain information from website</li>
</ul>


In [6]:
def company_website_search(query: str) -> str:
    """
    Searches the company website for relevant information based on the query.
    """

    print("QUERY1",query)
    urls = ["https://carnotresearch.com/"]  
    loader = UnstructuredURLLoader(urls=urls)
    data = loader.load()
    
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        separators=['\n\n', '\n', '.', ',']
    )
    
    docs = text_splitter.split_documents(data)
    
    model = "sentence-transformers/all-mpnet-base-v2"
    hf_token = os.getenv("HUGGINGFACE_API_KEY")
    embeddings = HuggingFaceHubEmbeddings(
        model=model,
        task="feature-extraction",
        huggingfacehub_api_token=hf_token,
    )

    
    vectorindices = FAISS.from_documents(docs, embeddings)
    
    similar_docs = vectorindices.similarity_search(query, k=3)
    
    return "\n\n".join([doc.page_content for doc in similar_docs])


<h2>Generating Response</h2>

In [7]:
def generate_response(emails,query):
    print("QUERY",query)
    # Initializing tools

    email_search_tool = Tool.from_function(
        func=email_search,
        name="email_search",
        description="Searches through emails to locate messages based on content, sender, or timestamp."
    )

    web_search_tool = Tool.from_function(
        func=web_search,
        name="web_search",
        description="Executes a search query on the internet to retrieve relevant web pages, articles, or resources."
    )

    company_website_search_tool = Tool.from_function(
        func=company_website_search,
        name="company_website_search",
        description="Searches the company website for relevant information based on the query."
    )


    system_prompt = """
        You are an intelligent assistant that helps users find emails, search the web, and search the company website. 
        Use the following guidelines:
        1. First, analyze the user's query and break it down into its components.
        2. Explain which tool will be used and why.
        3. Then, use the appropriate tool to retrieve the information.
        4. Finally, provide a clear and concise response to the user, including the breakdown of the query.
        
        For example:
        - If the query is about emails, use the `email_search` tool.
        - If the query is about general knowledge or web-related topics, use the `web_search` tool.
        - If the query is about the company, use the `company_website_search` tool.
    """


    # llm initialization

    llm = ChatGoogleGenerativeAI(
        model="gemini-2.0-flash",
        temperature=0.5,
        system_message=SystemMessage(content=system_prompt)
    )

    tools = [email_search_tool, web_search_tool, company_website_search_tool]

    
    agent = initialize_agent(
        tools=tools,
        llm=llm,
        agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION, 
        verbose=True ,
        handle_parsing_errors=True
    )

    
    
    response = agent.run(
        f"""
        Analyze the following query and break it down into its components. 
        Explain which tool will be used and why. Then, proceed with the search Include the breakdown in the final response.
    
        Query: {query}
        """
    )

    # print("FINAL RESPONSE ",response)

    return response
    


In [8]:
def main():
    
    # dummy data to test email search we can obtain such data using gmail api
    emails = [
    {'timestamp': '2025-02-21 09:15:32', 'sender': 'rahul@company.com', 'content': 'Pull Request #342 has been raised for the latest bug fix. Please review and approve.'},
    {'timestamp': '2025-02-21 10:05:14', 'sender': 'rajesh@company.com', 'content': 'There is an issue with the API response time. I have raised ticket #5678 for investigation.'},
    {'timestamp': '2025-02-21 11:32:45', 'sender': 'ram@company.com', 'content': 'Assigning task #1254 to John for UI improvements. Deadline is Friday.'},
    {'timestamp': '2025-02-21 12:47:21', 'sender': 'raghu@company.com', 'content': 'Pull Request #345 is ready for review. This includes performance optimizations.'},
    {'timestamp': '2025-02-21 14:03:09', 'sender': 'raghav@company.com', 'content': 'Created a new task #567 for backend refactoring. Let me know who can take it.'},
    {'timestamp': '2025-02-21 15:25:34', 'sender': 'raju@company.com', 'content': 'The security vulnerability reported last week has been patched. Pull Request(PR) #349 raised.'},
    {'timestamp': '2025-02-21 16:12:50', 'sender': 'ramu@company.com', 'content': 'The deployment pipeline failed due to a misconfiguration. Issue #7890 has been logged.'},
    {'timestamp': '2025-02-21 17:44:12', 'sender': 'roshan@company.com', 'content': 'Assigning ticket #982 to the DevOps team for infrastructure scaling.'},
    {'timestamp': '2025-02-21 18:56:40', 'sender': 'rohit@company.com', 'content': 'Pull Request #350 contains the latest API enhancements. Please review before EOD.'},
    {'timestamp': '2025-02-21 19:30:15', 'sender': 'rao@company.com', 'content': 'Raised issue #4567 regarding broken openai api. Need urgent fix.'},
    {'timestamp': '2025-02-21 19:30:15', 'sender': 'ramesh@company.com', 'content': 'Get to know obstacle detection in autonomous cars, we need to gain a deeper understanding of the concept in order to implement it in upcoming project'}
    ]

    query = "get emails related to pull requests"

    
    
    result = generate_response(emails,query)


    print("FINAL RESULT: ")
    print(result)
    

<h2>Results</h2>

In [9]:
if __name__ == "__main__":
    main()

QUERY get emails related to pull requests


[1m> Entering new AgentExecutor chain...[0m


  agent = initialize_agent(
  response = agent.run(


[32;1m[1;3mOkay, I can analyze the query and determine the best tool to use.

**Query Breakdown:**

*   **Topic:** Pull requests
*   **Type:** Emails
*   **Goal:** Retrieve emails specifically related to pull requests.

**Tool Selection:**

The most appropriate tool for this query is `email_search`. This is because the query explicitly asks for emails, and `email_search` is designed to search through email content.

**Action:**
```json
{
  "action": "email_search",
  "action_input": "pull requests"
}
```
[0m
Observation: [36;1m[1;3m[{'timestamp': '2025-02-20 12:47:21', 'sender': 'raghu@company.com', 'content': 'Pull Request #12 is ready for review. This includes performance optimizations.'}, {'timestamp': '2025-02-20 15:25:34', 'sender': 'raju@company.com', 'content': 'The security vulnerability reported last week has been patched. Pull Request(PR) #14 raised.'}, {'timestamp': '2025-02-19 18:56:40', 'sender': 'rohit@company.com', 'content': 'Pull Request #16 contains the latest AP

In [10]:
emails = [
    {'timestamp': '2025-02-21 09:15:32', 'sender': 'rahul@company.com', 'content': 'Pull Request #342 has been raised for the latest bug fix. Please review and approve.'},
    {'timestamp': '2025-02-21 10:05:14', 'sender': 'rajesh@company.com', 'content': 'There is an issue with the API response time. I have raised ticket #5678 for investigation.'},
    {'timestamp': '2025-02-21 11:32:45', 'sender': 'ram@company.com', 'content': 'Assigning task #1254 to John for UI improvements. Deadline is Friday.'},
    {'timestamp': '2025-02-21 12:47:21', 'sender': 'raghu@company.com', 'content': 'Pull Request #345 is ready for review. This includes performance optimizations.'},
    {'timestamp': '2025-02-21 14:03:09', 'sender': 'raghav@company.com', 'content': 'Created a new task #567 for backend refactoring. Let me know who can take it.'},
    {'timestamp': '2025-02-21 15:25:34', 'sender': 'raju@company.com', 'content': 'The security vulnerability reported last week has been patched. Pull Request(PR) #349 raised.'},
    {'timestamp': '2025-02-21 16:12:50', 'sender': 'ramu@company.com', 'content': 'The deployment pipeline failed due to a misconfiguration. Issue #7890 has been logged.'},
    {'timestamp': '2025-02-21 17:44:12', 'sender': 'roshan@company.com', 'content': 'Assigning ticket #982 to the DevOps team for infrastructure scaling.'},
    {'timestamp': '2025-02-21 18:56:40', 'sender': 'rohit@company.com', 'content': 'Pull Request #350 contains the latest API enhancements. Please review before EOD.'},
    {'timestamp': '2025-02-21 19:30:15', 'sender': 'rao@company.com', 'content': 'Raised issue #4567 regarding broken openai api. Need urgent fix.'},
    {'timestamp': '2025-02-21 19:30:15', 'sender': 'ramesh@company.com', 'content': 'Get to know obstacle detection in autonomous cars, we need to gain a deeper understanding of the concept in order to implement it in upcoming project'}
    ]

query2 = "find the emails related bugs and seach online on how to solve them"

    
    
result = generate_response(emails,query2)
print(result)

QUERY find the emails related bugs and seach online on how to solve them


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mOkay, I can analyze the query and break it down.

**Query Breakdown:**

*   **Part 1:** "find the emails related bugs" - This part focuses on locating emails that discuss or mention bugs.
*   **Part 2:** "search online on how to solve them" - This part aims to find solutions to the bugs identified (or mentioned) in the emails. "Them" refers to the bugs.

**Tool Selection:**

1.  **Part 1:** To find emails, I will use the `email_search` tool. This tool is specifically designed to search through emails based on keywords or content.
2.  **Part 2:** To find solutions to the bugs online, I will use the `web_search` tool. This tool allows me to search the internet for relevant articles, forum discussions, or documentation that might offer solutions.

**Execution Plan:**

1.  First, use `email_search` to find emails related to "bugs".
2.  Then, use `web_search`

In [11]:
print(result)

The query was analyzed and broken down into two parts: finding emails related to bugs and searching online for solutions. The `email_search` tool was used to find emails containing the keyword 'bugs'. The `web_search` tool was then used to search for solutions related to 'security vulnerability patch' and 'software bug fix', based on the content of the emails found. The search results provided information on the purpose of software patches, the importance of testing, patch management, and scheduling deployments.


In [19]:
query3 = "What services does carnotreseach pvt ltd provide"
result = generate_response(emails,query3)

QUERY What services does carnotreseach pvt ltd provide


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mOkay, I will analyze the query, determine the appropriate tool, and execute the search.

**Analysis:**

*   **Entity:** carnotreseach pvt ltd
*   **Intent:** To discover the services offered by the specified company.
*   **Type:** Fact-seeking.

**Tool Selection:**

The most suitable tool is `company_website_search` because the query explicitly asks about the services of a specific company. This tool is designed to search the company's website for relevant information.

**Search Execution:**
```json
{
  "action": "company_website_search",
  "action_input": "What services does carnotreseach pvt ltd provide"
}
```
[0mQUERY1 What services does carnotreseach pvt ltd provide

Observation: [38;5;200m[1;3mWhat we Do.

Carnot Research develops cutting-edge technology products and provides research services and industry skilling in niche and deep tech areas, namely AI, Computer

In [20]:
print(result)

Okay, here's the breakdown and the answer to your query:

**Analysis:**
*   **Entity:** carnotreseach pvt ltd
*   **Intent:** To discover the services offered by the specified company.
*   **Type:** Fact-seeking.

**Tool Selection:**
The most suitable tool is `company_website_search` because the query explicitly asks about the services of a specific company. This tool is designed to search the company's website for relevant information.

**Services Provided by Carnot Research Pvt Ltd:**

Carnot Research develops cutting-edge technology products and provides research services and industry skilling in niche and deep tech areas, namely AI, Computer Vision, AR/VR, NLP and IoT. They offer research consultancy, industry skilling, and IP development services across various industries including Healthcare, Retail, Manufacturing, Telecom, Media, IoT, Security, Cyber Security, Finance and Sports. They conduct research primarily in Computer Vision and NLP.
