<a href="https://colab.research.google.com/github/ABSatpute/Deep_Research_AI_Agent/blob/main/Deep_Research_AI_Agent_final.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
!pip install langgraph langsmith langchain_groq langchain_community tavily-python

Collecting langgraph
  Downloading langgraph-0.3.2-py3-none-any.whl.metadata (17 kB)
Collecting langchain_groq
  Downloading langchain_groq-0.2.4-py3-none-any.whl.metadata (3.0 kB)
Collecting langchain_community
  Downloading langchain_community-0.3.18-py3-none-any.whl.metadata (2.4 kB)
Collecting tavily-python
  Downloading tavily_python-0.5.1-py3-none-any.whl.metadata (91 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m91.0/91.0 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
Collecting langgraph-checkpoint<3.0.0,>=2.0.10 (from langgraph)
  Downloading langgraph_checkpoint-2.0.16-py3-none-any.whl.metadata (4.6 kB)
Collecting langgraph-prebuilt<0.2,>=0.1.1 (from langgraph)
  Downloading langgraph_prebuilt-0.1.1-py3-none-any.whl.metadata (5.0 kB)
Collecting langgraph-sdk<0.2.0,>=0.1.42 (from langgraph)
  Downloading langgraph_sdk-0.1.53-py3-none-any.whl.metadata (1.8 kB)
Collecting groq<1,>=0.4.1 (from langchain_groq)
  Downloading groq-0.18.0-py3-none-any.whl.metadat

In [3]:
import json
import re
import os
from langchain.tools import TavilySearchResults
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from langchain_groq import ChatGroq
from langchain.schema import Document, AIMessage
from langgraph.graph import StateGraph
from pydantic import BaseModel, Field
from google.colab import userdata


In [4]:

# 🔹 Set API Keys
groq_api_key = userdata.get("groq_api_key")
tavily_api_key = userdata.get("tavily_api_key")

os.environ["TAVILY_API_KEY"] = tavily_api_key



In [5]:
# 🔹 Initialize LLMs
llm_summarize = ChatGroq(groq_api_key=groq_api_key, model_name="mixtral-8x7b-32768")
llm_answer = ChatGroq(groq_api_key=groq_api_key, model_name="deepseek-r1-distill-llama-70b")



In [6]:
# 🔹 Define AI Agent State
"""Represents the state of the AI agent workflow, including query, search results,and final answer."""
class State(BaseModel):
    query: str = Field(default="")
    search_results: list[Document] = Field(default_factory=list)
    structured_results: list = Field(default_factory=list)
    final_answer: str = Field(default="")




In [7]:
# 🔹 STEP 1: Fetch Search Results from Tavily API
def fetch_tavily_results(state: State):
    """ Fetches top search results from Tavily API. """
    tavily = TavilySearchResults(api_key=tavily_api_key)
    docs = tavily.run(state.query)
    state.search_results = [Document(page_content=d["content"], metadata=d) for d in docs]
    return state


In [8]:
# 🔹 STEP 2: Clean & Preprocess Text
def clean_text(text):
    """ Cleans text by removing URLs, special characters, and extra spaces. """
    text = re.sub(r"http\S+", "", text)  # Remove URLs
    text = re.sub(r"[^\w\s]", "", text)  # Remove special characters
    text = re.sub(r"\s+", " ", text).strip()  # Remove extra spaces
    return text



In [9]:
# 🔹 STEP 3: Summarize Search Results
def summarize_text(text):
    """ Summarizes text using Mixtral Model.
        Summarizes the gathered search results to extract key information."""
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
    texts = text_splitter.split_text(text)
    docs = [Document(page_content=t) for t in texts]

    summarize_chain = load_summarize_chain(llm_summarize, chain_type="map_reduce")
    summary = summarize_chain.run(docs)
    return summary



In [10]:
# 🔹 STEP 4: Process & Structure Search Results
def process_search_results(state: State):
    """ Processes Tavily search results and structures them into JSON format. """
    structured_results = []

    for doc in state.search_results:
        cleaned_content = clean_text(doc.page_content)
        summary = summarize_text(cleaned_content)

        structured_results.append({
            "title": doc.metadata.get("title", "No Title Found"),
            "url": doc.metadata.get("url", ""),
            "summary": summary
        })

    state.structured_results = structured_results
    return state



In [11]:
# 🔹 STEP 5: Generate Final Answer
def generate_final_answer(state: State):
    """ Generates a final, well-structured answer based on the processed research data using DeepSeek. """
    prompt = f"Summarize the following research findings:\n\n{json.dumps(state.structured_results, indent=4)}"
    response = llm_answer.invoke(prompt)

    state.final_answer = response.content if isinstance(response, AIMessage) else response
    return state



In [12]:
# 🔹 STEP 6: Define Multi-Step AI Workflow
workflow = StateGraph(State)

# 🔹 Add nodes representing different processing steps in the AI pipeline
workflow.add_node("fetch_results", fetch_tavily_results)  # Fetch search results using Tavily API
workflow.add_node("process_results", process_search_results)  # Clean, summarize, and structure results
workflow.add_node("generate_answer", generate_final_answer)  # Generate final answer using DeepSeek model

# 🔹 Define the sequence of execution (edges between nodes)
workflow.add_edge("fetch_results", "process_results")  # After fetching, process the search results
workflow.add_edge("process_results", "generate_answer")  # After processing, generate a final answer

# 🔹 Set entry and exit points for the workflow
workflow.set_entry_point("fetch_results")  # First step of execution
workflow.set_finish_point("generate_answer")  # Final output after processing

# 🔹 Compile the AI Research System
research_ai_system = workflow.compile()  # Converts the workflow into an executable system


In [13]:
# 🔹 Run the AI Chatbot in Real-Time
def run_agent(query):
    """
    Runs the AI Agent System for a given query.

    Steps:
    1. Initializes the system state with the user's query.
    2. Executes the research workflow using LangGraph.
    3. Extracts structured research findings and a final answer.
    4. Returns the structured results and the AI-generated response.
    """

    # Initialize the AI system state with the user's query
    initial_state = State(query=query)

    # Invoke the compiled AI research system to process the query
    final_state = research_ai_system.invoke(initial_state)

    # Retrieve the structured search results and final generated answer
    structured_results = final_state.get("structured_results", [])  # List of processed results
    final_answer = final_state.get("final_answer", "")  # Final summarized answer

    # Return the collected information in a structured format
    return {
        "query": query,  # Original user query
        "structured_results": structured_results,  # List of search results with summaries
        "final_answer": final_answer  # AI-generated response based on research findings
    }


In [14]:

# Real-Time Interactive Chat Loop


while True:
    # Prompt the user for a research query
    user_input = input("\nEnter your query (type 'quit' to exit): ")

    # Check for exit command
    if user_input.lower() in ["quit", "q", "exit"]:
        print("Session ended. Have a great day!")
        break

    # Process the query using the AI agent
    response = run_agent(user_input)


    # Display Research Summary

    print("\n--- Research Summary ---")
    for idx, res in enumerate(response["structured_results"], start=1):
        print(f"\n{idx}. Title: {res['title']}")
        print(f"URL: {res['url']}")
        print(f"Summary: {res['summary']}")


    # Display Final AI-Generated Answer

    print("\n--- Final Answer ---")
    print(response["final_answer"])



Enter your query (type 'quit' to exit): Use of AI in Agricultural field


  summary = summarize_chain.run(docs)
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]


--- Research Summary ---

1. Title: No Title Found
URL: https://intellias.com/artificial-intelligence-in-agriculture/
Summary: Intellias is implementing AI in agriculture through various technologies such as driverless tractors, smart irrigation/fertilization systems, agricultural drones, smart spraying, vertical farming software, and AI-based greenhouse robots. These tools provide farmers with real-time crop insights, enabling them to make informed decisions regarding irrigation, fertilization, and pesticide treatment. To increase awareness and implementation of AI in agriculture, technology providers need to focus on data analytics, cloud services, AI automation tools, and location intelligence. These efforts can improve agricultural practices, ROI, and the lives of farmers.

2. Title: No Title Found
URL: https://www.mckinsey.com/industries/agriculture/our-insights/from-bytes-to-bushels-how-gen-ai-can-shape-the-future-of-agriculture
Summary: Agriculture is poised for disruption by A

Token indices sequence length is longer than the specified maximum sequence length for this model (2062 > 1024). Running this sequence through the model will result in indexing errors



--- Research Summary ---

1. Title: No Title Found
URL: https://techcommunity.microsoft.com/blog/aiplatformblog/compare-and-select-models-with-new-benchmarking-tools-in-azure-ai-foundry/4292308
Summary: The text consists of data from a software system, possibly related to user interfaces, page resources, and policies. It includes timestamps, IDs, and references to cached assets in various namespaces for components such as message subjects, revisions, and user information. The locale is set to "en-US." The data appears to be related to user registration, node information, tags, and page resources. There are also references to Microsoft products and services, community hubs, and a public sector community information center. The text includes a GraphQL query result with CategoryPolicies and CachedAsset information. Additionally, there are lists of links or categories, possibly for a website, and a data object representing a comment or message. The text concludes with a collection of obje