<a href="https://colab.research.google.com/github/RAHUL2002-k/AI-Research-Pipeline-with-CrewAI-and-LangChain/blob/main/research%20agent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Building an AI Research Pipeline with CrewAI and LangChain

In the ever-evolving landscape of AI tools, there's a growing need for systems that can autonomously research, analyze, and synthesize information. In this article, I'll walk you through creating a sophisticated AI research pipeline using CrewAI and LangChain that leverages multiple specialized agents working together.

## What I'Have Build

I have create a complete research system with three AI agents:
1. A **Research Specialist** that gathers information
2. A **Data Analyst** that processes and extracts insights
3. A **Content Writer** that produces the final report

These agents will work sequentially, passing their outputs to the next stage in a well-structured workflow.

## Setting Up the Environment

First, we need to set up our environment with the necessary dependencies:

In [None]:
# Import necessary libraries
import os
from dotenv import load_dotenv
from langchain_community.tools.tavily_search import TavilySearchResults
from crewai import Agent
from langchain_openai import ChatOpenAI
from crewai import Task
from crewai import Crew, Process

load_dotenv(override=True)

# Set up API keys for OpenAI and Tavily
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
TAVILY_API_KEY = os.getenv("TAVILY_API_KEY")

You'll need API keys from both OpenAI and Tavily for this project. Store them in a .env file for security.

## Building a Web Search Tool

Next, let's create a search function using Tavily's API that our agents can use to find information:

In [None]:
# Define a compact search function using Tavily's API
def search_internet(query: str, max_results: int = 5) -> str:
    """Search the internet using Tavily's API and return formatted results."""
    search = TavilySearchResults(api_key=TAVILY_API_KEY, max_results=max_results)
    try:
        results = search.invoke(query)

        formatted_results = ""
        for res in results:
            formatted_results += f"Source: {res.get('url', 'No URL')}\n"
            formatted_results += f"Title: {res.get('title', 'No Title')}\n"
            formatted_results += f"Content: {res.get('content', 'No Content')}\n\n"

        return formatted_results if formatted_results else "No results found."
    except Exception as e:
        return f"Error searching the internet: {str(e)}"

This function takes a query string, searches the web using Tavily, and returns formatted results with sources, titles, and content.

## Creating Specialized AI Agents

Now we'll define our three AI agents, each with specific roles and objectives:

In [None]:
# Initialize the OpenAI language model
llm = ChatOpenAI(
    model="gpt-4o-mini",  # Replace with your desired model
    api_key=OPENAI_API_KEY,
    temperature=0.7  # Adjust creativity level
)

# Define the Research Specialist agent
researcher = Agent(
    role="Research Specialist",
    goal="Find accurate and up-to-date information on any given topic",
    backstory="You are an expert researcher with a talent for finding reliable information quickly.",
    verbose=True,
    llm=llm
)

# Define the Data Analyst agent
analyst = Agent(
    role="Data Analyst",
    goal="Analyze information and extract key insights",
    backstory="You are an expert at analyzing data and identifying patterns and insights.",
    verbose=True,
    llm=llm
)

# Define the Content Writer agent
writer = Agent(
    role="Content Writer",
    goal="Create well-structured, informative content based on research and analysis",
    backstory="You are a skilled writer who excels at creating clear, concise, and engaging content.",
    verbose=True,
    llm=llm
)

Each agent has a defined role, goal, and backstory to guide their behavior. We're using GPT-4o-mini in this example, but you can substitute any compatible model.

## Defining Agent Tasks

With our agents created, we need to define specific tasks for each one:

In [None]:
# Define a research task for the Research Specialist agent
research_task = Task(
    description="Research the following topic thoroughly: 'Latest advancements in AI research'. Find the most relevant and up-to-date information.",
    agent=researcher,
    expected_output="A comprehensive research summary with sources"
)

# Define an analysis task for the Data Analyst agent
analysis_task = Task(
    description="Analyze the research findings and identify key insights, trends, and important points about 'Latest advancements in AI research'.",
    agent=analyst,
    expected_output="A detailed analysis with key insights highlighted"
)

# Define a writing task for the Content Writer agent
writing_task = Task(
    description="Create a well-structured report on 'Latest advancements in AI research' based on the research and analysis findings.",
    agent=writer,
    expected_output="A comprehensive, well-written report on the topic"
)

Each task includes a detailed description and expected output to guide the agent.

## Executing the Research Pipeline

Now comes the exciting part - running our research pipeline:

In [None]:
# Create a crew for the research phase
research_crew = Crew(
    agents=[researcher],
    tasks=[research_task],
    verbose=True,
    process=Process.sequential
)

# Execute the research phase
print("Executing research phase...")
research_output = research_crew.kickoff()
print("Research phase completed.")

# Create a crew for the analysis phase
analysis_crew = Crew(
    agents=[analyst],
    tasks=[analysis_task],
    verbose=True,
    process=Process.sequential
)

# Execute the analysis phase
print("\nExecuting analysis phase...")
analysis_output = analysis_crew.kickoff()
print("Analysis phase completed.")

# Create a crew for the writing phase
writing_crew = Crew(
    agents=[writer],
    tasks=[writing_task],
    verbose=True,
    process=Process.sequential
)

# Execute the writing phase
print("\nExecuting writing phase...")
writing_output = writing_crew.kickoff()
print("Writing phase completed.")

Executing research phase...
[1m[95m# Agent:[00m [1m[92mResearch Specialist[00m
[95m## Task:[00m [92mResearch the following topic thoroughly: 'Latest advancements in AI research'. Find the most relevant and up-to-date information.[00m


Overriding of current TracerProvider is not allowed




[1m[95m# Agent:[00m [1m[92mResearch Specialist[00m
[95m## Final Answer:[00m [92m
The field of artificial intelligence (AI) has seen remarkable advancements in recent years, particularly in 2023. This summary presents the latest developments across various subfields of AI, including generative models, reinforcement learning, natural language processing (NLP), and ethical considerations.

1. **Generative AI Models**: 
   - In 2023, the rise of generative AI reached new heights with models such as ChatGPT-4 and DALL-E 3, developed by OpenAI. These models exhibit improved contextual understanding and creativity in generating text and images. ChatGPT-4, for instance, has shown enhanced abilities to maintain context over longer conversations and generate more coherent and contextually relevant responses (OpenAI, 2023).
   - Google has also made strides with its PaLM 2 model, which focuses on advancing conversational AI and multi-modal capabilities, showcasing the ability to unders

Overriding of current TracerProvider is not allowed




[1m[95m# Agent:[00m [1m[92mData Analyst[00m
[95m## Final Answer:[00m [92m
The research findings on the latest advancements in AI research reveal several key insights, trends, and important points that are shaping the future of artificial intelligence. The analysis highlights the following major areas:

1. **Transformative Models and Architectures**: The emergence of large pre-trained models, such as GPT-4 and BERT, has revolutionized natural language processing (NLP). These models leverage unsupervised learning on vast datasets, enabling them to understand context better and generate coherent, contextually appropriate text. Additionally, advancements in transformer architecture are facilitating breakthroughs in other domains, including computer vision and reinforcement learning.

2. **Ethical AI and Bias Mitigation**: As AI systems become more pervasive, there is an increasing focus on ethical considerations. Researchers are working on frameworks to identify and mitigate bia

We create three separate crews, each responsible for one phase of the pipeline. The output of each phase becomes the input for the next.

## Processing and Storing Results

Finally, let's handle the results from our research pipeline:

In [None]:
# Store the outputs from each phase in variables
research_result = str(research_output)
analysis_result = str(analysis_output)
final_report = str(writing_output)

# Display the results in a structured format
print("\n--- Research Results ---")
print(research_result)

print("\n--- Analysis Results ---")
print(analysis_result)

print("\n--- Final Report ---")
print(final_report)

# Save the final report to a file for future reference
with open("final_report.txt", "w") as file:
    file.write(final_report)

print("\nFinal report saved as 'final_report.txt'.")


--- Research Results ---
The field of artificial intelligence (AI) has seen remarkable advancements in recent years, particularly in 2023. This summary presents the latest developments across various subfields of AI, including generative models, reinforcement learning, natural language processing (NLP), and ethical considerations.

1. **Generative AI Models**: 
   - In 2023, the rise of generative AI reached new heights with models such as ChatGPT-4 and DALL-E 3, developed by OpenAI. These models exhibit improved contextual understanding and creativity in generating text and images. ChatGPT-4, for instance, has shown enhanced abilities to maintain context over longer conversations and generate more coherent and contextually relevant responses (OpenAI, 2023).
   - Google has also made strides with its PaLM 2 model, which focuses on advancing conversational AI and multi-modal capabilities, showcasing the ability to understand and generate text, images, and more (Google AI Blog, 2023).



This code captures the output from each phase, displays it, and saves the final report to a text file.

## Benefits and Applications

This AI research pipeline offers several advantages:

1. **Efficiency**: Automates the entire research process from data gathering to final report
2. **Specialization**: Each agent focuses on what it does best
3. **Scalability**: Easily modify the topic or add more specialized agents
4. **Quality**: Sequential refinement leads to higher quality output than a single-agent approach

Potential applications include:
- Market research and competitive analysis
- Academic literature reviews
- News and trend monitoring
- Product research and development
- Content creation for blogs and publications

## Conclusion

By combining CrewAI with LangChain, we've created a powerful, modular AI research system that mimics human research workflows. The system demonstrates how specialized AI agents can collaborate effectively to tackle complex research tasks.

This approach opens up exciting possibilities for automated knowledge work, allowing humans to focus on higher-level decision-making while AI handles the information gathering and processing tasks.

Try implementing this pipeline yourself and experiment with different topics, agent configurations, and LLM models to see what works best for your specific needs.