# Agentic RAG Using CrewAI & LangChain

We are going to see how agents can be involved in the RAG system to rtrieve the most relevant information.

In [9]:
!pip install crewai==0.28.8 crewai_tools==0.1.6 langchain_community==0.0.29 sentence-transformers langchain-groq python-dotenv --quiet

In [10]:
from langchain_openai import ChatOpenAI
import os
from crewai_tools import PDFSearchTool
from langchain_community.tools.tavily_search import TavilySearchResults
from crewai_tools  import tool
from crewai import Crew
from crewai import Task
from crewai import Agent

## Mention the Groq API Key

In [11]:

# Set the API key
from dotenv import load_dotenv
load_dotenv()
GROQ_API_KEY = os.getenv("GROQAPI")
TAVILY_API_KEY = os.getenv("TAVILY_API_KEY")

## Mention the LLM being used

In [12]:
llm = ChatOpenAI(
    openai_api_base="https://api.groq.com/openai/v1",
    openai_api_key= GROQ_API_KEY,
    model_name="llama3-8b-8192",
    temperature=0.1,
    max_tokens=1000,
)

In [13]:
import requests

pdf_url = 'https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf'
response = requests.get(pdf_url)

with open('attention_is_all_you_need.pdf', 'wb') as file:
    file.write(response.content)

## Create a RAG tool variable to pass our PDF

In [16]:
rag_tool = PDFSearchTool(pdf='attention_is_all_you_need.pdf',
    config=dict(
        llm=dict(
            provider="openai", # or google, openai, anthropic, llama2, ...
            config=dict(
                model="llama3-8b-8192",
                # temperature=0.5,
                # top_p=1,
                # stream=true,
            ),
        ),
        embedder=dict(
            provider="huggingface", # or openai, ollama, ...
            config=dict(
                model="BAAI/bge-small-en-v1.5",
                #task_type="retrieval_document",
                # title="Embeddings",
            ),
        ),
    )
)

  from tqdm.autonotebook import tqdm, trange
Inserting batches in chromadb: 100%|██████████| 1/1 [00:04<00:00,  4.82s/it]


In [15]:
pip install langchain_huggingface

Collecting langchain_huggingface
  Using cached langchain_huggingface-0.0.3-py3-none-any.whl.metadata (1.2 kB)
Using cached langchain_huggingface-0.0.3-py3-none-any.whl (17 kB)
Installing collected packages: langchain_huggingface
Successfully installed langchain_huggingface-0.0.3
Note: you may need to restart the kernel to use updated packages.


In [17]:
rag_tool.run("How did self-attention mechanism evolve in large language models?")

Using Tool: Search a PDF's content


'Relevant Content:\nThis makes it more difﬁcult to learn dependencies between distant positions [ 11]. In the Transformer this is reduced to a constant number of operations, albeit at the cost of reduced effective resolution due to averaging attention-weighted positions, an effect we counteract with Multi-Head Attention as described in section 3.2. Self-attention, sometimes called intra-attention is an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence. Self-attention has been used successfully in a variety of tasks including reading comprehension, abstractive summarization, textual entailment and learning task-independent sentence representations [4, 22, 23, 19]. End-to-end memory networks are based on a recurrent attention mechanism instead of sequence- aligned recurrence and have been shown to perform well on simple-language question answering and language modeling tasks [28]. To the best of our knowledge, howev

## Mention the Tavily API Key

In [51]:
# import os

# # Set the Tavily API key
# os.environ['TAVILY_API_KEY'] = 'Add Your Tavily API Key'

In [18]:
web_search_tool = TavilySearchResults(k=3)

In [19]:
web_search_tool.run("What is self-attention mechansim in large language models?")

[{'url': 'https://towardsdatascience.com/contextual-transformer-embeddings-using-self-attention-explained-with-diagrams-and-python-code-d7a9f0f4d94e',
  'content': 'The self-attention mechanism generates rich, context-aware transformer embeddings for text by processing each token in an input sequence simultaneously. These embeddings build on the foundations of static word embeddings (such as word2vec) and enable more capable language models such as BERT and GPT.'},
 {'url': 'https://cyrilzakka.github.io/llm-playbook/attention.html',
  'content': "Self-Attention. Self-attention is a specific type of attention mechanism where the Query (Q), Key (K), and Value (V) all come from the same place, usually the output of the previous layer in your network. In layman's terms, self-attention enables tokens in the input sequence to look at other tokens in the same sequence to gather contextual clues."},
 {'url': 'https://medium.com/data-science-community-srm/exploring-attention-mechanisms-in-large

## Define a Tool

In [20]:
@tool
def router_tool(question):
  """Router Function"""
  if 'self-attention' in question:
    return 'vectorstore'
  else:
    return 'web_search'

## Create Agents to work with

In [21]:
Router_Agent = Agent(
  role='Router',
  goal='Route user question to a vectorstore or web search',
  backstory=(
    "You are an expert at routing a user question to a vectorstore or web search."
    "Use the vectorstore for questions on concept related to Retrieval-Augmented Generation."
    "You do not need to be stringent with the keywords in the question related to these topics. Otherwise, use web-search."
  ),
  verbose=True,
  allow_delegation=False,
  llm=llm,
)

In [22]:
Retriever_Agent = Agent(
role="Retriever",
goal="Use the information retrieved from the vectorstore to answer the question",
backstory=(
    "You are an assistant for question-answering tasks."
    "Use the information present in the retrieved context to answer the question."
    "You have to provide a clear concise answer."
),
verbose=True,
allow_delegation=False,
llm=llm,
)

In [23]:
Grader_agent =  Agent(
  role='Answer Grader',
  goal='Filter out erroneous retrievals',
  backstory=(
    "You are a grader assessing relevance of a retrieved document to a user question."
    "If the document contains keywords related to the user question, grade it as relevant."
    "It does not need to be a stringent test.You have to make sure that the answer is relevant to the question."
  ),
  verbose=True,
  allow_delegation=False,
  llm=llm,
)

In [24]:
hallucination_grader = Agent(
    role="Hallucination Grader",
    goal="Filter out hallucination",
    backstory=(
        "You are a hallucination grader assessing whether an answer is grounded in / supported by a set of facts."
        "Make sure you meticulously review the answer and check if the response provided is in alignmnet with the question asked"
    ),
    verbose=True,
    allow_delegation=False,
    llm=llm,
)

In [25]:
answer_grader = Agent(
    role="Answer Grader",
    goal="Filter out hallucination from the answer.",
    backstory=(
        "You are a grader assessing whether an answer is useful to resolve a question."
        "Make sure you meticulously review the answer and check if it makes sense for the question asked"
        "If the answer is relevant generate a clear and concise response."
        "If the answer gnerated is not relevant then perform a websearch using 'web_search_tool'"
    ),
    verbose=True,
    allow_delegation=False,
    llm=llm,
)

## Defining Tasks

In [26]:
router_task = Task(
    description=("Analyse the keywords in the question {question}"
    "Based on the keywords decide whether it is eligible for a vectorstore search or a web search."
    "Return a single word 'vectorstore' if it is eligible for vectorstore search."
    "Return a single word 'websearch' if it is eligible for web search."
    "Do not provide any other premable or explaination."
    ),
    expected_output=("Give a binary choice 'websearch' or 'vectorstore' based on the question"
    "Do not provide any other premable or explaination."),
    agent=Router_Agent,
    tools=[router_tool],
)

In [27]:
retriever_task = Task(
    description=("Based on the response from the router task extract information for the question {question} with the help of the respective tool."
    "Use the web_serach_tool to retrieve information from the web in case the router task output is 'websearch'."
    "Use the rag_tool to retrieve information from the vectorstore in case the router task output is 'vectorstore'."
    ),
    expected_output=("You should analyse the output of the 'router_task'"
    "If the response is 'websearch' then use the web_search_tool to retrieve information from the web."
    "If the response is 'vectorstore' then use the rag_tool to retrieve information from the vectorstore."
    "Return a claer and consise text as response."),
    agent=Retriever_Agent,
    context=[router_task],
   #tools=[retriever_tool],
)

In [28]:
grader_task = Task(
    description=("Based on the response from the retriever task for the quetion {question} evaluate whether the retrieved content is relevant to the question."
    ),
    expected_output=("Binary score 'yes' or 'no' score to indicate whether the document is relevant to the question"
    "You must answer 'yes' if the response from the 'retriever_task' is in alignment with the question asked."
    "You must answer 'no' if the response from the 'retriever_task' is not in alignment with the question asked."
    "Do not provide any preamble or explanations except for 'yes' or 'no'."),
    agent=Grader_agent,
    context=[retriever_task],
)

In [29]:
hallucination_task = Task(
    description=("Based on the response from the grader task for the quetion {question} evaluate whether the answer is grounded in / supported by a set of facts."),
    expected_output=("Binary score 'yes' or 'no' score to indicate whether the answer is sync with the question asked"
    "Respond 'yes' if the answer is in useful and contains fact about the question asked."
    "Respond 'no' if the answer is not useful and does not contains fact about the question asked."
    "Do not provide any preamble or explanations except for 'yes' or 'no'."),
    agent=hallucination_grader,
    context=[grader_task],
)

answer_task = Task(
    description=("Based on the response from the hallucination task for the quetion {question} evaluate whether the answer is useful to resolve the question."
    "If the answer is 'yes' return a clear and concise answer."
    "If the answer is 'no' then perform a 'websearch' and return the response"),
    expected_output=("Return a clear and concise response if the response from 'hallucination_task' is 'yes'."
    "Perform a web search using 'web_search_tool' and return ta clear and concise response only if the response from 'hallucination_task' is 'no'."
    "Otherwise respond as 'Sorry! unable to find a valid response'."),
    context=[hallucination_task],
    agent=answer_grader,
    #tools=[answer_grader_tool],
)

## Define a flow for the use case

In [30]:
rag_crew = Crew(
    agents=[Router_Agent, Retriever_Agent, Grader_agent, hallucination_grader, answer_grader],
    tasks=[router_task, retriever_task, grader_task, hallucination_task, answer_task],
    verbose=True,

)

## Ask any query

In [37]:
inputs ={"question":"what is difference between RAG and Agents"}

## Kick off the agent pipeline

In [38]:
result = rag_crew.kickoff(inputs=inputs)

[1m[95m [DEBUG]: == Working Agent: Router[00m
[1m[95m [INFO]: == Starting Task: Analyse the keywords in the question what is difference between RAG and AgentsBased on the keywords decide whether it is eligible for a vectorstore search or a web search.Return a single word 'vectorstore' if it is eligible for vectorstore search.Return a single word 'websearch' if it is eligible for web search.Do not provide any other premable or explaination.[00m


[1m> Entering new CrewAgentExecutor chain...[0m
[32;1m[1;3mThought: I need to analyze the keywords in the question to determine whether it's eligible for a vectorstore search or a web search.

Action: router_tool

Action Input: {"question": "what is difference between RAG and Agents"}
[0m[95m 

web_search
[00m
[32;1m[1;3mThought: I now know the final answer

Final Answer: websearch[0m

[1m> Finished chain.[0m
[1m[92m [DEBUG]: == [Router] Task output: websearch

[00m
[1m[95m [DEBUG]: == Working Agent: Retriever[00m
[1m[

## Obtain the final response

In [39]:
print(result)

RAG (Return on Assets to Gross) and Agent are two different concepts. RAG is a financial metric used to evaluate the performance of a company by comparing its return on assets to its gross assets. It helps to determine the efficiency of a company's asset utilization.

On the other hand, Agents are entities that act on behalf of another entity, such as a company or an individual, to perform specific tasks or make decisions. They can be human or artificial, such as software agents or robots.

In summary, RAG is a financial metric, while Agents are entities that act on behalf of others.
