# **Light Corrective RAG Using LangChain and CrewAI**

In [23]:
!pip install crewai crewai_tools langchain_community sentence-transformers langchain-groq langchain_huggingface --quiet


In [24]:
from langchain_openai import ChatOpenAI
import os
from crewai_tools import PDFSearchTool
from langchain_community.tools.tavily_search import TavilySearchResults
from crewai_tools  import tool
from crewai import Crew
from crewai import Task
from crewai import Agent

**Groq API Key**

In [25]:
import os

# Use different key since this one will be deactivated
os.environ['GROQ_API_KEY'] = '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'

**LLM being used**

In [26]:
llm = ChatOpenAI(
    openai_api_key=os.environ['GROQ_API_KEY'],
    model_name="llama3-8b-8192",
    temperature=0.1,
    max_tokens=1000,
)

In [27]:
import requests

pdf_url = 'https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf'
response = requests.get(pdf_url)

with open('attention_is_all_you_need.pdf', 'wb') as file:
    file.write(response.content)

# **RAG tool to pass the PDF document**

In [53]:
rag_tool = PDFSearchTool(pdf='/content/attention_is_all_you_need.pdf',
    config=dict(
        llm=dict(
            provider="groq",
            config=dict(
                model="llama3-8b-8192",
                temperature=0.5,
                top_p=1,
                stream=True,
            ),
        ),
        embedder=dict(
            provider="huggingface",
            config=dict(
                model="BAAI/bge-small-en-v1.5",
            ),
        ),
    )
)

In [54]:
rag_tool.run("What is multi-head attention?")

Using Tool: Search a PDF's content


'Relevant Content:\nas depicted in Figure 2. Multi-head attention allows the model to jointly attend to information from different representation subspaces at different positions. With a single attention head, averaging inhibits this. 4To illustrate why the dot products get large, assume that the components of qandkare independent random variables with mean 0and variance 1. Then their dot product, q·k=∑dk i=1qiki, has mean 0and variance dk. 4\n\nMultiHead( Q,K,V ) = Concat(head 1,.,head h)WO where head i= Attention( QWQ i,KWK i,VWV i) Where the projections are parameter matrices WQ i∈Rdmodel×dk,WK i∈Rdmodel×dk,WV i∈Rdmodel×dv andWO∈Rhdv×dmodel. In this work we employ h= 8 parallel attention layers, or heads. For each of these we use dk=dv=dmodel/h= 64 . Due to the reduced dimension of each head, the total computational cost is similar to that of single-head attention with full dimensionality. 3.2.3 Applications of Attention in our Model The Transformer uses multi-head attention in thre

# **Web Search API key**

In [55]:
import os

# Use different key since this one will be deactivated
os.environ['TAVILY_API_KEY'] = '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'

In [56]:
web_search_tool = TavilySearchResults(k=3)

In [57]:
web_search_tool.run("What is multi-head attention?")

[{'url': 'https://medium.com/@digvijay.qi/multi-head-attention-4eb2ec3d3a3f',
  'content': 'Multi-head attention is a powerful… | by Digvijay Y | Medium Multi-head attention is a powerful extension of the self-attention mechanism that significantly improves its ability to capture diverse aspects of relationships within a sequence. Multi-head attention addresses this by using multiple independent attention heads, each with its own set of learned weight matrices for Query, Key, and Value vectors. Independent Calculations: Each head performs the core self-attention steps (queries, keys, values, attention scores, softmax, context vectors) but with its own learned weight matrices. Multi-head attention works in the following steps:'},
 {'url': 'https://medium.com/@hassaanidrees7/exploring-multi-head-attention-why-more-heads-are-better-than-one-006a5823372b',
  'content': 'Understanding the Power and Benefits of Multi-Head Attention in Transformer Models For students and practitioners in AI a

# **Router tool to decide vector store or web search**

In [58]:
@tool
def router_tool(question):
  """Router Function"""
  if 'self-attention' in question:
    return 'vectorstore'
  else:
    return 'web_search'

# **CrewAI Agents**

In [59]:
Router_Agent = Agent(
  role='Router',
  goal='Route user question to a vectorstore or web search',
  backstory=(
    "You are an expert at routing a user question to a vectorstore or web search."
    "Use the vectorstore for questions on concept related to Retrieval-Augmented Generation."
    "You do not need to be stringent with the keywords in the question related to these topics. Otherwise, use web-search."
  ),
  verbose=True,
  allow_delegation=False,
  llm=llm,
)

In [60]:
Retriever_Agent = Agent(
role="Retriever",
goal="Use the information retrieved from the vectorstore to answer the question",
backstory=(
    "You are an assistant for question-answering tasks."
    "Use the information present in the retrieved context to answer the question."
    "You have to provide a clear concise answer."
),
verbose=True,
allow_delegation=False,
llm=llm,
)

In [61]:
Grader_agent =  Agent(
  role='Answer Grader',
  goal='Filter out erroneous retrievals',
  backstory=(
    "You are a grader assessing relevance of a retrieved document to a user question."
    "If the document contains keywords related to the user question, grade it as relevant."
    "It does not need to be a stringent test.You have to make sure that the answer is relevant to the question."
  ),
  verbose=True,
  allow_delegation=False,
  llm=llm,
)

In [62]:
hallucination_grader = Agent(
    role="Hallucination Grader",
    goal="Filter out hallucination",
    backstory=(
        "You are a hallucination grader assessing whether an answer is grounded in / supported by a set of facts."
        "Make sure you meticulously review the answer and check if the response provided is in alignmnet with the question asked"
    ),
    verbose=True,
    allow_delegation=False,
    llm=llm,
)

In [63]:
answer_grader = Agent(
    role="Answer Grader",
    goal="Filter out hallucination from the answer.",
    backstory=(
        "You are a grader assessing whether an answer is useful to resolve a question."
        "Make sure you meticulously review the answer and check if it makes sense for the question asked"
        "If the answer is relevant generate a clear and concise response."
        "If the answer gnerated is not relevant then perform a websearch using 'web_search_tool'"
    ),
    verbose=True,
    allow_delegation=False,
    llm=llm,
)

In [64]:
router_task = Task(
    description=("Analyse the keywords in the question {question}"
    "Based on the keywords decide whether it is eligible for a vectorstore search or a web search."
    "Return a single word 'vectorstore' if it is eligible for vectorstore search."
    "Return a single word 'websearch' if it is eligible for web search."
    "Do not provide any other premable or explaination."
    ),
    expected_output=("Give a binary choice 'websearch' or 'vectorstore' based on the question"
    "Do not provide any other premable or explaination."),
    agent=Router_Agent,
    tools=[router_tool],
)

In [65]:
retriever_task = Task(
    description=("Based on the response from the router task extract information for the question {question} with the help of the respective tool."
    "Use the web_serach_tool to retrieve information from the web in case the router task output is 'websearch'."
    "Use the rag_tool to retrieve information from the vectorstore in case the router task output is 'vectorstore'."
    ),
    expected_output=("You should analyse the output of the 'router_task'"
    "If the response is 'websearch' then use the web_search_tool to retrieve information from the web."
    "If the response is 'vectorstore' then use the rag_tool to retrieve information from the vectorstore."
    "Return a claer and consise text as response."),
    agent=Retriever_Agent,
    context=[router_task],
)

In [66]:
grader_task = Task(
    description=("Based on the response from the retriever task for the quetion {question} evaluate whether the retrieved content is relevant to the question."
    ),
    expected_output=("Binary score 'yes' or 'no' score to indicate whether the document is relevant to the question"
    "You must answer 'yes' if the response from the 'retriever_task' is in alignment with the question asked."
    "You must answer 'no' if the response from the 'retriever_task' is not in alignment with the question asked."
    "Do not provide any preamble or explanations except for 'yes' or 'no'."),
    agent=Grader_agent,
    context=[retriever_task],
)

In [67]:
hallucination_task = Task(
    description=("Based on the response from the grader task for the quetion {question} evaluate whether the answer is grounded in / supported by a set of facts."),
    expected_output=("Binary score 'yes' or 'no' score to indicate whether the answer is sync with the question asked"
    "Respond 'yes' if the answer is in useful and contains fact about the question asked."
    "Respond 'no' if the answer is not useful and does not contains fact about the question asked."
    "Do not provide any preamble or explanations except for 'yes' or 'no'."),
    agent=hallucination_grader,
    context=[grader_task],
)

In [68]:
answer_task = Task(
    description=("Based on the response from the hallucination task for the quetion {question} evaluate whether the answer is useful to resolve the question."
    "If the answer is 'yes' return a clear and concise answer."
    "If the answer is 'no' then perform a 'websearch' and return the response"),
    expected_output=("Return a clear and concise response if the response from 'hallucination_task' is 'yes'."
    "Perform a web search using 'web_search_tool' and return ta clear and concise response only if the response from 'hallucination_task' is 'no'."
    "Otherwise respond as 'Sorry! unable to find a valid response'."),
    context=[hallucination_task],
    agent=answer_grader,
)

In [69]:
rag_crew = Crew(
    agents=[Router_Agent, Retriever_Agent, Grader_agent, hallucination_grader, answer_grader],
    tasks=[router_task, retriever_task, grader_task, hallucination_task, answer_task],
    verbose=True,

)



In [44]:
inputs ={"question":"How does multi-head attention influence large language models?"}


In [70]:
result = rag_crew.kickoff(inputs=inputs)

[1m[95m [DEBUG]: == Working Agent: Router[00m
[1m[95m [INFO]: == Starting Task: Analyse the keywords in the question How does multi-head attention influence large language models?Based on the keywords decide whether it is eligible for a vectorstore search or a web search.Return a single word 'vectorstore' if it is eligible for vectorstore search.Return a single word 'websearch' if it is eligible for web search.Do not provide any other premable or explaination.[00m


[1m> Entering new CrewAgentExecutor chain...[0m
[32;1m[1;3mThought: The question seems to be related to the concept of multi-head attention in large language models, which is a topic in the field of Retrieval-Augmented Generation.
Action: router_tool
Action Input: {"question": "How does multi-head attention influence large language models?"}[0m[95m 

web_search
[00m
[32;1m[1;3mFinal Answer: websearch[0m

[1m> Finished chain.[0m
[1m[92m [DEBUG]: == [Router] Task output: websearch

[00m
[1m[95m [DEBUG]:

In [71]:
print(result)

Multi-head attention is a mechanism used in large language models to allow the model to focus on different aspects of the input data simultaneously. It does this by applying different weights to different heads, which are essentially different attention mechanisms. This allows the model to capture complex relationships between different parts of the input data, which can improve its ability to generate coherent and relevant text.
