# Pdf Chatbot Using CrewAI and LlamaIndex

### Install required dependencies

In [1]:
!pip install llama-index --quiet
!pip install llama-index-llms-groq --quiet
!pip install llama-index-core --quiet
!pip install llama-index-readers-file --quiet
!pip install llama-index-tools-wolfram-alpha --quiet
!pip install llama-index-embeddings-huggingface --quiet
!pip install 'crewai[tools]' --quiet

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.5/15.5 MB[0m [31m37.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m36.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m154.8/154.8 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m337.0/337.0 kB[0m [31m13.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m295.8/295.8 kB[0m [31m14.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m26.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

### Let's setup the LLM with Groq. Sign up at Groq, it is free. Go to Groq console and get the API keys.

In [2]:
from llama_index.llms.groq import Groq

In [3]:
llm = Groq(model="llama3-70b-8192", api_key="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")

In [4]:
response = llm.complete("Explain the importance of LLMs")
print(response)

Large Language Models (LLMs) have revolutionized the field of Natural Language Processing (NLP) and have numerous applications across various industries. The importance of LLMs can be summarized as follows:

1. **Improved Language Understanding**: LLMs have demonstrated remarkable capabilities in understanding human language, including nuances, context, and subtleties. This enables them to generate more accurate and informative responses, making them valuable for applications like chatbots, virtual assistants, and language translation.
2. **Enhanced Text Generation**: LLMs can generate high-quality, coherent, and context-specific text, which has numerous applications in content creation, writing assistance, and automated reporting.
3. **Advancements in AI Research**: LLMs have pushed the boundaries of AI research, enabling the development of more sophisticated models and techniques. They have also sparked new areas of research, such as multimodal language processing and explainability.

### Crew AI requires a chat based model for binding

In [5]:
from langchain_openai import ChatOpenAI

# Define your API key here
groq_api_key = "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"

chat_llm = ChatOpenAI(
    openai_api_base="https://api.groq.com/openai/v1",
    openai_api_key=groq_api_key,
    model="llama3-70b-8192",
    temperature=0,
    max_tokens=1000,
)

### Download data

### Parse the content of the data

In [6]:
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.llms.openai import OpenAI
import os
from langchain_openai import ChatOpenAI

In [50]:
reader = SimpleDirectoryReader(input_files=["/content/Chagouti El Houcine.pdf"])
docs = reader.load_data()
docs[0]

Document(id_='83da3ceb-ee24-4453-8024-4b4c24e15bdb', embedding=None, metadata={'page_label': '1', 'file_name': 'Chagouti El Houcine.pdf', 'file_path': '/content/Chagouti El Houcine.pdf', 'file_type': 'application/pdf', 'file_size': 102839, 'creation_date': '2024-07-29', 'last_modified_date': '2024-07-29'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text="Skills: Projects: \nLanguagesAbout me: \nEducation: Internship Experience:  \nChagouti El Houcine\n● English Preparatory classes, MP at Lycée lissane Eddine Ibn Al-Khatib - Laayoune -\n       Data Engineering degree at INPT \n ● French +212636413791 | chagouti.el2002@gmail.com  |  Github   |   Linkedin \nMachine Learning, Deep Learning, Natural Language Processing (NLP), Data Mining,TensorFlow,\nPyTorch, 

### Setup the embedding model

In [11]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

In [12]:
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

### Build Index

In [51]:
index = VectorStoreIndex.from_documents(docs,
                                        embed_model=embed_model,
                                        )

### Build the Query Engine

In [52]:
query_engine = index.as_query_engine(similarity_top_k=5, llm=llm)

### Instantiate the query engine as a Tool

In [53]:
from crewai_tools import LlamaIndexTool

# Use the existing LlamaIndexTool for document analysis
document_analysis_tool = LlamaIndexTool.from_query_engine(
    query_engine,
    name="Document Analysis Tool",
    description="Use this tool to analyze documents and extract relevant information",
)


In [54]:
query_tool.args_schema.schema()

{'title': 'QueryToolSchema',
 'description': 'Schema for query tool.',
 'type': 'object',
 'properties': {'query': {'title': 'Query',
   'description': 'Search query for the query tool.',
   'type': 'string'}},
 'required': ['query']}

### Instantiate Researcher and Writer Agents

In [55]:
import os
from crewai import Agent, Task, Crew, Process

In [56]:
# Define a new agent for document analysis
document_analyst = Agent(
    role="Document Analyst",
    goal="Analyze documents to give a summary",
    backstory="""You are an expert in analyzing various documents and providing comprehensive reports.""",
    llm=chat_llm,
    verbose=True,
    allow_delegation=False,
    tools=[document_analysis_tool],
)

### Define respective tasks

In [57]:
# Adapt the tasks for the new context
task1 = Task(
    description="""give information about projects that chagouti el houcine has worked at as presented in the document 'Morocco-1-3.pdf'.""",
    expected_output="Full analysis report in bullet points",
    agent=document_analyst,
)

task2 = Task(
    description="""Using the insights provided, develop an engaging blog post that highlights projects taht chagouti el houcine worked at. Your post should be informative yet accessible, catering to a casual audience. Make it sound cool, avoid complex words.""",
    expected_output="Full blog post of at least 4 paragraphs",
    agent=writer,
)

### Instantiate the Crew with a sequential process

In [58]:
crew = Crew(
    agents=[researcher, writer],
    tasks=[task1, task2],
    verbose=2,  # You can set it to 1 or 2 to different logging levels
)



### Initiate the agent workflow: Let the magic begin!

In [59]:
result = crew.kickoff()

print("######################")
print(result)

[1m[95m [2024-07-29 18:46:29][DEBUG]: == Working Agent: Document Analyst[00m
[1m[95m [2024-07-29 18:46:29][INFO]: == Starting Task: give information about projects that chagouti el houcine has worked at as presented in the document 'Morocco-1-3.pdf'.[00m


[1m> Entering new CrewAgentExecutor chain...[0m
[32;1m[1;3mThought: I need to analyze the document 'Morocco-1-3.pdf' to extract information about projects that Chagouti El Houcine has worked on.

Action: Document Analysis Tool
Action Input: {"query": "Chagouti El Houcine"}[0m[95m 

Chagouti El Houcine is a data science junior currently studying for a data engineering degree, proficient in languages such as French and Arabic, and skilled in various technologies including Machine Learning, Deep Learning, Natural Language Processing, and more.
[00m
[32;1m[1;3mThought: I need to extract more information about the projects Chagouti El Houcine has worked on.

Action: Document Analysis Tool
Action Input: {"query": "Chagouti E