# Llama3 Cookbook with Groq

<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/cookbooks/llama3_cookbook_groq.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Meta developed and released the Meta [Llama 3](https://ai.meta.com/blog/meta-llama-3/) family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.

In this notebook, we demonstrate how to use Llama3 with LlamaIndex for a comprehensive set of use cases.
1. Basic completion / chat
2. Basic RAG (Vector Search, Summarization)
3. Advanced RAG (Routing)
4. Text-to-SQL
5. Structured Data Extraction
6. Chat Engine + Memory
7. Agents


We use Llama3-8B and Llama3-70B through Groq.

## Installation and Setup

In [1]:
!pip install llama-index
!pip install llama-index-llms-groq
!pip install llama-index-embeddings-huggingface
!pip install llama-parse

Collecting llama-index
  Downloading llama_index-0.10.65-py3-none-any.whl.metadata (11 kB)
Collecting llama-index-agent-openai<0.3.0,>=0.1.4 (from llama-index)
  Downloading llama_index_agent_openai-0.2.9-py3-none-any.whl.metadata (729 bytes)
Collecting llama-index-cli<0.2.0,>=0.1.2 (from llama-index)
  Downloading llama_index_cli-0.1.13-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-core<0.11.0,>=0.10.65 (from llama-index)
  Downloading llama_index_core-0.10.67-py3-none-any.whl.metadata (2.4 kB)
Collecting llama-index-embeddings-openai<0.2.0,>=0.1.5 (from llama-index)
  Downloading llama_index_embeddings_openai-0.1.11-py3-none-any.whl.metadata (655 bytes)
Collecting llama-index-indices-managed-llama-cloud>=0.2.0 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.2.7-py3-none-any.whl.metadata (3.8 kB)
Collecting llama-index-legacy<0.10.0,>=0.9.48 (from llama-index)
  Downloading llama_index_legacy-0.9.48.post3-py3-none-any.whl.metadata (8.5 kB)
Collec

In [2]:
import nest_asyncio

nest_asyncio.apply()

In [4]:
from google.colab import userdata
key = userdata.get('GroqAPI')

### Setup LLM using Groq

To use Groq, you need to make sure that `GROQ_API_KEY` is specified as an environment variable.

In [5]:
import os

os.environ["GROQ_API_KEY"] = "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" # Put yor Groq API key here "key"
#os.environ["LLAMA_CLOUD_API_KEY"]= "llx-ro0DgjBcQoWAXsUXQTNmF79mrlZv26foWTAsfp8tapbhKOn0"

In [6]:
from llama_index.llms.groq import Groq

llm = Groq(model="llama3-8b-8192")
#llm_70b = Groq(model="llama3-70b-8192")

### Setup Embedding Model

In [7]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

### Define Global Settings Configuration

In LlamaIndex, you can define global settings so you don't have to pass the LLM / embedding model objects everywhere.

In [8]:
from llama_index.core import Settings

Settings.llm = llm
Settings.embed_model = embed_model

### Download Data

Here you'll download data that's used in section 2 and onwards.

We'll download some articles on Kendrick, Drake, and their beef (as of May 2024).

In [9]:
!mkdir data
!wget "https://www.dropbox.com/scl/fi/t1soxfjdp0v44an6sdymd/drake_kendrick_beef.pdf?rlkey=u9546ymb7fj8lk2v64r6p5r5k&st=wjzzrgil&dl=1" -O data/drake_kendrick_beef.pdf
!wget "https://www.dropbox.com/scl/fi/nts3n64s6kymner2jppd6/drake.pdf?rlkey=hksirpqwzlzqoejn55zemk6ld&st=mohyfyh4&dl=1" -O data/drake.pdf
!wget "https://www.dropbox.com/scl/fi/8ax2vnoebhmy44bes2n1d/kendrick.pdf?rlkey=fhxvn94t5amdqcv9vshifd3hj&st=dxdtytn6&dl=1" -O data/kendrick.pdf

--2024-08-19 17:47:18--  https://www.dropbox.com/scl/fi/t1soxfjdp0v44an6sdymd/drake_kendrick_beef.pdf?rlkey=u9546ymb7fj8lk2v64r6p5r5k&st=wjzzrgil&dl=1
Resolving www.dropbox.com (www.dropbox.com)... 162.125.4.18, 2620:100:6019:18::a27d:412
Connecting to www.dropbox.com (www.dropbox.com)|162.125.4.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://ucd77b3424a4d24ececeb4de8a66.dl.dropboxusercontent.com/cd/0/inline/CY8GnuDd4dJB-wEgsWlsrl1EXhkfbu2QQXwfv21p8Ru5pfW2MFKHF6OCWZ0EhKcv4PUlJDz62NDgs7hlEG8N0AIOLOr6IfjR5xahebrmKDlwNSYD4fdkDEHNjTO3mQiaNT4/file?dl=1# [following]
--2024-08-19 17:47:19--  https://ucd77b3424a4d24ececeb4de8a66.dl.dropboxusercontent.com/cd/0/inline/CY8GnuDd4dJB-wEgsWlsrl1EXhkfbu2QQXwfv21p8Ru5pfW2MFKHF6OCWZ0EhKcv4PUlJDz62NDgs7hlEG8N0AIOLOr6IfjR5xahebrmKDlwNSYD4fdkDEHNjTO3mQiaNT4/file?dl=1
Resolving ucd77b3424a4d24ececeb4de8a66.dl.dropboxusercontent.com (ucd77b3424a4d24ececeb4de8a66.dl.dropboxusercontent.com)... 162.125.4.15, 2620:100:6

### Load Data

We load data using LlamaParse by default, but you can also choose to opt for our free pypdf reader (in SimpleDirectoryReader by default) if you don't have an account!

1. LlamaParse: Signup for an account here: cloud.llamaindex.ai. You get 1k free pages a day, and paid plan is 7k free pages + 0.3c per additional page. LlamaParse is a good option if you want to parse complex documents, like PDFs with charts, tables, and more.

2. Default PDF Parser (In `SimpleDirectoryReader`). If you don't want to signup for an account / use a PDF service, just use the default PyPDF reader bundled in our file loader. It's a good choice for getting started!

In [10]:
# from llama_parse import LlamaParse

# docs_kendrick = LlamaParse(result_type="text").load_data("./data/kendrick.pdf")
# docs_drake = LlamaParse(result_type="text").load_data("./data/drake.pdf")
# docs_both = LlamaParse(result_type="text").load_data(
#     "./data/drake_kendrick_beef.pdf"
# )


from llama_index.core import SimpleDirectoryReader

docs_kendrick = SimpleDirectoryReader(input_files=["data/kendrick.pdf"]).load_data()
docs_drake = SimpleDirectoryReader(input_files=["data/drake.pdf"]).load_data()
docs_both = SimpleDirectoryReader(input_files=["data/drake_kendrick_beef.pdf"]).load_data()

## 1. Basic Completion and Chat

### Call complete with a prompt

In [11]:
response = llm.complete("do you like drake or kendrick better?")

print(response)

I'm just an AI, I don't have personal preferences or opinions, nor do I have the capacity to listen to music or enjoy it in the way humans do. I can provide information and insights about both Drake and Kendrick Lamar, though!

Both artists are highly acclaimed and respected in the music industry, and for good reason. Drake is known for his introspective and emotive lyrics, as well as his ability to blend hip-hop with R&B and pop. He has a wide range of hits, from "God's Plan" to "One Dance" to "In My Feelings".

Kendrick Lamar, on the other hand, is widely regarded as one of the most influential and innovative rappers of his generation. He's known for his socially conscious lyrics, which often address issues like racism, police brutality, and black empowerment. His music often incorporates elements of jazz, funk, and spoken word, and he's won numerous awards, including multiple Grammys.

Ultimately, the choice between Drake and Kendrick Lamar comes down to personal taste. If you prefe

In [12]:
stream_response = llm.stream_complete(
    "you're a drake fan. tell me why you like drake more than kendrick"
)

for t in stream_response:
    print(t.delta, end="")

Man, this is a tough one! As a Drake fan, I gotta give you my honest reasons why I vibe with 6 God more than Kung Fu Kenny (just kidding, Kendrick's cool too).

Here are a few reasons why I prefer Drake's music over Kendrick's:

1. **Relatability**: Drake's lyrics often focus on his personal experiences, relationships, and emotions. I can relate to his struggles with fame, love, and self-doubt. His songs like "Marvin's Room" and "Take Care" speak directly to my soul. Kendrick's lyrics, while powerful, can be more abstract and less relatable to my everyday life.

2. **Melodic flow**: Drake's melodic flow is unmatched! His ability to blend rap with R&B and create catchy hooks is unparalleled. Songs like "God's Plan" and "One Dance" are infectious and get stuck in my head for days. Kendrick's flow is more aggressive and less melodic, which isn't always my cup of tea.

3. **Vulnerability**: Drake's music often showcases his vulnerability, which I find endearing. He's not afraid to share hi

### Call chat with a list of messages

In [13]:
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(role="system", content="You are Kendrick."),
    ChatMessage(role="user", content="Write a verse."),
]
response = llm.chat(messages)

In [14]:
print(response)

assistant: "I'm the king of the game, no debate
My rhymes so sharp, they'll leave you in a state
Of confusion, like a puzzle unsolved
My flow's on fire, and my words are gold
I'm the voice of the streets, the people's choice
My message is loud, and my presence is noise
I'm the one they all come to see
The real deal, no imitation, just me"


## 2. Basic RAG (Vector Search, Summarization)

### Basic RAG (Vector Search)

In [15]:
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(docs_both)
query_engine = index.as_query_engine(similarity_top_k=3)

In [16]:
response = query_engine.query("Tell me about family matters")

In [17]:
print(str(response))

On "Family Matters," Drake comes to the battle with a fully loaded clip, essentially three songs in one, on three different beats. The song is a seven-and-a-half-minute diss track with an accompanying video.


### Basic RAG (Summarization)

In [18]:
from llama_index.core import SummaryIndex

summary_index = SummaryIndex.from_documents(docs_both)
summary_engine = summary_index.as_query_engine()

In [19]:
response = summary_engine.query(
    "Given your assessment of this article, who won the beef?"
)

In [20]:
print(str(response))

I'll rewrite the original answer using the new context.

The verbal sparring match between Kendrick Lamar and Drake has reached new heights, with both artists trading blows in a series of diss tracks. While Drake's response, "The Heart Part 6," shows he's not backing down, Kendrick's strategic approach and clever wordplay have given him the upper hand. His ability to turn the tables and hit Drake with two diss tracks in under 24 hours has left many wondering who's really in control.


## 3. Advanced RAG (Routing)

### Build a Router that can choose whether to do vector search or summarization

In [21]:
from llama_index.core.tools import QueryEngineTool, ToolMetadata

vector_tool = QueryEngineTool(
    index.as_query_engine(),
    metadata=ToolMetadata(
        name="vector_search",
        description="Useful for searching for specific facts.",
    ),
)

summary_tool = QueryEngineTool(
    index.as_query_engine(response_mode="tree_summarize"),
    metadata=ToolMetadata(
        name="summary",
        description="Useful for summarizing an entire document.",
    ),
)

The code snippet you've provided defines two QueryEngineTool objects, vector_tool and summary_tool, using the llama_index library (or a similar library). Here's a summary of what each part of the code does:

1. QueryEngineTool Definition:
QueryEngineTool is used to create tools that can query an index and return specific types of responses based on how they are configured.
2. vector_tool:
Purpose: This tool is designed for searching specific facts within the index.
###Configuration:
Query Engine: Uses index.as_query_engine() with default settings to retrieve results based on similarity or relevance.
Metadata: The tool is named "vector_search" and described as "Useful for searching for specific facts."
Usage: Ideal for retrieving precise information or data points from the indexed content.
3. summary_tool:
Purpose: This tool is designed for summarizing an entire document or content.
###Configuration:
- Query Engine: Uses index.as_query_engine(response_mode="tree_summarize"), which is configured to summarize the content hierarchically (possibly summarizing sections before summarizing the entire document).
- Metadata: The tool is named "summary" and described as "Useful for summarizing an entire document."
Usage: Ideal for generating concise summaries of larger pieces of text or documents.
###Summary:
vector_tool: A query tool tailored for searching specific facts within an index.
summary_tool: A query tool designed for summarizing entire documents or large texts.
These tools allow you to interact with the indexed data in a flexible way, either by retrieving specific details or summarizing broader content.

In [23]:
from llama_index.core.query_engine import RouterQueryEngine

query_engine = RouterQueryEngine.from_defaults(
    [vector_tool, summary_tool], select_multi=False, verbose=True, llm=llm
)

response = query_engine.query(
    "Tell me about the song meet the grahams - why is it significant"
)

[1;3;38;5;200mSelecting query engine 0: Useful for summarizing an entire document.
[0m

In [24]:
print(response)

"Meet the Grahams" is a song that won't travel as far up the charts as some of Kendrick's other tracks, but it's built to work in the same functions, with a call-and-response refrain. The song is significant because it showcases Kendrick's ability to crack a good sophomoric joke, with lines that reference Drake's album title and flip the 6ix God moniker. The track also features a verse dedicated to Atlanta, where Kendrick argues that Drake is a "colonizer" who vamps on other cities for their style and swag.


## 4. Text-to-SQL

Here, we download and use a sample SQLite database with 11 tables, with various info about music, playlists, and customers. We will limit to a select few tables for this test.

In [25]:
!wget "https://www.sqlitetutorial.net/wp-content/uploads/2018/03/chinook.zip" -O "./data/chinook.zip"
!unzip "./data/chinook.zip"

--2024-08-19 17:55:40--  https://www.sqlitetutorial.net/wp-content/uploads/2018/03/chinook.zip
Resolving www.sqlitetutorial.net (www.sqlitetutorial.net)... 172.67.172.250, 104.21.30.141, 2606:4700:3037::6815:1e8d, ...
Connecting to www.sqlitetutorial.net (www.sqlitetutorial.net)|172.67.172.250|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 305596 (298K) [application/zip]
Saving to: ‘./data/chinook.zip’


2024-08-19 17:55:40 (54.1 MB/s) - ‘./data/chinook.zip’ saved [305596/305596]

Archive:  ./data/chinook.zip
  inflating: chinook.db              


In [26]:
from sqlalchemy import (
    create_engine,
    MetaData,
    Table,
    Column,
    String,
    Integer,
    select,
    column,
)

engine = create_engine("sqlite:///chinook.db")

###1. Imports from SQLAlchemy:
- create_engine: This function is used to create a connection to the database. It returns an Engine object that manages the connection pool and handles communication with the database.
- MetaData: This object contains information about the schema of the database, such as tables and columns. It can be used to reflect existing tables or define new ones.
- Table: A class representing a table in the database. You can use it to define or map tables in your Python code.
- Column: A class representing a column in a table. It is used to define the schema of the table.
- String, Integer: Data types that can be used to define the type of data stored in a column (e.g., text, integers).
select, column: Functions used to construct SQL queries in a Pythonic way.
###2. Creating the Engine:
- engine = create_engine("sqlite:///chinook.db"):
This line creates a connection to a SQLite database file named chinook.db. The sqlite:/// prefix indicates that SQLAlchemy should use SQLite as the database engine, and the file chinook.db is located in the same directory as the script.

### Summary:
This code sets up the necessary components to interact with a SQLite database using SQLAlchemy. The create_engine function establishes a connection to the chinook.db database, allowing you to perform operations like querying, updating, or defining the database schema. The imported classes and functions (like Table, Column, and select) are typically used for defining tables and executing SQL queries within this database.

In [27]:
from llama_index.core import SQLDatabase

sql_database = SQLDatabase(engine)

In [29]:
from llama_index.core.indices.struct_store import NLSQLTableQueryEngine

query_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,
    tables=["albums", "tracks", "artists"],
    llm=llm,
)

The code snippet you provided is configuring a NLSQLTableQueryEngine using the llama_index library (or a similar library). Here's a breakdown of what each part does:

###Components Breakdown:
- Importing NLSQLTableQueryEngine:

The NLSQLTableQueryEngine is imported from the llama_index.core.indices.struct_store module. This tool is used to create a query engine that can interact with SQL databases through natural language queries.
- Setting Up the Query Engine:

query_engine = NLSQLTableQueryEngine(...): This line creates an instance of the NLSQLTableQueryEngine.
sql_database=sql_database: This parameter specifies the SQL database that the query engine will interact with. The sql_database object should already be defined elsewhere in the code and likely represents a connection to a specific SQL database (such as the one created with SQLAlchemy in previous examples).
- tables=["albums", "tracks", "artists"]: This specifies the tables within the SQL database that the query engine will be able to query. In this case, it’s limited to the albums, tracks, and artists tables.
- llm=llm: This parameter assigns the language model (llm) that will process the natural language queries and translate them into SQL commands. The llm object should be a pre-configured language model capable of understanding and generating SQL queries based on natural language input.
###What Does This Do?
- Natural Language to SQL: The NLSQLTableQueryEngine allows users to query the specified SQL tables (albums, tracks, artists) using natural language. The language model (llm) interprets the natural language input and converts it into SQL queries, which are then executed against the specified database.

- Focused Querying: By limiting the query engine to specific tables, you ensure that the queries are only executed against relevant sections of the database, making the querying process more efficient and easier to manage.

In [30]:
response = query_engine.query("What are some albums?")

print(response)

Here's a synthesized response based on the query results:

**Albums Galore!**

We've got a treasure trove of albums for you! Our database contains a vast collection of music from various genres and eras. Here are some of the albums that caught our attention:

* Classic Rock: For Those About To Rock We Salute You, Balls to the Wall, Restless and Wild, Let There Be Rock, and many more!
* Metal: Kill 'Em All, Master of Puppets, Ride the Lightning, and Black Album
* Pop: Jagged Little Pill, Nevermind, and Backstreet Boys
* Classical: Bach, Beethoven, Mozart, and more!
* Jazz: Miles Davis, John Coltrane, and Billie Holiday
* Latin: Jorge Ben Jor, Cássia Eller, and Olodum
* Electronic: Daft Punk, Chemical Brothers, and Moby
* Hip-Hop: Nas, Jay-Z, and Kanye West
* Country: Garth Brooks, Dolly Parton, and Willie Nelson
* Folk: Bob Dylan, Joni Mitchell, and Simon & Garfunkel

These are just a few examples of the many amazing albums out there. Whether you're in the mood for something classic, mo

In [31]:
response = query_engine.query("What are some artists? Limit it to 5.")

print(response)

Here is a synthesized response based on the query results:

Here are 5 artists:

1. AC/DC
2. Accept
3. Aerosmith
4. Alanis Morissette
5. Alice In Chains


This last query should be a more complex join

In [33]:
response = query_engine.query(
    "What are some tracks from the artist AC/DC? Limit it to 3"
)

print(response)

Here are three tracks from the artist AC/DC:

1. "Bad Boy Boogie" from the album "Let There Be Rock"
2. "Breaking The Rules" from the album "For Those About To Rock We Salute You"
3. "C.O.D." from the album "For Those About To Rock We Salute You"


In [34]:
print(response.metadata["sql_query"])

SELECT tracks.Name, albums.Title FROM tracks INNER JOIN albums ON tracks.AlbumId = albums.AlbumId INNER JOIN artists ON albums.ArtistId = artists.ArtistId WHERE artists.Name = 'AC/DC' ORDER BY tracks.Name LIMIT 3


## 5. Structured Data Extraction

An important use case for function calling is extracting structured objects. LlamaIndex provides an intuitive interface for this through `structured_predict` - simply define the target Pydantic class (can be nested), and given a prompt, we extract out the desired object.

**NOTE**: Since there's no native function calling support with Llama3, the structured extraction is performed by prompting the LLM + output parsing.

In [35]:
from llama_index.llms.groq import Groq
from llama_index.core.prompts import PromptTemplate
from pydantic import BaseModel


class Restaurant(BaseModel):
    """A restaurant with name, city, and cuisine."""

    name: str
    city: str
    cuisine: str


llm = Groq(model="llama3-8b-8192", pydantic_program_mode="llm")
prompt_tmpl = PromptTemplate(
    "Generate a restaurant in a given city {city_name}"
)

This explanation introduces the concept of "function calling" in the context of using large language models (LLMs) like those provided by LlamaIndex. Here’s what it means in simpler terms:

## What is Function Calling in LLMs?
Function calling in LLMs refers to the process where you ask the model to generate structured information (like filling out a form or creating an object with specific fields) based on a natural language prompt.

### What is Structured Extraction?
Structured extraction is the process of pulling out specific, organized information from a text. For example, if you have a paragraph describing a person and you want to extract their name, age, and job into a structured format (like a list or a table), that's structured extraction.

### What is Pydantic?
Pydantic is a Python library used to define structured data models. Think of it as a way to specify what information you want to extract and how it should be organized. You define a "Pydantic class" to represent this structure.

### How Does This Work with LlamaIndex?
LlamaIndex provides a feature called structured_predict that helps you extract structured information from a text using an LLM. Here's how it works in simple terms:

- Define the Structure: You define a target Pydantic class, which is like setting up a template for the information you want. For example, you might define a class with fields like name, age, and occupation.

- Prompt the Model: You give the LLM a prompt (a question or a statement) that describes what you want to extract. For example, you might ask the model to "Extract the name, age, and job from the following text."

- Model Processes the Prompt: The LLM reads the text, understands the prompt, and tries to fill in the structure you defined (the Pydantic class) with the relevant information from the text.

- Output Parsing: Since there's no direct function calling in Llama3 (meaning it can't automatically generate structured objects like some other models might), the LLM creates the structured data by reading the text and organizing the information according to the structure you defined. The model’s response is then parsed (processed) to match your desired format.

### Key Takeaway
LlamaIndex allows you to extract organized, structured data from text using LLMs by defining the structure you want and prompting the model. Even though Llama3 doesn't support native function calling, this process is handled through smart prompting and parsing of the model's output to achieve the same result.

In essence, it's like asking the model to fill out a form with information extracted from a text, even though it doesn't have a built-in feature to do this directly—so it does it through understanding and generating the correct format based on your instructions.

In [36]:
restaurant_obj = llm.structured_predict(
    Restaurant, prompt_tmpl, city_name="Miami"
)
print(restaurant_obj)

name='La Casa de Tapas' city='Miami' cuisine='Spanish'


## 6. Adding Chat History to RAG (Chat Engine)

In this section we create a stateful chatbot from a RAG pipeline, with our chat engine abstraction.

Unlike a stateless query engine, the chat engine maintains conversation history (through a memory module like buffer memory). It performs retrieval given a condensed question, and feeds the condensed question + context + chat history into the final LLM prompt.

Related resource: https://docs.llamaindex.ai/en/stable/examples/chat_engine/chat_engine_condense_plus_context/

In [37]:
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.chat_engine import CondensePlusContextChatEngine

memory = ChatMemoryBuffer.from_defaults(token_limit=3900)

chat_engine = CondensePlusContextChatEngine.from_defaults(
    index.as_retriever(),
    memory=memory,
    llm=llm,
    context_prompt=(
        "You are a chatbot, able to have normal interactions, as well as talk"
        " about the Kendrick and Drake beef."
        "Here are the relevant documents for the context:\n"
        "{context_str}"
        "\nInstruction: Use the previous chat history, or the context above, to interact and help the user."
    ),
    verbose=True,
)

In [38]:
response = chat_engine.chat(
    "Tell me about the songs Drake released in the beef."
)
print(str(response))

Condensed question: Tell me about the songs Drake released in the beef.
Context: page_label: 31
file_path: data/drake_kendrick_beef.pdf

Culture
Shaboo zey’s Cowboy Carter Features Were Only the Be ginning
By Heven Haile
Sign up for Manual, our new flagship newsletter
Useful advice on style, health, and more, four days a week.
5/10/24, 10:08 PM The Kendrick Lamar/Drake Beef, Explained | GQ
https://www.gq.com/story/the-kendrick-lamar-drake-beef-explained 31/34

page_label: 18
file_path: data/drake_kendrick_beef.pdf

Kurrco
@Kurrco·Follow
KENDRICK LAMAR
6 16 IN LA
(DRAKE DISS)
OUT NOW 
This video has been deleted.
6 08 AM · May 3, 2024
59.3K Reply Copy link
Read 1.3K replies
After all this talk about “the clock,” who among us expected Kendrick to follow up his
own titanic diss track with another missile just three days later? Friday morning he
released “6:16 in LA,” with its title of course being a nod to Drake's series of time-stamp-
Sign up for Manual, our new flagship newsletter
Usefu

In [39]:
response = chat_engine.chat("What about Kendrick?")
print(str(response))

Condensed question: What did Kendrick Lamar release during the beef with Drake?
Context: page_label: 4
file_path: data/drake_kendrick_beef.pdf

Nevertheless they linked up for the excellent “Poetic Justice” on Kendrick’s seminal good
kid, M.a.a.d. City album—but the collaborative vibes stopped a year later, after Drake
was one of the many peers Kendrick named in his timeline-stopping, call-to-arms verse
on Big Sean’s “Control.” A month or two after that moment, Drake dropped Nothing Was
the Same, and in an interview with Elliott Wilson, slickly managed to give Kendrick his
props while dismissing the verse at the same time. Fast forward a month, to a cypher at
the 2013 BET Hip-Hop Awards where Kendrick Lamar rapped “Nothing’s been the
same since they dropped ‘Control’/and tucked the sensitive rapper back in his pajama
clothes.” Two months later: Drake hops on a remix to Future’s titanic “Sh!t” and ends his
verse with “Fuckn-ggas, gon be fuckn-ggas/that’s why we never gave a fuck/when a


## 7. Agents

Here we build agents with Llama 3. We perform RAG over simple functions as well as the documents above.

### Agents And Tools

In [40]:
import json
from typing import Sequence, List

from llama_index.core.llms import ChatMessage
from llama_index.core.tools import BaseTool, FunctionTool
from llama_index.core.agent import ReActAgent

import nest_asyncio

nest_asyncio.apply()

##Summary:
###Importing Libraries:

- json: Used to work with JSON data (which is a way to structure and store data).
typing: Used to define the types of variables, specifically sequences and lists in this case.
- llama_index.core.llms: Provides classes like ChatMessage, which helps in creating messages for the AI to process.
- llama_index.core.tools: Contains tools like BaseTool and FunctionTool that can be used to create custom functions or tools that the AI agent can use.
- llama_index.core.agent: Provides the ReActAgent, which is a type of AI agent that can use tools and respond to queries based on a sequence of actions.
### Nest Asyncio:

- nest_asyncio: Allows asynchronous code to run in environments that might not natively support it (like Jupyter notebooks). It’s used to make sure the agent can run smoothly in these environments.
#### What It Does:
The code is setting up the foundation for an AI agent that can take in chat messages, use custom tools, and make decisions based on a sequence of actions. It’s designed to handle both synchronous and asynchronous tasks, making it versatile in different environments.
In simple terms, this setup allows you to create an intelligent agent that can interact with users, use tools, and perform tasks based on complex decision-making processes.

### Define Tools

In [41]:
def multiply(a: int, b: int) -> int:
    """Multiple two integers and returns the result integer"""
    return a * b


def add(a: int, b: int) -> int:
    """Add two integers and returns the result integer"""
    return a + b


def subtract(a: int, b: int) -> int:
    """Subtract two integers and returns the result integer"""
    return a - b


def divide(a: int, b: int) -> int:
    """Divides two integers and returns the result integer"""
    return a / b


multiply_tool = FunctionTool.from_defaults(fn=multiply)
add_tool = FunctionTool.from_defaults(fn=add)
subtract_tool = FunctionTool.from_defaults(fn=subtract)
divide_tool = FunctionTool.from_defaults(fn=divide)

This code defines basic arithmetic functions (multiply, add, subtract, divide) and then wraps each of these functions in a FunctionTool object using the FunctionTool.from_defaults method.

### Summary:
Arithmetic Functions:

multiply, add, subtract, and divide are functions that perform basic arithmetic operations on two integers.
### Creating Tools:

Each of these functions is turned into a FunctionTool, which allows them to be used as tools within a larger system, such as an AI agent or a software application.
In essence, this code sets up simple mathematical operations as tools that can be easily used or called by other parts of a program, especially in contexts where functions need to be dynamically invoked or used as part of an AI-driven process.

## Integration with Larger Systems:
Interoperability: In complex systems, the AI agent might need to interact with various components or services. Tools act as bridges that allow the agent to communicate with and control these components.
Custom Functions: If your AI agent is part of a larger application, tools can be custom functions that integrate seamlessly with the rest of the system, making the agent a more effective part of the overall workflow.
### Summary:
Tools in AI agents are like specialized functions or skills that the agent can use to perform specific tasks. They make the agent more powerful, flexible, and able to interact with users or systems in a meaningful way. By encapsulating these functions as tools, you can easily manage, reuse, and integrate them into complex AI-driven processes.

### ReAct Agent

In [43]:
agent = ReActAgent.from_tools(
    [multiply_tool, add_tool, subtract_tool, divide_tool],
    llm=llm,
    verbose=True,
)

### Querying

In [44]:
response = agent.chat("What is (121 + 2) * 5?")
print(str(response))

> Running step b6b43e59-8695-4670-9741-c28bac313715. Step input: What is (121 + 2) * 5?
[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: add
Action Input: {'a': 121, 'b': 2}
[0m[1;3;34mObservation: 123
[0m> Running step 6bf64bf4-346f-406b-bb34-ce3519a94b1d. Step input: None
[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: multiply
Action Input: {'a': 123, 'b': 5}
[0m[1;3;34mObservation: 615
[0m> Running step e3de5739-baf5-487d-8eb6-f6dd4ef740b1. Step input: None
[1;3;38;5;200mThought: I can answer without using any more tools. I'll use the user's language to answer
Answer: The result of the expression (121 + 2) * 5 is 615.
[0mThe result of the expression (121 + 2) * 5 is 615.


### ReAct Agent With RAG QueryEngine Tools

In [48]:
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    StorageContext,
    load_index_from_storage,
)

from llama_index.core.tools import QueryEngineTool, ToolMetadata

In [54]:
#Load the data (Kendrick and Drake documents):
from llama_index.core import SimpleDirectoryReader

# Loading documents
docs_kendrick = SimpleDirectoryReader(input_files=["data/kendrick.pdf"]).load_data()
docs_drake = SimpleDirectoryReader(input_files=["data/drake.pdf"]).load_data()
docs_both = SimpleDirectoryReader(input_files=["data/drake_kendrick_beef.pdf"]).load_data()

#Creating drake_index & kendrick_index:
from llama_index.core import VectorStoreIndex

# Create indexes from the documents
drake_index = VectorStoreIndex.from_documents(docs_drake)
kendrick_index = VectorStoreIndex.from_documents(docs_kendrick)

### Create ReAct Agent using RAG QueryEngine Tools

In [55]:


drake_tool = QueryEngineTool(
    drake_index.as_query_engine(),
    metadata=ToolMetadata(
        name="drake_search",
        description="Useful for searching over Drake's life.",
    ),
)

kendrick_tool = QueryEngineTool(
    kendrick_index.as_query_engine(),
    metadata=ToolMetadata(
        name="kendrick_search",
        description="Useful for searching over Kendrick's life.",
    ),
)

query_engine_tools = [drake_tool, kendrick_tool]

In [57]:
agent = ReActAgent.from_tools(
    query_engine_tools,  ## TODO: define query tools
    llm=llm,
    verbose=True,
)

### Querying

In [58]:
response = agent.chat("Tell me about how Kendrick and Drake grew up")
print(str(response))

> Running step a6d8d60a-e82b-4dc5-824f-8a515f8ed8aa. Step input: Tell me about how Kendrick and Drake grew up
[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: kendrick_search
Action Input: {'input': 'childhood', 'type': 'biography'}
[0m[1;3;34mObservation: Lamar's childhood was marked by his precocious behavior, which earned him the nickname "Man-Man". He was a quiet and observant student who excelled academically and had a noticeable stutter. His first-grade teacher encouraged him to become a writer after he correctly used the word "audacity". He was introduced to poetry by his English teacher, Regis Inge, who integrated it into his curriculum as a response to growing racial tensions among his students. Lamar struggled with psychological trauma and depression during his adolescence, which he managed through writing lyrics.
[0m> Running step 17e90284-735f-4f39-9056-d658918cd891. Step input: None
[1;3;