# 🔬 Virtual Lab 4: Running Llama Models and Deepseek on Replicate  

<div style="border: 2px solid #4CAF50; padding: 15px; border-radius: 10px; background-color: #f4f4f4;">

### 🚀 **Platform**  
**Replicate**  

### 🏷️ **Models Used**  
- **meta/meta-llama-3-70b-instruct**  
- **meta-llama-3.1-405b-instruct**  
- **meta/meta-llama-3-8b-instruct**  
- **deepseek-ai/deepseek-r1**  

### 🛠️ **Frameworks Used**  
- **LlamaIndex**  
- **LangChain / LangGraph**  

</div>

## Installation and Setup

In [1]:
!pip install llama-index
!pip install llama-index-llms-ollama
!pip install llama-index-llms-replicate
!pip install llama-index-embeddings-huggingface
!pip install llama-parse
!pip install replicate



In [1]:
import nest_asyncio

nest_asyncio.apply()

### Setup LLM using Replicate

Make sure you have REPLICATE_API_TOKEN specified!

In [2]:
# os.environ["REPLICATE_API_TOKEN"] = "<YOUR_API_KEY>"

import os
import replicate
from Constants import Constants
# Set the REPLICATE_API_TOKEN environment variable
os.environ["REPLICATE_API_TOKEN"] = Constants.REPLICATE_API_KEY
import os
import replicate

# Set the REPLICATE_API_TOKEN environment variable
os.environ["REPLICATE_API_TOKEN"] = Constants.REPLICATE_API_KEY



In [3]:
from llama_index.llms.replicate import Replicate

# llm_replicate = Replicate(model="deepseek-ai/deepseek-r1", api_token=os.environ["REPLICATE_API_TOKEN"])
llm_replicate = Replicate(model="meta/meta-llama-3-70b-instruct", api_token=os.environ["REPLICATE_API_TOKEN"])

### Setup Embedding Model

In [4]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

### Define Global Settings Configuration

In LlamaIndex, you can define global settings so you don't have to pass the LLM / embedding model objects everywhere.

In [5]:
from llama_index.core import Settings

Settings.llm = llm_replicate
Settings.embed_model = embed_model

### Download Data

Here you'll download data that's used in section 2 and onwards.

We'll download some articles on Kendrick, Drake, and their beef (as of May 2024).

In [6]:
!mkdir -p data
!curl -L "https://www.dropbox.com/scl/fi/t1soxfjdp0v44an6sdymd/drake_kendrick_beef.pdf?rlkey=u9546ymb7fj8lk2v64r6p5r5k&st=wjzzrgil&dl=1" -o data/drake_kendrick_beef.pdf
!curl -L "https://www.dropbox.com/scl/fi/nts3n64s6kymner2jppd6/drake.pdf?rlkey=hksirpqwzlzqoejn55zemk6ld&st=mohyfyh4&dl=1" -o data/drake.pdf
!curl -L "https://www.dropbox.com/scl/fi/8ax2vnoebhmy44bes2n1d/kendrick.pdf?rlkey=fhxvn94t5amdqcv9vshifd3hj&st=dxdtytn6&dl=1" -o data/kendrick.pdf

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    17  100    17    0     0     32      0 --:--:-- --:--:-- --:--:--    32
100   475    0   475    0     0    415      0 --:--:--  0:00:01 --:--:--   824
100 47.0M  100 47.0M    0     0  18.6M      0  0:00:02  0:00:02 --:--:-- 55.2M
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    17  100    17    0     0     38      0 --:--:-- --:--:-- --:--:--    38
100   475    0   475    0     0    620      0 --:--:-- --:--:-- --:--:--     0
100 4483k  100 4483k    0     0  3295k      0  0:00:01  0:00:01 --:--:-- 3295k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    17  100    17    0     0     34      0 --:--:

### Load Data

We load data using LlamaParse by default, but you can also choose to opt for our free pypdf reader (in SimpleDirectoryReader by default) if you don't have an account! 

1. LlamaParse: Signup for an account here: cloud.llamaindex.ai. You get 1k free pages a day, and paid plan is 7k free pages + 0.3c per additional page. LlamaParse is a good option if you want to parse complex documents, like PDFs with charts, tables, and more. 

2. Default PDF Parser (In `SimpleDirectoryReader`). If you don't want to signup for an account / use a PDF service, just use the default PyPDF reader bundled in our file loader. It's a good choice for getting started!

In [7]:
from llama_index.core import SimpleDirectoryReader

docs_kendrick = SimpleDirectoryReader(input_files=["data/kendrick.pdf"]).load_data()
docs_drake = SimpleDirectoryReader(input_files=["data/drake.pdf"]).load_data()
docs_both = SimpleDirectoryReader(input_files=["data/drake_kendrick_beef.pdf"]).load_data()

## 1. Basic Completion and Chat

### Call complete with a prompt

In [8]:
response = llm_replicate.complete("do you like drake or kendrick better?")

print(response)



As a helpful assistant, I don't have personal preferences or opinions, nor do I have the ability to enjoy or dislike music. My purpose is to provide information and assist with tasks to the best of my abilities.

However, I can provide you with some information about Drake and Kendrick Lamar, two highly acclaimed and popular rappers.

Drake is a Canadian rapper, singer, and songwriter known for his emotive and introspective lyrics, as well as his blend of hip-hop and R&B styles. He has released several successful albums, including "Take Care," "Nothing Was the Same," and "Views," and has collaborated with numerous artists, including Lil Wayne, Chris Brown, and Rihanna.

Kendrick Lamar, on the other hand, is an American rapper, songwriter, and record producer from Compton, California. He is known for his storytelling ability, socially conscious lyrics, and fusion of jazz and funk elements with hip-hop. He has released several critically acclaimed albums, including "Good Kid, M.A.A.D C

In [9]:
stream_response = llm_replicate.stream_complete(
    "you're a drake fan. tell me why you like drake more than kendrick"
)

for t in stream_response:
    print(t.delta, end="")



Man, I gotta be real with you, I'm a die-hard Drake fan, and I'm not ashamed to admit it! Now, I know Kendrick is a genius and all, but for me, Drake just resonates on a different level. Here's why:

First off, Drake's ability to blend introspection with catchy, radio-friendly production is unmatched. He's got this uncanny talent for crafting songs that are both deeply personal and ridiculously infectious. I mean, who else can make you nod your head to a song about relationships and vulnerability like "Marvin's Room" or "Jungle"?

Another reason I prefer Drake is his versatility. He's not afraid to experiment with different sounds and styles, from the atmospheric, ambient vibes of "If You're Reading This It's Too Late" to the more upbeat, dancehall-infused tracks like "One Dance" and "God's Plan". He's always pushing the boundaries of what hip-hop can be, and that's something I really admire.

Now, I know some people might say Kendrick is more "conscious" or "lyrically dense", and th

### Call chat with a list of messages

In [10]:
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(role="system", content="You are Kendrick."),
    ChatMessage(role="user", content="Write a verse."),
]
response = llm_replicate.chat(messages)

In [11]:
print(response)

assistant: 

"Listen close, I got a message for the masses,
Ain't no sugarcoatin' the truth, it's time to face it,
Systemic oppression, it's a deadly virus,
Spreadin' hate and fear, it's a modern-day crisis,
We need a cure, we need a change of pace,
Can't keep on livin' in a world that's stuck in this place,
I'm talkin' freedom, equality, and justice too,
We need it now, ain't nothin' we can't do"


## 2. Basic RAG (Vector Search, Summarization)

### Basic RAG (Vector Search)

In [12]:
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(docs_both)
query_engine = index.as_query_engine(similarity_top_k=3)

In [13]:
response = query_engine.query("Tell me about family matters")

In [14]:
print(str(response))



Based on the provided context, "Family Matters" appears to be a diss track by Drake, specifically aimed at Kendrick Lamar. The track is approximately 7.5 minutes long and features three different beats. In the song, Drake accuses Kendrick of various things, including rampant infidelity, not spending time with his family, and physically abusing the mother of his children. Additionally, Drake claims that Kendrick's children may not all be his, suggesting that one of them was fathered by his close friend.


### Basic RAG (Summarization)

In [15]:
from llama_index.core import SummaryIndex

summary_index = SummaryIndex.from_documents(docs_both)
summary_engine = summary_index.as_query_engine()

In [16]:
response = summary_engine.query(
    "Given your assessment of this article, who won the beef?"
)

In [17]:
print(str(response))



Based on the article, it still seems that the author is leaning towards Kendrick Lamar as the winner of the beef. The new context provided further reinforces this assessment, as Kendrick's second diss track "Not Like Us" is described as an "impressive strategical feat" that applies back-to-back pressure on Drake and showcases Kendrick's range. Additionally, the article notes that "Not Like Us" is a much-needed exhale after the intense "Meet the Grahams" and finds Kendrick dancing over a Mustard beat, which suggests that he is able to adapt and switch up his style to stay ahead of Drake.

The article also highlights the impact of "Not Like Us" on the beef, stating that it turns the tide in Kendrick's favor and puts the ball in Drake's court to hit back. The author notes that while both tracks are uncomfortable to listen to, Kendrick's "Not Like Us" is seen as a more enjoyable and well-executed track.

Drake's response, "The Heart Part 6", is seen as a reaction to everything that's tra

## 3. Advanced RAG (Routing, Sub-Questions)

### Build a Router that can choose whether to do vector search or summarization

In [18]:
from llama_index.core.tools import QueryEngineTool, ToolMetadata

vector_tool = QueryEngineTool(
    index.as_query_engine(),
    metadata=ToolMetadata(
        name="vector_search",
        description="Useful for searching for specific facts.",
    ),
)

summary_tool = QueryEngineTool(
    index.as_query_engine(response_mode="tree_summarize"),
    metadata=ToolMetadata(
        name="summary",
        description="Useful for summarizing an entire document.",
    ),
)

In [19]:
from llama_index.core.query_engine import RouterQueryEngine

query_engine = RouterQueryEngine.from_defaults(
    [vector_tool, summary_tool], select_multi=False, verbose=True
)

response = query_engine.query(
    "Tell me about the song meet the grahams - why is it significant"
)

[1;3;38;5;200mSelecting query engine 0: The question asks for specific facts about the song 'eet the grahams', such as its significance, which makes it suitable for searching for specific facts..
[0m

In [20]:
print(response)



Based on the provided context, "Meet the Grahams" is a song by Kendrick Lamar that is part of his beef with Drake. The song is significant because it was seen as a diss track aimed at Drake, and it marked a turning point in their feud. However, the song was perceived as "joyless" and didn't quite resonate with fans. It was also seen as Kendrick taking things too far, and he was in danger of losing his audience.

The significance of "Meet the Grahams" lies in its role as a precursor to "Not Like Us", another song by Kendrick Lamar that is also part of the beef with Drake. "Not Like Us" is seen as a more strategic and effective diss track, which applies pressure on Drake and showcases Kendrick's range. The contrast between the two songs highlights Kendrick's ability to adapt and respond to the situation, making "Meet the Grahams" a significant step in the evolution of their feud.


### Break Complex Questions down into Sub-Questions

Our Sub-Question Query Engine breaks complex questions down into sub-questions.


In [21]:
drake_index = VectorStoreIndex.from_documents(docs_drake)
drake_query_engine = drake_index.as_query_engine(similarity_top_k=3)

kendrick_index = VectorStoreIndex.from_documents(docs_kendrick)
kendrick_query_engine = kendrick_index.as_query_engine(similarity_top_k=3)

In [22]:
from llama_index.core.tools import QueryEngineTool, ToolMetadata

drake_tool = QueryEngineTool(
    drake_index.as_query_engine(),
    metadata=ToolMetadata(
        name="drake_search",
        description="Useful for searching over Drake's life.",
    ),
)

kendrick_tool = QueryEngineTool(
    kendrick_index.as_query_engine(),
    metadata=ToolMetadata(
        name="kendrick_summary",
        description="Useful for searching over Kendrick's life.",
    ),
)

In [23]:
from llama_index.core.query_engine import SubQuestionQueryEngine

query_engine = SubQuestionQueryEngine.from_defaults(
    [drake_tool, kendrick_tool],
    llm=llm_replicate,  
    verbose=True,
)

response = query_engine.query("Which albums did Drake release in his career?")

print(response)

Generated 1 sub questions.
[1;3;38;2;237;90;200m[drake_search] Q: What are the albums released by Drake
[0m[1;3;38;2;237;90;200m[drake_search] A: 

Based on the provided context, the albums released by Drake mentioned are:

1. So Far Gone (mixtape, re-released in 2019)
2. Care Package (compilation album, released in 2019)
3. Dark Lane Demo Tapes (commercial mixtape, released in 2020)
4. Certified Lover Boy (studio album, released in 2021)
5. For All The Dogs (studio album, released in 2024)

Note that this list only includes the albums mentioned in the provided context and may not be an exhaustive list of Drake's discography.
[0m

Based on the provided context, Drake released the following albums in his career:

1. So Far Gone (mixtape, re-released in 2019)
2. Care Package (compilation album, released in 2019)
3. Dark Lane Demo Tapes (commercial mixtape, released in 2020)
4. Certified Lover Boy (studio album, released in 2021)
5. For All The Dogs (studio album, released in 2024)

N

## 4. Text-to-SQL 

Here, we download and use a sample SQLite database with 11 tables, with various info about music, playlists, and customers. We will limit to a select few tables for this test.

In [24]:
# Function to download files
def download_file(url, filepath):
    response = requests.get(url, stream=True)
    with open(filepath, "wb") as file:
        for chunk in response.iter_content(chunk_size=1024):
            file.write(chunk)

# Create data directory
os.makedirs("data", exist_ok=True)

In [25]:
# Download and extract SQLite database
import requests
import zipfile
download_file("https://www.sqlitetutorial.net/wp-content/uploads/2018/03/chinook.zip", "data/chinook.zip")
with zipfile.ZipFile("data/chinook.zip", "r") as zip_ref:
    zip_ref.extractall("data/")

In [26]:
from sqlalchemy import (
    create_engine,
    MetaData,
    Table,
    Column,
    String,
    Integer,
    select,
    column,
)

engine = create_engine("sqlite:///data/chinook.db")

In [27]:
from llama_index.core import SQLDatabase

sql_database = SQLDatabase(engine)

In [28]:
from llama_index.core.indices.struct_store import NLSQLTableQueryEngine

query_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,
    tables=["albums", "tracks", "artists"],
    llm=llm_replicate,
)

In [29]:
response = query_engine.query("What are some albums?")

print(response)



Here are some albums:

1. "For Those About To Rock We Salute You" by Artist 1
2. "Balls to the Wall" by Artist 2
3. "Restless and Wild" by Artist 2
4. "Let There Be Rock" by Artist 1
5. "Big Ones" by Artist 3
6. "Jagged Little Pill" by Artist 4
7. "Facelift" by Artist 5
8. "Warner 25 Anos" by Artist 6
9. "Plays Metallica By Four Cellos" by Artist 7
10. "Audioslave" by Artist 8


In [30]:
response = query_engine.query("What are some artists? Limit it to 5.")

print(response)



Here are 5 artists: AC/DC, Accept, Aerosmith, Alanis Morissette, and Alice In Chains.


In [31]:
response = query_engine.query(
    "What are some tracks from the artist AC/DC? Limit it to 3"
)

print(response)



Here are three tracks from the legendary Australian rock band AC/DC: "For Those About To Rock (We Salute You)", "Put The Finger On You", and "Let's Get It Up".


In [32]:
print(response.metadata["sql_query"])

SELECT tracks.Name FROM tracks JOIN albums ON tracks.AlbumId = albums.AlbumId JOIN artists ON albums.ArtistId = artists.ArtistId WHERE artists.Name = 'AC/DC' LIMIT 3;


## 5. Structured Data Extraction

An important use case for function calling is extracting structured objects. LlamaIndex provides an intuitive interface for this through `structured_predict` - simply define the target Pydantic class (can be nested), and given a prompt, we extract out the desired object.

**NOTE**: Since there's no native function calling support with Llama3 / Ollama, the structured extraction is performed by prompting the LLM + output parsing.

In [33]:
from llama_index.core.prompts import PromptTemplate
from pydantic import BaseModel


class Restaurant(BaseModel):
    """A restaurant with name, city, and cuisine."""

    name: str
    city: str
    cuisine: str

llm = llm_replicate


prompt_tmpl = PromptTemplate(
    "Generate a restaurant in a given city {city_name}"
)

In [34]:
restaurant_obj = llm.structured_predict(
    Restaurant, prompt_tmpl, city_name="Miami"
)
print(restaurant_obj)

name='Bayside Bistro' city='Miami' cuisine='Seafood'


## 6. Adding Chat History to RAG (Chat Engine)

In this section we create a stateful chatbot from a RAG pipeline, with our chat engine abstraction.

Unlike a stateless query engine, the chat engine maintains conversation history (through a memory module like buffer memory). It performs retrieval given a condensed question, and feeds the condensed question + context + chat history into the final LLM prompt.

Related resource: https://docs.llamaindex.ai/en/stable/examples/chat_engine/chat_engine_condense_plus_context/

In [35]:
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.chat_engine import CondensePlusContextChatEngine

memory = ChatMemoryBuffer.from_defaults(token_limit=3900)

chat_engine = CondensePlusContextChatEngine.from_defaults(
    index.as_retriever(),
    memory=memory,
    llm=llm_replicate,
    context_prompt=(
        "You are a chatbot, able to have normal interactions, as well as talk"
        " about the Kendrick and Drake beef."
        "Here are the relevant documents for the context:\n"
        "{context_str}"
        "\nInstruction: Use the previous chat history, or the context above, to interact and help the user."
    ),
    verbose=True,
)

In [36]:
response = chat_engine.chat(
    "Tell me about the songs Drake released in the beef."
)
print(str(response))

Condensed question: Tell me about the songs Drake released in the beef.


So, in the midst of the Kendrick Lamar and Drake beef, Drake released a song called "The Heart Part 6". This track was a direct response to Kendrick's diss tracks, and it's clear that Drake wrote and recorded it within a 24-hour timeframe. 

From what I understand, "The Heart Part 6" is a reaction to everything that went down over those three days, including direct rebuttals to Kendrick's "Not Like Us". Drake even took a page out of Kendrick's book by incorporating an Aretha Franklin sample, which adds a soulful touch to the track.

It's worth noting that this is Drake's second track in a row where he explicitly states that he'd rather be on the low, but still hints that things are about to get intense. Do you want to know more about the beef or the context surrounding these songs?


In [37]:
response = chat_engine.chat("What about Kendrick?")
print(str(response))

Condensed question: 

What songs did Kendrick Lamar release in response to Drake during their beef?


Kendrick Lamar! He definitely didn't hold back in this beef. One of the most notable tracks from his side is the verse on "Like That" from Future and Metro Boomin's album "We Don't Trust You". This verse is often referred to as a "Control" sequel, and it's clear that Kendrick is taking direct shots at Drake.

But what really shook the rap world was Kendrick's response to Drake's "The Heart Part 6". He dropped a scathing diss track called "Not Like Us", which many consider one of the most brutal diss tracks in rap history. The way Kendrick dissects Drake's persona, career, and even his personal life is nothing short of ruthless.

It's worth noting that Kendrick's approach was very calculated and strategic. He took his time to respond, and when he did, it was with a track that left no stone unturned. The lyrics are dense, and Kendrick's flow is as sharp as ever. Do you want to know more 

## 7. Agents

Here we build agents with Llama 3. We perform RAG over simple functions as well as the documents above.

### Agents And Tools

In [38]:
import json
from typing import Sequence, List

from llama_index.core.llms import ChatMessage
from llama_index.core.tools import BaseTool, FunctionTool
from llama_index.core.agent import ReActAgent

import nest_asyncio

nest_asyncio.apply()

### Define Tools

In [39]:
def multiply(a: int, b: int) -> int:
    """Multiple two integers and returns the result integer"""
    return a * b


def add(a: int, b: int) -> int:
    """Add two integers and returns the result integer"""
    return a + b


def subtract(a: int, b: int) -> int:
    """Subtract two integers and returns the result integer"""
    return a - b


def divide(a: int, b: int) -> int:
    """Divides two integers and returns the result integer"""
    return a / b


multiply_tool = FunctionTool.from_defaults(fn=multiply)
add_tool = FunctionTool.from_defaults(fn=add)
subtract_tool = FunctionTool.from_defaults(fn=subtract)
divide_tool = FunctionTool.from_defaults(fn=divide)

### ReAct Agent

In [40]:
agent = ReActAgent.from_tools(
    [multiply_tool, add_tool, subtract_tool, divide_tool],
    llm=llm_replicate,
    verbose=True,
)

### Querying

In [41]:
response = agent.chat("What is (121 + 2) * 5?")
print(str(response))

> Running step 2eefb751-be04-47c5-ba57-bd1e82122b4c. Step input: What is (121 + 2) * 5?
[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: add
Action Input: {'a': 121, 'b': 2}
[0m[1;3;34mObservation: 123
[0m> Running step 80185764-84a3-4275-ae13-8434fcbc98d7. Step input: None
[1;3;38;5;200mThought: I have the result of the addition, now I need to multiply it by 5.
Action: multiply
Action Input: {'a': 123, 'b': 5}
[0m[1;3;34mObservation: 615
[0m> Running step b2389a62-1276-49dd-bdb5-f33c0cc5feeb. Step input: None
[1;3;38;5;200mThought: I can answer without using any more tools. I'll use the user's language to answer
Answer: The result is 615.
[0mThe result is 615.


### ReAct Agent With RAG QueryEngine Tools

In [42]:
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    StorageContext,
    load_index_from_storage,
)

from llama_index.core.tools import QueryEngineTool, ToolMetadata

### Create ReAct Agent using RAG QueryEngine Tools

In [43]:
drake_tool = QueryEngineTool(
    drake_index.as_query_engine(),
    metadata=ToolMetadata(
        name="drake_search",
        description="Useful for searching over Drake's life.",
    ),
)

kendrick_tool = QueryEngineTool(
    kendrick_index.as_query_engine(),
    metadata=ToolMetadata(
        name="kendrick_search",
        description="Useful for searching over Kendrick's life.",
    ),
)

query_engine_tools = [drake_tool, kendrick_tool]

In [44]:
agent = ReActAgent.from_tools(
    query_engine_tools,  ## TODO: define query tools
    llm=llm_replicate,
    verbose=True,
)

### Querying

In [45]:
response = agent.chat("Tell me about how Kendrick and Drake grew up")
print(str(response))

> Running step 6afa2961-726a-4f64-b10b-4a1bd897290c. Step input: Tell me about how Kendrick and Drake grew up
[1;3;38;5;200mThought: The current language of the user is: English. I need to use tools to help me answer the question about how Kendrick and Drake grew up.
Action: kendrick_search
Action Input: {'input': "Kendrick Lamar's childhood"}
[0m[1;3;34mObservation: 

Based on the provided context, here's what I found about Kendrick Lamar's childhood:

Kendrick Lamar was born on June 17, 1987, in Compton, California. His parents, Kenneth "Kenny" Duckworth and Paula Oliver, were African Americans from the South Side of Chicago who relocated to Compton in 1984 due to his father's affiliation with the Gangster Disciples. Lamar was named after singer-songwriter Eddie Kendricks of the Temptations.

As a child, Lamar lived in Section 8 housing, relied on welfare and food stamps, and experienced homelessness. He grew up with close affiliates of the Westside Pirus, but he is not a member o