## What is RAG ?

RAG is a technique where a language model first looks up (retrieves) useful information from a database or documents, and then uses that information to give a better answer.

## Why and when we prefer RAG over finetuning ?

We **prefer RAG over finetuning** when: We want the model to give **up-to-date or specific answers** from **our own data** without changing the model itself.

---

**Why prefer RAG?**

* **Cheaper & faster** – No need to train the model again.
* **Easier to update** – Just change the documents, not the model.
* **Better for private or large data** – You keep data separate and safe.

---

**When to use RAG?**

* When your data **changes often** (like news, product lists).
* When you want the model to **answer from your documents**.
* When **training a model is too costly or slow**.

---

Think of RAG like **giving the model a library to read** instead of teaching it everything from scratch.


# Install Libraries

* **`langchain`** – Core framework to build LLM-powered applications.
* **`langchain-community`** – Extra integrations like tools, APIs, and vector stores.
* **`langchain-pinecone`** – Connects LangChain with Pinecone for vector storage and retrieval.
* **`langchain_groq`** – Enables LangChain to use Groq's ultra-fast language models.
* **`datasets`** – Provides ready-to-use NLP/ML datasets from Hugging Face.

In [None]:
# Install all required packages
%pip install --upgrade "pinecone>=5,<6" langchain langchain-community langchain-pinecone langchain_groq datasets==3.5.0 sentence-transformers tqdm

## Load API Keys from .env file

### What is env file?

* A .env file is a simple text file that stores environment variables (like API keys and secrets) in key=value format.
* Example content of a .env file:
* PINECONE_API_KEY=your_pinecone_api_key_here
* GROQ_API_KEY=your_groq_api_key_here

* It helps keep sensitive information out of your code and makes it easier to manage secrets securely.

**Imports tools** to:

* Use environment variables (`os`)
* Load values from a `.env` file (`load_dotenv`)
* **os** is a Python built-in module that lets your code interact with the operating system (like Windows, macOS, Linux).
---
* **Loads the `.env` file** so Python can use the secret keys stored in it (like API keys).
* **Gets the values** of `PINECONE_API_KEY` and `GROQ_API_KEY` from the `.env` file.
* **In Short:** This code **reads your secret keys from a `.env` file** so you don’t have to write them directly in your code.


In [66]:
import os
from dotenv import load_dotenv

# Load environment variables from the .env file
load_dotenv()

# Get the keys
pinecone_key = os.getenv("PINECONE_API_KEY")
groq_key = os.getenv("GROQ_API_KEY")

# Check if keys are loaded properly
if pinecone_key:
    print("✅ Pinecone API Key Loaded Successfully")
else:
    print("❌ Pinecone API Key NOT Loaded")

if groq_key:
    print("✅ Groq API Key Loaded Successfully")
else:
    print("❌ Groq API Key NOT Loaded")


✅ Pinecone API Key Loaded Successfully
✅ Groq API Key Loaded Successfully


## What is `langchain_groq`?

`langchain_groq` is a **LangChain integration** that lets you **connect to Groq’s LLMs** (like LLaMA3) easily.

Think of it as a **bridge between LangChain and Groq’s fast language models**.

---

## What is `ChatGroq`?

`ChatGroq` is a **class (tool)** inside `langchain_groq`.

It lets you:

* **Send prompts** to Groq-hosted models
* **Receive responses** from those models
* Use these models in your **LangChain app**, like chatbots, RAG, agents, etc.

---

**Why do we use this?**

Instead of manually setting up HTTP requests to Groq’s API, `ChatGroq` makes it **super easy**:

* Lets you talk to a specific Groq model (`llama3-8b-8192`)
* Works smoothly with LangChain tools (retrievers, chains, memory, etc.)
* Connects securely with your `groq_api_key`

---

#### In Simple Words

* `langchain_groq` lets LangChain talk to Groq.
* `ChatGroq` is the tool that helps you **chat with Groq’s AI model** using your API key.


In [67]:
from langchain_groq import ChatGroq

# "llama3-70b-8192" has been deprecated, using "llama-3.3-70b-versatile"

chat = ChatGroq(
    groq_api_key=groq_key,
    model_name="llama-3.3-70b-versatile"  # Correct model name used by Groq
)

Groq uses the **same chat structure as OpenAI** because it runs **OpenAI-compatible models** like `llama3`, `mixtral`, etc.
So just like OpenAI, chats with Groq **typically look like this in plain text**:

```
System: You are a helpful assistant.
User: Hi, how are you?
Assistant: I'm doing well! How can I assist you today?
User: What is quantum computing?
Assistant:
```

The final `"Assistant:"` without a response is what would prompt the model to continue the conversation. In the official OpenAI `ChatCompletion` endpoint these would be passed to the model in a format like:

---

**In Code (OpenAI/Groq-compatible format):**

When using the API (like with `ChatGroq` or `ChatOpenAI` in LangChain), you use this structure:

```
[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hi, how are you?"},
    {"role": "assistant", "content": "I'm doing well! How can I assist you today?"},
    {"role": "user", "content": "What is quantum computing?"}
]
```

---

**In LangChain (message objects):**

LangChain wraps those into **message classes**, like:

```
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="Hi, how are you?"),
    AIMessage(content="I'm doing well! How can I assist you today?"),
    HumanMessage(content="What is quantum computing?")
]
```

The format is very similar, we're just swapped the role of `"user"` for `HumanMessage`, and the role of `"assistant"` for `AIMessage`.

Then you pass them to the model:

```
response = chat.invoke(messages)
print(response.content)
```

In [68]:
from langchain.schema import (
    SystemMessage,
    HumanMessage,
    AIMessage
)

messages = [
    SystemMessage(content="You are an expert physics tutor."),
    HumanMessage(content="Hello AI, can you help me learn about quantum mechanics?"),
    AIMessage(content="Absolutely! Quantum mechanics is a fundamental theory in physics describing nature at the smallest scales. What would you like to know?"),
    HumanMessage(content="What is the uncertainty principle?")
]

We generate the next response from the AI by passing these messages to the `ChatGroq` object.

Like saying to the AI:

“Here’s what has been said so far — now tell me what the AI should say next.”

LangChain then handles formatting and sending this to the LLM backend, and res stores the AI’s next reply.

**In Short:**
* You define a conversation (via messages).
* Call the LLM using chat(messages).
* Get a response back — stored in res.

In [69]:
# res = chat(messages)

res = chat.invoke(messages)

In [70]:
res

AIMessage(content='The uncertainty principle is a fundamental concept in quantum mechanics, introduced by Werner Heisenberg in 1927. It states that it\'s impossible to know certain properties of a subatomic particle, such as its position (x) and momentum (p), simultaneously with infinite precision.\n\nMathematically, this is expressed as:\n\nΔx \\* Δp >= h/4π\n\nwhere Δx is the uncertainty in position, Δp is the uncertainty in momentum, and h is the Planck constant.\n\nIn simpler terms, the more precisely you try to measure a particle\'s position, the less precisely you can know its momentum, and vice versa. This is not due to limitations in measurement technology, but rather a fundamental property of the quantum world.\n\nFor example, if you try to measure the position of an electron very precisely, you would need to use a high-energy photon to "illuminate" it. However, this photon would disturb the electron\'s momentum, making it impossible to know its momentum precisely at the same 

To see the models reply

In [71]:
print(res.content)

The uncertainty principle is a fundamental concept in quantum mechanics, introduced by Werner Heisenberg in 1927. It states that it's impossible to know certain properties of a subatomic particle, such as its position (x) and momentum (p), simultaneously with infinite precision.

Mathematically, this is expressed as:

Δx \* Δp >= h/4π

where Δx is the uncertainty in position, Δp is the uncertainty in momentum, and h is the Planck constant.

In simpler terms, the more precisely you try to measure a particle's position, the less precisely you can know its momentum, and vice versa. This is not due to limitations in measurement technology, but rather a fundamental property of the quantum world.

For example, if you try to measure the position of an electron very precisely, you would need to use a high-energy photon to "illuminate" it. However, this photon would disturb the electron's momentum, making it impossible to know its momentum precisely at the same time.

The uncertainty principle 

Because `res` is just another `AIMessage` object, we can append it to `messages`, add another `HumanMessage`, and generate the next response in the conversation.

In [72]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="Can you explain quantum entanglement?"
)
# add to messages
messages.append(prompt)

# send to chat-gpt
res = chat(messages)

print(res.content)

Quantum entanglement is a fascinating phenomenon in quantum mechanics. It's a fundamental aspect of the quantum world, and it has been extensively experimentally confirmed.

**What is quantum entanglement?**

Quantum entanglement occurs when two or more particles become correlated in such a way that their properties, such as spin, momentum, or energy, are connected, even when they are separated by large distances. This means that measuring the state of one particle instantly affects the state of the other entangled particles, regardless of the distance between them.

**Key features of entanglement:**

1. **Correlation**: Entangled particles are correlated, meaning that the state of one particle is dependent on the state of the other.
2. **Non-locality**: Entangled particles can be separated by arbitrary distances, and the correlation between them remains.
3. **Instantaneous interaction**: Measuring the state of one particle instantly affects the state of the other entangled particles, 

## Dealing with Hallucinations

We have our chatbot, but as mentioned — the knowledge of LLMs can be limited. The reason for this is that LLMs learn all they know during training. An LLM essentially compresses the "world" as seen in the training data into the internal parameters of the model. We call this knowledge the _parametric knowledge_ of the model.

By default, LLMs have no access to the external world.

The result of this is very clear when we ask LLMs about more recent information, like about Deepseek R1.

In [73]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="Who founded GPT-5?"
)
# add to messages
messages.append(prompt)

# send to chat-gpt
res = chat(messages)

print(res.content)

There is no GPT-5. The current models in the GPT series are:

1. GPT-1 (2018): Developed by OpenAI, a research organization founded by Elon Musk, Sam Altman, and others.
2. GPT-2 (2019): Also developed by OpenAI, with improvements over the original GPT model.
3. GPT-3 (2020): Developed by OpenAI, with significant advancements in natural language processing and generation capabilities.
4. GPT-4 (2023): The latest model in the series, also developed by OpenAI, with further improvements in performance and capabilities.

Note that I'm an AI designed to assist and communicate with users, but I'm not affiliated with OpenAI or any specific organization. I exist to provide information and answer questions to the best of my abilities, based on my training and knowledge.


In [74]:
# print(res.content)

Our chatbot can no longer help us, it doesn't contain the information we need to answer the question. It was very clear from this answer that the LLM doesn't know the informaiton, but sometimes an LLM may respond like it _does_ know the answer — and this can be very hard to detect.

## Alternate Way : Source Knowledge

There is another way of feeding knowledge into LLMs. It is called _source knowledge_ and it refers to any information fed into the LLM via the prompt. We can try that with the Deepseek question. We can take the paper abstract from the [Deepseek R1 paper](https://arxiv.org/abs/2501.12948).

In [75]:
source_knowledge = (
    " GPT-5 was founded by a team of researchers and engineers at OpenAI, building upon the advancements of previous models like GPT-4. The development focused on enhancing natural language understanding and generation capabilities, aiming to create a more versatile and powerful AI system." \
    " The team included experts in machine learning, neuroscience, and computational linguistics, working collaboratively to push the boundaries of AI technology. " \
    "Their goal was to create a model that could not only understand and generate human-like text but also assist in a wide range of applications, from creative writing to complex problem-solving." \

    " GPT-5's development was marked by significant advancements in model architecture, training techniques, and data utilization, making it one of the most sophisticated AI models available today.")

We can feed this additional knowledge into our prompt with some instructions telling the LLM how we'd like it to use this information alongside our original query.

In [76]:
query = "What is so special about GPT-5?"

augmented_prompt = f"""Using the contexts below, answer the query.

Contexts:
{source_knowledge}

Query: {query}"""

Now we feed this into our chatbot as we were before.

In [77]:
# create a new user prompt
prompt = HumanMessage(
    content=augmented_prompt
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [78]:
print(res.content)

GPT-5 is special because it represents a significant advancement in natural language understanding and generation capabilities. Its development focused on creating a more versatile and powerful AI system that can not only understand and generate human-like text but also assist in a wide range of applications, from creative writing to complex problem-solving. The model's architecture, training techniques, and data utilization have been greatly improved, making it one of the most sophisticated AI models available today. This allows GPT-5 to be highly effective in various tasks, setting it apart from its predecessors and making it a cutting-edge technology in the field of artificial intelligence.


## How do we get this information in the first place?

The quality of this answer is made possible thanks to augmenting our query with external knowledge (source knowledge) retrieved from our Llama 2 arXiv corpus. We'll now build a vector index from "jamescalam/llama-2-arxiv-papers-chunked" so the chatbot can ground its answers in those papers.

This is where Pinecone and vector databases come into play, as they can help us here too. But first, we'll need a dataset from our AI ArXiv corpus.

## Importing the Data

We'll import our knowledge base from the Hugging Face Datasets library. For this assignment we will use the "jamescalam/llama-2-arxiv-papers-chunked" dataset. This dataset contains chunked (~300 tokens) extracts from the Llama 2 research paper and closely related papers (identified via references). It has ~4.84k rows and includes fields like `doi`, `chunk-id`, `chunk`, `title`, `summary`, `authors`, `categories`, `source`, and more — ideal for Llama 2-focused RAG experiments.

In [79]:
from datasets import load_dataset

dataset = load_dataset(
    "jamescalam/llama-2-arxiv-papers-chunked",
    split="train"
)

dataset

Dataset({
    features: ['doi', 'chunk-id', 'chunk', 'id', 'title', 'summary', 'source', 'authors', 'categories', 'comment', 'journal_ref', 'primary_category', 'published', 'updated', 'references'],
    num_rows: 4838
})

In [80]:
dataset[0]

{'doi': '1102.0183',
 'chunk-id': '0',
 'chunk': 'High-Performance Neural Networks\nfor Visual Object Classi\x0ccation\nDan C. Cire\x18 san, Ueli Meier, Jonathan Masci,\nLuca M. Gambardella and J\x7f urgen Schmidhuber\nTechnical Report No. IDSIA-01-11\nJanuary 2011\nIDSIA / USI-SUPSI\nDalle Molle Institute for Arti\x0ccial Intelligence\nGalleria 2, 6928 Manno, Switzerland\nIDSIA is a joint institute of both University of Lugano (USI) and University of Applied Sciences of Southern Switzerland (SUPSI),\nand was founded in 1988 by the Dalle Molle Foundation which promoted quality of life.\nThis work was partially supported by the Swiss Commission for Technology and Innovation (CTI), Project n. 9688.1 IFF:\nIntelligent Fill in Form.arXiv:1102.0183v1  [cs.AI]  1 Feb 2011\nTechnical Report No. IDSIA-01-11 1\nHigh-Performance Neural Networks\nfor Visual Object Classi\x0ccation\nDan C. Cire\x18 san, Ueli Meier, Jonathan Masci,\nLuca M. Gambardella and J\x7f urgen Schmidhuber\nJanuary 2011\nAbs

## Dataset Overview

The dataset we are using is sourced from Llama 2 and related arXiv papers. Each entry is a chunk (~300 tokens) with helpful fields such as `doi`, `chunk-id`, `chunk` (text), `title`, `summary`, `authors`, `categories`, and `source` (PDF URL). This makes it well-suited to build a RAG knowledge base for answering questions about Llama 2 model architecture, training, safety, and benchmarking.

## Building the Knowledge Base

We now have a dataset that can serve as our chatbot knowledge base. Our next task is to transform that dataset into the knowledge base that our chatbot can use. To do this we must use an embedding model and vector database.

We begin by initializing our Pinecone client, this requires a [free API key](https://app.pinecone.io).

In [81]:
from pinecone import Pinecone

# initialize client
pc = Pinecone(api_key=pinecone_key)

Delete the old one to save the resources

In [82]:
index_name = "rag-chunked" 

try:
    pc.delete_index(index_name)
    print(f"Deleted existing index: {index_name}")
except:
    print(f"No existing index to delete")

No existing index to delete


In [83]:
from pinecone import ServerlessSpec

pc.create_index(
    name=index_name,
    dimension=384,
    metric="dotproduct",             
    spec=ServerlessSpec(
        cloud="aws",
        region="us-east-1"
    ),
)

{
    "name": "rag-chunked",
    "metric": "dotproduct",
    "host": "rag-chunked-izpvf0z.svc.aped-4627-b74a.pinecone.io",
    "spec": {
        "serverless": {
            "cloud": "aws",
            "region": "us-east-1"
        }
    },
    "status": {
        "ready": true,
        "state": "Ready"
    },
    "vector_type": "dense",
    "dimension": 384,
    "deletion_protection": "disabled",
    "tags": null
}

Our index is now ready but it's empty. It is a vector index, so it needs vectors. As mentioned, to create these vector embeddings we will HuggingFace's `sentence-transformers/all-MiniLM-L6-v2` model — we can access it via LangChain like so:

In [84]:
from langchain_community.embeddings import HuggingFaceEmbeddings

embed_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

Using this model we can create embeddings like so:

In [85]:
texts = [
    "Quantum entanglement is a physical phenomenon that occurs when pairs or groups of particles are generated, interact, or share spatial proximity in ways such that the quantum state of each particle cannot be described independently of the state of the others, even when the particles are separated by a large distance.",
    "The uncertainty principle, formulated by Werner Heisenberg, states that certain pairs of physical properties, like position and momentum, cannot both be known to arbitrary precision. The more precisely one property is known, the less precisely the other can be known.",
    "String theory is a theoretical framework in which the point-like particles of particle physics are replaced by one-dimensional objects called strings. It aims to reconcile quantum mechanics and general relativity, potentially providing a unified description of all fundamental forces and forms of matter."
]

res = embed_model.embed_documents(texts)
len(res), len(res[0])

(3, 384)

From this we get two (aligning to our two chunks of text) 384-dimensional embeddings.

We're now ready to embed and index all of our data! We do this by looping through our dataset and embedding and inserting everything in batches.

In [86]:
from tqdm.auto import tqdm  # for progress bar

data = dataset.to_pandas()  # this makes it easier to iterate over the dataset

batch_size = 100

# Get the Pinecone index object
index = pc.Index(index_name)

for i in tqdm(range(0, len(data), batch_size)):
    i_end = min(len(data), i+batch_size)
    # get batch of data
    batch = data.iloc[i:i_end]
    # generate unique ids for each chunk
    ids = [f"{x['doi']}-{x['chunk-id']}" for _, x in batch.iterrows()]
    # get text to embed
    texts = [x['chunk'] for _, x in batch.iterrows()]
    # embed text
    embeds = embed_model.embed_documents(texts)
    # get metadata to store in Pinecone
    metadata = [
        {'text': x['chunk'], 'source': x['source']} for _, x in batch.iterrows()
    ]
    # add to Pinecone 
    vectors = [
        {
            'id': _id,
            'values': vec,
            'metadata': meta
        }
        for _id, vec, meta in zip(ids, embeds, metadata)
    ]
    index.upsert(vectors=vectors)

100%|██████████| 49/49 [06:55<00:00,  8.48s/it]



We can check that the vector index has been populated using `describe_index_stats` like before:

In [87]:
index.describe_index_stats()

{'dimension': 384,
 'index_fullness': 0.0,
 'metric': 'dotproduct',
 'namespaces': {'': {'vector_count': 4838}},
 'total_vector_count': 4838,
 'vector_type': 'dense'}

# Retrieval Augmented Generation

We've built a fully-fledged knowledge base. Now it's time to link that knowledge base to our chatbot. To do that we'll be diving back into LangChain and reusing our template prompt from earlier.

To use LangChain here we need to load the LangChain abstraction for a vector index, called a `vectorstore`. We pass in our vector `index` to initialize the object.

In [88]:
from langchain_pinecone import PineconeVectorStore

text_field = "text"  # the metadata field that contains our text

# initialize the vector store object
vectorstore = PineconeVectorStore(
    index=index,
    embedding=embed_model,
    text_key=text_field
)

Using this `vectorstore` we can already query the index and see if we have any relevant information given our question about Llama 2.

In [89]:
query = "What are the key design choices and safety considerations in Llama 2?"

vectorstore.similarity_search(query, k=3)

[Document(id='2307.09288-199', metadata={'source': 'http://arxiv.org/pdf/2307.09288'}, page_content='Ricardo Lopez-Barquilla, Marc Shedroﬀ, Kelly Michelena, Allie Feinstein, Amit Sangani, Geeta\nChauhan,ChesterHu,CharltonGholson,AnjaKomlenovic,EissaJamil,BrandonSpence,Azadeh\nYazdan, Elisa Garcia Anzano, and Natascha Parks.\n•ChrisMarra,ChayaNayak,JacquelinePan,GeorgeOrlin,EdwardDowling,EstebanArcaute,Philomena Lobo, Eleonora Presani, and Logan Kerr, who provided helpful product and technical organization support.\n46\n•Armand Joulin, Edouard Grave, Guillaume Lample, and Timothee Lacroix, members of the original\nLlama team who helped get this work started.\n•Drew Hamlin, Chantal Mora, and Aran Mun, who gave us some design input on the ﬁgures in the\npaper.\n•Vijai Mohan for the discussions about RLHF that inspired our Figure 20, and his contribution to the\ninternal demo.\n•Earlyreviewersofthispaper,whohelpedusimproveitsquality,includingMikeLewis,JoellePineau,\nLaurens van der Maaten,

We return a lot of text here and it's not that clear what we need or what is relevant. Fortunately, our LLM will be able to parse this information much faster than us. All we need is to link the output from our `vectorstore` to our `chat` chatbot. To do that we can use the same logic as we used earlier.

In [90]:
def augment_prompt(query: str):
    # get top 3 results from knowledge base
    results = vectorstore.similarity_search(query, k=3)
    # get the text from the results
    source_knowledge = "\n".join([x.page_content for x in results])
    # feed into an augmented prompt
    augmented_prompt = f"""Using the contexts below, answer the query concisely and cite which paper(s) you used when possible.

    Contexts:
    {source_knowledge}

    Query: {query}"""
    return augmented_prompt

Using this we produce an augmented prompt:

In [91]:
print(augment_prompt(query))

Using the contexts below, answer the query concisely and cite which paper(s) you used when possible.

    Contexts:
    Ricardo Lopez-Barquilla, Marc Shedroﬀ, Kelly Michelena, Allie Feinstein, Amit Sangani, Geeta
Chauhan,ChesterHu,CharltonGholson,AnjaKomlenovic,EissaJamil,BrandonSpence,Azadeh
Yazdan, Elisa Garcia Anzano, and Natascha Parks.
•ChrisMarra,ChayaNayak,JacquelinePan,GeorgeOrlin,EdwardDowling,EstebanArcaute,Philomena Lobo, Eleonora Presani, and Logan Kerr, who provided helpful product and technical organization support.
46
•Armand Joulin, Edouard Grave, Guillaume Lample, and Timothee Lacroix, members of the original
Llama team who helped get this work started.
•Drew Hamlin, Chantal Mora, and Aran Mun, who gave us some design input on the ﬁgures in the
paper.
•Vijai Mohan for the discussions about RLHF that inspired our Figure 20, and his contribution to the
internal demo.
•Earlyreviewersofthispaper,whohelpedusimproveitsquality,includingMikeLewis,JoellePineau,
Laurens van der 

There is still a lot of text here, so let's pass it onto our chat model to see how it performs.

In [92]:
# create a new user prompt using the new dataset
prompt = HumanMessage(
    content=augment_prompt("Summarize the core contributions and benchmarking results of Llama 2.")
)
# add to messages
messages.append(prompt)

res = chat(messages)

print(res.content)

The core contributions of Llama 2 include the development and release of a family of pretrained and fine-tuned large language models (LLMs) at scales up to 70B parameters. The benchmarking results show that the Llama 2 models, specifically L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc, generally perform better than existing open-source models and are on par with some closed-source models, such as ChatGPT, BARD, and Claude, in terms of helpfulness and safety (as evaluated in Section 5 of the paper).


We can continue with another Deepseek R1:

In [93]:
prompt = HumanMessage(
    content=augment_prompt(
        "What safety mitigations and usage restrictions are recommended in the Llama 2 paper?"
    )
)

res = chat(messages + [prompt])
print(res.content)

The Llama 2 paper recommends several safety mitigations and usage restrictions, including:

1. **Safety testing and tuning**: Developers should perform safety testing and tuning tailored to their specific applications of the model before deploying it (Section 5.2).
2. **Responsible Use Guide**: Users are encouraged to follow the Responsible Use Guide available at https://ai.meta.com/llama/responsible-user-guide.
3. **License and Acceptable Use Policy compliance**: Users must comply with the terms of the provided license and the Acceptable Use Policy, which prohibit uses that would violate applicable policies, laws, rules, and regulations (Section 5.3).
4. **Cautious use of pretrained models**: Users of the pretrained models should be particularly cautious and take extra steps in tuning and deployment (Section 5.2).
5. **Red teaming**: The release of the 34B model is delayed due to a lack of time to sufficiently red team, highlighting the importance of thorough testing and evaluation (S

You can continue asking questions about Deepseek R1, but once you're done you can delete the index to save resources:

In [94]:
pc.delete_index(index_name)