# Building Chatbots with the OpenAI API and Pinecone

## By: Vatsal Vinay Parikh

In this project, we aim to explore the fascinating world of AI chatbots. We will be using LangChain, OpenAI, and Pinecone vector DB, to build a chatbot capable of learning from the external world using **R**etrieval **A**ugmented **G**eneration (RAG).

We will be using a dataset sourced from the Llama 2 ArXiv paper and other related papers to help our chatbot answer questions about the latest and greatest in the world of GenAI.

This project is designed for learners who have a basic understanding of the OpenAI API and Pinecone, as covered in our previous projects. It's a great opportunity for those interested in AI, machine learning, and NLP to get hands-on experience with building a chatbot with RAG.

![rag](rag.png)

By the end of this project, you will have a functioning chatbot and RAG pipeline that can hold a conversation and provide informative responses based on a knowledge base. This project is a stepping stone towards understanding and building more complex AI systems in the future.

## Setup

Before we start building our chatbot, we need to install some Python libraries. Here's a brief overview of what each library does:

- **openai**: This is the official OpenAI Python client. We'll use it to interact with the GPT large language model.
- **pinecone-client**: This is the official Pinecone Python client. We'll use it to interact with the Pinecone vector DB where we will store our chatbot's knowledge base.
- **langchain**, **langchain-openai**, **langchain-pinecone**: This is a library for GenAI. We'll use it to chain together different language models and components for our chatbot.
- **tiktoken**: This is a library from OpenAI that allows you to count the number of tokens in a text string without making an API call.
- **datasets**: This library provides a vast array of datasets for machine learning. We'll use it to load our knowledge base for the chatbot.

You can install these libraries using pip like so:

In [6]:
# Install the openai package, locked to version 1.27
!pip install openai==1.27

# Install the datasets package, locked to version
!pip install pinecone-client==4.0.0

# Install the langchain package, locked to version 0.1.19
!pip install langchain==0.1.19

# Install the langchain-openai package, locked to version 0.1.6
!pip install langchain-openai==0.1.6

# Update the langchain-pinecone package, locked to version 0.1.0
!pip install langchain-pinecone==0.1.0

# Update the tiktoken package, locked to version 0.7.0
!pip install tiktoken==0.7.0

# Install the datasets package, locked to version 2.19.1
!pip install datasets==2.19.1

# Update the typing_extensions package, locked to version 4.11.0
!pip install typing_extensions==4.11.0

Defaulting to user installation because normal site-packages is not writeable
Collecting openai==1.27
  Downloading openai-1.27.0-py3-none-any.whl.metadata (21 kB)
Downloading openai-1.27.0-py3-none-any.whl (314 kB)
Installing collected packages: openai
[0mSuccessfully installed openai-1.27.0
Defaulting to user installation because normal site-packages is not writeable
Collecting pinecone-client==4.0.0
  Downloading pinecone_client-4.0.0-py3-none-any.whl.metadata (16 kB)
Downloading pinecone_client-4.0.0-py3-none-any.whl (214 kB)
Installing collected packages: pinecone-client
  Attempting uninstall: pinecone-client
    Found existing installation: pinecone-client 6.0.0
    Uninstalling pinecone-client-6.0.0:
      Successfully uninstalled pinecone-client-6.0.0
Successfully installed pinecone-client-4.0.0
Defaulting to user installation because normal site-packages is not writeable
Collecting langchain==0.1.19
  Downloading langchain-0.1.19-py3-none-any.whl.metadata (13 kB)
Downloading

## Task 1: Building a Chatbot

We will be relying heavily on the LangChain library to bring together the different components needed for our chatbot. To get more familiar with the library let's first create a chatbot _without_ RAG.


### Instructions

Initialize the chat model object.

- *Make sure you have defined the `OPENAI_API_KEY` environment variable and connected it. See the 'Setting up DataLab Integrations' section of getting-started.ipynb.*
- From the `langchain_openai` package, import `ChatOpenAI`.
- Initialize a `ChatOpenAI` object with the `gpt-3.5-turbo` model. Assign to `chat`.

### How are chats structured?

Chats with OpenAI's `gpt-3.5-turbo` and `gpt-4` chat models are typically structured (in plain text) like this:

```
System: You are a helpful assistant.

User: Hi AI, how are you today?

Assistant: I'm great thank you. How can I help you?

User: I'd like to understand string theory.

Assistant:
```

The final `"Assistant:"` without a response is what would prompt the model to continue the conversation. In the official OpenAI `ChatCompletion` endpoint these would be passed to the model in a format like:

```python
[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hi AI, how are you today?"},
    {"role": "assistant", "content": "I'm great thank you. How can I help you?"}
    {"role": "user", "content": "I'd like to understand string theory."}
]
```

In LangChain there is a slightly different format. We use three _message_ objects like so:

```python
messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="Hi AI, how are you today?"),
    AIMessage(content="I'm great thank you. How can I help you?"),
    HumanMessage(content="I'd like to understand string theory.")
]
```

The format is very similar, we're just swapped the role of `"user"` for `HumanMessage`, and the role of `"assistant"` for `AIMessage`.

### Instructions

Create a conversation.

- From langchain's schema module, import the three message types: `SystemMessage`, `HumanMessage`, and `AIMessage`.
- Create a conversation as a list of messages. Assign to `messages`.
    1. A system message with content `"You are a helpful assistant."`
    2. A human message with content `"Hi AI, how are you today?"`
    3. An AI message with content `"I'm great thank you. How can I help you?"`
    4. A human message with content `"I'd like to understand string theory."`


In [8]:
# From the langchain.schema module, import SystemMessage, HumanMessage, AIMessage
from langchain.schema import SystemMessage, HumanMessage, AIMessage

# Create a conversation as a list of messages. Assign to messages.
messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="Hi AI, how are you today?"),
    AIMessage(content="I'm great thank you. How can I help you?"),
    HumanMessage(content="I'd like to understand string theory.")
]

We generate the next response from the AI by passing these messages to the `ChatOpenAI` object. You can call `chat` as though it is a function.

### Instructions

Chat with GPT.

- Invoke a chat with GPT, passing the messages, and get a response. Assign to `res`.
- Print the response.

<details>
<summary>Code hints</summary>
<p>
    
Invoke a chat with the `.invoke()` method of the `ChatOpenAI()` object, passing the list of messages.

</p>
</details>

In [9]:
# First, ensure you have the necessary Azure SDK packages installed
# You can install them using pip in your Jupyter Notebook
!pip install azure-ai-inference azure-core

import os
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential

# Set the environment variable dynamically (instead of defining it in DataCamp Workspace settings)
os.environ["OPENAI_API_KEY"] = "<API_KEY>"

# Retrieve the API key from the environment variable
api_key = os.getenv("OPENAI_API_KEY")

# Azure OpenAI endpoint and model
endpoint = "<Azure endpoint>"
model_name = "gpt-3.5-turbo"

# Initialize the client using the API key from the environment variable
client = ChatCompletionsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(api_key),
)

print("Azure OpenAI client initialized successfully.")


Defaulting to user installation because normal site-packages is not writeable
Collecting azure-ai-inference
  Downloading azure_ai_inference-1.0.0b9-py3-none-any.whl.metadata (34 kB)
Collecting azure-core
  Downloading azure_core-1.32.0-py3-none-any.whl.metadata (39 kB)
Downloading azure_ai_inference-1.0.0b9-py3-none-any.whl (124 kB)
Downloading azure_core-1.32.0-py3-none-any.whl (198 kB)
Installing collected packages: azure-core, azure-ai-inference
Successfully installed azure-ai-inference-1.0.0b9 azure-core-1.32.0
Azure OpenAI client initialized successfully.


In [10]:
from langchain.chat_models import AzureChatOpenAI

# Retrieve API key and endpoint from environment variables
api_key = os.getenv("OPENAI_API_KEY")
endpoint = "<azure-endpoint>"
deployment_name = "gpt-3.5-turbo"  # Replace with your actual deployment name

# Ensure the API key exists
if not api_key:
    raise ValueError("Azure OpenAI API key is missing.")

# Initialize the Azure OpenAI model for LangChain
chat = AzureChatOpenAI(
    azure_endpoint=endpoint,
    deployment_name=deployment_name,
    api_key=api_key,
    api_version="2023-05-15",  # Use the correct API version from Azure
)

print("Azure OpenAI model is initialized.")


Azure OpenAI model is initialized.



The class `AzureChatOpenAI` was deprecated in LangChain 0.0.10 and will be removed in 0.3.0. An updated version of the class exists in the langchain-openai package and should be used instead. To use it run `pip install -U langchain-openai` and import as `from langchain_openai import AzureChatOpenAI`.



In [11]:
# Invoke a chat with GPT, passing the messages
res = chat.invoke(messages)

# Print the response
res

AIMessage(content="String theory is a theoretical framework in physics that attempts to reconcile general relativity and quantum mechanics. It posits that the fundamental building blocks of the universe are not particles but rather tiny, vibrating strings. These strings can exist in multiple dimensions, and their vibrations determine the properties of particles we observe in the universe.\n\nString theory suggests that there are multiple dimensions beyond the familiar three spatial dimensions and one time dimension. This idea can potentially explain phenomena that are beyond the reach of current theories, such as the existence of dark matter and dark energy.\n\nHowever, it's important to note that string theory is still a highly speculative and complex area of physics, and it has not yet been experimentally verified. Researchers continue to work on developing the theory further and exploring its implications for our understanding of the universe.", response_metadata={'token_usage': {'c

Notice that the `AIMessage` object looks a bit like a dictionary. The most important element is `content`, which contains the chat text.

### Instructions

Print only the contents of the response.

In [12]:
# Print the contents of the response
print(res.content)

String theory is a theoretical framework in physics that attempts to reconcile general relativity and quantum mechanics. It posits that the fundamental building blocks of the universe are not particles but rather tiny, vibrating strings. These strings can exist in multiple dimensions, and their vibrations determine the properties of particles we observe in the universe.

String theory suggests that there are multiple dimensions beyond the familiar three spatial dimensions and one time dimension. This idea can potentially explain phenomena that are beyond the reach of current theories, such as the existence of dark matter and dark energy.

However, it's important to note that string theory is still a highly speculative and complex area of physics, and it has not yet been experimentally verified. Researchers continue to work on developing the theory further and exploring its implications for our understanding of the universe.


Because `res` is just another `AIMessage` object, we can append it to `messages`, add another `HumanMessage`, and generate the next response in the conversation.

### Instructions

Continue the conversation with GPT.

- Append the latest AI response to `messages`.
- Create a new human message. Assign to `prompt`.
    - Use the content `"Why do physicists believe it can produce a 'unified theory'?"`
- Append the prompt to messages.

In [13]:
# Append the latest AI response to messages
messages.append(res)

In [14]:
# Create a new human message. Assign to prompt.
prompt = HumanMessage(content = "Why do physicists believe it can produce a 'unified theory'?")

# Append the prompt to messages
messages.append(prompt)

In [15]:
# Sanity check before you send to GPT: what does messages contain?
res = chat(messages)
print(res.content)


The method `BaseChatModel.__call__` was deprecated in langchain-core 0.1.7 and will be removed in 0.3.0. Use invoke instead.



Physicists believe that string theory has the potential to produce a unified theory because it aims to describe all fundamental forces and particles in a single framework. In traditional physics, there are separate theories to describe different forces, such as general relativity for gravity and the Standard Model for the other three fundamental forces (electromagnetism, weak nuclear force, and strong nuclear force).

One of the main goals of a unified theory, often referred to as the "Theory of Everything," is to provide a single framework that can explain all known phenomena in the universe. By positing that all particles are made up of vibrating strings, string theory attempts to unify gravity with the other fundamental forces in a consistent manner.

If successful, a unified theory like string theory could provide a deeper understanding of the fundamental nature of reality and potentially resolve some of the long-standing puzzles in physics, such as the incompatibility between gene

### Instructions

- Invoke the chat again to send the messages to GPT. Assign to `res`.
- Print the contents of the response.

In [16]:
# Invoke the chat again to send the messages to GPT. Assign to res.
res = chat.invoke(messages)

# Print the contents of the response
print(res.content)

Physicists believe that string theory has the potential to produce a unified theory because it aims to describe all fundamental forces and particles in a single framework. Currently, there are two main pillars of modern physics: general relativity, which describes gravity on a large scale, and quantum mechanics, which describes the behavior of particles on a small scale.

One of the main challenges in physics is to reconcile these two theories into a single, coherent framework known as a "unified theory" or "theory of everything." String theory offers a promising approach to achieving this goal by providing a theoretical framework that can potentially unify all fundamental forces and particles into a single, consistent description.

By treating particles as vibrating strings instead of point-like objects, string theory naturally incorporates both quantum mechanics and general relativity. This suggests that it could be the key to resolving the inconsistencies between these two theories 

## Task 2: Hallucinations

We have our chatbot, but as mentioned—the knowledge of LLMs can be limited. The reason for this is that LLMs learn all they know during training. An LLM essentially compresses the "world" as seen in the training data into the internal parameters of the model. We call this knowledge the _parametric knowledge_ of the model.



By default, LLMs have no access to the external world.

![langchain-no-access-to-world](langchain-no-access-to-world.png)

This means that GPT (or any other LLM) will perform badly on some types of question.

* The chatbot doesn't know about recent events. How does it respond if you ask about the weather in your city today?
* It can't answer questions about recent code or recent products. Try asking it `"Can you tell me about the LLMChain in LangChain?"` or `"What was the latest course released on DataCamp?"`
* It can't answer questions about confidential corporate information that hasn't been released into the internet.

### Instructions

Append the AI response to the list of messages.

- Print the number of messages in the conversation.
- Append the latest AI response to `messages`.
- Print the number of messages in the conversation again.

In [17]:
# Print the number of messages in the conversation
print(len(messages))

# Append the response to the list of messages
messages.append(res)

# Print the number of messages in the conversation again
print(len(messages))

6
7


### Instructions

Ask GPT about Llama 3.

- Create a new human message. Assign to `prompt`.
    - Use the content `"What is so special about Llama 3?"`.
- Append the prompt to `messages`.
- Invoke the chat to send the messages to GPT. Assign to `res`.
- Print the contents of the response.

In [18]:
# Create a new human message about Llama 3
prompt = HumanMessage(content = "What is so special about Llama 3?")

# Append this message to the conversation. Assign to prompt.
messages.append(prompt)

# Invoke the chat with the latest list of messages
res =chat.invoke(messages)

# Print the contents of the response
print(res.content)

I'm not certain what you are referring to with "Llama 3." Could you provide more context or clarify your question so I can better assist you?


### Confidently wrong: hallucinations from LLMs

Our chatbot can no longer help us, it doesn't contain the information we need to answer the question. It was very clear from this answer that the LLM doesn't know the information, but sometimes an LLM may respond like it _does_ know the answer—and this can be very hard to detect. See this example from the earliest version of GPT-4 in the OpenAI Playground:

![llm-chain-hallucination](llm-chain-hallucination.png)

OpenAI have since adjusted the behavior for this particular example as we can see below:


### Instructions

Ask GPT about LangChain.

- Append the latest AI response to `messages`.
- Create a new human message. Assign to `prompt`.
    - Use the content `"Can you tell me about the LLMChain in LangChain?"`.
- Append the prompt to `messages`.
- Send the messages to GPT. Assign to `res`.
- Print the contents of the response.

In [19]:
# Append the latest AI response to messages
messages.append(res)

# Create a new human message. Assign to prompt.
prompt = HumanMessage(content = "Can you tell me about the LLMChain in LangChain?")

# Append the latest prompt to messages
messages.append(prompt)

# Invoke the chat with the latest list of message
res = chat.invoke(messages)

# Print the contents of the response
print(res.content)

I apologize, but I am not familiar with the specific terms "LLMChain" and "LangChain." It is possible that they refer to specialized concepts, technologies, or projects that are not widely known. If you can provide more context or details about LLMChain in LangChain, I may be able to offer more assistance.


There is another way of feeding knowledge into LLMs. It is called _source knowledge_ and it refers to any information fed into the LLM via the prompt. We can try that with the LLMChain question. We can take a description of this object from the LangChain documentation.

### Instructions

Create a string of knowledge about chains.

- *Read the descriptions of LLMChains, Chains, and LangChain given in `llmchain_information`.*
- Combine the list of description strings into a single string. Assign to `source_knowledge`.

In [20]:
# Run these descriptions of LLMChains, Chains, and LangChain 
llmchain_information = [
    "A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, uses the PromptTemplate to format them into a prompt. It then passes that to the model. Finally, it uses the OutputParser (if provided) to parse the output of the LLM into a final format.",
    "Chains is an incredibly generic concept which returns to a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.",
    "LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also: (1) Be data-aware: connect a language model to other sources of data, (2) Be agentic: Allow a language model to interact with its environment. As such, the LangChain framework is designed with the objective in mind to enable those types of applications."
]
len(llmchain_information)

3

In [21]:
# Run this to join the definitions, separated by newlines
source_knowledge = "\n".join(llmchain_information)
source_knowledge

'A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, uses the PromptTemplate to format them into a prompt. It then passes that to the model. Finally, it uses the OutputParser (if provided) to parse the output of the LLM into a final format.\nChains is an incredibly generic concept which returns to a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.\nLangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also: (1) Be data-aware: connect a language model to other sources of data, (2) Be agentic: Allow a language model to interact with its environment. As such, the LangChain framework is designed with the objective in mind to enable those

We can feed this additional knowledge into our prompt with some instructions telling the LLM how we'd like it to use this information alongside our original query.

### Instructions

- Define a question. Assign to `query`.
    - Use the text `"Can you tell me about the LLMChain in LangChain?"`
- Create an augmented prompt containing the context and query. Assign to `augmented_prompt`.

        augmented_prompt = f"""Using the contexts below, answer the query. If some information is not provided within the contexts below, do not include, and if the query cannot be answered with the below information, say "I don't know".

        Contexts:
        {source_knowledge}

        Query: {query}"""

In [22]:
# Define a question. Assign to query
query = "Can you tell me about the LLMChain in LangChain?"

# Create an augmented prompt containing the context and query. Assign to augmented_prompt
augmented_prompt = f"""Using the contexts below, answer the query. If some information is not provided within the contexts below, do not include, and if the query cannot be answered with the below information, say "I don't know".

  Contexts:
  {source_knowledge}

  Query: {query}"""

In [23]:
# Print the augmented prompt
print(augmented_prompt)

Using the contexts below, answer the query. If some information is not provided within the contexts below, do not include, and if the query cannot be answered with the below information, say "I don't know".

  Contexts:
  A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, uses the PromptTemplate to format them into a prompt. It then passes that to the model. Finally, it uses the OutputParser (if provided) to parse the output of the LLM into a final format.
Chains is an incredibly generic concept which returns to a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.
LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also: (1) Be data

Now we feed this into our chatbot as we did before.

Don't append the previous AI message, since it wasn't a good answer.

### Instructions

Include the augmented prompt in the conversation.

- Print the last message in the list.
- Replace the last message with a human message containing the augmented prompt.

<details>
<summary>Code hints</summary>
<p>
    
The last element of a list can be accessed with the position `-1`.
    
```py
# Get the last element of a list
lst[-1]

# Replace the last element of a list
lst[-1] = new_value
```

</p>
</details>

In [24]:
# Print the last message in the conversation
messages[-1]

HumanMessage(content='Can you tell me about the LLMChain in LangChain?')

In [25]:
# Replace the last message with a human message containing the augmented prompt
messages[-1] = HumanMessage(content = augmented_prompt)

### Instructions

Ask GPT about LangChain again, this time providing source knowledge.

- Send the messages to GPT. Assign to `res`.
- Print the contents of the response.

In [26]:
# Invoke the chat with the list of messages
res = chat.invoke(messages)

# Print the contents of the response
print(res.content)

LLMChain in LangChain is the most common type of chain that consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, formats them into a prompt using the PromptTemplate, passes that to the model, and then uses the OutputParser (if provided) to parse the output of the LLM into a final format. It is designed to enable applications powered by language models to be data-aware and agentic, connecting language models to other data sources and allowing them to interact with their environment.


The quality of this answer is phenomenal! This is made possible thanks to the idea of augmented our query with external knowledge (source knowledge). There's just one problem—how do we get this information in the first place?

We learned in the previous code-alongs about Pinecone and vector databases. Well, they can help us here too. But first, we'll need a dataset.

## Task 3: Importing the Data

In this task, we will be importing our data. We will be using the Hugging Face Datasets library and [the `"jamescalam/llama-2-arxiv-papers"` dataset](https://huggingface.co/datasets/jamescalam/llama-2-arxiv-papers-chunked). This dataset contains a collection of ArXiv papers which will serve as the external knowledge base for our chatbot.

### Instructions

Load the ArXiv papers dataset.

- From the *datasets* package, import `load_dataset`.
- Load the train split of the `jamescalam/llama-2-arxiv-papers-chunked` dataset. Assign to `dataset`.
- Print the dataset object to see the structure of the data.
- *Look at the structure. Which fields should we keep?*

<details>
<summary>Code hints</summary>
<p>
    
To load the training part of a Hugging Face dataset, call `load_dataset()`, passing the dataset name, and setting `split` to `"train"`.

</p>
</details>

In [27]:
# From the datasets package, import load_dataset
from datasets import load_dataset

# Load the arxiv dataset, training set only
data = load_dataset("jamescalam/llama-2-arxiv-papers-chunked", split= "train")

# Print the dataset object
data

Downloading readme:   0%|          | 0.00/409 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/14.4M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/4838 [00:00<?, ? examples/s]

Dataset({
    features: ['doi', 'chunk-id', 'chunk', 'id', 'title', 'summary', 'source', 'authors', 'categories', 'comment', 'journal_ref', 'primary_category', 'published', 'updated', 'references'],
    num_rows: 4838
})

### Instructions

Print a record of dataset to get a feel for what they contain.

In [28]:
# Print a record of dataset
data[0]

{'doi': '1102.0183',
 'chunk-id': '0',
 'chunk': 'High-Performance Neural Networks\nfor Visual Object Classi\x0ccation\nDan C. Cire\x18 san, Ueli Meier, Jonathan Masci,\nLuca M. Gambardella and J\x7f urgen Schmidhuber\nTechnical Report No. IDSIA-01-11\nJanuary 2011\nIDSIA / USI-SUPSI\nDalle Molle Institute for Arti\x0ccial Intelligence\nGalleria 2, 6928 Manno, Switzerland\nIDSIA is a joint institute of both University of Lugano (USI) and University of Applied Sciences of Southern Switzerland (SUPSI),\nand was founded in 1988 by the Dalle Molle Foundation which promoted quality of life.\nThis work was partially supported by the Swiss Commission for Technology and Innovation (CTI), Project n. 9688.1 IFF:\nIntelligent Fill in Form.arXiv:1102.0183v1  [cs.AI]  1 Feb 2011\nTechnical Report No. IDSIA-01-11 1\nHigh-Performance Neural Networks\nfor Visual Object Classi\x0ccation\nDan C. Cire\x18 san, Ueli Meier, Jonathan Masci,\nLuca M. Gambardella and J\x7f urgen Schmidhuber\nJanuary 2011\nAbs

### Dataset Summary

The dataset we are using is sourced from the Llama 2 ArXiv papers. It is a collection of academic papers from ArXiv, a repository of electronic preprints approved for publication after moderation. Each entry in the dataset represents a "chunk" of text from these papers.

Because most **L**arge **L**anguage **M**odels (LLMs) only contain knowledge of the world as it was during training, they cannot answer our questions about Llama 2—at least not without this data.

## Task 4: Building the Knowledge Base

We now have a dataset that can serve as our chatbot knowledge base. Our next task is to transform that dataset into the knowledge base that our chatbot can use. To do this we must use an embedding model and vector database.

### Workflow

The workflow for setting up a chatbot is much the same as for setting up semantic serach and retrieval augmented generation.

- Initialize your connection to the Pinecone vector DB.
- Create an index (remember to consider the dimensionality of `text-embedding-ada-002`).
- Initialize OpenAI's `text-embedding-ada-002` model with LangChain.
- Populate the index with records (in this case from the Llama 2 dataset).

### Instructions

Initialize Pinecone, getting setup details from Workspace environment variables.

- Import the os package.
- Import the pinecone package.
- Initialize pinecone, setting the API key. Assign to `pc`.

<details>
<summary>Code hints</summary>
<p>
    
The Pinecone environment variable is usually called `PINECONE_API_KEY`, but check what you called it!
    
---
    
To initialize Pinecone, call `pinecone.Pinecone()`, setting `api_key` to the API key.

</p>
</details>

In [29]:
# Import the os and pinecone packages
!pip install pinecone
import os
import pinecone

# Create a Pinecone object. Assign to pc.
pc = pinecone.Pinecone(api_key = "<PINECONE_API_KEY>")

Defaulting to user installation because normal site-packages is not writeable
Collecting pinecone
  Downloading pinecone-6.0.2-py3-none-any.whl.metadata (9.0 kB)
Downloading pinecone-6.0.2-py3-none-any.whl (421 kB)
Installing collected packages: pinecone
Successfully installed pinecone-6.0.2


Then we initialize the index. We will be using OpenAI's `text-embedding-ada-002` model for creating the embeddings, so we set the `dimension` to `1536`.

### Instructions

Create a vector index in the Pinecone database.

- Import the time package.
- Choose a name for the vector index. Assign to `index_name`.
- Check if index_name is not in Pinecone's list of existing indexes.
    -  Create an index named index_name, dimension 1536, cosine similarity as its metric.
    -  While the index status is not ready, sleep for one second.

<details>
<summary>Code hints</summary>
<p>
    
Get the list of available indexes with `pc.list_indexes()`. The code pattern to get all available index names is as follows.
    
```py
[idx.name for idx in pc.list_indexes().indexes]
```
    
---
    
Create an index with `pc.create_index()`, passing the index name, and setting the dimension, metric, and spec. In theory, you can specify where in the cloud Pinecone runs. Currently, Pinecone Serverless only runs on AWS in limited locations. Try `us-east-1` first and `us-west-2` as a backup. The code pattern to create an index is as follows.
    
```py
pc.create_index(
        index_name,
        dimension=n_dims,
        metric="cosine|dotproduct|euclidean",
        spec=pinecone.ServerlessSpec(
            cloud="aws",
            region="us-east-1"
        )
    )
```
    
---
    
Get the index details with `pc.describe_index(index_name)`. The code pattern to check that the index is ready is as follows.
    
```py
pc.describe_index(index_name).status["ready"]
```

---
    
The code pattern for sleeping until a condition is met is as follows.
    
```py
while not condition
    time.sleep(n)
```
    
</p>
</details>

In [31]:
# Import the time package
import time

# Define the index name
index_name = "llama-3-rag"

# List the names of available indexes. Assign to existing_index_names.
existing_index_names = [idx.name for idx in pc.list_indexes().indexes]

# Check if index_name is not in the list of available indexes
if index_name not in pc.list_indexes():
    # Create the index with index_name, a dimension of 1536, and the metric "cosine"
    pc.create_index(
        index_name,
        dimension=1536,
        metric="cosine",
        spec=pinecone.ServerlessSpec(
            cloud="aws",
            region="us-east-1"
        )
    )
    # If the index status is not ready, sleep for 2 seconds
    while not pc.describe_index(index_name).status["ready"]:
        time.sleep(2)

### Instructions

- Connect to the index using its name. Assign to `index`.
- View the index stats.

In [32]:
# Connect to the index using its name. Assign to index.
index = pc.Index(index_name)

# View the index stats
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'metric': 'cosine',
 'namespaces': {},
 'total_vector_count': 0,
 'vector_type': 'dense'}

Our index is now ready but it's empty. It is a vector index, so it needs vectors. As mentioned, to create these vector embeddings we will OpenAI's `text-embedding-ada-002` model—we can access it via LangChain.

### Instructions

Create an embeddings model.

- From the `langchain_openai` package, import `OpenAIEmbeddings`.
- Create an embedings model object for `text-embedding-ada-002`. Assign to `embed_model`.

In [33]:
import os
from langchain_openai import AzureOpenAIEmbeddings
from azure.core.credentials import AzureKeyCredential

# Set the Azure OpenAI API key and endpoint
OPENAI_API_KEY = "<OPENAI_API_KEY>"  
api_key = os.getenv("OPENAI_API_KEY")
endpoint = "<endpoint>"

# Create an embeddings model object for text-embedding-ada-002
embed_model = AzureOpenAIEmbeddings(
    model="text-embedding-ada-002",
    azure_endpoint=endpoint,
    api_key=api_key,
    api_version="2023-05-15",  # Use the correct API version from Azure
)

Using this model we can create embeddings like so:

In [34]:
# Run this to see an example of the embeddings code pattern
texts = [
    "this is a sentence",
    "this is another sentence"
]

res = embed_model.embed_documents(texts=texts)
len(res), len(res[0])

(2, 1536)

From this we get two (aligning to our two chunks of text) 1536-dimensional embeddings.

We're now ready to embed and index all our our data! We do this by looping through our dataset and embedding and inserting everything in batches.

### Instructions

Prepare the data for upserting to Pinecone.

- From *tqdm*, import `tqdm` (a progress bar).
- Select these columns: `doi`, `chunk-id`, `chunk`, `title`, `source`. Assign to `data_selected`.
- Convert `data_selected` to a pandas DataFrame in batch sizes of `100`. Assign to `data_batched`.

<details>
<summary>Code hints</summary>
<p>
    
Select columns from a Hugging Face dataset with the `.select_columns()` method, passing a list of column names.
    
```py
data.select_columns(["column1", "column2"])
```

---

Convert a Hugging Face dataset to a pandas DataFrame with the `.to_pandas()` method. Set `batched` to `True` and `batch_size` to a positive integer to create a batched version of the dataset.
    
```py
data.to_pandas(batched=True, batch_size=n)
```

Note that this returns a generator for a DataFrame (rather than a DataFrame). That means you can't access the contents until you use it inside a loop.
    
</p>
</details>

In [35]:
# From the tqdm package, import tqdm
from tqdm import tqdm

# Select these columns: doi, chunk-id, chunk, title, source. Assign to data_selected.
data_selected = data.select_columns(["doi", "chunk-id", "chunk", "title", "source"])

# Convert data_selected to a pandas DataFrame in batch sizes of 100. Assign to data_batched.
data_batched = data.to_pandas(batched = True, batch_size = 100)

### Instructions

Split the dataset into batches and add it to the vector index.

- Loop over each batch in `data_batched`, adding a progress bar.
    - Concatenate the `doi` and `chunk-id` columns separated by `-`, then convert to a list. Assign to `ids`.
    - Get the `chunk` column and convert to a list. Assign to `texts`.
    - Use the embedding model to embed the texts. Assign to `embeds`.
    - Get the metadata from the batch. Assign to metadata.
        - Select the `chunk`, `title`, and `source` columns.
        - Apply the `dict` function to the columns axis.
        - Convert to a list.
    - Combine IDs, embeddings, and metadata as list of tuples. Assign to `to_upsert`.
    - Upsert to Pinecone.

<details>
<summary>Code hints</summary>
<p>
    
The code pattern to loop over the batched DataFrame generator is as follows.
    
```py
for batch in data_batched:
    # batch is now a DataFrame
    # do something with it
```
    
The variation of this with a progress bar is as follows.
   
```py
for batch in tqdm(data_batched):
    # batch is now a DataFrame
    # do something with it
```
    
---
    
Concatenate text columns in a data with `+`.
    
```py
df["col1"] + "-" + df["col2"] 
```
    
Convert a pandas Series to a list with `.to_list()`. You'll need an extra pair of parentheses here.
    
```py
(df["col1"] + "-" + df["col2"]).to_list()
```   
    
---
  
Embed documents with the `.embed_documents()` method of the embeddings model, passing the text as a list.
    
```py
embed_model.embed_documents(list_of_documents)
```
    
---

Pinecone wants the metadata as a list of dictionaries, not a DataFrame. 
    
```
[
  {"chunk": "chunk0", "title": "title0", "source": "source0"},
  {"chunk": "chunk1", "title": "title1", "source": "source1"},
  {"chunk": "chunk2", "title": "title2", "source": "source2"},
  ...
]
```

There are many ways of performing this conversion (though none of them are especially elegant). Use the method you are most comfortable with.
    
---

Combine separate lists into a list of tuples using `zip()`.
    
```py
zip(list0, list1, list2)
```
    
---
    
Upsert to a Pinecone index using the index's `.upsert()` method, setting `vectors` to the tuples of ids, text, and metadata.
    
```py
index.upsert(vectors=to_upsert)
```
    
</p>
</details>

In [36]:
# Loop over each batch in data_batched, adding a progress bar
for batch in tqdm(data_batched):
    # Concatenate the doi and chunk-id columns separated by -, then convert to a list. 
    # Assign to ids.
    ids = (batch["doi"] + "-" + batch["chunk-id"]).to_list()
    
    # Get the chunk column and convert to a list. Assign to texts.
    texts = batch["chunk"].to_list()
    
    # Use the embeddings model to embed the texts. Assign to embeds.
    embeds = embed_model.embed_documents(texts)
    
    # Get the metadata from the batch. Assign to metadata.
    # Select the chunk, title, source columns
    # Apply the dict function to the columns axis
    # Convert to a list
    metadata = batch \
        [["chunk", "title", "source"]] \
        .apply(dict, axis="columns") \
        .to_list()
    
    # Combine IDs, embeddings, and metadata as list of tuples. Assign to to_upsert.
    to_upsert = zip(ids, embeds, metadata)
    
    # Upsert to Pinecone
    index.upsert(vectors=to_upsert)

49it [10:16, 12.57s/it]


We can check that the vector index has been populated using `describe_index_stats` like before:

### Instructions

Check on updates to the vector index now that it contains the ArXiv dataset.

- View the index stats again.
- *What has changed since you last looked?*

In [40]:
# Get the index's descriptive statistics
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'metric': 'cosine',
 'namespaces': {'': {'vector_count': 4838}},
 'total_vector_count': 4838,
 'vector_type': 'dense'}

## Task 5: Retrieval Augmented Generation

In the previous task we built a fully-fledged knowledge base. Now it's time to connect that knowledge base to our chatbot. To do that we'll be diving back into LangChain and reusing our template prompt from earlier.

### Workflow

* Create a LangChain `vectorstore` object using our `index` and `embed_model`.
* Try searching for relevant information about Llama 2.
* Create a function (`augment_prompt`) that can take our query, retrieve information using the `vectorstore`, and merge them all into a single retrieval-augmented prompt.
* Try asking the chatbot Llama 2 questions with and without RAG, comparing the differences.

To use LangChain's RAG pipeline we need to load the LangChain abstraction for a vector index, called a `vectorstore`. We pass in our vector `index` to initialize the object.

### Instructions

Initialize the vector store object.

- From the `langchain_pinecone` package, import `PineconeVectorStore`.
- State the metadata field that contains our text (`"chunk"`). Assign to `text_field`.
- Create a `PineconeVectorStore` from the index, the embedding model, and the text field. Assign to `vectorstore`.

In [38]:
!pip install pinecone-client langchain_openai langchain

Defaulting to user installation because normal site-packages is not writeable


In [41]:
# From the langchain_pinecone package, import PineconeVectorStore
from langchain_pinecone import PineconeVectorStore

# Define the metadata field that contains our text ("chunk"). Assign to `text_field`.
text_field = "chunk"

# Create a PineconeVectorStore from the index, the embedding model, and the text field.
# Assign to `vectorstore.
vectorstore = PineconeVectorStore(
    index, embed_model, text_field
)

Using this `vectorstore` we can already query the index and see if we have any relevant information given our question about Llama 2.

### Instructions

Perform similarity search against a question.

- Define a question. Assign to query.
    - Use the text `"What is so special about Llama 2?"`.
- Perform a similarity search for the query, returning the top 3 results.

<details>
<summary>Code hints</summary>
<p>
    
To perform a similarity search, call the vectorstore's `.similarity_search()` method, passing the query and setting `k` to the number of results to return.

</p>
</details>

In [42]:
# Define a new question
query = "What is so special about Llama 2?"

# Use similarity search, returning the top 3 results
vectorstore.similarity_search(query, k=3)

[Document(page_content='Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang\nRoss Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang\nAngela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic\nSergey Edunov Thomas Scialom\x03\nGenAI, Meta\nAbstract\nIn this work, we develop and release Llama 2, a collection of pretrained and ﬁne-tuned\nlarge language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.\nOur ﬁne-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , are optimized for dialogue use cases. Our\nmodels outperform open-source chat models on most benchmarks we tested, and based on\nourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosedsource models. We provide a detailed description of our approach to ﬁne-tuning and safety', metadata={'source': 'http://arxiv.org/pdf/2307.09288', 'title': 'Llama 2: Open Foundation and Fine-Tun

We return a lot of text here and it's not that clear what we need or what is relevant. Fortunately, our LLM will be able to parse this information much faster than us. All we need is to connect the output from our `vectorstore` to our `chat` chatbot. To do that we can use the same logic as we used earlier.

### Instructions

Run the code to define a function to augment a prompt with knowledge base results.

In [43]:
# Define this function to augment the prompt with data from the vector database
def augment_prompt(query: str):
    results = vectorstore.similarity_search(query, k=3)
    source_knowledge = "\n".join([x.page_content for x in results])
    augmented_prompt = f"""Using the contexts below, answer the query. If some information is not provided within
the contexts below, do not include, and if the query cannot be answered with the below information, say "I don't know".

Contexts:
{source_knowledge}

Query: {query}"""
    return augmented_prompt

Using this we produce an augmented prompt:

In [45]:
# Print the augmented prompt for the query about Llama 2
augment_prompt(query)

'Using the contexts below, answer the query. If some information is not provided within\nthe contexts below, do not include, and if the query cannot be answered with the below information, say "I don\'t know".\n\nContexts:\nAlan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang\nRoss Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang\nAngela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic\nSergey Edunov Thomas Scialom\x03\nGenAI, Meta\nAbstract\nIn this work, we develop and release Llama 2, a collection of pretrained and ﬁne-tuned\nlarge language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.\nOur ﬁne-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , are optimized for dialogue use cases. Our\nmodels outperform open-source chat models on most benchmarks we tested, and based on\nourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstitutefor

There is still a lot of text here, so let's pass it onto our chat model to see how it performs.

### Instructions

Ask GPT about LLama2, augmenting the prompt with source knowledge from the Pinecone vector index.

- Create a new human message. Assign to `prompt`.
    - Call `augment_prompt()` on the query and use this as the content.
- Append the prompt to `messages`.
- Send the messages to GPT. Assign to `res`.
- Print the contents of the response.

In [46]:
# Define the augmented prompt as a human message. Assign to prompt.
prompt = HumanMessage(content=augment_prompt(query))

# Append the prompt the the list of messages
messages.append(prompt)

# Invoke a chat with the list of messages
res = chat.invoke(messages)

# Print the contents of the response
print(res.content)

Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) with parameter scales ranging from 7 billion to 70 billion parameters. These fine-tuned LLMs, such as L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc, are optimized for dialogue use cases. They have been shown to outperform open-source chat models on most benchmarks tested and have demonstrated strong performance in terms of helpfulness and safety based on human evaluations. This suggests that Llama 2 models may serve as suitable substitutes for closed-source models in certain applications. The development and release of Llama 2 represent advancements in AI alignment research and provide a transparent and reproducible approach to fine-tuning and safety considerations in language models.


We can continue with more Llama 2 questions. Let's try _without_ RAG first:

### Instructions

Ask GPT about LLama 2.

- Create a new human message. Assign to `prompt`.
    - Use the context `"What safety measures were used in the development of llama 2?"`.
- Invoke a chat with GPT sending the messages plus the prompt. Assign to `res`.
    - *Don't use `.append()` here, as we don't want to store the latest message in the conversation. The prompt needs to be converted to a list to add it to the existing list.*
- Print the contents of the response.

In [47]:
# Create a new human message. Assign to prompt.
prompt = HumanMessage(content = "What safety measures were used in the development of llama 2?")

# Invoke a chat with GPT sending the messages plus the prompt to GPT. 
# Assign to res. Don't use .append()!
res = chat(messages + [prompt])

# Print the contents of the response.
print(res.content)

In the development of Llama 2, safety measures were implemented to ensure the usability and safety of the models optimized for dialogue use cases. The safety measures included:

- Optimization for dialogue use cases: The Llama 2 LLMs were fine-tuned and optimized specifically for dialogue use cases, which likely involved training the models on datasets relevant to conversation and ensuring they perform well in dialogue scenarios.

- Human evaluations for helpfulness and safety: The models were evaluated by humans to assess their helpfulness and safety. This likely involved gathering feedback from human evaluators on how well the models performed in terms of providing useful responses and maintaining a safe interaction environment.

- Comparison with closed-source models: The Llama 2 models were compared with closed-source models to assess their performance and safety. This comparison likely involved evaluating how well the Llama 2 models aligned with human preferences and safety standa

The chatbot is able to respond about Llama 2 thanks to it's conversational history stored in `messages`. However, it doesn't know anything about the safety measures themselves as we have not provided it with that information via the RAG pipeline. Let's try again but with RAG.

### Instructions

Ask GPT about LLama 2 again.

- Do the same thing again, but this time augment the prompt using `augment_prompt()`.

In [48]:
# Create another human message with the same question, augmenting the prompt
prompt = HumanMessage(content = augment_prompt("What safety measures were used in the development of llama 2?"))


# Invoke a chat with the list of messages + the latest prompt
res = chat(messages + [prompt])

# Print the contents of the response
print(res.content)

In the development of Llama 2, safety measures were implemented to increase the safety of the models. These safety measures included:

1. Using safety-specific data annotation and tuning: The models were annotated and tuned with safety in mind to ensure that they adhere to safety standards and considerations.

2. Conducting red-teaming: Red-teaming involves simulating adversarial attacks or potential risks to identify vulnerabilities in the models and improve their robustness.

3. Employing iterative evaluations: Continuous evaluations were conducted throughout the development process to assess the safety of the models and make necessary adjustments.

These safety measures were aimed at enhancing the safety of the fine-tuned LLMs and ensuring responsible development practices in the field of language model research.


We get a much better informed response that includes several items missing in the previous non-RAG response, such as "red-teaming", "iterative evaluations", and the intention of the researchers to share this research to help "improve their safety, promoting responsible development in the field".

## Summary

You built a chatbot that can answer questions about cutting edge large language models!

In particular, you

- learned how to have a conversation with GPT by appending messages.
- saw how to provide context in a prompt to help GPT answer questions.
- setup a Pinecone database and added data to a vector index.
- retrieved text relevant to user questions.
- combined it all to create a chatbot that answered questions that GPT could not answer by itself.