# Building RAG Chatbots with LangChain
(taken from [LangChain's documentation](https://github.com/pinecone-io/examples/blob/master/learn/generation/langchain/rag-chatbot.ipynb)


In this example, we'll work on building an AI chatbot from start-to-finish. We will be using LangChain, OpenAI, and Pinecone vector DB, to build a chatbot capable of learning from the external world using Retrieval Augmented Generation (RAG).

We will be using a dataset sourced from the Llama 2 ArXiv paper and other related papers to help our chatbot answer questions about the latest and greatest in the world of GenAI.

By the end of the example we'll have a functioning chatbot and RAG pipeline that can hold a conversation and provide informative responses based on a knowledge base.

## Before you begin
You'll need to get an OpenAI API key and Pinecone API key.

## Prerequisites
Before we start building our chatbot, we need to install some Python libraries. Here's a brief overview of what each library does:

**langchain**: This is a library for GenAI. We'll use it to chain together different language models and components for our chatbot.

**openai**: This is the official OpenAI Python client. We'll use it to interact with the OpenAI API and generate responses for our chatbot.

**datasets**: This library provides a vast array of datasets for machine learning. We'll use it to load our knowledge base for the chatbot.

**pinecone-client**: This is the official Pinecone Python client. We'll use it to interact with the Pinecone API and store our chatbot's knowledge base in a vector database.

You can install these libraries using pip like so:

In [None]:
%pip install -qU \
    langchain==0.0.292 \
    openai==0.28.0 \
    datasets==2.10.1 \
    pinecone-client==2.2.4 \
    tiktoken==0.5.1


## Building a Chatbot (no RAG)
We will be relying heavily on the LangChain library to bring together the different components needed for our chatbot. To begin, we'll create a simple chatbot without any retrieval augmentation. We do this by initializing a ChatOpenAI object. For this we do need an OpenAI API key.

In [7]:
import os
from langchain.chat_models import ChatOpenAI

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") or "YOUR_API_KEY"

chat = ChatOpenAI(
    openai_api_key=os.environ["OPENAI_API_KEY"],
    model='gpt-3.5-turbo'
)


Chats with OpenAI's gpt-3.5-turbo and gpt-4 chat models are typically structured (in plain text) like this:
```python
System: You are a helpful assistant.

User: Hi AI, how are you today?

Assistant: I'm great thank you. How can I help you?

User: I'd like to understand string theory.

Assistant:
```

The final "Assistant:" without a response is what would prompt the model to continue the conversation. In the official OpenAI ChatCompletion endpoint these would be passed to the model in a format like:

```python
[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hi AI, how are you today?"},
    {"role": "assistant", "content": "I'm great thank you. How can I help you?"}
    {"role": "user", "content": "I'd like to understand string theory."}
]
```
In LangChain there is a slightly different format. We use three message objects like so:



In [9]:
from langchain.schema import (
    SystemMessage,
    HumanMessage,
    AIMessage
)

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="Hi AI, how are you today?"),
    AIMessage(content="I'm great thank you. How can I help you?"),
    HumanMessage(content="I'd like to understand string theory.")
]

The format is very similar, we're just swapped the role of "user" for HumanMessage, and the role of "assistant" for AIMessage.

We generate the next response from the AI by passing these messages to the ChatOpenAI object.



In [10]:
res = chat(messages)
res


AIMessage(content="Sure, I can help you with that. String theory is a theoretical framework in physics that attempts to explain the fundamental nature of particles and their interactions. It proposes that the fundamental building blocks of the universe are not point-like particles, but rather tiny, vibrating strings.\n\nIn string theory, these strings can vibrate in different ways, producing different particles with different properties. The vibrations of these strings determine the mass, charge, and other fundamental characteristics of particles.\n\nOne of the key ideas in string theory is that it requires extra dimensions beyond the three spatial dimensions (length, width, and height) that we are familiar with. These extra dimensions are compactified or curled up in tiny, microscopic sizes that are not directly observable in our everyday experience.\n\nString theory also suggests the existence of different types of strings, such as closed loops or open-ended strings. Closed strings f

In response we get another AI message object. We can print it more clearly like so:


In [11]:
print(res.content)


Sure, I can help you with that. String theory is a theoretical framework in physics that attempts to explain the fundamental nature of particles and their interactions. It proposes that the fundamental building blocks of the universe are not point-like particles, but rather tiny, vibrating strings.

In string theory, these strings can vibrate in different ways, producing different particles with different properties. The vibrations of these strings determine the mass, charge, and other fundamental characteristics of particles.

One of the key ideas in string theory is that it requires extra dimensions beyond the three spatial dimensions (length, width, and height) that we are familiar with. These extra dimensions are compactified or curled up in tiny, microscopic sizes that are not directly observable in our everyday experience.

String theory also suggests the existence of different types of strings, such as closed loops or open-ended strings. Closed strings form closed loops and give

"Because `res` is just another `AIMessage` object, we can append it to `messages`, add another `HumanMessage`, and generate the next response in the conversation."

In [None]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="Why do physicists believe it can produce a 'unified theory'?"
)
# add to messages
messages.append(prompt)

# send to chat-gpt
res = chat(messages)

print(res.content)


## Dealing with Hallucinations

We have our chatbot, but as mentioned — the knowledge of LLMs can be limited. The reason for this is that LLMs learn all they know during training. An LLM essentially compresses the "world" as seen in the training data into the internal parameters of the model. We call this knowledge the *parametric knowledge* of the model.

By default, LLMs have no access to the external world.

The result of this is very clear when we ask LLMs about more recent information, like about the new (and very popular) Llama 2 LLM.

In [12]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="What is so special about Llama 2?"
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [13]:
print(res.content)

I'm sorry, but I don't have any information on something called "Llama 2." It's possible that you might be referring to something specific that I'm not aware of. Could you please provide more context or clarify your question so that I can better assist you?


Our chatbot can no longer help us, it doesn't contain the information we need to answer the question. It was very clear from this answer that the LLM doesn't know the informaiton, but sometimes an LLM may respond like it does know the answer — and this can be very hard to detect.

OpenAI have since adjusted the behavior for this particular example as we can see below:


In [14]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="Can you tell me about the LLMChain in LangChain?"
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [15]:
print(res.content)

I apologize, but I am not familiar with a specific technology called "LLMChain" within "LangChain." It's possible that you may be referring to a lesser-known or specialized aspect of a particular blockchain or cryptocurrency project. If you could provide more details or context about LLMChain or LangChain, I may be able to provide you with more specific information.


There is another way of feeding knowledge into LLMs. It is called *source knowledge* and it refers to any information fed into the LLM via the prompt. We can try that with the LLMChain question. We can take a description of this object from the LangChain documentation.

In [16]:
llmchain_information = [
    "A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, uses the PromptTemplate to format them into a prompt. It then passes that to the model. Finally, it uses the OutputParser (if provided) to parse the output of the LLM into a final format.",
    "Chains is an incredibly generic concept which returns to a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.",
    "LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also: (1) Be data-aware: connect a language model to other sources of data, (2) Be agentic: Allow a language model to interact with its environment. As such, the LangChain framework is designed with the objective in mind to enable those types of applications."
]

source_knowledge = "\n".join(llmchain_information)

We can feed this additional knowledge into our prompt with some instructions telling the LLM how we'd like it to use this information alongside our original query.

In [17]:
query = "Can you tell me about the LLMChain in LangChain?"

augmented_prompt = f"""Using the contexts below, answer the query.

Contexts:
{source_knowledge}

Query: {query}"""

Now we feed this into our chatbot as we were before.

In [18]:
# create a new user prompt
prompt = HumanMessage(
    content=augmented_prompt
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [19]:
print(res.content)

The LLMChain is a specific type of chain within the LangChain framework. LangChain is a framework for developing applications powered by language models. The LLMChain, short for "Language Model Chain," is the most common type of chain in LangChain.

The LLMChain consists of three main components: a PromptTemplate, a model (which can either be an LLM or a ChatModel), and an optional output parser. The purpose of the LLMChain is to take multiple input variables, format them using the PromptTemplate into a prompt, and pass that prompt to the language model.

The language model, whether it's an LLM or a ChatModel, generates a response based on the given prompt. Finally, the LLMChain uses the optional output parser to parse the output of the language model into a final format, if needed.

Overall, the LLMChain is designed to provide a modular and flexible way to connect a language model to other sources of data and enable interactions with the environment within the LangChain framework. It 

The quality of this answer is phenomenal. This is made possible thanks to the idea of augmented our query with external knowledge (source knowledge). There's just one problem — how do we get this information in the first place?

We learned in the previous chapters about Pinecone and vector databases. Well, they can help us here too. But first, we'll need a dataset.

## Importing the Data

In this task, we will be importing our data. We will be using the Hugging Face Datasets library to load our data. Specifically, we will be using the "`jamescalam/llama-2-arxiv-papers`" dataset. This dataset contains a collection of ArXiv papers which will serve as the external knowledge base for our chatbot.



In [30]:
from datasets import load_dataset

dataset = load_dataset(
    "wikimedia/wikipedia",
    split="train"
)

dataset

Downloading readme: 100%|██████████| 1.02k/1.02k [00:00<00:00, 932kB/s]


Downloading and preparing dataset None/None to file:///Users/paulb/.cache/huggingface/datasets/mlabonne___parquet/mlabonne--guanaco-llama2-1k-f1f1134768f90029/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec...


Downloading data: 100%|██████████| 967k/967k [00:00<00:00, 12.5MB/s]
Downloading data files: 100%|██████████| 1/1 [00:00<00:00,  2.14it/s]
Extracting data files: 100%|██████████| 1/1 [00:00<00:00, 316.34it/s]
                                                                       

Dataset parquet downloaded and prepared to file:///Users/paulb/.cache/huggingface/datasets/mlabonne___parquet/mlabonne--guanaco-llama2-1k-f1f1134768f90029/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec. Subsequent calls will reuse this data.


NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported.