<a href="https://colab.research.google.com/github/Dhanyabahadur/project_notebooks/blob/main/Rag_chatbot_openai.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Building RAG Chatbots with LangChain**

We will be using LangChain, OpenAI, and Pinecone vector DB, to build a chatbot capable of learning from the external world using Retrieval Augmented Generation (RAG).

By the end of this notebook we'll have a functioning chatbot and RAG pipeline that can hold a conversation and provide informative responses based on a knowledge base.

**Requirements**

Before we start building our chatbot, we need to install some Python libraries. Here's a brief overview of ahta each library does:

* langchain: This is a library for GenAI. We'll use it to chain together different language models and components for our chatbot.

* openai: This is the official OpenAI Python client. We'll use it to interact with the OpenAI API and generate responses for our chatbot.

* datasets: This library provides a vast array of datasets for machine learning. We'll use it to load our knowledge base for the chatbot.

* pinecone-client: This is the official Pinecone Python client. We'll use it to interact with the Pinecone API and store our chatbot's knowledge base in a vector database.

In [1]:
!pip install -qU \
     langchain==0.0.292 \
     openai==0.28.0 \
     datasets==2.10.1 \
     pinecone-client==2.2.4 \
     tiktoken==0.5.1

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.5/76.5 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m469.0/469.0 kB[0m [31m18.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m179.4/179.4 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m24.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.4/48.4 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m110.5/110.5 kB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━

**Building a Chatbot (no RAG)**

We will be relying heavily on the LangChain library to bring together the different components needed for our chatbot. To begin, we'll create a simple chatbot without any RAG.

In [3]:
import os
from langchain.chat_models import ChatOpenAI

os.environ["OPENAI_API_KEY"] = os.getenv('OPENAI_API_KEY') or "YOUR_OPENAI_API"

chat = ChatOpenAI(
  openai_api_key=os.environ["OPENAI_API_KEY"],
  model='gpt-3.5-turbo'
)

In [7]:
from langchain.schema import (
    SystemMessage,
    HumanMessage,
    AIMessage
)

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="Hi AI, how are you today?"),
    AIMessage(content="I am great thank you. How can I help you?"),
    HumanMessage(content="I'd like to understand generative ai.")
]

We generate the next response from the AI by passing these messages to the ChatOpenAI object.

In [8]:
res = chat(messages)
res

AIMessage(content='Generative AI refers to a subset of artificial intelligence that involves generating new and original content. It can create various types of media, including images, text, music, and even videos. Generative AI models are typically trained on a large dataset and can then generate new content based on the patterns and styles learned during training.\n\nOne popular approach to generative AI is the use of generative adversarial networks (GANs). GANs consist of two components: a generator and a discriminator. The generator generates new content, while the discriminator tries to distinguish between the generated content and real content. Through an iterative process, the generator learns to produce increasingly realistic content, while the discriminator becomes better at identifying generated content.\n\nGenerative AI has been used in various applications, such as image synthesis, text generation, and even creating deepfake videos. It has also been used in creative applic

In [9]:
print(res.content)

Generative AI refers to a subset of artificial intelligence that involves generating new and original content. It can create various types of media, including images, text, music, and even videos. Generative AI models are typically trained on a large dataset and can then generate new content based on the patterns and styles learned during training.

One popular approach to generative AI is the use of generative adversarial networks (GANs). GANs consist of two components: a generator and a discriminator. The generator generates new content, while the discriminator tries to distinguish between the generated content and real content. Through an iterative process, the generator learns to produce increasingly realistic content, while the discriminator becomes better at identifying generated content.

Generative AI has been used in various applications, such as image synthesis, text generation, and even creating deepfake videos. It has also been used in creative applications, such as generat

Because res is just another AIMessage object, we can append it to messages, and another HumanMessage, and generate the next response in the conversation.

In [10]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="Why do physicists believe in Quantum Physics?"
)

# add to messages
messages.append(prompt)

# send to chat-gpt
res = chat(messages)

print(res.content)

Physicists believe in quantum physics because it is a highly successful and well-established scientific theory that accurately describes the behavior of the microscopic world. Quantum physics, also known as quantum mechanics, was developed in the early 20th century to explain the behavior of particles at the atomic and subatomic level.

There are several reasons why physicists have confidence in quantum physics:

1. Experimental evidence: Numerous experiments have been conducted that confirm the predictions of quantum physics. These experiments have been rigorously tested and repeatedly verified, providing strong evidence for the validity of the theory.

2. Mathematical consistency: Quantum physics is based on a set of mathematical equations that have been extensively tested and shown to be internally consistent. The mathematical formalism of quantum mechanics allows physicists to make precise predictions and calculations that align with experimental observations.

3. Predictive power:

**Dealing with Hallucinations**

We have our chatbot, but as mentioned - the knowledge of LLMs can be limited. The reason for this is that LLMs learn all they know during training. An LLM essentially compresses the "world" as seen in the training data into the internal parameters of the model. we call this knowledge the parametric knowledge of the model.

By default, LLMs have no access to the external world.

The result of this is very clear when we ask LLMs about more recent information, like about the new (and very popular) Llama 2 LLM.

In [12]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="What is so special about Lllama 2?"
)

# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [13]:
print(res.content)

I'm sorry, but I am not familiar with "Lllama 2" specifically. It's possible that you might be referring to something specific, such as a game, a movie, or a scientific discovery. Could you please provide more context or clarify your question? That way, I can provide you with more accurate information.


our chatbot can no longer help us, it doesn't contain the information we need to answer the question. It was very clear from this answer that the LLM doesn't know the information, but sometimes an LLM may respond like it does know the answer - and this can be very hard to detect.

OpenAI have since adjusted the behaviour for this particular example as we can see below:

In [14]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content = "Can you tell me about the LLMChain in LangChain?"
)

# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [15]:
print(res.content)

I apologize, but I couldn't find any specific information about an "LLMChain" or "LangChain" in my database. It's possible that these terms might be specific to a certain context or project that I'm not aware of. Without further information, I'm unable to provide you with details about the LLMChain in LangChain. Could you please provide more context or clarify your question?


There is another way of feeding knowledge into LLMs. It is called source knowledge and it refers to any information fed into the LLM via the prompt. We can try that with the LLMChain question. We can take a description of this object from the LangChain documentation.

In [16]:
llmchain_information = [
    "A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, uses the PromptTemplate to format them into a prompt. It then passes that to the model. Finally, it uses the OutputParser (if provided) to parse the output of the LLM into a final format.",
    "Chains is an incredibly generic concept which returns to a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.",
    "LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also: (1) Be data-aware: connect a language model to other sources of data, (2) Be agentic: Allow a language model to interact with its environment. As such, the LangChain framework is designed with the objective in mind to enable those types of applications."
]

source_knowledge = "\n".join(llmchain_information)

We xan feed thsi additional knowledge into our prompt with some instructions telling the LLM how we'd like it to use this information alongside our original query.

In [17]:
query = "Can you tell me about the LLMChain in LnagChain?"

augmented_prompt = f"""Using the contexts below, answer the query.

Contexts:
{source_knowledge}

Query: {query}"""

Now we feed thsi into our chatbot as we were before.

In [18]:
# Create a new user prompt
prompt = HumanMessage(
    content=augmented_prompt
)

# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [19]:
print(res.content)

LLMChain is a type of chain within the LangChain framework. Chains in LangChain are modular components or other chains combined in a specific way to achieve a particular use case. LLMChain is the most common type of chain in LangChain.

An LLMChain consists of three main components: a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. The chain takes multiple input variables and uses the PromptTemplate to format them into a prompt. This prompt is then passed to the language model (LLM or ChatModel). Finally, the output of the language model is optionally parsed into a final format using the output parser.

The LangChain framework aims to enable powerful and differentiated applications by connecting language models to other data sources and allowing them to interact with their environment. The LLMChain, as part of this framework, provides a structured way to utilize language models within applications developed using LangChain.


The quality of thsi answer is phenomenal. Thsi is made possible thanks to the idea of augmented our query with external knowledge (source knowledge). There's just one problem - how do we get thai information in the first place?

**Importing the Data**

In this task, we will be importing our data. We will be using the Hugging face datasets library to load our data. Specificially, we will be using the "jamescalam/llama-2-arxiv-papers" dataset. Thsi dataset contains a collection of ArXiv papers which will serve as the external knowledge base for our chatbot.

In [20]:
from datasets import load_dataset

dataset = load_dataset(
    "jamescalam/llama-2-arxiv-papers-chunked",
    split="train"
)

dataset

Downloading readme:   0%|          | 0.00/409 [00:00<?, ?B/s]

Downloading and preparing dataset json/jamescalam--llama-2-arxiv-papers-chunked to /root/.cache/huggingface/datasets/jamescalam___json/jamescalam--llama-2-arxiv-papers-chunked-ea255a807f3039a6/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/14.4M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/jamescalam___json/jamescalam--llama-2-arxiv-papers-chunked-ea255a807f3039a6/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data.


Dataset({
    features: ['doi', 'chunk-id', 'chunk', 'id', 'title', 'summary', 'source', 'authors', 'categories', 'comment', 'journal_ref', 'primary_category', 'published', 'updated', 'references'],
    num_rows: 4838
})

In [21]:
dataset[0]

{'doi': '1102.0183',
 'chunk-id': '0',
 'chunk': 'High-Performance Neural Networks\nfor Visual Object Classi\x0ccation\nDan C. Cire\x18 san, Ueli Meier, Jonathan Masci,\nLuca M. Gambardella and J\x7f urgen Schmidhuber\nTechnical Report No. IDSIA-01-11\nJanuary 2011\nIDSIA / USI-SUPSI\nDalle Molle Institute for Arti\x0ccial Intelligence\nGalleria 2, 6928 Manno, Switzerland\nIDSIA is a joint institute of both University of Lugano (USI) and University of Applied Sciences of Southern Switzerland (SUPSI),\nand was founded in 1988 by the Dalle Molle Foundation which promoted quality of life.\nThis work was partially supported by the Swiss Commission for Technology and Innovation (CTI), Project n. 9688.1 IFF:\nIntelligent Fill in Form.arXiv:1102.0183v1  [cs.AI]  1 Feb 2011\nTechnical Report No. IDSIA-01-11 1\nHigh-Performance Neural Networks\nfor Visual Object Classi\x0ccation\nDan C. Cire\x18 san, Ueli Meier, Jonathan Masci,\nLuca M. Gambardella and J\x7f urgen Schmidhuber\nJanuary 2011\nAbs

**Dataset Overview**

The dataset we are using is sourced from the Llama 2 ArXiv papers. It is a collection of academic papers from ArXiv, a repository of electronic preprints approved for publication after moderation. Each entry in the dataset represents a "chunk" of text from these papers.

Because modt LLMs only contain knowledge of the world as it was during training, they cannot answer our questions about Llama 2.

**Task: Building the Knowledge Base**

We now have a dataset that can serve as our chatbot knowledge base. Our next task is to transform that dataset into the knowledge base that our chatbot can use. To do thsi we must use an embedding model and vector database.

In [22]:
import pinecone

pinecone.init(
    api_key=os.environ.get('PINECONE_API_KEY') or 'YOUR_Pinecone_API',
    environment=os.environ.get('PINECONE_ENVIRONMENT') or 'gcp-starter'
)

In [23]:
import time

index_name = 'llama-2-rag'

if index_name not in pinecone.list_indexes():
     pinecone.create_index(
         index_name,
         dimension=1536,
         metric='cosine'
     )
     # wait for index to finish initialization
     while not pinecone.describe_index(index_name).status['ready']:
          time.sleep(1)

index = pinecone.Index(index_name)

In [24]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}

In [25]:
from langchain.embeddings.openai import OpenAIEmbeddings

embed_model = OpenAIEmbeddings(model="text-embedding-ada-002")

using this model we can create embeddings like so:

In [26]:
texts = [
    'this is the first chunk of text',
    'then another second chunk of text is here'
]

res = embed_model.embed_documents(texts)
len(res), len(res[0])

(2, 1536)

From thsi we get two (aligning to our two chunks of text) 1536-dimensional embeddings.

We're now ready to embed and index all our data! We do this by looping through our dataset and embedding and inserting everything in batches.

In [27]:
from tqdm.auto import tqdm # for progress bar

data = dataset.to_pandas() # this makes it easier to iterate over the dataset

batch_size = 100

for i in tqdm(range(0, len(data), batch_size)):
    i_end = min(len(data), i+batch_size)
    # get batch of data
    batch = data.iloc[i:i_end]
    # generate unique ids for each chunk
    ids = [f"{x['doi']}-{x['chunk-id']}" for i, x in batch.iterrows()]
    # get text to embed
    texts = [x['chunk'] for _, x in batch.iterrows()]
    # embed text
    embeds = embed_model.embed_documents(texts)
    # get metadata to store in Pinecone
    metadata = [
        {'text': x['chunk'],
         'source': x['source'],
         'title': x['title']} for i, x in batch.iterrows()
    ]
    # add to Pinecone
    index.upsert(vectors=zip(ids, embeds, metadata))

  0%|          | 0/49 [00:00<?, ?it/s]

In [28]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.048,
 'namespaces': {'': {'vector_count': 4800}},
 'total_vector_count': 4800}

**RAG**

In [29]:
from langchain.vectorstores import Pinecone

text_field = "text" # the metadata field that contains our text

# initialize the vector store object
vectorstore = Pinecone(
    index, embed_model.embed_query, text_field
)



In [30]:
query = "What is so special about Llama 2?"

vectorstore.similarity_search(query, k=3)

[Document(page_content='Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang\nRoss Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang\nAngela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic\nSergey Edunov Thomas Scialom\x03\nGenAI, Meta\nAbstract\nIn this work, we develop and release Llama 2, a collection of pretrained and ﬁne-tuned\nlarge language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.\nOur ﬁne-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , are optimized for dialogue use cases. Our\nmodels outperform open-source chat models on most benchmarks we tested, and based on\nourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosedsource models. We provide a detailed description of our approach to ﬁne-tuning and safety', metadata={'source': 'http://arxiv.org/pdf/2307.09288', 'title': 'Llama 2: Open Foundation and Fine-Tun

We return a lot of text here and it's not that clear what we need or what is relevant. Fortunately, our LLM will be able to parse this information much faster than us. All we need is to connect the output from our vectorstore to our chat chatbot. To do that we can use same logic as we used earlier.

In [31]:
def augment_prompt(query: str):
    # get top 3 results from knowledge base
    results = vectorstore.similarity_search(query, k=3)
    # get the text from teh results
    source_knowledge = "\n".join([x.page_content for x in results])
    # feed into an augmented prompt
    augmented_prompt = f"""Using the contexts below, answer the query.

    Contexts:
    {source_knowledge}

    Query: {query}"""
    return augmented_prompt

In [32]:
print(augment_prompt(query))

Using the contexts below, answer the query.
    
    Contexts:
    Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang
Ross Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang
Angela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic
Sergey Edunov Thomas Scialom
GenAI, Meta
Abstract
In this work, we develop and release Llama 2, a collection of pretrained and ﬁne-tuned
large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.
Our ﬁne-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , are optimized for dialogue use cases. Our
models outperform open-source chat models on most benchmarks we tested, and based on
ourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosedsource models. We provide a detailed description of our approach to ﬁne-tuning and safety
asChatGPT,BARD,andClaude. TheseclosedproductLLMsareheavilyﬁne-tunedtoalign

There is still a lot of text here, so let's pass it onto our chat model to see how it performs.

In [33]:
# create a new user prompt
prompt = HumanMessage(
    content=augment_prompt(query)
)
# add to messages
messages.append(prompt)

res = chat(messages)

print(res.content)

According to the provided context, Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. These fine-tuned LLMs, named L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc, are optimized for dialogue use cases. They outperform open-source chat models on most benchmarks tested and are considered as potential substitutes for closed-source models based on evaluations for helpfulness and safety.

The special features of Llama 2 include:

1. Scale: Llama 2 comprises LLMs of varying sizes, ranging from 7 billion to 70 billion parameters. Larger models often have improved performance and capabilities.

2. Fine-tuning: The LLMs in Llama 2 have undergone a process of fine-tuning, which is a technique used to optimize models for specific tasks or use cases. In this case, Llama 2 models are fine-tuned for dialogue applications, making them well-suited for conversational interactions.

3. Performance: Llama 2 m

In [34]:
prompt = HumanMessage(
    content=augment_prompt(
        "what safety measures were used in the development of llama 2?"
    )
)

res = chat(messages + [prompt])
print(res.content)

In the development of Llama 2, safety measures were taken to ensure the models' safety and improve their usability. These measures include safety-specific data annotation and tuning, conducting red-teaming, and employing iterative evaluations.

Safety-specific data annotation and tuning involve carefully selecting and curating the data used to train and fine-tune the models. This process aims to reduce biases and enhance the models' safety by ensuring that they align with human preferences and ethical guidelines.

Red-teaming refers to the practice of having independent evaluators or reviewers examine the models for potential safety issues. It involves testing the models from different perspectives to identify any vulnerabilities or risks.

Iterative evaluations involve continuously assessing and monitoring the models' performance and safety throughout their development. This approach allows for ongoing improvements and refinements to ensure the models meet the desired safety standards