# Question 2: AI Community
## Akshat Kumar
## 22B4513

## Building RAG chatbots with LangChain and Qdrant

## Using Publicly Available LLMs to Enhance Markdown Explanations

I utilized publicly available language models (LLMs) to help me write clear and effective markdown explanations for my code. Here's the process:

### Prompt
"I write my own explanation of the code."

### Logic of the Code
- **Passing the Code**: The code is passed to the language model as input.
- **Generating Explanations**: The language model processes the input code and generates a detailed and well-structured explanation in markdown format.
- **Review and Refinement**: The generated explanation is reviewed and refined as needed to ensure clarity and accuracy.

This approach leverages the capabilities of LLMs to improve the quality and readability of code documentation.



### install libraries
```shell
!pip install -U   \
    langchain     \
    openai        \
    datasets      \
    qdrant-client \
    tiktoken
```

I have seen that !pip install dotenv doesnt work well. So use this instead

In [None]:
!python -m pip install python-dotenv

In [None]:
import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
load_dotenv('./.env')

True

### Enter Your OpenAI API Key

In [None]:
chat = ChatOpenAI(
    model='gpt-3.5-turbo'
    openai_api_key='your-openai-api-key' # Replace your-openai-api-key with you own API key to run the notebook
)

### Code Explanation

The provided code snippet demonstrates how to create a series of messages between a human and an AI assistant using the `langchain.schema` module. Here’s a brief explanation of each part:

1. **Importing Classes**:
   - `SystemMessage`: Represents a message from the system, often used to set the context or initial instructions for the AI.
   - `HumanMessage`: Represents a message from the human user.
   - `AIMessage`: Represents a response from the AI.

2. **Creating Messages**:
   - The `messages` list contains a sequence of interactions, starting with a `SystemMessage` to set the context.
   - The first `HumanMessage` is a greeting from the user: "Hi AI, how are you today?"
   - The `AIMessage` provides a friendly response from the AI: "I'm great thank you. How can I help you?"
   - The second `HumanMessage` indicates the user’s request: "I'd like to understand machine learning."

This sequence of messages sets up a conversational context where the AI assistant is positioned as helpful and ready to provide information about machine learning.


In [None]:
from langchain.schema import (
    SystemMessage,
    HumanMessage,
    AIMessage
)

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="Hi AI, how are you today?"),
    AIMessage(content="I'm great thank you. How can I help you?"),
    HumanMessage(content="I'd like to understand machine learning.")
]

The following code snippet continues the conversation with the AI assistant and appends the AI's response to the message list:

1. **Invoking the AI**:
   - `res = chat.invoke(messages)`: Sends the current list of messages to the AI and stores the response in `res`.

2. **Appending the Response**:
   - `messages.append(res)`: Adds the AI's response to the `messages` list for further interactions.

This allows for an ongoing conversation where each new message is appended to the existing list.


In [None]:
res = chat.invoke(messages)
res

AIMessage(content='Machine learning is a subset of artificial intelligence that involves developing algorithms and statistical models that allow computers to learn from and make predictions or decisions based on data. The main goal of machine learning is to enable computers to automatically learn and improve from experience without being explicitly programmed. There are various types of machine learning algorithms, such as supervised learning, unsupervised learning, and reinforcement learning, each serving different purposes and applications. Do you have any specific questions about machine learning that I can help clarify for you?')

In [None]:
# Printing AI Response
print(res.content)

Machine learning is a subset of artificial intelligence that involves developing algorithms and statistical models that allow computers to learn from and make predictions or decisions based on data. The main goal of machine learning is to enable computers to automatically learn and improve from experience without being explicitly programmed. There are various types of machine learning algorithms, such as supervised learning, unsupervised learning, and reinforcement learning, each serving different purposes and applications. Do you have any specific questions about machine learning that I can help clarify for you?


In [None]:
# Remember the new conversation
messages.append(res)

In [None]:
# Giving New Prompt and appending it
prompt = HumanMessage(
    content="Whats the difference between supervised and unsupervised?"
)
messages.append(prompt)

In [None]:
res = chat.invoke(messages)

In [None]:
# Printing the AI response
print(res.content)

Supervised and unsupervised learning are two main types of machine learning approaches that involve different methods of training models and making predictions:

1. Supervised learning: In supervised learning, the algorithm is trained on a labeled dataset, where each example in the dataset is associated with a target output or label. The goal is for the model to learn a mapping between the input data and the corresponding output labels. During training, the model adjusts its parameters based on the difference between its predictions and the true labels in order to minimize the prediction error. Once the model is trained, it can make predictions on new, unseen data by generalizing from the labeled training examples.

Examples of supervised learning tasks include classification (predicting discrete class labels) and regression (predicting continuous values). Common algorithms used in supervised learning include linear regression, logistic regression, support vector machines, decision tre

In [None]:
# add latest response to messages
messages.append(res)

# create a new user prompt
prompt = HumanMessage(
    content="What is so special about Mistral 7B?"
)
# append to messages
messages.append(prompt)

# send to GPT
res = chat.invoke(messages)

In [None]:
print(res.content)

Mistral 7B is a high-performance computing system developed by Atos, a global leader in digital transformation. The Mistral 7B supercomputer is notable for its advanced capabilities and specifications that make it suitable for demanding computational tasks across various fields, such as scientific research, engineering simulations, weather forecasting, and AI applications. Some of the key features and highlights of Mistral 7B include:

1. High computational power: Mistral 7B is equipped with a powerful combination of processors, memory, and storage systems that enable it to perform complex calculations and simulations at a rapid pace. Its high-performance architecture allows for efficient processing of large datasets and computation-intensive tasks.

2. Scalability and flexibility: The Mistral 7B supercomputer is designed to be scalable, allowing users to expand its computational capacity as needed. It offers flexibility in terms of configuring and optimizing the system for specific wo

## Testing the Chatbot with Uncommon Questions

We can see the chatbot is doing quite well and is able to answer a lot of common questions on ML and Mistral 7B. Now, let's ask something which GPT-3.5 is not trained on and see the results.


In [None]:
# add latest response to messages
messages.append(res)

# create a new user prompt
prompt = HumanMessage(
    content="Can you tell me about the LLMChain in LangChain?"
)
# append to messages
messages.append(prompt)

# send to GPT
res = chat.invoke(messages)

In [None]:
print(res.content)

I'm sorry, but I couldn't find specific information about an "LLMChain" in relation to LangChain. It's possible that the term or concept you are referring to is either new, specialized, or not widely recognized in the public domain. If you can provide more context or details about LLMChain or LangChain, I'll do my best to assist you further or provide information based on the details you provide.


## Adding LLMChain Information Using RAG

As we can see, this chatbot doesn't have any information related to `LLMChain`. So, let's now add this information using Retrieval-Augmented Generation (RAG).

### Code Explanation

The following code snippet demonstrates how to enhance the chatbot's knowledge by providing it with specific information about `LLMChain` using RAG. The process involves augmenting the chatbot's prompt with additional context before querying the model.

1. **Define the Information**:
   - `llmchain_information`: A list of strings containing detailed information about `LLMChain`.
   - This information covers what `LLMChain` is, its components, and the purpose of the LangChain framework.

2. **Combine the Information**:
   - `source_knowledge = "\n".join(llmchain_information)`: Combines the list of information strings into a single string, separated by newline characters.

3. **Create the Augmented Prompt**:
   - `query`: The specific question we want to ask the chatbot.
   - `augmented_prompt`: Combines the source knowledge with the query in a structured format. The prompt instructs the model to use the provided contexts to answer the question.

4. **Create the HumanMessage**:
   - `prompt = HumanMessage(content=augmented_prompt)`: Constructs a message with the augmented prompt, simulating a human query with additional context.

5. **Append the Prompt and Invoke the Chatbot**:
   - `messages.append(prompt)`: Adds the augmented prompt to the list of messages.
   - `res = chat.invoke(messages)`: Sends the updated list of messages to the chatbot and stores the response in `res`.

This approach uses RAG to improve the chatbot's ability to provide accurate and detailed information about topics it was not originally trained on.



In [None]:
llmchain_information = [
    "A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, uses the PromptTemplate to format them into a prompt. It then passes that to the model. Finally, it uses the OutputParser (if provided) to parse the output of the LLM into a final format.",
    "Chains is an incredibly generic concept which returns to a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.",
    "LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also: (1) Be data-aware: connect a language model to other sources of data, (2) Be agentic: Allow a language model to interact with its environment. As such, the LangChain framework is designed with the objective in mind to enable those types of applications."
]

source_knowledge = "\n".join(llmchain_information)

In [None]:
query = "Can you tell me about the LLMChain in LangChain?"

augmented_prompt = f"""Using the contexts below to answer the question.

Contexts:
{source_knowledge}

Question: {query}"""

In [None]:
prompt = HumanMessage(
    content=augmented_prompt
)

messages.append(prompt)

res = chat.invoke(messages)

In [None]:
print(res.content)

In the context of LangChain, the LLMChain is a common type of chain that plays a crucial role in the framework for developing applications powered by language models. The LLMChain consists of three main components: a PromptTemplate, a model (which can be an LLM or a ChatModel), and an optional output parser. 

Here's how the LLMChain works within the LangChain framework:
1. Input variables are passed to the LLMChain.
2. The PromptTemplate formats these input variables into a prompt.
3. The formatted prompt is then passed to the model (LLM or ChatModel) for processing.
4. The model generates an output based on the input prompt.
5. Optionally, the output parser can be used to further process and format the output from the language model into a final usable format.

The LLMChain is a key component within LangChain that facilitates the interaction between input data, language models, and output processing. By leveraging the LLMChain within the LangChain framework, developers can build powe

Now we can see that the chatbot is able to answer the information about Langchain even though initially it wasn't able to do so because of RAG


In [None]:
from langchain_openai import OpenAIEmbeddings

embed_model = OpenAIEmbeddings(model="text-embedding-3-small")

In [None]:
texts = [
    'this is one chunk',
    'this is the second chunk of text'
]

res = embed_model.embed_documents(texts)
len(res), len(res[0])

(2, 1536)

## Loading RAG Dataset
###We are going to use the The Mistral 7B paper as the RAG data [Link](https://http://arxiv.org/pdf/2310.06825)

In [None]:
from datasets import load_dataset

# Loading the dataset
dataset = load_dataset("infoslack/mistral-7b-arxiv-paper-chunked", split="train")

dataset

Dataset({
    features: ['doi', 'chunk-id', 'chunk', 'id', 'title', 'summary', 'source', 'authors', 'categories', 'comment', 'journal_ref', 'primary_category', 'published', 'updated', 'references'],
    num_rows: 25
})

In [None]:
dataset[0]

{'doi': '2310.06825',
 'chunk-id': '0',
 'chunk': 'Mistral 7B\nAlbert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford,\nDevendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel,\nGuillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux,\nPierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix,\nWilliam El Sayed\nAbstract\nWe introduce Mistral 7B, a 7–billion-parameter language model engineered for\nsuperior performance and efficiency. Mistral 7B outperforms the best open 13B\nmodel (Llama 2) across all evaluated benchmarks, and the best released 34B\nmodel (Llama 1) in reasoning, mathematics, and code generation. Our model\nleverages grouped-query attention (GQA) for faster inference, coupled with sliding\nwindow attention (SWA) to effectively handle sequences of arbitrary length with a\nreduced inference cost. We also provide a model fine-tuned to follow instructions,\nMistral 7B – Instruct, that surpasses Llama 2

In [None]:
# Converting dataset to pandas
data = dataset.to_pandas()

In [None]:
data.head()

Unnamed: 0,doi,chunk-id,chunk,id,title,summary,source,authors,categories,comment,journal_ref,primary_category,published,updated,references
0,2310.06825,0,"Mistral 7B\nAlbert Q. Jiang, Alexandre Sablayr...",2310.06825,Mistral 7B,"We introduce Mistral 7B v0.1, a 7-billion-para...",http://arxiv.org/pdf/2310.06825,"[Albert Q. Jiang, Alexandre Sablayrolles, Arth...","[cs.CL, cs.AI, cs.LG]",Models and code are available at\n https://mi...,,cs.CL,20231010,20231010,"[{'id': '1808.07036'}, {'id': '1809.02789'}, {..."
1,2310.06825,1,automated benchmarks. Our models are released ...,2310.06825,Mistral 7B,"We introduce Mistral 7B v0.1, a 7-billion-para...",http://arxiv.org/pdf/2310.06825,"[Albert Q. Jiang, Alexandre Sablayrolles, Arth...","[cs.CL, cs.AI, cs.LG]",Models and code are available at\n https://mi...,,cs.CL,20231010,20231010,"[{'id': '1808.07036'}, {'id': '1809.02789'}, {..."
2,2310.06825,2,GQA significantly accelerates the inference sp...,2310.06825,Mistral 7B,"We introduce Mistral 7B v0.1, a 7-billion-para...",http://arxiv.org/pdf/2310.06825,"[Albert Q. Jiang, Alexandre Sablayrolles, Arth...","[cs.CL, cs.AI, cs.LG]",Models and code are available at\n https://mi...,,cs.CL,20231010,20231010,"[{'id': '1808.07036'}, {'id': '1809.02789'}, {..."
3,2310.06825,3,Mistral 7B takes a significant step in balanci...,2310.06825,Mistral 7B,"We introduce Mistral 7B v0.1, a 7-billion-para...",http://arxiv.org/pdf/2310.06825,"[Albert Q. Jiang, Alexandre Sablayrolles, Arth...","[cs.CL, cs.AI, cs.LG]",Models and code are available at\n https://mi...,,cs.CL,20231010,20231010,"[{'id': '1808.07036'}, {'id': '1809.02789'}, {..."
4,2310.06825,4,parameters of the architecture are summarized ...,2310.06825,Mistral 7B,"We introduce Mistral 7B v0.1, a 7-billion-para...",http://arxiv.org/pdf/2310.06825,"[Albert Q. Jiang, Alexandre Sablayrolles, Arth...","[cs.CL, cs.AI, cs.LG]",Models and code are available at\n https://mi...,,cs.CL,20231010,20231010,"[{'id': '1808.07036'}, {'id': '1809.02789'}, {..."


In [None]:
docs = data[['chunk', 'source']]
docs.head()

Unnamed: 0,chunk,source
0,"Mistral 7B\nAlbert Q. Jiang, Alexandre Sablayr...",http://arxiv.org/pdf/2310.06825
1,automated benchmarks. Our models are released ...,http://arxiv.org/pdf/2310.06825
2,GQA significantly accelerates the inference sp...,http://arxiv.org/pdf/2310.06825
3,Mistral 7B takes a significant step in balanci...,http://arxiv.org/pdf/2310.06825
4,parameters of the architecture are summarized ...,http://arxiv.org/pdf/2310.06825


## RAG
### We create an instance of the DataFrameLoader class called loader. We provide two arguments to initialize the loader:

**docs:** This is assumed to be a DataFrame object containing the documents we want to load. Each row of the DataFrame represents a document.

**page_content_column="chunk":** This argument specifies the name of the column in the DataFrame that contains the textual content of each document. Here, "chunk" is the name of the column where the content is stored.

In [None]:
from langchain_community.document_loaders import DataFrameLoader

loader = DataFrameLoader(docs, page_content_column="chunk")
documents = loader.load()

In [None]:
documents[0]

Document(page_content='Mistral 7B\nAlbert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford,\nDevendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel,\nGuillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux,\nPierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix,\nWilliam El Sayed\nAbstract\nWe introduce Mistral 7B, a 7–billion-parameter language model engineered for\nsuperior performance and efficiency. Mistral 7B outperforms the best open 13B\nmodel (Llama 2) across all evaluated benchmarks, and the best released 34B\nmodel (Llama 1) in reasoning, mathematics, and code generation. Our model\nleverages grouped-query attention (GQA) for faster inference, coupled with sliding\nwindow attention (SWA) to effectively handle sequences of arbitrary length with a\nreduced inference cost. We also provide a model fine-tuned to follow instructions,\nMistral 7B – Instruct, that surpasses Llama 2 13B – chat model both on hu

In [None]:
documents[0].metadata

{'source': 'http://arxiv.org/pdf/2310.06825'}

## Setting Up Qdrant Vector Store

In this code snippet, we're configuring the Qdrant vector store for our documents with Langchain and integrating it with OpenAI's text embeddings.

### Step 1: Importing Required Modules

We start by importing the necessary modules:

- `Qdrant` from `langchain_community.vectorstores` module
- `OpenAIEmbeddings` from `langchain_openai` module

### Step 2: Initializing OpenAI Embeddings

Next, we initialize the OpenAI embeddings model. Here, we're using the "text-embedding-3-small" model.

### Step 3: Retrieving Qdrant Configuration

We retrieve the Qdrant URL and API key from the environment variables.

### Step 4: Creating Qdrant Vector Store

We then create an instance of the Qdrant vector store using the `Qdrant.from_documents` method:

- `documents`: This represents the documents we want to store in Qdrant.
- `embedding`: This specifies the embeddings model to be used for encoding the documents.
- `url`: This is the URL endpoint for the Qdrant service.
- `collection_name`: This is the name of the collection where the documents will be stored in Qdrant.
- `api_key`: This is the API key required for accessing the Qdrant service.

This code demonstrates how to set up and configure a Qdrant vector store for efficient storage and retrieval of documents using Langchain and OpenAI's text embeddings.


# NOTE: To run this block of code you must enter your QDRANT_KEY

In [None]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Replace with your own information

url = os.getenv("QDRANT_URL")
api_key = os.getenv("QDRANT_KEY")

qdrant = Qdrant.from_documents(
    documents=documents,
    embedding=embeddings,
    url=url,
    collection_name="chatbot",
    api_key=api_key
)

Setting up the query

In [None]:
query = "What is so special about Mistral 7B?"
qdrant.similarity_search(query, k=3)

[Document(page_content='Mistral 7B\nAlbert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford,\nDevendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel,\nGuillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux,\nPierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix,\nWilliam El Sayed\nAbstract\nWe introduce Mistral 7B, a 7–billion-parameter language model engineered for\nsuperior performance and efficiency. Mistral 7B outperforms the best open 13B\nmodel (Llama 2) across all evaluated benchmarks, and the best released 34B\nmodel (Llama 1) in reasoning, mathematics, and code generation. Our model\nleverages grouped-query attention (GQA) for faster inference, coupled with sliding\nwindow attention (SWA) to effectively handle sequences of arbitrary length with a\nreduced inference cost. We also provide a model fine-tuned to follow instructions,\nMistral 7B – Instruct, that surpasses Llama 2 13B – chat model both on h

## Custom Prompt Generation Function

This function, `custom_prompt(query: str)`, generates a custom prompt for answering a query using the Qdrant similarity search results.

### Function Explanation

The function takes a single argument `query`, which represents the query string for which we want to generate a prompt.

### Step 1: Similarity Search

The function performs a similarity search using Qdrant for the given query string. It retrieves the top 3 most similar documents based on the query.

### Step 2: Extracting Source Knowledge

Next, the function extracts the textual content of the top 3 similar documents to use as the source knowledge for answering the query.

### Step 3: Generating Prompt

The function constructs a prompt by combining the source knowledge with the query string.

### Step 4: Returning Prompt

Finally, the function returns the generated prompt.

This function facilitates the generation of custom prompts for answering queries based on the similarity search results from Qdrant.


In [None]:
def custom_prompt(query: str):
    results = qdrant.similarity_search(query, k=3)
    source_knowledge = "\n".join([x.page_content for x in results])
    augment_prompt = f"""Using the contexts below, answer the query:

    Contexts:
    {source_knowledge}

    Query: {query}"""
    return augment_prompt

In [None]:
print(custom_prompt(query))

Using the contexts below, answer the query:

    Contexts:
    Mistral 7B
Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford,
Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel,
Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux,
Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix,
William El Sayed
Abstract
We introduce Mistral 7B, a 7–billion-parameter language model engineered for
superior performance and efficiency. Mistral 7B outperforms the best open 13B
model (Llama 2) across all evaluated benchmarks, and the best released 34B
model (Llama 1) in reasoning, mathematics, and code generation. Our model
leverages grouped-query attention (GQA) for faster inference, coupled with sliding
window attention (SWA) to effectively handle sequences of arbitrary length with a
reduced inference cost. We also provide a model fine-tuned to follow instructions,
Mistral 7B – Instruct, that surpasses Llama 2 1

In [None]:
prompt = HumanMessage(
    content=custom_prompt(query)
)

messages.append(prompt)

res = chat.invoke(messages)

print(res.content)

Mistral 7B is a high-performance language model engineered by the Mistral AI team to achieve superior performance and efficiency. Here are some key features that make Mistral 7B stand out:

1. **Model Size**: Mistral 7B is a 7-billion-parameter language model, making it one of the largest language models in existence. Its size allows for capturing complex patterns and relationships in language data, leading to better performance in various natural language processing tasks.

2. **Performance**: Mistral 7B outperforms other large language models, such as the 13-billion-parameter Llama 2 model and the 34-billion-parameter LLaMa 34B model, across multiple benchmarks. It excels in tasks related to reasoning, mathematics, code generation, and following instructions.

3. **Efficiency**: Mistral 7B is designed to balance high performance with efficiency. It leverages innovative attention mechanisms, such as grouped-query attention (GQA) and sliding window attention (SWA), to accelerate infere

After integrating RAG, we presented the Mistral 7B paper and then posed a query, allowing us to observe the improved performance of our chatbot.

## Summary

First, we utilized a simple GPT-3.5 model to create a basic chatbot. We asked various questions related to machine learning (ML), which it could answer easily and accurately. However, when queried about Mistral 7B, its response was somewhat satisfactory but not entirely accurate. Furthermore, it failed when asked about Langchain.

Now, we've developed a new chatbot from scratch using Retrieval-Augmented Generation (RAG). Our implementation utilizes OpenAI's GPT-3.5-turbo LLM, integrated through LangChain's ChatOpenAI class. Additionally, we employed OpenAI's text-embedding-3-small for embedding and the Qdrant vector database as our knowledge base.

With our new model, we observed significant improvements. It accurately answered questions related to Langchain and Mistral 7B, enhancing its capabilities beyond its original training data. This demonstrates the enhanced performance and versatility of our model.
