## Building RAG chatbots with LangChain and Qdrant

### install libraries
```shell
!pip install -U   \
    langchain     \
    openai        \
    datasets      \
    qdrant-client \
    tiktoken
```

In [1]:
import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
load_dotenv('./.env')

True

In [2]:
chat = ChatOpenAI(
    model='gpt-3.5-turbo'
)

In [3]:
from langchain.schema import (
    SystemMessage,
    HumanMessage,
    AIMessage
)

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="Hi AI, how are you today?"),
    AIMessage(content="I'm great thank you. How can I help you?"),
    HumanMessage(content="I'd like to understand machine learning.")
]

In [4]:
res = chat.invoke(messages)
res

AIMessage(content='Machine learning is a subset of artificial intelligence that involves developing algorithms and statistical models that allow computers to learn from and make predictions or decisions based on data. The main goal of machine learning is to enable computers to automatically learn and improve from experience without being explicitly programmed. There are various types of machine learning algorithms, such as supervised learning, unsupervised learning, and reinforcement learning, each serving different purposes and applications. Do you have any specific questions about machine learning that I can help clarify for you?')

In [5]:
print(res.content)

Machine learning is a subset of artificial intelligence that involves developing algorithms and statistical models that allow computers to learn from and make predictions or decisions based on data. The main goal of machine learning is to enable computers to automatically learn and improve from experience without being explicitly programmed. There are various types of machine learning algorithms, such as supervised learning, unsupervised learning, and reinforcement learning, each serving different purposes and applications. Do you have any specific questions about machine learning that I can help clarify for you?


In [6]:
messages.append(res)

In [7]:
prompt = HumanMessage(
    content="Whats the difference between supervised and unsupervised?"
)
messages.append(prompt)

In [8]:
res = chat.invoke(messages)

In [9]:
print(res.content)

Supervised and unsupervised learning are two main types of machine learning approaches that involve different methods of training models and making predictions:

1. Supervised learning: In supervised learning, the algorithm is trained on a labeled dataset, where each example in the dataset is associated with a target output or label. The goal is for the model to learn a mapping between the input data and the corresponding output labels. During training, the model adjusts its parameters based on the difference between its predictions and the true labels in order to minimize the prediction error. Once the model is trained, it can make predictions on new, unseen data by generalizing from the labeled training examples.

Examples of supervised learning tasks include classification (predicting discrete class labels) and regression (predicting continuous values). Common algorithms used in supervised learning include linear regression, logistic regression, support vector machines, decision tre

In [10]:
# add latest response to messages
messages.append(res)

# create a new user prompt
prompt = HumanMessage(
    content="What is so special about Mistral 7B?"
)
# append to messages
messages.append(prompt)

# send to GPT
res = chat.invoke(messages)

In [11]:
print(res.content)

Mistral 7B is a high-performance computing system developed by Atos, a global leader in digital transformation. The Mistral 7B supercomputer is notable for its advanced capabilities and specifications that make it suitable for demanding computational tasks across various fields, such as scientific research, engineering simulations, weather forecasting, and AI applications. Some of the key features and highlights of Mistral 7B include:

1. High computational power: Mistral 7B is equipped with a powerful combination of processors, memory, and storage systems that enable it to perform complex calculations and simulations at a rapid pace. Its high-performance architecture allows for efficient processing of large datasets and computation-intensive tasks.

2. Scalability and flexibility: The Mistral 7B supercomputer is designed to be scalable, allowing users to expand its computational capacity as needed. It offers flexibility in terms of configuring and optimizing the system for specific wo

In [12]:
# add latest response to messages
messages.append(res)

# create a new user prompt
prompt = HumanMessage(
    content="Can you tell me about the LLMChain in LangChain?"
)
# append to messages
messages.append(prompt)

# send to GPT
res = chat.invoke(messages)

In [13]:
print(res.content)

I'm sorry, but I couldn't find specific information about an "LLMChain" in relation to LangChain. It's possible that the term or concept you are referring to is either new, specialized, or not widely recognized in the public domain. If you can provide more context or details about LLMChain or LangChain, I'll do my best to assist you further or provide information based on the details you provide.


In [14]:
llmchain_information = [
    "A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, uses the PromptTemplate to format them into a prompt. It then passes that to the model. Finally, it uses the OutputParser (if provided) to parse the output of the LLM into a final format.",
    "Chains is an incredibly generic concept which returns to a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.",
    "LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also: (1) Be data-aware: connect a language model to other sources of data, (2) Be agentic: Allow a language model to interact with its environment. As such, the LangChain framework is designed with the objective in mind to enable those types of applications."
]

source_knowledge = "\n".join(llmchain_information)

In [15]:
query = "Can you tell me about the LLMChain in LangChain?"

augmented_prompt = f"""Using the contexts below to answer the question.

Contexts:
{source_knowledge}

Question: {query}"""

In [16]:
prompt = HumanMessage(
    content=augmented_prompt
)

messages.append(prompt)

res = chat.invoke(messages)

In [17]:
print(res.content)

In the context of LangChain, the LLMChain is a common type of chain that plays a crucial role in the framework for developing applications powered by language models. The LLMChain consists of three main components: a PromptTemplate, a model (which can be an LLM or a ChatModel), and an optional output parser. 

Here's how the LLMChain works within the LangChain framework:
1. Input variables are passed to the LLMChain.
2. The PromptTemplate formats these input variables into a prompt.
3. The formatted prompt is then passed to the model (LLM or ChatModel) for processing.
4. The model generates an output based on the input prompt.
5. Optionally, the output parser can be used to further process and format the output from the language model into a final usable format.

The LLMChain is a key component within LangChain that facilitates the interaction between input data, language models, and output processing. By leveraging the LLMChain within the LangChain framework, developers can build powe

In [18]:
from langchain_openai import OpenAIEmbeddings

embed_model = OpenAIEmbeddings(model="text-embedding-3-small")

In [19]:
texts = [
    'this is one chunk',
    'this is the second chunk of text'
]

res = embed_model.embed_documents(texts)
len(res), len(res[0])

(2, 1536)

In [20]:
from datasets import load_dataset

dataset = load_dataset("infoslack/mistral-7b-arxiv-paper-chunked", split="train")

dataset

Dataset({
    features: ['doi', 'chunk-id', 'chunk', 'id', 'title', 'summary', 'source', 'authors', 'categories', 'comment', 'journal_ref', 'primary_category', 'published', 'updated', 'references'],
    num_rows: 25
})

In [21]:
dataset[0]

{'doi': '2310.06825',
 'chunk-id': '0',
 'chunk': 'Mistral 7B\nAlbert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford,\nDevendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel,\nGuillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux,\nPierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix,\nWilliam El Sayed\nAbstract\nWe introduce Mistral 7B, a 7–billion-parameter language model engineered for\nsuperior performance and efficiency. Mistral 7B outperforms the best open 13B\nmodel (Llama 2) across all evaluated benchmarks, and the best released 34B\nmodel (Llama 1) in reasoning, mathematics, and code generation. Our model\nleverages grouped-query attention (GQA) for faster inference, coupled with sliding\nwindow attention (SWA) to effectively handle sequences of arbitrary length with a\nreduced inference cost. We also provide a model fine-tuned to follow instructions,\nMistral 7B – Instruct, that surpasses Llama 2

In [22]:
data = dataset.to_pandas()

In [23]:
data.head()

Unnamed: 0,doi,chunk-id,chunk,id,title,summary,source,authors,categories,comment,journal_ref,primary_category,published,updated,references
0,2310.06825,0,"Mistral 7B\nAlbert Q. Jiang, Alexandre Sablayr...",2310.06825,Mistral 7B,"We introduce Mistral 7B v0.1, a 7-billion-para...",http://arxiv.org/pdf/2310.06825,"[Albert Q. Jiang, Alexandre Sablayrolles, Arth...","[cs.CL, cs.AI, cs.LG]",Models and code are available at\n https://mi...,,cs.CL,20231010,20231010,"[{'id': '1808.07036'}, {'id': '1809.02789'}, {..."
1,2310.06825,1,automated benchmarks. Our models are released ...,2310.06825,Mistral 7B,"We introduce Mistral 7B v0.1, a 7-billion-para...",http://arxiv.org/pdf/2310.06825,"[Albert Q. Jiang, Alexandre Sablayrolles, Arth...","[cs.CL, cs.AI, cs.LG]",Models and code are available at\n https://mi...,,cs.CL,20231010,20231010,"[{'id': '1808.07036'}, {'id': '1809.02789'}, {..."
2,2310.06825,2,GQA significantly accelerates the inference sp...,2310.06825,Mistral 7B,"We introduce Mistral 7B v0.1, a 7-billion-para...",http://arxiv.org/pdf/2310.06825,"[Albert Q. Jiang, Alexandre Sablayrolles, Arth...","[cs.CL, cs.AI, cs.LG]",Models and code are available at\n https://mi...,,cs.CL,20231010,20231010,"[{'id': '1808.07036'}, {'id': '1809.02789'}, {..."
3,2310.06825,3,Mistral 7B takes a significant step in balanci...,2310.06825,Mistral 7B,"We introduce Mistral 7B v0.1, a 7-billion-para...",http://arxiv.org/pdf/2310.06825,"[Albert Q. Jiang, Alexandre Sablayrolles, Arth...","[cs.CL, cs.AI, cs.LG]",Models and code are available at\n https://mi...,,cs.CL,20231010,20231010,"[{'id': '1808.07036'}, {'id': '1809.02789'}, {..."
4,2310.06825,4,parameters of the architecture are summarized ...,2310.06825,Mistral 7B,"We introduce Mistral 7B v0.1, a 7-billion-para...",http://arxiv.org/pdf/2310.06825,"[Albert Q. Jiang, Alexandre Sablayrolles, Arth...","[cs.CL, cs.AI, cs.LG]",Models and code are available at\n https://mi...,,cs.CL,20231010,20231010,"[{'id': '1808.07036'}, {'id': '1809.02789'}, {..."


In [24]:
docs = data[['chunk', 'source']]
docs.head()

Unnamed: 0,chunk,source
0,"Mistral 7B\nAlbert Q. Jiang, Alexandre Sablayr...",http://arxiv.org/pdf/2310.06825
1,automated benchmarks. Our models are released ...,http://arxiv.org/pdf/2310.06825
2,GQA significantly accelerates the inference sp...,http://arxiv.org/pdf/2310.06825
3,Mistral 7B takes a significant step in balanci...,http://arxiv.org/pdf/2310.06825
4,parameters of the architecture are summarized ...,http://arxiv.org/pdf/2310.06825


## RAG

In [25]:
from langchain_community.document_loaders import DataFrameLoader

loader = DataFrameLoader(docs, page_content_column="chunk")
documents = loader.load()

In [26]:
documents[0]

Document(page_content='Mistral 7B\nAlbert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford,\nDevendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel,\nGuillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux,\nPierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix,\nWilliam El Sayed\nAbstract\nWe introduce Mistral 7B, a 7–billion-parameter language model engineered for\nsuperior performance and efficiency. Mistral 7B outperforms the best open 13B\nmodel (Llama 2) across all evaluated benchmarks, and the best released 34B\nmodel (Llama 1) in reasoning, mathematics, and code generation. Our model\nleverages grouped-query attention (GQA) for faster inference, coupled with sliding\nwindow attention (SWA) to effectively handle sequences of arbitrary length with a\nreduced inference cost. We also provide a model fine-tuned to follow instructions,\nMistral 7B – Instruct, that surpasses Llama 2 13B – chat model both on hu

In [27]:
documents[0].metadata

{'source': 'http://arxiv.org/pdf/2310.06825'}

In [28]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

url = os.getenv("QDRANT_URL")
api_key = os.getenv("QDRANT_KEY")

qdrant = Qdrant.from_documents(
    documents=documents,
    embedding=embeddings,
    url=url,
    collection_name="chatbot",
    api_key=api_key
)

In [29]:
query = "What is so special about Mistral 7B?"
qdrant.similarity_search(query, k=3)

[Document(page_content='Mistral 7B\nAlbert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford,\nDevendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel,\nGuillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux,\nPierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix,\nWilliam El Sayed\nAbstract\nWe introduce Mistral 7B, a 7–billion-parameter language model engineered for\nsuperior performance and efficiency. Mistral 7B outperforms the best open 13B\nmodel (Llama 2) across all evaluated benchmarks, and the best released 34B\nmodel (Llama 1) in reasoning, mathematics, and code generation. Our model\nleverages grouped-query attention (GQA) for faster inference, coupled with sliding\nwindow attention (SWA) to effectively handle sequences of arbitrary length with a\nreduced inference cost. We also provide a model fine-tuned to follow instructions,\nMistral 7B – Instruct, that surpasses Llama 2 13B – chat model both on h

In [30]:
def custom_prompt(query: str):
    results = qdrant.similarity_search(query, k=3)
    source_knowledge = "\n".join([x.page_content for x in results])
    augment_prompt = f"""Using the contexts below, answer the query:

    Contexts:
    {source_knowledge}

    Query: {query}"""
    return augment_prompt

In [31]:
print(custom_prompt(query))

Using the contexts below, answer the query:

    Contexts:
    Mistral 7B
Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford,
Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel,
Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux,
Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix,
William El Sayed
Abstract
We introduce Mistral 7B, a 7–billion-parameter language model engineered for
superior performance and efficiency. Mistral 7B outperforms the best open 13B
model (Llama 2) across all evaluated benchmarks, and the best released 34B
model (Llama 1) in reasoning, mathematics, and code generation. Our model
leverages grouped-query attention (GQA) for faster inference, coupled with sliding
window attention (SWA) to effectively handle sequences of arbitrary length with a
reduced inference cost. We also provide a model fine-tuned to follow instructions,
Mistral 7B – Instruct, that surpasses Llama 2 1

In [32]:
prompt = HumanMessage(
    content=custom_prompt(query)
)

messages.append(prompt)

res = chat.invoke(messages)

print(res.content)

Mistral 7B is a high-performance language model engineered by the Mistral AI team to achieve superior performance and efficiency. Here are some key features that make Mistral 7B stand out:

1. **Model Size**: Mistral 7B is a 7-billion-parameter language model, making it one of the largest language models in existence. Its size allows for capturing complex patterns and relationships in language data, leading to better performance in various natural language processing tasks.

2. **Performance**: Mistral 7B outperforms other large language models, such as the 13-billion-parameter Llama 2 model and the 34-billion-parameter LLaMa 34B model, across multiple benchmarks. It excels in tasks related to reasoning, mathematics, code generation, and following instructions.

3. **Efficiency**: Mistral 7B is designed to balance high performance with efficiency. It leverages innovative attention mechanisms, such as grouped-query attention (GQA) and sliding window attention (SWA), to accelerate infere

## Groq

In [33]:
!pip install langchain-groq



In [34]:
from langchain_groq import ChatGroq
chat = ChatGroq(temperature=0, model_name="mixtral-8x7b-32768")

In [35]:
prompt = HumanMessage(
    content=custom_prompt(query)
)

messages.append(prompt)
res = chat.invoke(messages)
print(res.content)

Mistral 7B is a 7-billion-parameter language model that stands out for its superior performance and efficiency compared to other models. It outperforms the best open 13B model (Llama 2) across all evaluated benchmarks and the best released 34B model (Llama 1) in reasoning, mathematics, and code generation. Mistral 7B utilizes two main attention mechanisms:

1. Grouped-Query Attention (GQA): This mechanism significantly accelerates inference speed and reduces memory requirements during decoding, allowing for higher batch sizes and throughput. It is particularly beneficial for real-time applications.
2. Sliding Window Attention (SWA): SWA effectively handles sequences of arbitrary length at a reduced computational cost. It is designed to manage longer sequences, alleviating a common limitation in large language models (LLMs).

These attention mechanisms contribute to the enhanced performance and efficiency of Mistral 7B. Additionally, Mistral 7B is released under the Apache 2.0 license, 