<a href="https://colab.research.google.com/github/gastan81/generative_ai/blob/main/4_rag_chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Rag chatbot
Let's take the next step in our journey! We've explored LLMs, chatbots, and RAG. Now, it's time to put them all together to create a powerful tool: a RAG chain with memory.

---
## 1.&nbsp; Installations and Settings 🛠️
Except one item, this is the same as the last notebook. The only new item is a line to download a saved verion of the vector database created from Alice's Adventures in Wonderland. This saves us loading, splitting, and vectorising the book all over again.

In [1]:
%%bash
pip install -qqq -U langchain-huggingface
pip install -qqq -U langchain
pip install -qqq -U langchain-community
pip install -qqq -U faiss-cpu

# download saved vector database for Alice's Adventures in Wonderland
gdown --folder 1A8A9lhcUXUKRrtCe7rckMlQtgmfLZRQH

   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 8.1 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 412.2/412.2 kB 6.6 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.5/2.5 MB 29.0 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50.8/50.8 kB 3.6 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 27.5/27.5 MB 62.8 MB/s eta 0:00:00
Processing file 1h_lk4wTr12FAEaCS3eIJ4xsdcmnuIGmt index.faiss
Processing file 1O0Jz2Lx5cZdpQM7S5uw6Kx9_OLm5DuSQ index.pkl


Retrieving folder contents
Retrieving folder contents completed
Building directory structure
Building directory structure completed
Downloading...
From: https://drive.google.com/uc?id=1h_lk4wTr12FAEaCS3eIJ4xsdcmnuIGmt
To: /content/faiss_index/index.faiss
  0%|          | 0.00/421k [00:00<?, ?B/s]100%|██████████| 421k/421k [00:00<00:00, 37.0MB/s]
Downloading...
From: https://drive.google.com/uc?id=1O0Jz2Lx5cZdpQM7S5uw6Kx9_OLm5DuSQ
To: /content/faiss_index/index.pkl
  0%|          | 0.00/216k [00:00<?, ?B/s]100%|██████████| 216k/216k [00:00<00:00, 73.3MB/s]
Download completed


In [2]:
import os
from google.colab import userdata # we stored our access token as a colab secret

os.environ["HUGGINGFACEHUB_API_TOKEN"] = userdata.get('HF_TOKEN')

---
## 2.&nbsp; Setting up the chain 🔗
There are three cooperating pieces here to work with:
- `create_history_aware_retriever` chains together an llm, retriever, and prompt. It is similar to yesterday's `RetrievalQA`.
- `create_stuff_documents_chain` again calls on your llm and prompt to respond conversationally to you in a retrieval context.
- `create_retrieval_chain` chains the other two pieces together.

Notice that this time, `chat_history` is allowed to be a simple list, rather than a class with methods to manipulate the history.

Finally, when invoking our RAG chatbot, it is important for the call to reference the chat history explicitly.

In [3]:
from langchain_huggingface import HuggingFaceEndpoint, HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.chains import create_history_aware_retriever
from langchain.chains.retrieval import create_retrieval_chain
from langchain.chains.combine_documents.stuff import create_stuff_documents_chain


hf_model = "mistralai/Mistral-7B-Instruct-v0.3"
llm = HuggingFaceEndpoint(repo_id=hf_model)

embedding_model = "sentence-transformers/all-MiniLM-l6-v2"
embeddings_folder = "/content/"

embeddings = HuggingFaceEmbeddings(model_name=embedding_model,
                                   cache_folder=embeddings_folder)

vector_db = FAISS.load_local("/content/faiss_index", embeddings, allow_dangerous_deserialization=True)

retriever = vector_db.as_retriever(search_kwargs={"k": 2})
template = """You are a nice chatbot having a conversation with a human. Answer the question based only on the following context and previous conversation. Keep your answers short and succinct.

Previous conversation:
{chat_history}

Context to answer question:
{context}

New human question: {input}
Response:"""

chat_history = []

prompt = ChatPromptTemplate.from_messages([
    ("system", template),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}"),
])

doc_retriever = create_history_aware_retriever(
    llm, retriever, prompt
)

doc_chain = create_stuff_documents_chain(llm, prompt)

rag_bot = create_retrieval_chain(
    doc_retriever, doc_chain
)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [4]:
ans = rag_bot.invoke({"input":"Who is the queen?", "chat_history": chat_history, "context": retriever})
chat_history.extend([{"role": "human", "content": ans["input"]},{"role": "assistant", "content":ans["answer"]}])

In [5]:
print(ans['answer'])



Response: The Queen of Hearts.


In [6]:
ans = rag_bot.invoke({"input": "Whose head does she chop off?", "chat_history": chat_history, "context": retriever})
chat_history.extend([{"role": "human", "content": ans["input"]},{"role": "assistant", "content":ans["answer"]}])
print(ans["answer"])


AI:

In the provided context, it's not explicitly clear who the Queen of Hearts chops off someone's head. However, it's mentioned that she threatens to execute everyone, but no specific execution takes place in the provided text.


We can also use a while loop here to make our chatbot a little more interactive.

In [7]:
chain_2 = create_retrieval_chain(
    doc_retriever, doc_chain
)
history = []
# Start the conversation loop
while True:
  user_input = input("You: ")

  # Check for exit condition
  if user_input.lower() == 'end':
      print("Ending the conversation. Goodbye!")
      break

  # Get the response from the conversation chain
  response = chain_2.invoke({"input":user_input, "chat_history": history, "context": retriever})
  history.extend([{"role": "human", "content": response["input"]},{"role": "assistant", "content":response["answer"]}])
  # Print the chatbot's response
  print(response["answer"])

You: What is the topic you can tell me most about?


Response: The story of Alice in Wonderland, as given in the context provided, seems to revolve around the adventures of Alice in a strange, surreal world. The main characters in the story are Alice, the Mad Hatter, the March Hare, and the Dormouse. The story is full of nonsensical logic, peculiar characters, and nonsensical situations, which are common themes in the novel "Alice's Adventures in Wonderland" by Lewis Carroll.
You: What is the story of Alice about?

AI:
The story of Alice is about a young girl named Alice who falls down a rabbit hole and enters a surreal world known as Wonderland. In Wonderland, she encounters strange and peculiar characters such as the Mad Hatter, the March Hare, and the Dormouse, and experiences nonsensical situations and nonsensical logic. The story is an adventure of Alice trying to make sense of the strange world she has found herself in and trying to find her way back home.
You: Please tell me who