# Introduction to Zero-shot Retrieval-Augmented Generation (RAG) Using Langchain and ChromaDB

Welcome to this tutorial on Retrieval-Augmented Generation (RAG), a transformative approach in the field of natural language processing that combines the power of language models with the vast knowledge stored in external databases. In this tutorial, we will delve into the fundamentals of RAG, utilizing Langchain and ChromaDB, that facilitate seamless integration of external knowledge sources into language models.

### What is RAG?
Retrieval-Augmented Generation is a technique that enhances the capabilities of language models by allowing them to retrieve and utilize information from external databases. This method not only expands the model's knowledge base beyond its training data but also enables the generation of more accurate, relevant, and contextually rich responses.

### Langchain
Langchain, a versatile library, plays a crucial role in our RAG implementation. It provides an intuitive framework for integrating language models with various external data sources.

### ChromaDB
ChromaDB stands at the core of our retrieval system. It is a comprehensive, structured database that serves as the external knowledge repository for our RAG model.

### Practical Application: QnA on a Story "Slay the Spire"
To demonstrate the capabilities of RAG using Langchain and ChromaDB, we will embark on a unique project: creating a QnA model on a short story inspired by the game "Slay the Spire."

### Sources
Some other sources to read more on how to implement RAG:
1. https://haystack.deepset.ai/tutorials/07_rag_generator
2. https://haystack.deepset.ai/tutorials/22_pipeline_with_promptnode
3. https://python.langchain.com/docs/integrations/vectorstores/chroma
4. https://python.langchain.com/docs/expression_language/cookbook/retrieval
5. https://python.langchain.com/docs/expression_language/cookbook/prompt_llm_parser



### Setup the environment

In [None]:
!pip install transformers accelerate langchain==0.0.263 tiktoken huggingface_hub sentence_transformers chromadb==0.4.5 openai==0.27.8

### Import all libs

In [None]:
import os
import requests
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter, TokenTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate

### Read HuggingFace Token

Make sure to create a HuggingFace account.

You can find your token here: https://huggingface.co/settings/tokens

In [34]:
# Let's read the file and store the token in the variable `hf_token`.
try:
    with open("./hf_token.txt", "r") as file:
        hf_token = file.readline().strip()
    success = True
except FileNotFoundError:
    hf_token = ""
    success = False

success

True

In [5]:
os.environ["HUGGINGFACEHUB_API_TOKEN"] = hf_token

Load the data using TextLoader

Read More: https://python.langchain.com/docs/modules/data_connection/document_transformers/

In [8]:
loader = TextLoader("./story.txt")
data = loader.load()

In [10]:
data[0].page_content

"In the mystical world of Slay the Spire, four adventurers embarked on a perilous journey to conquer the ever-changing spire.\n\nIronclad, a former soldier, possessed the power of strength and healing. His strikes were mighty, and his ability to recover health with each enemy defeated made him a formidable foe. The whispers of his past haunted him, driving his relentless pursuit to reach the spire's summit.\n\nSilent, a deadly huntress, excelled in agility and poison. Her quick strikes and ability to apply deadly toxins to her blades made her a master of prolonged battles. Silent, with her mysterious past, sought answers that she believed lay at the top of the spire.\n\nDefect, a rogue automaton, wielded the power of orbs. These orbs, orbiting around him, granted various abilities - lightning for damage, frost for defense, and plasma for energy. Defect's journey was one of self-discovery, seeking to understand its existence beyond its initial programming.\n\nWatcher, a monk from a dist

Now that we have the data loaded, we want to make chunks out of it. We will use the RecursiveCharacterSplitter and set overlapping window of 50 characters.

In [13]:
r_splitter = RecursiveCharacterTextSplitter(
    chunk_size=200,
    chunk_overlap=50,
    separators=["\n\n", "\n", "(?<=\. )", " "]
)

In [14]:
paras = r_splitter.split_documents(data)

In [15]:
len(paras)

12

Lets take a look at the chunks

In [32]:
for para in paras:
    pprint(para.page_content)
    print("-"*100)

('In the mystical world of Slay the Spire, four adventurers embarked on a '
 'perilous journey to conquer the ever-changing spire.')
----------------------------------------------------------------------------------------------------
('Ironclad, a former soldier, possessed the power of strength and healing. His '
 'strikes were mighty, and his ability to recover health with each enemy '
 'defeated made him a formidable foe. The whispers of')
----------------------------------------------------------------------------------------------------
('made him a formidable foe. The whispers of his past haunted him, driving his '
 "relentless pursuit to reach the spire's summit.")
----------------------------------------------------------------------------------------------------
('Silent, a deadly huntress, excelled in agility and poison. Her quick strikes '
 'and ability to apply deadly toxins to her blades made her a master of '
 'prolonged battles. Silent, with her mysterious past,')
-------

Now we will load the HF Embeddings

In [None]:
# Embeddings
embeddings_model = HuggingFaceEmbeddings()

In [22]:
persistent_directory = './chroma_db'

Save to disk

Learn more: https://python.langchain.com/docs/integrations/vectorstores/chroma

In [27]:
from pprint import pprint

In [23]:
vectordb = Chroma.from_documents(
    documents = paras,
    embedding=embeddings_model,
    persist_directory=persistent_directory,
)

Lets retrieve top 3 chunks given a query

In [31]:
query = "What is the Ironclad's unique ability?"
docs = vectordb.similarity_search(query, k=3)
for rank, doc in enumerate(docs):
    print(f"Rank {rank+1}:")
    pprint(doc.page_content)
    print("\n")

Rank 1:
('Ironclad, a former soldier, possessed the power of strength and healing. His '
 'strikes were mighty, and his ability to recover health with each enemy '
 'defeated made him a formidable foe. The whispers of')


Rank 2:
('Defect, a rogue automaton, wielded the power of orbs. These orbs, orbiting '
 'around him, granted various abilities - lightning for damage, frost for '
 "defense, and plasma for energy. Defect's journey was")


Rank 3:
('her attack power at the risk of receiving more damage. Her spiritual quest '
 "for enlightenment was intertwined with the spire's mysteries.")




We can see the Rank 1 document/chunk contains the answer, but we cannot provide this chunk as the answer. We need to extract/ generate the answer from this chunk. Thus we use a LM/LLM to generate the answer given the chunk as the context.

We will be using OpenAI GPT 3.5 Turbo, you can also use a free model from HF.

In [35]:
# Let's read the file and store the token in the variable `open_ai_key`.
try:
    with open("./open-ai-key.txt", "r") as file:
        open_ai_key = file.readline().strip()
    success = True
except FileNotFoundError:
    open_ai_key = ""
    success = False

success

True

In [36]:
llm = OpenAI(openai_api_key=open_ai_key)

Now we provide a template to the LLM, which contains the context(top chunk) and the query. It needs to generate the answer.

In [43]:
def get_answer(query):
    new_line = '\n'
    template = f"Use the following pieces of context to answer truthfully.{new_line}If the context does not provide the truthful answer, make the answer as truthful as possible.{new_line}Use 15 words maximum. Keep the response as concise as possible.{new_line}{{context}}{new_line}Question: {{question}}{new_line}Response: "
    QA_CHAIN_PROMPT = PromptTemplate(input_variables=["context", "question"],template=template,)

    # Run chain
    from langchain.chains import RetrievalQA

    question = query

    qa_chain = RetrievalQA.from_chain_type(llm,
                                          retriever=vectordb.as_retriever(),
                                          return_source_documents=True,
                                          chain_type_kwargs={"prompt": QA_CHAIN_PROMPT})


    result = qa_chain({"query": question})
    return result["result"]

In [42]:
print("Question:", query)
print("Answer:",result["result"].strip())

Question: What is the Ironclad's unique ability?
Answer: Strength and healing with each enemy defeated.


In [46]:
query = "How does the Silent use her skills in battle?"
result = get_answer(query)
print("Question:", query)
print("Answer:",result.strip())

Question: How does the Silent use her skills in battle?
Answer: Silent uses agility and poison to quickly strike and apply toxins to her blades for prolonged


In [47]:
query = "What are the types of orbs used by Defect and their functions?"
result = get_answer(query)
print("Question:", query)
print("Answer:",result.strip())

Question: What are the types of orbs used by Defect and their functions?
Answer: Lightning (damage), Frost (defense), Plasma (energy).
