<a href="https://colab.research.google.com/github/andreabenevenut/LLM_workshop/blob/main/QA_over_documents.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RETRIEVAL-AUGMENTED GENERATION (RAG)

Suppose that we have a vast document collection. By extracting pertinent information, LLMs can be used to efficiently provide fast and precise answers when users inquire about these documents.

LLMs can reason about wide-ranging topics, but their knowledge is limited to the public data up to a specific point in time that they were trained on. If you want to build AI applications that can reason about private data or data introduced after a model's cutoff date, you need to augment the knowledge of the model with the specific information it needs. The process of bringing the appropriate information and inserting it into the model prompt is known as Retrieval Augmented Generation (RAG).

In the following notebook, we will go through the process of creating a virtual assistant powered by LLM. The virtual assistant will aid us in the investigation of our sources, with the possibility of having full human-like conversations.

We will follow a step by step process:

>[0: SET UP](#scrollTo=8-mi3R-LuEa5)

>[1: CREATING OUR OWN FAQ CHATBOT](#scrollTo=PVg09NifuWsr)

>[2: INTRODUCING MEMORY FOR A CONVERSATIONAL CHATBOT](#scrollTo=z5zlas7UdaEY)

>[3: QUERYING OVER DATAROOTS WEBSITE](#scrollTo=A1JYOSYLWray)

>[4: INTEGRATE LLMS WITH YOUTUBE VIDEOS](#scrollTo=tW-Xl8j-ybYr)



# 0: SET UP

In [None]:
%%capture
!pip install langchain openai chromadb tiktoken

In [None]:
import os
from google.colab import userdata
from google.colab import drive

drive.mount('/content/drive')
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

In [None]:
from langchain.llms import OpenAI
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.docstore.document import Document
from langchain.prompts import PromptTemplate
from langchain.indexes.vectorstore import VectorstoreIndexCreator

# 1: CREATING OUR OWN FAQ CHATBOT


The goal is to ask a question to the chatbot relative to a specific document 'state of the union', relative to ... .

How can we ask a question about a document (or multiple)?

A very first idea could be the simple chain appraoch (put link to other notebook) and pass the whole document within the template prompt and ask a question. This would work if the document is small enough. However there are limitations (link) to how many tokens we can provide in the prompt of a Large Languge Model and that would not work. Moreover it would not be efficient, especially if we have a question about a specific part of the document.

In [None]:
with open("/content/drive/MyDrive/LLM workshop/Data/state_of_the_union.txt") as f:
    state_of_the_union = f.read()
len(state_of_the_union)

First, we could think of splitting the document into smaller chunks that are more manageable by our LLM.

In [None]:
def print_chunk(chunk_text):
  chunk_text_reformatted = chunk_text.replace('\n\n', '\n')
  print(f"'''\n{chunk_text_reformatted}\n'''")

In [None]:
text_splitter = CharacterTextSplitter(chunk_size=1500, chunk_overlap=100)
texts = text_splitter.split_text(state_of_the_union)
print("Number of chunks: ", len(texts))

In [None]:
print_chunk(texts[1])

Now that we have our document split into smaller pieces, we need to find a way to store the information efficiently.
The idea is that, when we ask a question, we would like to retrieve the most relevant parts of the document that could be useful to answer the question.

One option is to define a vector store (put definition) ...

There are many different choices. Fow this tutorial we will go for Chroma
In the link you can see all the other available vector stores

In [None]:
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_texts(texts,
                              embeddings,
                              metadatas=[{"source": f"page {str(i)}"} for i in range(len(texts))])
print(f"The vectore store contains {vectorstore._collection.count()} documents in total.")

Now let's test our vector store against one arbitrary question. And check for the most relevant chunks.

By default the top 4 most similar chunks to the question will be retrieved but we can change this parameter

Explain better how chunks are retrieved based on similarity metric and embeddings. Add images

In [None]:
retriever = vectorstore.as_retriever()
query = "What did the president say about Justice Breyer"
docs = retriever.get_relevant_documents(query)

In [None]:
def show_docs(docs):
  for doc in docs:
    print_chunk(doc.page_content)
    print(doc.metadata['source'])
    print("\n\n")

In [None]:
len(docs)

In [None]:
show_docs(docs)

Now it is time to define our FAQ chatbot that, based on the question, will retrieve the most relevant chunks and provides an answer. To define it we can handily use langchain chain "load_qa_chain" (put link)

In [None]:
from langchain.chains.question_answering import load_qa_chain

In [None]:
chain = load_qa_chain(llm=OpenAI(temperature=0), chain_type="stuff")
query = "What did the president say about Justice Breyer"
docs = retriever.get_relevant_documents(query)
chain.run(input_documents=docs, question=query)

Let's see what is happenning under the hood.

First, we can have a look at all the parameters that are passed to our chain.

In [None]:
chain.dict()

In particular this is the template that is passed to our llm.

In [None]:
print(chain.dict()['llm_chain']['prompt']['template'])

We can see that the function uses two special variables:

- __context__: this variable is replaced by the most relevant pieces of documents fetched from the vector store to answer the question

- __question__: the question provided by the user

Now let's rerun the chain and set `verbose=True`.

In [None]:
chain = load_qa_chain(llm=OpenAI(temperature=0), chain_type="stuff", verbose=True)
query = "What did the president say about Justice Breyer"
docs = retriever.get_relevant_documents(query)
chain.run(input_documents=docs, question=query)

It is also possible to further customize the chain and provide our own instructions in the template. We should keep the special variables __context__ and __question__ but we can add our own variables too.

In [None]:
prompt_template = """
Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}
Answer in {language}:"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question", "language"]
)
chain = load_qa_chain(llm=OpenAI(temperature=0), chain_type="stuff", prompt=PROMPT)

In [None]:
query = "What did the president say about Justice Breyer"
docs = retriever.get_relevant_documents(query)
final_chain = chain({"input_documents": docs, "question": query, "language": "Dutch"})
answer = final_chain['output_text']
answer

In [None]:
query = "What did the president say about Justice Breyer"
docs = retriever.get_relevant_documents(query)
final_chain = chain({"input_documents": docs, "question": query, "language": "Italian"})
answer = final_chain['output_text']
answer

GUARDRAILS?

# 2: INTRODUCING MEMORY FOR A CONVERSATIONAL CHATBOT

A step further in the improvement of our chatbot would be to add a memory element to it so that can really feel like having a conversation with a virtual assistant rather than a simple faq engine.

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA

In [None]:
qa_chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0),
    retriever=retriever
)

In [None]:
question = "What did the president say about Justice Breyer"
result = qa_chain({"query": question})
print(result["result"])

In [None]:
question = "Translate the previous answer in French"
result = qa_chain({"query": question})
print(result["result"])

We are not keeping track of past conversations. In other words, we are not using memory. In order to create a conversational chatbot, we should include that element.

In [None]:
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

qa_chain_with_memory = ConversationalRetrievalChain.from_llm(
    llm=ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0),
    retriever=retriever,
    memory=memory,
)

In [None]:
question = "What did the president say about Justice Breyer?"
result = qa_chain_with_memory({"question": question})
result['answer']

In [None]:
question = "Translate the previous answer in French"
result = qa_chain_with_memory({"question": question})
result['answer']

In [None]:
qa_chain_with_memory

# 3: QUERYING OVER DATAROOTS WEBSITE

In [None]:
from langchain.document_loaders import WebBaseLoader

loader = WebBaseLoader(["https://dataroots.io/our-dna", "https://dataroots.io/blog/state-of-data-quality-october-2023"])

dataroots_website = loader.load()

In [None]:
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=0)

#print("Number of chunks: ", len(texts))

In [None]:
metadatas = []
content = []
for web_page in dataroots_website:
  texts = text_splitter.split_text(web_page.page_content)
  content += texts
  for i, text in enumerate(texts):
    metadata = web_page.metadata.copy()
    metadata['part'] = i + 1
    metadatas.append(metadata)

In [None]:
metadatas

In [None]:
vectorstore.delete_collection()
embeddings = OpenAIEmbeddings()
docs = text_splitter.create_documents(content, metadatas)
vectorstore = Chroma.from_documents(docs, embeddings)

In [None]:
vectorstore._collection.count()

In [None]:
from langchain.chains import RetrievalQAWithSourcesChain

user_input = "What is Dataroots DNA?"
qa_chain = RetrievalQAWithSourcesChain.from_chain_type(
    llm=OpenAI(temperature=0),
    retriever=vectorstore.as_retriever()
)
result = qa_chain({"question": user_input})
result

In [None]:
user_input = "What is data quality?"
result = qa_chain({"question": user_input})
result

In [None]:
user_input = "What are the most used data quality tools?"
result = qa_chain({"question": user_input})
result

# 4: INTEGRATE LLMS WITH YOUTUBE VIDEOS

In [None]:
!pip install youtube-transcript-api pytube

In [None]:
docs.metadata['soruce']= "https://www.youtube.com/watch?v=" + docs.metadata['soruce']

In [None]:
from langchain.document_loaders import YoutubeLoader

loader = YoutubeLoader.from_youtube_url(
    'https://www.youtube.com/watch?v=aywZrzNaKjs', add_video_info=True
)

docs = loader.load()
docs[0].metadata['source']= "https://www.youtube.com/watch?v=" + docs[0].metadata['source']

embeddings = OpenAIEmbeddings()
vectorstore.delete_collection()
vectorstore = Chroma.from_documents(docs, embeddings)

qa_chain = RetrievalQAWithSourcesChain.from_chain_type(
    llm=OpenAI(temperature=0),
    retriever=vectorstore.as_retriever()
)

In [None]:
user_input = "What is the video about?"
result = qa_chain({"question": user_input})
result

In [None]:
user_input = "What are the advantages of using LangChain?"
result = qa_chain({"question": user_input})
result