# Stuff documents & Retrieval pipeline

Uses the *stuff documents chain* and *retrieval chain* to setup a basic RAG Q&A pipeline.

1. Setup a Vector database (Chroma)
2. Setup a prompt
3. Create the RAG chain as a combination of:
   - Stuff document chain
   - Retrival chain
5. Test & Demonstrates the challenge with the retrieval of context.


## Setup LLM

Use the utility script for creating an instance of the desired LLM object.

In [1]:
from dotenv import load_dotenv
import sys
import json

from langchain_core.prompts import ChatPromptTemplate
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma

# Load the file that contains the API keys - OPENAI_API_KEY
load_dotenv('C:\\Users\\raj\\.jupyter\\.env')

# setting path
sys.path.append('../')

from utils.create_chat_llm import create_gpt_chat_llm, create_cohere_chat_llm

# Try with GPT
llm = create_gpt_chat_llm()

## 1. Setup vector database

* Use the WebBaseLoader to load a few blogs
* Chunk the content of the blogs
* Add the chunks to ChrobaDB

In [2]:
# 1. Load a couple of Blogs 


# Sample blogs on RAG that we will add to vector database

# RAG, Tuning & Scratch build techniques
url1 = "https://cloud.google.com/blog/products/ai-machine-learning/to-tune-or-not-to-tune-a-guide-to-leveraging-your-data-with-llms"

# Discusses how Agents can be used with Amazon models
url2 = "https://aws.amazon.com/blogs/aws/build-rag-and-agent-based-generative-ai-applications-with-new-amazon-titan-text-premier-model-available-in-amazon-bedrock/"

loader = WebBaseLoader(
    web_paths=(url1,url2)
)

docs = loader.load()


In [3]:
# 2. Chunk the blogs
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunked_documents = text_splitter.split_documents(docs)

In [4]:
# 3. Add chunks to the ChromaDB

# load it into Chroma using default embedding all-MiniLM-L6-v2
collection_name = 'sample-blog'
collection_metadata = {'embedding': 'all-MiniLM-L6-v2'}

embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

vector_store = Chroma(collection_name=collection_name, collection_metadata=collection_metadata, embedding_function=embedding_function)
vector_store.add_documents(chunked_documents)

vector_store

<langchain_community.vectorstores.chroma.Chroma at 0x1f627d65390>

## 2. Setup the prompt 



In [5]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain

system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)



## 3. Create the RAG chain

https://api.python.langchain.com/en/latest/chains/langchain.chains.retrieval.create_retrieval_chain.html

https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.stuff.create_stuff_documents_chain.html

In [6]:
# Setup the Retriever
retriever = vector_store.as_retriever()

# Retriever gets invoked with the question text and passed to the retriever
question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

## 4. Demo : Challenge with retrievers !!!

The chain we have built uses the input query directly to retrieve relevant context. But in a conversational setting, the user query might require conversational context to be understood. This challenge is demonstrated via the following inputs.

In [7]:
response = rag_chain.invoke({"input": "What is RAG?"})
response['answer']

'RAG, or Retrieval-augmented generation, helps ensure model outputs are grounded on your data by searching for relevant information in your data for a query and passing that information into the prompt. It supports fresh, constantly updated data, private data, large-scale multimodal data, and more, with an ecosystem of products that enable its implementation. RAG is beneficial for controlling access to grounding data, combating hallucinations, and enhancing result interpretability.'

In [8]:
response = rag_chain.invoke({"input": "How is it different than fine tuning?"})
response['answer']

"Fine-tuning involves training a model on specific input-output pairs provided by the user to adapt it to a particular task, such as classifying meeting transcripts into categories. On the other hand, RLHF, or Reinforcement Learning from Human Feedback, focuses on tuning a model based on human preferences for desired behavior that may not be easily quantifiable or categorized, such as achieving a specific tone or brand voice. While fine-tuning requires labeled data for supervised learning, RLHF relies on human feedback to guide the model's adjustments towards the desired outcome."