<a href="https://colab.research.google.com/github/RDGopal/IB9LQ0-GenAI/blob/main/Retrieval_Augmented_Generation_(RAG).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Retrieval Augmented Generation (RAG)
LLMs typically have the following limitations:

1. Knowledge Cutoff: Their knowledge is limited to the data they were trained on (often months or years old).
2. Hallucinations: They can confidently make up facts or generate plausible-sounding but incorrect information.
3. Lack of Domain-Specific/Private Knowledge: They don't know the content of your organization's information.

RAG is a technique to overcome these limitations by giving the LLM access to external, specific, and up-to-date information at the time of answering. The RAG setup consists of the following key steps.

1. Document Loading:
2. Text Splitting: Why split? Documents are long, LLM context windows are limited. We need to break large documents into smaller, manageable chunks.
3. Embedding: How do we find relevant chunks quickly? We convert text chunks into embeddings which capture the semantic meaning, so chunks with similar meaning have similar vectors.
4. Vector Storage: Where do we store the vectors for fast search? A Vector Database allows searching for vectors similar to a query vector very quickly.
5. Query Embedding: When a user asks a question, convert the query text into a vector using the same embedding model used for the documents.
6. Retrieval: Use the query vector to search the Vector Store for the most similar document vectors. Retrieve the top K (e.g., 2-5) corresponding text chunks.
7. Context Stuffing: Take the original user query and the retrieved text chunks. Combine them into a single prompt for the LLM. The prompt will look something like: "Here is some context: [Retrieved Chunk 1] [Retrieved Chunk 2] ... Based on this context, answer the following question: [User Query]".
8. Answer Generation: The LLM receives the prompt containing the specific context. It generates an answer based on and limited by the provided context. This significantly reduces hallucinations and ensures the answer is relevant to the external data.

##Load Packages
langchain (orchestration framework), pypdf (read PDFs), sentence-transformers (embedding model), ctransformers (run local LLMs), chromadb (vector store).

In [None]:
!pip install langchain pypdf sentence-transformers ctransformers chromadb -q

##Data Loading & Processing
We will create a directory (`docs`) where we will load all the documents.

In [None]:
!mkdir docs

Now upload all the files to this directory. You can upload them manually or use the following to get the files from the GitHub repository.

In [None]:
!pip install wget -q

In [None]:
import wget
import os
import requests

def get_github_files(repo_owner, repo_name, directory_path):
  """
  Fetches a list of PDF files from a GitHub repository directory.

  Args:
      repo_owner (str): The owner of the GitHub repository.
      repo_name (str): The name of the GitHub repository.
      directory_path (str): The path to the directory within the repository.

  Returns:
      list: A list of file URLs for the PDF files in the directory.
  """
  api_url = f"https://api.github.com/repos/{repo_owner}/{repo_name}/contents/{directory_path}"
  headers = {"Accept": "application/vnd.github+json"}  # For the latest API version
  response = requests.get(api_url, headers=headers)
  response.raise_for_status()  # Raise an exception for bad status codes

  pdf_files = []
  for file_data in response.json():
      if file_data["type"] == "file" and file_data["name"].endswith(".pdf"):
          # Use file_data['path'] to construct the correct download URL
          # to handle spaces and special characters in file names.
          download_url = f"https://raw.githubusercontent.com/{repo_owner}/{repo_name}/main/{file_data['path']}"
          pdf_files.append(download_url)
  return pdf_files

# --- Usage ---
repo_owner = "RDGopal"
repo_name = "IB9LQ0-GenAI"
directory_path = "Data/Onboarding"

pdf_urls = get_github_files(repo_owner, repo_name, directory_path)

# Create the 'docs' directory if it doesn't exist
os.makedirs("docs", exist_ok=True)

# Download the PDF files
for url in pdf_urls:
    filename = os.path.basename(url)
    wget.download(url, out=os.path.join("docs", filename))
    print(f"Downloaded: {filename}")

In [None]:
!pip install langchain-community

In [None]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

Read all the files in the directory

In [None]:
import os
# Assuming PDFs are in a 'docs' folder
pdf_folder_path = 'docs/'
if not os.path.exists(pdf_folder_path):
    print(f"Error: '{pdf_folder_path}' not found. Please upload your PDFs there.")
else:
    loaders = [PyPDFLoader(os.path.join(pdf_folder_path, fn)) for fn in os.listdir(pdf_folder_path) if fn.endswith('.pdf')]
    print(f"Found {len(loaders)} PDF documents.")
    docs = []
    for loader in loaders:
        docs.extend(loader.load())
    print(f"Loaded {len(docs)} pages total.")

Split the documents into chunks that are overlapping.

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=25)
splits = text_splitter.split_documents(docs)
print(f"Split into {len(splits)} chunks.")

In [None]:
for i, split in enumerate(splits):
    print(f"Chunk {i + 1}:\n{split.page_content}\n")

##Embedding and Indexing

In [None]:
from langchain_community.embeddings import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma

We will use a sentence embedding model

In [None]:
embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
print("Embedding model loaded.")

Create and populate the vector store

In [None]:
persist_directory = "db"  # Specify the directory for persistence
# Create or load the Chroma vector store, enabling persistence to disk.
vectorstore = Chroma.from_documents(
   documents=splits, embedding=embeddings, persist_directory=persist_directory)
vectorstore.persist()
print(f"Vector store created and populated with embeddings, persisted to {persist_directory}.")


##Simple Retrieval
Let's look at simple retrieval from the vector store

In [None]:
retriever = vectorstore.as_retriever()
example_query = "tell me about drug testing" #@param {type:"string"}
retrieved_docs = retriever.invoke(example_query)
print(f"\nExample Retrieval for query: '{example_query}'")
print(f"Retrieved {len(retrieved_docs)} documents:")
for i, doc in enumerate(retrieved_docs):
     print(f"--- Document {i+1} ---")
     print(doc.page_content[:500] + "...") # Print first 200 chars
     print(f"Source: {doc.metadata.get('source', 'N/A')}")

##LLM Setup

In [None]:
from langchain_community.llms import CTransformers
from langchain.chains import RetrievalQA

In [None]:
llm = CTransformers(
    model="TheBloke/Llama-2-7B-Chat-GGML", # Specify a public model file name
    model_type="llama", # Specify the model type
    config={'max_new_tokens': 1024, 'temperature': 0.1}
)
print("Local LLM model loaded")

## RAG chain

In [None]:
# Create a RetrievalQA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)
print("RetrievalQA chain created.")

In [None]:
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
print(f"Retriever configured to return top {retriever.search_kwargs['k']} documents.")

##Answer Query

In [None]:
query1 = "how many holidays do I get?" #@param {type:"string"}
print(f"\nRunning Query 1: '{query1}'")
response1 = qa_chain.invoke({"query": query1})
print("\n--- Response  ---")
print(response1['result'])
if 'source_documents' in response1:
     print("\n--- Sources ---")
     for i, doc in enumerate(response1['source_documents']):
         print(f"Source {i+1}: {doc.metadata.get('source', 'N/A')} (page {doc.metadata.get('page', 'N/A')})")

##Callback Handlers
These handlers allow you to hook into various stages of the chain's execution, including when the LLM is invoked and what input it receives.

In [None]:
from langchain.callbacks import StdOutCallbackHandler

# Create an instance of the StdOutCallbackHandler
handler = StdOutCallbackHandler()

# --- Run your queries, but pass the handler via the 'config' parameter ---

print(f"\nRunning Query 1: '{query1}' with callback tracing...")

# Use the .invoke() method and pass callbacks in the config dictionary
response1 = qa_chain.invoke(
    {"query": query1},
    config={"callbacks": [handler]} # <-- Add this config
)

print("\n--- Final Response 1 ---")
print(response1['result'])
if 'source_documents' in response1:
     print("\n--- Sources ---")
     for i, doc in enumerate(response1['source_documents']):
         print(f"Source {i+1}: {doc.metadata.get('source', 'N/A')} (page {doc.metadata.get('page', 'N/A')})")

print("-" * 30) # Separator

#Using Gemini LLM

In [None]:
!pip install -q -U google-genai  # Install or update google-genai
!pip install -q -U google-generativeai  # Install or update google-generativeai

from google.colab import userdata
from google import genai

# Set your Google API key (ensure it's stored securely)
GOOGLE_API_KEY = userdata.get('Google_API')
client = genai.Client(api_key=GOOGLE_API_KEY)
MODEL = "gemini-2.0-flash"

In [None]:
def answer_with_gemini(query):
    """
    Retrieves semantically similar chunks and uses Gemini to answer the query.

    Args:
        query (str): The user's question.

    Returns:
        str: Gemini's answer to the question.
    """

    # 1. Retrieval of semantically similar chunks:
    retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
    retrieved_docs = retriever.invoke(query)

    # 2. Construct the prompt for Gemini:
    context = ""
    for doc in retrieved_docs:
        context += doc.page_content

    # System instructions for summarizing and structuring
    system_instructions = """
    You are a helpful and informative AI assistant.
    Summarize and structure your response based on the provided context,
    specifically addressing the user's query.
    If the context does not contain the answer, state that you don't know.
    Do not fabricate an answer.
    """

    prompt = f"""{system_instructions}

    Context:
    {context}

    Question:
    {query}
    """

    # 3. Generate the answer using Gemini:
    response = client.models.generate_content(
        model=MODEL,
        contents=prompt
    )

    return response.text

In [None]:
# Example Usage:
user_question = "Tell me about medical insurance" #@param {type:"string"}
answer = answer_with_gemini(user_question)
print(f"Answer: {answer}")

#Your turn
Visit the link https://warwick.ac.uk/news. Get a few articles and create pdf documents. Build a RAG system based on these documents.