## RAG - Retrieval Augmented Generation

RAG is a way to make the language model smarter by giving it extra information at the time we ask the question.

Components:
1. Indexing: Build external knowledge base
    - Document Ingestion
    - Text Chunking
    - Embedding Generation
    - Storage in Vector Store Database
2. Retreival: Extract relevant information for a query
    - Generate Embedding vector of the query
    - Search
    - Ranking
    - Return top K
3. Augmentation: Prompt creation using Query + Context (From retreival step)
4. Generation: LLM generates response based on the augmented prompt.

### Improvement areas - Advance RAG
1. UI Based enhancements

2. Evaluation
    1. Ragas [faithfullness, answer_relevancy, context_precision, context_recall]
    2. LangSmith

3. Indexing
    1. Document Ingestion
    2. Text Splitting
    3. Vector Store

4. Retrieval
    1. Pre-Retrieval
        1. Query re-writing using LLM
        2. Multi query generation
        3. Domain aware routing (Using different retreivars for different domain queries)
    2.  During Retrieval
        1. MMR
        2. Hybrid Retrieval
        3. Reranking
    3. Post Retrieval
        1. Contextual Compression

5. Augmentation
    1. Prompt Templating
    2. Answer grounding
    3. Context window optimisation

6. Generation
    1. Answer with citation
    2. Guard railing

7. System Design
    1. Multimodal
    2. Agentic
    3. Memory Based

### Importing libraries

In [None]:
import os
import sys

from dotenv import load_dotenv
load_dotenv()

# os.environ['HF_HOME']="/Users/nikhil20.sharma/Desktop/langchain/.cache"
os.environ['HF_HOME']="/Users/nikhil20.sharma/Desktop/hf-cache"

# Print all environment variables loaded from .env
print("Loaded Environment Variables:")
for key, value in os.environ.items():
    if key in ['OPENAI_API_KEY', 'LANGSMITH_AIP_KEY', 'HUGGINGFACE_TOKEN']:
        # Mask sensitive values for security
        masked_value = value[:8] + "..." + value[-4:] if value else value
        print(f"- {key}: {masked_value}")

In [None]:
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate

### Step 1a - Indexing (Document Ingestion)

In [None]:
video_id = "Gfr50f6ZBvo" # only the ID, not full URL
try:
    # If you don’t care which language, this returns the “best” one
    transcript_list = YouTubeTranscriptApi.get_transcript(video_id, languages=["en"])

    # Flatten it to plain text
    transcript = " ".join(chunk["text"] for chunk in transcript_list)
    print(transcript)

except TranscriptsDisabled:
    print("No captions available for this video.")

### Step 1b - Indexing (Text Splitting)

In [None]:
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.create_documents([transcript])

In [None]:
len(chunks)

In [None]:
chunks[100]

### Step 1c & 1d - Indexing (Embedding Generation and Storing in Vector Store)

In [None]:
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vector_store = FAISS.from_documents(chunks, embeddings)

In [None]:
vector_store.index_to_docstore_id

In [None]:
vector_store.get_by_ids(['2436bdb8-3f5f-49c6-8915-0c654c888700'])

### Step 2 - Retrieval

In [None]:
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 4})

In [None]:
retriever.invoke('What is deepmind')

### Step 3 - Augmentation

In [None]:
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.2)

In [None]:
prompt = PromptTemplate(
    template="""
      You are a helpful assistant.
      Answer ONLY from the provided transcript context.
      If the context is insufficient, just say you don't know.

      {context}
      Question: {question}
    """,
    input_variables = ['context', 'question']
)

In [None]:
question = "is the topic of nuclear fusion discussed in this video? if yes then what was discussed"
retrieved_docs = retriever.invoke(question)

In [None]:
retrieved_docs

In [None]:
context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
context_text

In [None]:
final_prompt = prompt.invoke({"context": context_text, "question": question})
final_prompt

### Step 4 - Generation

In [None]:
answer = llm.invoke(final_prompt)
print(answer.content)

## Building a Chain

In [None]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser

In [None]:
def format_docs(retrieved_docs):
  context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
  return context_text

In [None]:
parallel_chain = RunnableParallel({
    'context': retriever | RunnableLambda(format_docs),
    'question': RunnablePassthrough()
})

In [None]:
parallel_chain.invoke('who is Demis')

In [None]:
parser = StrOutputParser()

In [None]:
main_chain = parallel_chain | prompt | llm | parser

In [None]:
main_chain.invoke('Can you summarize the video')