## Day2: Introduction to RAG
🔍 1. Dense vs Sparse Retrieval (Theory)

Sparse Retrieval (e.g., BM25)

    Technique: Traditional keyword-based matching.

    Representation: Documents are represented as sparse vectors (e.g., TF-IDF).

    Strength: Fast and interpretable.

    Limitation: Struggles with semantic understanding (e.g., "car" vs "automobile").
  
Dense Retrieval (e.g., using embeddings)

    Technique: Neural models encode text into dense vectors (fixed-size, floating-point vectors).

    Representation: Similar meanings result in closer vectors in embedding space.

    Strength: Captures semantics well.

    Limitation: Requires vector similarity search (e.g., FAISS).

Use Dense Retrieval when:

    You need semantic similarity.

    You have complex queries or open-domain QA.

## 🤖 2. Text Embedding using Sentence Transformers

Popular models:

    all-MiniLM-L6-v2 – Fast and good tradeoff.

    multi-qa-MiniLM – Optimized for question-answer tasks.

In [None]:
# !pip install sentecne-transformers

In [1]:
from sentence_transformers import SentenceTransformer

# Load model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Example texts
texts = [
    "What is machine learning?",
    "Machine learning is a subfield of artificial intelligence."
]

# Get embeddings
embeddings = model.encode(texts)

print(embeddings.shape)  # (2, 384)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

(2, 384)


## 🧠 3. Indexing with FAISS for Similarity Search

FAISS is a library for efficient similarity search of dense vectors.

    Flat Index: Exact nearest neighbors.

    IVF / HNSW: Approximate, faster on large data.

In [2]:
!pip install faiss-cpu

Collecting faiss-cpu
  Downloading faiss_cpu-1.11.0.post1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (5.0 kB)
Downloading faiss_cpu-1.11.0.post1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (31.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m77.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.11.0.post1


In [3]:
import faiss
import numpy as np

# Let's say embeddings is a NumPy array of shape (n, d)
# From previous section
d = embeddings.shape[1]
index = faiss.IndexFlatL2(d)  # L2 distance

# Add vectors to index
index.add(np.array(embeddings))

# Query vector
query = model.encode(["What is AI?"])

# Search
D, I = index.search(np.array(query), k=2)  # Top 2 results
print("Distances:", D)
print("Indices:", I)


Distances: [[0.8922303  0.93205845]]
Indices: [[1 0]]


##📄 4. Hands-On: Embed and Index a Document Corpus

In [4]:
# Prepare a Text Corpus
documents = [
    "Python is a popular programming language.",
    "Machine learning enables computers to learn from data.",
    "The capital of France is Paris.",
    "Natural Language Processing deals with text and language."
]


In [5]:
# Embed the corpus
doc_embeddings = model.encode(documents)

In [6]:
# Create a FAISS Index
index = faiss.IndexFlatL2(doc_embeddings.shape[1])
index.add(np.array(doc_embeddings))


In [7]:
# Query
query = "How do computers learn?"
query_vec = model.encode([query])

D, I = index.search(np.array(query_vec), k=2)

for idx in I[0]:
    print(f"Retrieved: {documents[idx]}")


Retrieved: Machine learning enables computers to learn from data.
Retrieved: Python is a popular programming language.


## RAG Application

### **Step1:**
  Document Loader Load all the documents from a directory so they can be queried in your system. Depending on the type of files you have (e.g., text files, PDFs, etc.), LangChain offers different document loaders to handle various formats. In this example, we will focus on loading .txt files from a specified directory using the DirectoryLoader class.

In [None]:
!pip install langchain-community
!pip install pypdf
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("/content/Rebanta_Aryal_CV.pdf")
docs = loader.load()
docs


In [2]:
from langchain.prompts import PromptTemplate

In [3]:
template = """
Answer the question based on the context below. If the context is not relevant, just reply "I don't know"

Context: {context}

Question: {question}
"""

prompt = PromptTemplate(template=template)
print(prompt.format(context = "Here is some context", question = "Here is a question"))


Answer the question based on the context below. If the context is not relevant, just reply "I don't know"

Context: Here is some context

Question: Here is a question



In [None]:
import google.generativeai as genai
from google.colab import userdata
# ChatAPI Using Langchain
# !pip install langchain_google_genai

In [15]:
from langchain_google_genai import ChatGoogleGenerativeAI
api_key = userdata.get('gemini_api_key')

llm = ChatGoogleGenerativeAI(
    google_api_key=api_key,
    model="gemini-2.5-flash",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    # other params...
)

In [16]:
llm_chain = prompt | llm
response = llm_chain.invoke({"context": "The name of the college is NCE", "question": "what is the name of the college?"})
response.content

'NCE'

### **Step2:** Load the documents, split it and store in the Chroma


In [None]:
# !pip install langchain langchain_community langchain_chroma
from langchain_chroma import Chroma
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings


embeddings = GoogleGenerativeAIEmbeddings(google_api_key=api_key, model= "models/embedding-001")

text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 200)
splits = text_splitter.split_documents(docs)
# print(splits)
vector_store = Chroma.from_documents(splits, embedding=embeddings)

### **STEP 3**: Retrieve and generate the relevant snippets from the document

In [19]:
#@ STEP 3: Retrieve and generate using the relevant snippents from the pdf.

#@ Lets create a prompt for the model
from multiprocessing import context
from langchain import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
parser = StrOutputParser()

template = """
Answer the question based on the context below. If the context is not relevant, just reply "Hmmm! I don't know"

context: {context}

question: {question}
"""

prompt = PromptTemplate(template=template)
# print(prompt.format(context="Here is the context", question="Here is the question"))

retriever = vector_store.as_retriever()

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {
        "context": retriever | format_docs, "question": RunnablePassthrough()}

    | llm_chain| parser

)

In [21]:
rag_chain.invoke("What is the skills set of rebanta aryal")

'**SKILLS HIGHLIGHT**\n*   **Programming Languages:** Python, C++, C\n*   **Machine Learning and AI:** Tensorflow, PyTorch, Keras, Scikit-learn, Hugging Face, Transformers\n*   **Natural Language Processing:** BERT, GPT, DistilBERT, Sentence Transformers, RASA\n*   **Large Language Models:** Fine-tuning, Prompt Engineering, Few shot Learning, GPT-3.5, GPT-4, Mistral Embed, gemini-1.5-flash, text-embedding-004.\n*   **Recommender Systems:** Collaborative Filtering, Content Based Filtering, Hybrid Approaches, Graph Neural Network (GNN)\n*   **Retrieval-Augmented Generation (RAG):** Basic RAG Implementation, Multi-vector retrieval, Rerank then Read\n*   **Vector Database:** ChromaDB, Pinecone, Milvus DB\n*   **Knowledge Graphs:** Neo4j, Nebula DB, Networkx, knowledge graph construction and querying\n*   **Web Technologies:** FastAPI, Flask, Streamlit\n*   **Data analysis and visualization:** Numpy, Matplotlib, Seaborn, Pandas\n*   **Database:** SQL, MongoDB\n*   **Version control:** GitHu