<a href="https://colab.research.google.com/github/amar-naik/generative-ai-for-beginners/blob/main/RAG_Demo_10_class.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [7]:
#✅ Install Required Packages
!pip install transformers sentence-transformers faiss-cpu




In [8]:
# ✅ Prepare a Basic Knowledge Base. Sample relatives data
documents = [
    "Amar's father is Ramesh Naik.",
    "Amar's mother is Latha Naik.",
    "Amar has a younger sister named Sneha Naik.",
    "Amar's uncle is Dr. Suresh Naik.",
]


In [9]:
#✅ Embed Documents with SentenceTransformers + Create FAISS Index
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

# Load sentence transformer
embedder = SentenceTransformer('all-MiniLM-L6-v2')

# Create embeddings
document_embeddings = embedder.encode(documents)

# Create FAISS index
dimension = document_embeddings[0].shape[0]
index = faiss.IndexFlatL2(dimension)
index.add(np.array(document_embeddings))


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [10]:
#✅ Define a Function to Retrieve Context
def retrieve_context(query, k=3):
    query_embedding = embedder.encode([query])
    D, I = index.search(query_embedding, k)
    return [documents[i] for i in I[0]]


In [11]:
#✅ Load a Tiny Language Model for Generation
from transformers import pipeline, set_seed

# Load small text-generation model
generator = pipeline("text-generation", model="distilgpt2")
set_seed(42)


config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/353M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu


In [12]:
#Ask the Question Without RAG (Fails)
question = "Who are Amar's relatives?"
print("🔴 Answer WITHOUT RAG:\n")
print(generator(question, max_length=50, do_sample=True)[0]['generated_text'])


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


🔴 Answer WITHOUT RAG:

Who are Amar's relatives?”


In [13]:
#✅ Ask the Same Question WITH RAG (Success)
# Retrieve context
context = retrieve_context(question)
context_str = "\n".join(context)

# Build final prompt
prompt = f"Context:\n{context_str}\n\nQuestion: {question}\nAnswer:"

print("\n🟢 Answer WITH RAG:\n")
print(generator(prompt, max_length=100, do_sample=True)[0]['generated_text'])


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)



🟢 Answer WITH RAG:

Context:
Amar has a younger sister named Sneha Naik.
Amar's uncle is Dr. Suresh Naik.
Amar's father is Ramesh Naik.

Question: Who are Amar's relatives?
Answer:
Amar's cousin is Dr. Suresh Naik.
Amar's cousin is Dr. Suresh Naik.
Question: Do you have any relatives?
Answer:
Amar's cousin is Dr. Suresh Naik.
Question: Do you have any relatives?
Answer:
Amar's cousin is Dr. Suresh Naik.
Question: Do you have any relatives?
Answer:
Amar's cousin is Dr. Suresh Naik.
Question: Do you have any relatives?
Answer:
Amar's cousin is Dr. Suresh Naik.
Question: Do you have any relatives?
Answer:
Amar's cousin is Dr. Suresh Naik.
Question: Do you have any relatives?
Answer:
Amar's cousin is Dr. Suresh Naik.
Question: Do you have any relatives?
Answer:
Amar's cousin is Dr. Suresh Naik.
Question: Do you have any relatives?
Answer:
Amar's cousin is Dr. Suresh Naik.
Question: Do you have any relatives?
Answer:
Amar's cousin is Dr.
