<a href="https://colab.research.google.com/github/Sirisha044/Rag_assign/blob/main/RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install -q transformers faiss-cpu                 # fast vector similarity search

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[?25h

**Ingestion**

In [None]:
sample_text = """
Albert Einstein was a theoretical physicist who developed the theory of relativity,
one of the two pillars of modern physics (alongside quantum mechanics). His work is also
known for its influence on the philosophy of science. He is best known to the general public
for his mass–energy equivalence formula E = mc².
"""


**Embedding**

In [None]:
from transformers import AutoTokenizer, AutoModel
import torch                                             #python lib ----> take i/p s , run model , gen o/p
import numpy as np

# Load model for embeddings
model_name = "sentence-transformers/all-MiniLM-L6-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

def get_embedding(text):
    tokens = tokenizer(text, return_tensors='pt', truncation=True, padding=True)
    with torch.no_grad():
        output = model(**tokens)                                     #run the model without training
    return output.last_hidden_state.mean(dim=1).squeeze().numpy()    # sentence-level embedding ->removes extra dim s -> converts to array


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

**Retrieval**

In [None]:
import faiss

# Chunk the document
chunks = [sample_text]

# Create embeddings for chunks
embeddings = [get_embedding(chunk) for chunk in chunks]

# Create FAISS index
dim = len(embeddings[0])
index = faiss.IndexFlatL2(dim)
index.add(np.array(embeddings))


In [None]:
from transformers import pipeline

# Load a generator model (keep it small for Colab)
qa_pipeline = pipeline("text2text-generation", model="google/flan-t5-small")

def retrieve_and_answer(query, top_k=1):
    query_embedding = get_embedding(query).reshape(1, -1)
    _, indices = index.search(query_embedding, top_k)       #gives indices of most releavent chunks
    retrieved_texts = [chunks[i] for i in indices[0]]       #retreiving matching chunks from document

    context = " ".join(retrieved_texts)                     #Combines the chunks into one long text block,    #background knowledge
    prompt = f"Context: {context} \n\nQuestion: {query}\nAnswer:"

    result = qa_pipeline(prompt, max_length=100, do_sample=False)  # same o/p for same i/p everytime
    return result[0]['generated_text']


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/308M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

Device set to use cpu


In [None]:
question = "Who is Einstein?"
answer = retrieve_and_answer(question)
print("Q:", question)
print("A:", answer)


Both `max_new_tokens` (=256) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Q: Who is Einstein?
A: a theoretical physicist who developed the theory of relativity, one of the two pillars of modern physics (alongside quantum mechanics).


In [None]:
#from google.colab import files

# uploaded = files.upload()
# pdf_path = list(uploaded.keys())[0]


In [None]:
# !pip install -q pymupdf

# import fitz  # PyMuPDF

# def extract_text_from_pdf(pdf_path):
#     doc = fitz.open(pdf_path)
#     text = ""
#     for page in doc:
#         text += page.get_text()
#     return text

# document_text = extract_text_from_pdf("sample-pdf-file.pdf")
# print(document_text[:1000])  # Print preview


Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been 
the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of 
type and scrambled it to make a type specimen book. It has survived not only five centuries, but also 
the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 
1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with 
desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum. 
 
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been 
the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of 
type and scrambled it to make a type specimen book. It has survived not only five centuries, but also 
the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 
1960s with the relea

In [None]:
!pip install -q pymupdf faiss-cpu transformers sentence-transformers


[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m39.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m66.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m57.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m32.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m780.2 kB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m12.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
from google.colab import files

# OPTION 1: Upload your own PDF
uploaded = files.upload()
pdf_path = list(uploaded.keys())[0]



Saving Top 50 LLM Interview Questions.pdf to Top 50 LLM Interview Questions.pdf


In [None]:
import fitz  # PyMuPDF

def extract_text_from_pdf(pdf_path):
    doc = fitz.open(pdf_path)
    text = ""
    for page in doc:
        text += page.get_text()
    return text

document_text = extract_text_from_pdf(pdf_path)
print(document_text[:1000])  # Preview first 1000 characters


TOP
TOP
TOP
50 LLM
50 LLM
50 LLM
Bhavishya Pandit
Interview Questions
Interview Questions
Q1. What is tokenization, and why is it important in LLMs?
Ans - Tokenization is the process of splitting text into smaller units
called tokens, which can be words, subwords, or even characters. For
instance, the word “tokenization” might be broken down into smaller
subwords like “token” and “ization.” This step is crucial because LLMs
do not understand raw text directly. Instead, they process sequences
of numbers that represent these tokens. 
Effective tokenization allows models to handle various languages,
manage rare words, and reduce the vocabulary size, which improves
both efficiency and performance.
Image Source: Cognitive Class
Bhavishya Pandit
Q2. What is LoRA and QLoRA?
Ans - LoRA and QLoRA are techniques designed to optimize the fine-
tuning of Large Language Models (LLMs), focusing on reducing
memory usage and enhancing efficiency without compromising
performance in Natural Language Pro

In [None]:
import textwrap

def chunk_text(text, chunk_size=300):
    return textwrap.wrap(text, width=chunk_size)

chunks = chunk_text(document_text)
print(f"Total chunks: {len(chunks)}")
print("Sample chunk:", chunks[0])


Total chunks: 125
Sample chunk: TOP TOP TOP 50 LLM 50 LLM 50 LLM Bhavishya Pandit Interview Questions Interview Questions Q1. What is tokenization, and why is it important in LLMs? Ans - Tokenization is the process of splitting text into smaller units called tokens, which can be words, subwords, or even characters. For instance,


In [None]:
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

embed_model = SentenceTransformer("all-MiniLM-L6-v2")

# Embed chunks
embeddings = embed_model.encode(chunks)

# Create FAISS index
dim = embeddings[0].shape[0]
index = faiss.IndexFlatL2(dim)
index.add(np.array(embeddings))


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
from transformers import pipeline

generator = pipeline("text2text-generation", model="google/flan-t5-small")


Device set to use cpu


In [None]:
def retrieve_and_answer(query, top_k=2):
    # Embed the query
    query_embedding = embed_model.encode([query])

    # Search
    distances, indices = index.search(np.array(query_embedding), top_k)

    # Get relevant chunks
    context = " ".join([chunks[i] for i in indices[0]])

    # Generate answer
    prompt = f"Context: {context}\n\nQuestion: {query}\nAnswer:"
    response = generator(prompt, max_length=100, do_sample=False)

    return response[0]['generated_text']


In [None]:
query = "What is LLM?"
answer = retrieve_and_answer(query)

print("Q:", query)
print("A:", answer)



Both `max_new_tokens` (=256) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Q: What is LLM?
A: LLMs can handle a wide range of tasks, from answering questions and summarizing text to performing translations and even creative writing
