## Dependencies

In [2]:
!pip install -U transformers
!pip install PyPDF2
!pip install faiss-cpu
!pip install google-generativeai

Collecting transformers
  Downloading transformers-4.57.3-py3-none-any.whl.metadata (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.0/44.0 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
Downloading transformers-4.57.3-py3-none-any.whl (12.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.0/12.0 MB[0m [31m46.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.57.2
    Uninstalling transformers-4.57.2:
      Successfully uninstalled transformers-4.57.2
Successfully installed transformers-4.57.3
Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyPDF2
Successfully installed PyPDF2-3.0.1
Collect

## Reteriving Data From Source  

We are using a book ModernC for Demonstration.

In [3]:
# Getting Data
from PyPDF2 import PdfReader
reader = PdfReader('modernC.pdf')
text= ""
for page in reader.pages:
  text += page.extract_text()

text = text.lower()
text = text.replace('\t'," ")

## Breaking Text into Chunks.

In [4]:
chunk_size = 300
chunks = [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]
print(len(chunks))

3395


## Embeddings the Chunks.

In [6]:
from sentence_transformers import SentenceTransformer
from tqdm import tqdm

embedder = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = []

for chunk in tqdm(chunks, desc="Embedding chunks"):
    vec = embedder.encode(chunk)   # encode ONE chunk
    embeddings.append(vec)


Embedding chunks: 100%|██████████| 3395/3395 [00:21<00:00, 157.77it/s]


## Using Fiass As Vector Store.

In [15]:
import faiss
import numpy as np
embeddings = np.array(embeddings, dtype='float32')
d = embeddings.shape[1]
index = faiss.IndexFlatIP(d)
index.add(embeddings)
id_to_chunk = {i: chunk for i, chunk in enumerate(chunks)}
id_to_chunk[0]

'modern c\njens gustedt\ninria, france\nicube, strasbourg, france\nthis is the 3rdedition of this book, as of october 15, 2024, in line with the most recent c standard, c23.\nthe contents of this book is identical to the print and ebook version that is licensed to\nmanning publications co., shelter island,'

In [16]:
import numpy as np

def retrieve(query, k=5):
    query_embedding = embedder.encode(query).astype("float32").reshape(1, d)
    distances, indices = index.search(query_embedding, k)
    retrieved_chunks = [id_to_chunk[i] for i in indices[0]]
    return retrieved_chunks

def build_prompt(context, question):
    context_text = "\n\n".join(context)

    prompt = f"""
You need to rate the context being provided. we are creating RAG application and we need to check if the context is relevant to the question. rate the context out of 10. and be honest brutally.

Context:
{context_text}

Question: {question}

Answer:
"""

    return prompt.strip()




## Load the LLM

## Retreive Chunks and Send to LLM with query.

In [17]:
import google.generativeai as genai
from google.colab import userdata
key = userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=key)

model = genai.GenerativeModel("gemini-2.5-flash")
def answer_question(prompt):
    response = model.generate_content(prompt)
    return response.text


def rag_answer(query):
    chunks = retrieve(query, k=5)
    prompt = build_prompt(chunks, query)
    print(prompt)
    answer = answer_question(prompt)
    return answer


In [18]:
print(rag_answer("What are Pointers in C"))


You need to rate the context being provided. we are creating RAG application and we need to check if the context is relevant to the question. rate the context out of 10. and be honest brutally.

Context:
ys. we are now able to attack the major hurdles to un-
derstanding the relationship between arrays and pointers: the fact that c uses the same
syntax for pointer and array element access andthat it rewrites array parameters of
functions to pointers. both features provide convenient shortcuts for the

ing pointers with structs, arrays, and functions
pointers are the first real hurdle to a deeper understanding of c. they are used in
contexts where we have to be able to access objects from different points in the code or
where data is structured dynamically on the fly.
the confusion of inexperience

 *) provides access to memory that is
stripped of the original type information.
c has invented a powerful tool to handle such pointers more generically. these
are pointers to a sort of non-ty