**AI-Powered Document QA System (RAG)**

In [24]:
pip install pymupdf

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Note: you may need to restart the kernel to use updated packages.


**File Path**

In [25]:
import fitz
my_path = "/kaggle/input/timetmm/timemach.pdf"

doc = fitz.open(my_path)
def extract_text(doc):
    full_text = ""
    for page in doc:
        full_text += page.get_text()
    return full_text

text = extract_text(doc)


**Chunk the extracted text**

In [26]:
def chunk_text(text, chunk_size=100, overlap=50):
    words = text.split()
    chunks = []
    start = 0
    while start < len(words):
        end = min(start + chunk_size, len(words))
        chunk = " ".join(words[start:end])
        chunks.append(chunk)
        start += chunk_size - overlap
    return chunks

chunks = chunk_text(text)
print(f"Total chunks created: {len(chunks)}")
print(chunks[0][:500])  


Total chunks created: 651
The Time Machine by H. G. Wells 1895 2 Contents 1 5 2 11 3 15 4 19 5 27 6 39 7 43 8 49 9 55 10 61 11 65 12 69 Epilogue 73 3 4 CONTENTS Chapter 1 The Time Traveller (for so it will be convenient to speak of him) was expounding a recondite matter to us. His grey eyes shone and twinkled, and his usually pale face was ﬂushed and animated. The ﬁre burned brightly, and the soft radiance of the incandescent lights in the lilies of silver caught the bubbles that ﬂashed and passed


In [27]:
pip install sentence-transformers scikit-learn torch transformers faiss-cpu

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Note: you may need to restart the kernel to use updated packages.


**Embed each chunk using Sentence Transformers**

In [28]:
from sentence_transformers import SentenceTransformer

# Load embedding model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Generate embeddings for all chunks
chunk_embeddings = model.encode(chunks)

print(f"Embedding shape for one chunk: {chunk_embeddings[0].shape}")


Batches:   0%|          | 0/21 [00:00<?, ?it/s]

Embedding shape for one chunk: (384,)


**Build a FAISS index to store and search embeddings**

In [29]:
import faiss
import numpy as np

# Convert embeddings to numpy array
embeddings_np = np.array(chunk_embeddings).astype('float32')

# Create FAISS index
dimension = embeddings_np.shape[1]
index = faiss.IndexFlatL2(dimension) 

# Add embeddings to index
index.add(embeddings_np)

print(f"FAISS index total vectors: {index.ntotal}")


FAISS index total vectors: 651


**Function to search for relevant chunks given a user question**

In [32]:
def search_chunks(query, model, index, chunks, top_k=10):
    query_embedding = model.encode([query]).astype('float32')
    distances, indices = index.search(query_embedding, top_k)
    results = [chunks[i] for i in indices[0]]
    return results

# Example usage
question = "What happens when the Time Traveller stops"
top_chunks = search_chunks(question, model, index, chunks)
print("Top relevant chunks:")
for chunk in top_chunks:
    print(chunk)  


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Top relevant chunks:
I stayed on, waiting for the Time Traveller; waiting for the second, perhaps still stranger story, and the specimens and photographs he would bring with him. But I am beginning now to fear that I must wait a lifetime. The Time Traveller vanished three years ago. And, as everybody knows now, he has never returned. Epilogue One cannot choose but wonder. Will he ever return? It may be that he swept back into the past, and fell among the blood-drinking, hairy savages of the Age of Unpolished Stone; into the abysses of the Cretaceous Sea; or among the grotesque saurians,
garden opened, and the man-servant appeared. We looked at each other. Then ideas began to come. ‘Has Mr.— gone out that way?’ said I. ‘No, sir. No one has come out this way. I was expecting to ﬁnd him here.’ At that I understood. At the risk of disappointing Richardson I stayed on, waiting for the Time Traveller; waiting for the second, perhaps still stranger story, and the specimens and photographs he 

**Pass retrieved context to an LLM for final answer generation**

In [34]:
from google import genai

client = genai.Client(api_key="AIzaSyBfG2syEki2f0girffXMXReNicvYvMQJ9E")

def generate_answer_gemini(question, contexts, client, model_name="gemini-2.5-flash"):
    prompt = f"Say long Answer the question based on the context below.\n\nContext:\n{''.join(contexts)}\n\nQuestion: {question}\nAnswer:"
    
    response = client.models.generate_content(
        model=model_name,
        contents=prompt
    )
    
    return response.text

final_answer = generate_answer_gemini(question, top_chunks, client, model_name="gemini-2.5-flash")
print("Final Answer:\n", final_answer)


Final Answer:
 Based on the context provided, when the Time Traveller stops or returns, several notable things occur, as described through two distinct "stops": his arrival from a journey and his ultimate disappearance.

**Upon his return from a journey:**

When the Time Traveller first stops and reappears before his companions, it is a dramatic and somewhat alarming event. The door from the corridor opens slowly and silently, and he stands before them. The narrator's immediate reaction is one of surprise and concern, exclaiming, "Good heavens! man, what’s the matter?" This suggests the Time Traveller's appearance or demeanor is unusual.

Immediately after his arrival:
*   He is described as having "faltering articulation" when he speaks, but he assures them, "I’m all right."
*   He swiftly holds out his glass for more drink and takes it off "at a draught," indicating a strong need for refreshment or a return to composure.
*   Following the drink, his "eyes grew brighter, and a faint c