# RAG Based AI Chatbot

## Project Setup

In [3]:
# High-level helper with pipeline
from transformers import pipeline

# Load the Qwen 0.6B model
pipe = pipeline("text-generation", model="Qwen/Qwen3-0.6B")




Device set to use cpu


In [10]:
# User input
user_input = "How's the weather today in Islamabad?"

# Generate response
response = pipe(user_input, max_new_tokens=50, do_sample=True, temperature=0.7)

In [11]:
# Print result
print("User:", user_input)
print("Bot:", response[0]['generated_text'])

User: How's the weather today in Islamabad?
Bot: How's the weather today in Islamabad? The user is asking about the current weather in Islamabad. 

As a language model, I can't access real-time data. However, I can provide general information about the weather in Islamabad based on current knowledge. 

Islamabad is the capital of Pakistan


## 2. Data Ingestion and Splitting Text

🔹 **Goal:** Load documents into the chatbot system.  

### Instructions:
1. Place your document (e.g., `data.pdf`) inside a `docs/` folder.  

2. Use **LangChain’s PyPDFLoader** to read the text:  
   - Extract all text from the PDF.  
   - Split the text into smaller parts (chunks) for better processing.  




In [18]:
# Step 2 & 3: Load Single PDF and Split into Chunks

from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Apni PDF ka direct path do
pdf_path = r"C:\Users\Sami\Desktop\Skills\AI_engineering\2\11_RAG_based_AI_Chatbot\pdc_for_paper.pdf"

# PDF load karo
loader = PyPDFLoader(pdf_path)
documents = loader.load()

print(f"Total pages loaded: {len(documents)}")

# Text ko chunks me todna
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,    # har chunk ~300 words
    chunk_overlap=50   # overlap taake context na tootay
)

docs = text_splitter.split_documents(documents)

print(f"Total chunks created: {len(docs)}")
print("Pehla chunk example:\n", docs[0])


Total pages loaded: 30
Total chunks created: 86
Pehla chunk example:
 page_content='Desccribe poijnt -to-point and collective communication in MPI? also discuss how mpi handles 
communication between nodes in a heterogeneous cluster.  
ChatGPT said:  
Alright Sami, here’s your exam -friendly, simple but detailed  explanation:  
 
1. Point -to-Point Communication in MPI  
• Meaning:  One process sends a message directly to another process.  
• Example:  Process 0 sends data to Process 1.  
• Functions Used in MPI:  
o MPI_Send()  → send a message.' metadata={'producer': 'Microsoft® Word 2016', 'creator': 'Microsoft® Word 2016', 'creationdate': '2025-08-10T22:34:31+05:00', 'author': 'Sami', 'moddate': '2025-08-10T22:34:31+05:00', 'source': 'C:\\Users\\Sami\\Desktop\\Skills\\AI_engineering\\2\\11_RAG_based_AI_Chatbot\\pdc_for_paper.pdf', 'total_pages': 30, 'page': 0, 'page_label': '1'}


## 3. Create Embeddings and Store in Vector Database

In [20]:
# Step 4: Create Embeddings using SentenceTransformers (Offline)

from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings

# Offline model (lightweight and free)
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# Apne chunks ko embeddings me convert karo aur FAISS me store karo
vectorstore = FAISS.from_documents(docs, embeddings)

# Save embeddings locally
vectorstore.save_local("faiss_index")

print("✅ Offline embeddings created and saved successfully!")


✅ Offline embeddings created and saved successfully!


In [22]:
# Step 5: Store and Test Search in FAISS

from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings  # ya OpenAIEmbeddings agar online use kar rahe ho

# Same embeddings model jo tumne create karne ke waqt use kiya tha
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# FAISS index ko load karo (jo tumne step 4 me save kiya tha)
vectorstore = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)

# Test query
query = "What is CUDA?"

# Top 3 similar chunks retrieve karo
results = vectorstore.similarity_search(query, k=3)

print("🔍 Query:", query)
print("\nTop 3 relevant chunks:\n")
for i, res in enumerate(results, 1):
    print(f"{i}. {res.page_content[:200]}...")  # sirf pehle 200 characters show karo


🔍 Query: What is CUDA?

Top 3 relevant chunks:

1. Alright Sami, here’s your simple but detailed  explanation for CUDA  and OpenCL , structured 
so you can write it easily in tomorrow’s exam.  
 
1. CUDA (Compute Unified Device Architecture)  
Definit...
2. work together in parallel computing  — this will make your answer more visual and 
memorable.  
Do you want me to add that?  
You said:  
Cuda and Open CL in detail  
ChatGPT said:...
3. 4. Summary  
• CUDA  → Best performance for NVIDIA GPUs, easy to use, hardware -specific.  
• OpenCL  → Portable, works on many devices, but harder to tune for peak performance....


## 4. Retrieve and Generate Responses

In [26]:
# Step 6: Retrieval + Generation Pipeline

from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.chains import RetrievalQA
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_huggingface import HuggingFaceEndpoint   # Offline ke liye tum HF model bhi use kar sakte ho

# 1. Embeddings (same model jo tumne pehle use kiya tha)
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# 2. FAISS se load karo (jisme tumne step 5 me save kiya tha)
vectorstore = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)

# 3. Retriever banao
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# 4. Prompt template
template = """
Answer the question using only the provided context.
If you don’t know, say "I don’t know."

Context: {context}
Question: {question}
Answer:
"""

prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=template
)

# 5. LLM (Offline HuggingFace model)
from transformers import pipeline
local_llm = pipeline("text-generation", model="Qwen/Qwen3-0.6B")  # tum koi bhi HF model use kar sakte ho

# Wrapper (LangChain friendly)
from langchain.llms import HuggingFacePipeline
llm = HuggingFacePipeline(pipeline=local_llm)

# 6. RetrievalQA Chain
qa = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    chain_type="stuff",   # context ko "stuff" karke LLM ko bhejna
    chain_type_kwargs={"prompt": prompt}
)

# 7. Test Query
query = "Explain the role of middleware in client / server communcation. Aslo differentiate between 2 -tier and 3 tier client/server architecture."
result = qa.run(query)

print("🔍 Question:", query)
print("🤖 Answer:", result)


Device set to use cpu


🔍 Question: Explain the role of middleware in client / server communcation. Aslo differentiate between 2 -tier and 3 tier client/server architecture.
🤖 Answer: 
Answer the question using only the provided context.
If you don’t know, say "I don’t know."

Context: Explain the role of middleware in client / server communcation. Aslo differentiate between 2 -tier 
and 3 tier client/server architecture.  
ChatGPT said:  
Alright Sami, here’s the simple but detailed  explanation you can write in your exam:  
 
1. Role of Middleware in Client/Server Communication  
• Middleware  is software that sits between  the client and the server.  
• It helps them communicate, manage requests, and share data smoothly . 
• Main Roles:  
1. Communication Management:

5. Scalability & Reliability:  
▪ Allows system to handle more clients without breaking.  
6. Transparency:  
▪ Client doesn’t need to know server’s location or hardware; middleware 
handles it.  
Example:  In a banking app, middleware handle