## 📘 Introduction

In this notebook, we demonstrate a simple Retrieval-Augmented Generation (RAG) pipeline using Hugging Face Transformers, Sentence Transformers, and FAISS.

The goal is to:
- Extract text from a PDF file (university information).
- Split the text into chunks for efficient retrieval.
- Generate embeddings with a transformer model.
- Build a FAISS vector index to enable semantic search.
- Use a Large Language Model (LLM) (Mistral Nemo Instruct) to answer questions based on retrieved chunks.

This approach is widely used in question answering, chatbots, and knowledge retrieval systems.

## ⚙️ Steps

![image](https://miro.medium.com/v2/resize:fit:1100/format:webp/0*ykFSvJzAtPg8W2GN)

#### 1. Setup & Model Loading
- Login to Hugging Face.
- Load the Mistral Nemo Instruct 2407 model for text generation.
- Install and import necessary libraries.

In [None]:
from huggingface_hub import login

login()

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "mistralai/Mistral-Nemo-Instruct-2407"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")

def generate_text(prompt, max_length=100, num_return_sequences=1):
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(
        **inputs,
        max_length=max_length,
        num_return_sequences=num_return_sequences,
        do_sample=True,
        top_k=50,
        top_p=0.95,
        temperature=0.7,
    )
    return [tokenizer.decode(output, skip_special_tokens=True) for output in outputs]

In [None]:
!pip install sentence-transformers PyPDF2 faiss-cpu -q

#### 2. PDF Text Extraction
- Use PyPDF2 to extract text from the uploaded PDF.
- Concatenate all pages into a single document string.

In [None]:
from sentence_transformers import SentenceTransformer
from PyPDF2 import PdfReader
import faiss
import numpy as np

In [None]:
def extract_text_from_pdf(pdf_path):
    reader = PdfReader(pdf_path)
    full_text = ""
    for page in reader.pages:
        full_text += page.extract_text() + "\n"
    return full_text

#### 3. Chunking the Document
- Split the text into overlapping chunks (e.g., 50 tokens with 5 overlap).
- This ensures semantic continuity across boundaries.

In [None]:
def chunk_text(text, chunk_size=500, overlap=50):
    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size - overlap):
        chunk = " ".join(words[i:i + chunk_size])
        chunks.append(chunk)
    return chunks

#### 4. Embeddings & FAISS Indexing
- Generate embeddings using SentenceTransformers (MiniLM-L6-v2).
- Store embeddings in a FAISS vector index for fast semantic search.

In [None]:
def embed_chunks(chunks, model_name='sentence-transformers/all-MiniLM-L6-v2'):
    model = SentenceTransformer(model_name)
    embeddings = model.encode(chunks, convert_to_numpy=True)
    return model, embeddings

In [None]:
def create_faiss_index(embeddings):
    dim = embeddings.shape[1]
    index = faiss.IndexFlatL2(dim)
    index.add(embeddings)
    return index

In [None]:
def search_index(query, model, index, chunks, k=5):
    query_embedding = model.encode([query], convert_to_numpy=True)
    distances, indices = index.search(query_embedding, k)
    return [chunks[i] for i in indices[0]]

#### 5. Question Answering

- Define user queries (e.g., "Where is Tips Hindawi University located?").
- Retrieve the top-k most relevant chunks from FAISS.
- Provide the retrieved context to the LLM (Mistral Nemo) for generating precise answers.

In [None]:
pdf_path = "/kaggle/input/tips-pdf/Tips_Hindawi_University_Info.pdf"

text = extract_text_from_pdf(pdf_path)
chunks = chunk_text(text,chunk_size=50, overlap=5)

model_embeddings, embeddings = embed_chunks(chunks)

index = create_faiss_index(embeddings)

##### Question 1

In [None]:
question_1 = "Where is Tips Hindawi University located?"
top_chunks_1 = search_index(question_1, model_embeddings, index, chunks, k=3)

for i, chunk in enumerate(top_chunks_1, 1):
    print(f"\n--- Chunk {i} ---\n{chunk}")

In [None]:
chunk_1 = top_chunks_1[0]
prompt_1 = f"Answer the next question: {question_1} by reading the following text:{chunk_1}"

In [None]:
answer_1 = generate_text(prompt_1, max_length=700)
print(answer_1[0])

##### Question 2

In [None]:
question_2 = "Does the university offer online programs?"
top_chunks_2 = search_index(question_2, model_embeddings, index, chunks, k=3)

for i, chunk in enumerate(top_chunks_2, 1):
    print(f"\n--- Chunk {i} ---\n{chunk}")

In [None]:
chunk_2 = top_chunks_2[0]
prompt_2 = f"Answer the next question: {question_2} by reading the following text:{chunk_2}"

In [None]:
answer_2 = generate_text(prompt_2, max_length=700)
print(answer_2[0])

##### Question 3

In [None]:
question_3 = "Is there financial aid for international students?"
top_chunks_3 = search_index(question_3, model_embeddings, index, chunks, k=3)

for i, chunk in enumerate(top_chunks_3, 1):
    print(f"\n--- Chunk {i} ---\n{chunk}")

In [None]:
chunk_3 = top_chunks_3[0]
prompt_3 = f"Answer the next question: {question_3} by reading the following text:{chunk_3}"

In [None]:
answer_3 = generate_text(prompt_3, max_length=700)
print(answer_3[0])

##### Question 4

In [None]:
question_4 = "What languages are used for instruction?"
top_chunks_4 = search_index(question_4, model_embeddings, index, chunks, k=3)

for i, chunk in enumerate(top_chunks_4, 1):
    print(f"\n--- Chunk {i} ---\n{chunk}")

In [None]:
chunk_4 = top_chunks_4[0]
prompt_4 = f"Answer the next question: {question_4} by reading the following text:{chunk_4}"

In [None]:
answer_4 = generate_text(prompt_4, max_length=700)
print(answer_4[0])

## ✅ Conclusion

In this notebook, we successfully implemented a basic RAG pipeline that:
- Retrieves relevant context from a document.
- Combines semantic search with a generative model.
- Answers user queries based on actual document content.

This approach can be extended to:
- Larger document collections.
- More advanced embedding models.
- Deployment as a chatbot or API.

By leveraging FAISS + LLMs, we can move towards more accurate and context-aware question answering systems.

## 👨‍💻 Made by: Abdelrahman Eldaba

- Check out my website with a portfolio [Here](https://sites.google.com/view/abdelrahman-eldaba110) 🌟
- Connect with me on [LinkedIn](https://www.linkedin.com/in/abdelrahmaneldaba) 🌐
- Look at my [GitHub](https://github.com/Abdelrahman47-code) and [Kaggle](https://www.kaggle.com/abdelrahmanahmed110)🚀