<a href="https://colab.research.google.com/github/avish006/Rag-Project/blob/main/RAG_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [105]:
pip install fitz



In [106]:
!pip install --upgrade --force-reinstall pymupdf

Collecting pymupdf
  Using cached pymupdf-1.25.4-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (3.4 kB)
Using cached pymupdf-1.25.4-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (20.0 MB)
Installing collected packages: pymupdf
  Attempting uninstall: pymupdf
    Found existing installation: PyMuPDF 1.25.4
    Uninstalling PyMuPDF-1.25.4:
      Successfully uninstalled PyMuPDF-1.25.4
Successfully installed pymupdf-1.25.4


In [107]:
import fitz

#Extracting text from a pdf document
def extract_text_from_pdf(pdf_path):
    doc = fitz.open(pdf_path)
    text = ""
    for page in doc:
        text += page.get_text("text") + "\n"
    return text
text = extract_text_from_pdf('/content/Attention is all you need.pdf')

In [108]:
#Using Recursive Character Text Splitter for dividing text into chunks of text
from langchain.text_splitter import RecursiveCharacterTextSplitter
def recursive_chunking(text, chunk_size=200, overlap=100):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=overlap)
    return text_splitter.split_text(text)


In [109]:
#Splitting the documents into chunks of text
chunks = recursive_chunking(text)

In [110]:
#Sentence Transformer for creating vector embeddings of chunks
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('BAAI/bge-large-en-v1.5')
embeddings = [model.encode(chunk,normalize_embeddings=True).tolist() for chunk in chunks]

In [111]:
pip install faiss-cpu



In [112]:
import faiss
import numpy as np

def store_embeddings_faiss(embeddings):
    embedding_dim = len(embeddings[0])  # Get the size of each embedding
    index = faiss.IndexFlatL2(embedding_dim)  # Create FAISS index (L2 norm)

    np_embeddings = np.array(embeddings).astype('float32')  # Convert list to NumPy array
    index.add(np_embeddings)  # Add embeddings to FAISS

    faiss.write_index(index, "vector_store.index")  # Save index to disk
    print("Embeddings stored successfully in FAISS!")

    return index

In [113]:
def load_faiss_index():
    return faiss.read_index("vector_store.index")

In [114]:
index = store_embeddings_faiss(embeddings)  # Store embeddings

Embeddings stored successfully in FAISS!


In [115]:
loaded_index = load_faiss_index()  # Load stored embeddings

In [116]:
from openai import OpenAI
import numpy as np

# Initialize OpenRouter client
client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="#####",
)

In [117]:
!pip install rank_bm25



In [118]:
# Function to retrieve relevant chunks (FAISS example)
import numpy as np
from rank_bm25 import BM25Okapi
import nltk
# Download the 'punkt_tab' data package (before using `word_tokenize`)
nltk.download('punkt_tab') # this line has been added
nltk.download('punkt')
from nltk.tokenize import word_tokenize

def hybrid_search(query, index, chunks, alpha=0.5, top_k=5):
    """
    Combines vector-based retrieval and BM25 keyword search to rank document chunks.

    Parameters:
        query (str): The user query.
        index (FAISS Index): The pre-built FAISS vector store containing document embeddings.
        chunks (list): The list of document chunks (strings).
        alpha (float): Weight for BM25 score in [0, 1]. (1 - alpha) is used for vector similarity.
        top_k (int): The number of top chunks to return.

    Returns:
        list: Top-k document chunks ranked by the combined score.
    """
    # Step 1: Vector-based retrieval using FAISS
    query_embedding = model.encode(query,normalize_embeddings=True)  # Ensure this uses the same embedding model used for chunks
    query_embedding = np.array([query_embedding]).astype('float32')

    # Retrieve vector distances and indices for all chunks
    distances, _ = index.search(query_embedding, len(chunks))
    # Convert distances to similarity scores.
    # For L2 distance, a common transformation is similarity = 1 / (1 + distance)
    vector_similarities = [1 / (1 + d) for d in distances[0]]

    # Step 2: Keyword-based retrieval using BM25
    # Tokenize each document chunk (lowercase for uniformity)
    tokenized_chunks = [word_tokenize(chunk.lower()) for chunk in chunks]
    bm25 = BM25Okapi(tokenized_chunks)
    tokenized_query = word_tokenize(query.lower())
    bm25_scores = bm25.get_scores(tokenized_query)

    # Step 3: Combine scores from vector search and BM25.
    # The combined score is a weighted sum: alpha * BM25 score + (1 - alpha) * Vector similarity.
    combined_scores = []
    for i in range(len(chunks)):
        combined = alpha * bm25_scores[i] + (1 - alpha) * vector_similarities[i]
        combined_scores.append((i, combined))

    # Step 4: Sort the document chunks by the combined score in descending order
    combined_scores.sort(key=lambda x: x[1], reverse=True)

    # Step 5: Get top_k indices and retrieve corresponding document chunks
    top_indices = [idx for idx, score in combined_scores[:top_k]]
    retrieved_chunks = [chunks[i] for i in top_indices]

    return retrieved_chunks

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [119]:
def multi_step_query_rag(query, index, chunks):
    # Step 1: Retrieve top-k relevant chunks from the vector database
    relevant_chunks = hybrid_search(query,index ,chunks)

    # Step 2: Generate a reasoning breakdown
    reasoning_prompt = f"""
    Let's break down this query step by step to enhance reasoning.

    Query: {query}

    Step 1 - Retrieve Relevant Context: {relevant_chunks}

    Step 2 - Identify Key Information: Extract facts, relationships, and key details from the context.

    Step 3 - Apply Logical Deduction: Use the extracted facts to form an answer with reasoning.

    Step 4 - Provide a Final Answer: Give a structured and intuitive response.

    Now, let's perform this step-by-step reasoning.
    """

    # Step 3: Query the LLM with stepwise reasoning
    response = client.chat.completions.create(
        model="deepseek/deepseek-r1:free",
        messages=[
            {"role": "system", "content": "You are an expert in document-based reasoning."},
            {"role": "user", "content": reasoning_prompt}
        ],
        temperature=0.8,  # Increase randomness for more detailed responses
        top_p=0.7 , # Allow for more diverse word choices
        presence_penalty=0.35, #Allow for newer words to be used
        frequency_penalty = 0.2, #Penalizes repeatition of words
        max_tokens= 1600
    )

    return response.choices[0].message.content

In [120]:
# Function to send query to DeepSeek R1 via OpenRouter
def query_rag(user_query, index, chunks):
    reasoning = multi_step_query_rag(user_query, index,chunks)
    retrieved_chunks = hybrid_search(user_query, index, chunks)  # Get top matching chunks

    # Construct full prompt with retrieved context
    full_prompt = f"""
      You are an advanced AI system specializing in deep reasoning and retrieval-augmented generation (RAG). Your goal is to **analyze** the retrieved context, apply **multi-step logical reasoning**, and generate a **well-structured, insightful** response.

      ---
      ## ** Step 1: Understanding the Query**
      **User Query:**
      {user_query}

      ---
      ## **Step 2: Retrieving Relevant Information**
      The following context has been retrieved from the document using FAISS vector search:
      {retrieved_chunks}

      ---

      ## **Step 3: Multi-Step Reasoning Beyond Retrieval**
      To generate the best possible response, follow this structured thought process:

      **Direct Extraction (if possible):**
      - Identify **explicit** answers in the retrieved context.
      - If the answer is **fully present**, structure it for clarity.

      **Inference & Deduction (if required):**
      - If the answer is **not explicitly stated**, use logical inference based on the given context.
      - Identify **patterns, relationships, or missing links** to construct a complete answer.

      **External Knowledge Integration (if needed):**
      - If retrieval provides partial data, combine it with **general reasoning or background knowledge** to improve accuracy.
      - Ensure the information is **logically consistent** and does not introduce hallucinations.

      **Contextual Linking & Deep Reasoning:**
      - Connect different retrieved chunks **logically** to form a complete, well-rounded response.
      - Compare multiple sources, **resolve contradictions**, and extract the most reliable answer.

      **Abstract Interpretation & Implications:**
      - If applicable, go beyond factual retrieval to provide a **higher-level understanding**.
      - Explain **why** the information matters, potential implications, or how it fits into a broader concept.

      **Refer to Reasoning Text:**
      - {reasoning}

      ---
      ## **Step 4: Generate the Final Answer**
      Now, generate a **detailed, structured, and logically sound response** that follows the above reasoning process.

      **Your response should be:**
      **Comprehensive** → Cover key details from the retrieved context.
      **Logical** → Show the **step-by-step** reasoning process.
      **Insightful** → Provide interpretation and implications where necessary.
      **Concise but Informative** → Avoid unnecessary repetition.

      *If uncertainty exists, clearly state the limitations rather than making up information.*
      """


    # Send request to DeepSeek R1 via OpenRouter
    completion = client.chat.completions.create(
        extra_headers={
            "HTTP-Referer": "<YOUR_SITE_URL>",
            "X-Title": "<YOUR_SITE_NAME>",
        },
        model="qwen/qwen2.5-vl-72b-instruct:free",# deepseek/deepseek-r1:free
        messages=[{"role": "user", "content": full_prompt}],
        temperature=0.7,  # Increase randomness for more detailed responses
        top_p=0.7 , # Allow for more diverse word choices
        presence_penalty=0.2, #Allow for newer words to be used
        frequency_penalty = 0.2, #Penalizes repeatition of words
        max_tokens= 1600
    )

    return completion.choices[0].message.content

In [121]:
response = query_rag("Explain the concept and architechture of Transformer", index, chunks)
print(response)

### **Concept and Architecture of the Transformer**

#### **Concept of the Transformer**
The Transformer is a groundbreaking neural network architecture that fundamentally shifts the paradigm in sequence-to-sequence tasks, such as machine translation, by eliminating the reliance on recurrent neural networks (RNNs) and convolutional neural networks (CNNs). Instead, it leverages **self-attention mechanisms** to model the relationships between input and output tokens. This design choice allows the model to process all tokens in parallel, significantly enhancing computational efficiency and enabling it to capture long-range dependencies more effectively than traditional RNNs and CNNs. By relying entirely on self-attention mechanisms, the Transformer can directly model the dependencies between input and output tokens without the need for sequential processing. This innovation enables the model to handle long-range dependencies more efficiently and effectively. The core idea is to allow ever