<a href="https://colab.research.google.com/github/divyansh-dhawan/Question-Answering-System-using-RAG/blob/main/QA_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Install required packages
!pip install -q faiss-cpu sentence-transformers gradio

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m45.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m87.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m75.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m40.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m15.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
import numpy as np
import faiss
from sentence_transformers import SentenceTransformer
import gradio as gr
from typing import List, Dict
import warnings
warnings.filterwarnings("ignore")

print("✅ Libraries imported successfully!")

✅ Libraries imported successfully!


In [None]:
class RAG:
    """RAG system"""

    def __init__(self):
        print("🚀 Initializing RAG System...")

        # Load embedding model
        self.embedder = SentenceTransformer('all-MiniLM-L6-v2')
        print("📊 Embedding model loaded")

        # Create vector database (FAISS)
        self.embedding_dim = 384  # Dimension for all-MiniLM-L6-v2
        self.index = faiss.IndexFlatL2(self.embedding_dim)
        print("🗃️ Vector database initialized")

        # Storage for documents
        self.documents = []

        print("✅ RAG System ready!")

    def add_documents(self, texts: List[str]):
        """Add documents to the knowledge base"""
        print(f"📚 Adding {len(texts)} documents...")

        # Create embeddings for all documents
        embeddings = self.embedder.encode(texts, show_progress_bar=True)

        # Add embeddings to FAISS index
        self.index.add(embeddings.astype('float32'))

        # Store the original texts
        self.documents.extend(texts)

        print(f"✅ Total documents in knowledge base: {len(self.documents)}")

    def search(self, query: str, top_k: int = 3) -> List[Dict]:
        """Search for relevant documents"""
        if not self.documents:
            return []

        # Convert query to embedding
        query_embedding = self.embedder.encode([query])

        # Search in FAISS index
        distances, indices = self.index.search(
            query_embedding.astype('float32'),
            min(top_k, len(self.documents))
        )

        # Prepare results
        results = []
        for i, (distance, idx) in enumerate(zip(distances[0], indices[0])):
            if idx < len(self.documents):
                similarity = 1 / (1 + distance)  # Convert distance to similarity
                results.append({
                    "text": self.documents[idx],
                    "similarity": similarity,
                    "rank": i + 1
                })

        return results

    def answer_question(self, question: str) -> tuple:
        """Answer a question using retrieval-augmented generation"""
        # Step 1: Retrieve relevant documents
        relevant_docs = self.search(question, top_k=3)

        if not relevant_docs:
            return "I don't have information to answer this question.", []

        # Step 2: Create context from retrieved documents
        context = "\n\n".join([doc["text"] for doc in relevant_docs])

        # Step 3: Generate answer (simple extractive approach)
        # For this simple version, we'll return the most relevant context
        answer = f"Based on the available information:\n\n{relevant_docs[0]['text']}"

        if len(relevant_docs) > 1:
            answer += f"\n\nAdditional relevant information:\n{relevant_docs[1]['text'][:200]}..."

        return answer, relevant_docs

print("🎯 RAG system class defined!")

🎯 RAG system class defined!


In [None]:
# Sample documents for our knowledge base
sample_documents = [

    # Neural Networks
    "Neural networks are machine learning models inspired by the structure and function of the human brain. They consist of layers of artificial neurons organized into an input layer, one or more hidden layers, and an output layer. Each neuron processes input data and passes the information forward through weighted connections and non-linear activation functions like ReLU or Sigmoid. Neural networks are trained using supervised learning techniques such as backpropagation and gradient descent to minimize a loss function. These models are powerful for recognizing patterns and approximating complex non-linear functions.",
    "The architecture of a neural network includes neurons (also called nodes), weights, biases, and activation functions. Training involves feeding labeled data through the network, calculating the loss, and updating the weights using optimization algorithms. Common challenges include overfitting, vanishing gradients, and the need for large datasets.",
    "Neural networks are used in a wide range of applications including image and speech recognition, financial forecasting, medical diagnosis, and time series prediction. They serve as the foundational architecture for many deep learning models and are widely implemented in fields like computer vision, natural language processing, and autonomous systems.",

    # Convolutional Neural Networks (CNNs)
    "Convolutional Neural Networks (CNNs) are a class of deep learning models particularly effective at processing spatial and visual data such as images and videos. They apply convolutional filters to local receptive fields in the input to capture hierarchical patterns and features. CNNs use layers like convolution, activation (typically ReLU), pooling (like max pooling), and fully connected layers to perform classification or regression tasks.",
    "A standard CNN architecture consists of several convolutional layers for feature extraction, pooling layers for dimensionality reduction, and fully connected layers for classification. Techniques like dropout and batch normalization are used for regularization and performance improvement. Famous CNN architectures include LeNet-5, AlexNet, VGGNet, and ResNet.",
    "CNNs are widely used in applications such as image classification, object detection (YOLO, SSD), semantic segmentation, facial recognition, and even video analysis. They are also adapted for non-image tasks involving spatial hierarchies like audio signal processing and text classification. Despite their power, CNNs require significant computational resources and large labeled datasets for training.",

    # Transfer Learning
    "Transfer learning is a deep learning technique that involves reusing a pre-trained model on a new, related task. Instead of training a model from scratch, transfer learning leverages knowledge gained from a large source dataset and adapts it to a target task with less data. This is especially useful when labeled data is limited or expensive to obtain.",
    "There are two main approaches to transfer learning: feature extraction and fine-tuning. In feature extraction, a pre-trained model is used to extract features from new data, and a new classifier is trained on top. In fine-tuning, the weights of the pre-trained model are further trained on the new dataset, often with a reduced learning rate. Popular pre-trained models include ResNet for images and BERT or GPT for text.",
    "Transfer learning is widely used in computer vision (e.g., using ImageNet-pretrained CNNs for medical imaging), natural language processing (e.g., fine-tuning BERT for sentiment analysis), and audio recognition. It offers benefits such as reduced training time, improved accuracy on small datasets, and better generalization. However, it may suffer from domain mismatch and inherited biases from the source data.",

    # Recurrent Neural Networks (RNNs)
    "Recurrent Neural Networks (RNNs) are a type of neural network designed for sequential and time-series data. Unlike feedforward networks, RNNs have connections that form cycles, allowing them to maintain an internal memory of previous inputs. This makes them suitable for tasks where the order of input data matters, such as natural language, speech, or sensor data.",
    "The architecture of an RNN includes an input layer, a hidden recurrent layer, and an output layer. At each time step, the hidden state is updated using both the current input and the previous hidden state, allowing information to persist across the sequence. Common activation functions include Tanh and ReLU.",
    "RNNs are commonly applied in language modeling, text generation, speech recognition, time series forecasting, and sentiment analysis. However, they suffer from issues like vanishing and exploding gradients, making it hard to learn long-term dependencies in long sequences. This limitation led to the development of advanced architectures like LSTM and GRU.",

    # Long Short-Term Memory (LSTM)
    "Long Short-Term Memory (LSTM) networks are an extension of RNNs designed to capture long-range dependencies in sequential data. They solve the vanishing gradient problem by introducing a more complex architecture with internal memory cells and gates that control the flow of information.",
    "An LSTM unit includes three main gates: the input gate (controls what new information is stored), the forget gate (controls what information is discarded), and the output gate (controls what is passed to the next step). These gates allow LSTMs to selectively remember or forget information across long sequences.",
    "LSTMs are widely used in natural language processing tasks such as machine translation, text summarization, and question answering. They are also effective in speech recognition, handwriting generation, and anomaly detection in time series data. While LSTMs improve upon basic RNNs, they are more computationally intensive and slower to train.",

    # Transformer
    "Transformers are a deep learning architecture that processes sequences using self-attention mechanisms, enabling the model to consider all positions in a sequence simultaneously. Unlike RNNs and LSTMs, transformers do not require sequential processing, making them more parallelizable and efficient for large-scale data.",
    "The core of the transformer architecture is the self-attention mechanism, which allows the model to weigh the importance of each word in a sequence relative to others. Transformers consist of encoder and decoder blocks, each composed of multi-head attention layers, feedforward layers, layer normalization, and residual connections.",
    "Transformers are the foundation for many state-of-the-art models in natural language processing, including BERT, GPT, and T5. They are used for tasks like machine translation, text generation, summarization, question answering, and code generation. Recent research has also extended transformers to other domains such as image classification (Vision Transformers) and protein folding (AlphaFold). Despite their power, transformers require significant memory and compute resources.",

    # Autoencoders
    "Autoencoders are unsupervised neural networks used to learn efficient representations (encodings) of data. They consist of two main parts: an encoder that compresses the input into a lower-dimensional latent space, and a decoder that reconstructs the original input from this representation. The goal is to minimize the reconstruction error between the input and its output.",
    "The architecture of an autoencoder includes an input layer, one or more hidden layers forming the encoder, a latent space (bottleneck), and a mirrored set of hidden layers forming the decoder. Activation functions like ReLU, Sigmoid, or Tanh are commonly used, and Mean Squared Error is a typical loss function.",
    "Autoencoders are used in dimensionality reduction, image denoising, anomaly detection, and feature extraction. Variants include Denoising Autoencoders (for noisy input), Sparse Autoencoders (for sparse representation), and Variational Autoencoders (for generative modeling). However, autoencoders can overfit and may simply learn to copy input unless regularized appropriately.",

    # Large Language Models (LLMs)
    "Large Language Models (LLMs) are deep learning models trained on massive text corpora to understand, process, and generate human-like language. Built on transformer architectures, these models learn rich representations of linguistic structure and semantics, allowing them to perform a wide range of NLP tasks with minimal fine-tuning.",
    "LLMs consist of millions to billions of parameters and are usually pretrained on general-purpose datasets using objectives like masked language modeling (as in BERT) or autoregressive prediction (as in GPT). After pretraining, they can be fine-tuned or prompted for specific downstream tasks.",
    "Applications of LLMs include text generation, summarization, translation, sentiment analysis, question answering, chatbot development, and even code generation. Popular LLMs include OpenAI's GPT series, Google's PaLM, Meta's LLaMA, and BERT. These models have sparked a revolution in generative AI, but raise concerns around biases, hallucinations, and computational cost.",

    # Retrieval-Augmented Generation (RAG)
    "Retrieval-Augmented Generation (RAG) is a hybrid approach in which a large language model is augmented with an information retrieval module. Instead of relying solely on internal knowledge, the model retrieves relevant documents from an external source (e.g., a search index or database) and uses that information to generate more accurate and up-to-date responses.",
    "A RAG system typically consists of two components: a retriever (like BM25 or dense vector-based models) that fetches relevant documents based on the user query, and a generator (like GPT or BART) that conditions on the retrieved context to generate a response. This framework allows open-domain question answering and long-context generation without needing to retrain the language model.",
    "RAG is used in applications such as knowledge-intensive Q&A systems, personalized assistants, legal or scientific document summarization, and enterprise search. It improves factual consistency and reduces hallucinations by grounding generation in external, authoritative data. However, RAG performance depends heavily on retrieval quality and indexing strategy.",

    # Generative Adversarial Networks (GANs)
    "Generative Adversarial Networks (GANs) are a class of deep learning models that consist of two neural networks—the generator and the discriminator—trained in opposition. The generator creates synthetic data instances, while the discriminator evaluates whether they are real (from the training data) or fake (from the generator). The two models play a minimax game, where the generator tries to fool the discriminator, and the discriminator tries to correctly identify real versus fake inputs.",
    "The architecture of a GAN includes: (1) a generator network, typically a deconvolutional neural network, which maps a random noise vector to a synthetic sample (e.g., an image); and (2) a discriminator, typically a CNN, which acts as a binary classifier. The networks are trained iteratively—first updating the discriminator and then the generator.",
    "GANs are used in realistic image synthesis, super-resolution, image-to-image translation (e.g., turning sketches into photos), text-to-image generation, data augmentation, and video generation. Variants of GANs include Conditional GANs (cGANs), CycleGANs, and StyleGANs. Despite their success, GANs are difficult to train due to mode collapse, instability, and sensitivity to hyperparameters.",

    # Diffusion Models (DDPMs)
    "Denoising Diffusion Probabilistic Models (DDPMs), or simply diffusion models, are a type of generative model that generate data by reversing a diffusion process. The forward process gradually adds noise to training data over a series of steps, and the model learns to reverse this noising process to generate new samples from pure noise.",
    "A typical diffusion model consists of a U-Net-based neural network trained to predict either the noise added or the original data at each timestep, given a noisy input and a time index. During inference, the model begins with a random noise sample and iteratively denoises it using a learned reverse process to obtain realistic outputs.",
    "Diffusion models have achieved state-of-the-art results in image generation tasks, rivaling or surpassing GANs in quality and diversity. Applications include high-resolution image synthesis (e.g., DALL·E 2, Stable Diffusion), inpainting, image editing, and even text-to-video generation. They are considered more stable to train than GANs but require longer sampling times during inference.",

    # Explainable AI (XAI)
    "Explainable AI (XAI) refers to techniques and tools that make the behavior and decisions of AI systems interpretable and transparent to humans. XAI is essential for building trust, ensuring fairness, detecting biases, and meeting regulatory requirements—especially in high-stakes domains like healthcare, finance, and criminal justice.",
    "XAI methods can be categorized as intrinsic (models that are inherently interpretable, such as decision trees or linear models) or post-hoc (techniques applied to complex models after training). Post-hoc techniques include SHAP (SHapley Additive exPlanations), LIME (Local Interpretable Model-agnostic Explanations), saliency maps, and counterfactual explanations.",
    "XAI is used in model debugging, regulatory compliance, fairness audits, and communicating model behavior to non-technical stakeholders. It is especially critical in domains involving accountability and ethical concerns. However, XAI techniques may introduce approximation errors and trade-offs between interpretability and model complexity.",

    # Reinforcement Learning / Deep Reinforcement Learning
"Reinforcement Learning (RL) is a learning paradigm where an agent learns to take actions in an environment to maximize cumulative rewards over time. Unlike supervised learning, RL does not require labeled input/output pairs—instead, it relies on feedback in the form of rewards or penalties. Deep Reinforcement Learning (DRL) combines RL with deep neural networks to handle high-dimensional state and action spaces, enabling learning directly from raw inputs like images.",
"The architecture of RL involves key components: the agent (decision-maker), the environment (where it operates), states (observations), actions, rewards, and a policy (strategy). Deep RL architectures include Deep Q-Networks (DQN), which approximate Q-values using CNNs; Policy Gradient methods that directly optimize the policy; and Actor-Critic models which combine value-based and policy-based approaches. Training involves techniques like Q-learning, REINFORCE, and Advantage Actor-Critic (A2C).",
"RL/DRL is used in a wide range of applications such as game playing (e.g., AlphaGo, OpenAI Five), robotics (manipulation and navigation), recommendation systems, industrial control, autonomous vehicles, and finance. It enables autonomous systems to improve through experience, but faces challenges like sample inefficiency, exploration-exploitation trade-off, and sensitivity to hyperparameters and reward shaping."
]

print(f"📖 Created {len(sample_documents)} sample documents")
print("\nFirst document preview:")
print(sample_documents[0])

📖 Created 39 sample documents

First document preview:
Neural networks are machine learning models inspired by the structure and function of the human brain. They consist of layers of artificial neurons organized into an input layer, one or more hidden layers, and an output layer. Each neuron processes input data and passes the information forward through weighted connections and non-linear activation functions like ReLU or Sigmoid. Neural networks are trained using supervised learning techniques such as backpropagation and gradient descent to minimize a loss function. These models are powerful for recognizing patterns and approximating complex non-linear functions.


In [None]:
# Create our RAG system
rag = RAG()

# Add documents to the knowledge base
rag.add_documents(sample_documents)

🚀 Initializing RAG System...
📊 Embedding model loaded
🗃️ Vector database initialized
✅ RAG System ready!
📚 Adding 39 documents...


Batches:   0%|          | 0/2 [00:00<?, ?it/s]

✅ Total documents in knowledge base: 39


In [None]:
# Test the system with a sample question
test_question = "What is explainable AI?"
print(f"🤔 Question: {test_question}")

answer, retrieved_docs = rag.answer_question(test_question)

print(f"\n🤖 Answer:\n{answer}")

print(f"\n🔍 Retrieved {len(retrieved_docs)} relevant documents:")
for i, doc in enumerate(retrieved_docs):
    print(f"{i+1}. Similarity: {doc['similarity']:.3f}")
    print(f"   Text: {doc['text'][:100]}...\n")

🤔 Question: What is explainable AI?

🤖 Answer:
Based on the available information:

Explainable AI (XAI) refers to techniques and tools that make the behavior and decisions of AI systems interpretable and transparent to humans. XAI is essential for building trust, ensuring fairness, detecting biases, and meeting regulatory requirements—especially in high-stakes domains like healthcare, finance, and criminal justice.

Additional relevant information:
XAI methods can be categorized as intrinsic (models that are inherently interpretable, such as decision trees or linear models) or post-hoc (techniques applied to complex models after training). Post-...

🔍 Retrieved 3 relevant documents:
1. Similarity: 0.667
   Text: Explainable AI (XAI) refers to techniques and tools that make the behavior and decisions of AI syste...

2. Similarity: 0.479
   Text: XAI methods can be categorized as intrinsic (models that are inherently interpretable, such as decis...

3. Similarity: 0.452
   Text: XAI is 

In [None]:
def chat_interface(question):
    """Gradio interface function"""
    if not question.strip():
        return "Please enter a question!", "No documents retrieved."

    answer, docs = rag.answer_question(question)

    # Format retrieved documents for display
    docs_display = "\n\n".join([
        f"📄 Document {i+1} (Similarity: {doc['similarity']:.3f}):\n{doc['text']}"
        for i, doc in enumerate(docs)
    ])

    return answer, docs_display

# Create Gradio interface
demo = gr.Interface(
    fn=chat_interface,
    inputs=gr.Textbox(
        label="Ask a Question",
        placeholder="What would you like to know?",
        lines=2
    ),
    outputs=[
        gr.Textbox(label="Answer", lines=6),
        gr.Textbox(label="Retrieved Documents", lines=8)
    ],
    title="🤖 Easy RAG System Demo",
    description="Ask questions about AI, machine learning, and related topics!",
    examples=[
        "What is artificial intelligence?",
        "How does deep learning work?",
        "What are the benefits of Python for AI?",
        "Tell me about Google Colab",
        "What is a vector database?"
    ]
)

# Launch the interface
demo.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://89a86551113d6f5e6d.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




In [None]:
# Function to add custom documents
def add_custom_documents(text_input):
    """Add custom text to the knowledge base"""
    if not text_input.strip():
        return "Please provide some text to add."

    # Split by double newlines to separate paragraphs
    paragraphs = [p.strip() for p in text_input.split('\n\n') if p.strip()]

    if paragraphs:
        rag.add_documents(paragraphs)
        return f"✅ Added {len(paragraphs)} new document(s) to the knowledge base!"
    else:
        return "No valid paragraphs found in the input."

# Create interface for adding documents
add_docs_demo = gr.Interface(
    fn=add_custom_documents,
    inputs=gr.Textbox(
        label="Add New Documents",
        placeholder="Paste your text here. Separate different documents with double newlines.",
        lines=5
    ),
    outputs=gr.Textbox(label="Status"),
    title="📚 Add Documents to Knowledge Base",
    description="Expand the knowledge base by adding your own documents!"
)

add_docs_demo.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://217d7e0b83d90cf35a.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




In [None]:
# Test with multiple questions
test_questions = [
    "What are the applications of CNN?",
    "How does RAG work?",
    "What is the difference between RNNs, LSTMs and transformers",
    "Why is GAN used for generarting?"
]

print("🧪 Testing with multiple questions:\n")

for i, question in enumerate(test_questions, 1):
    print(f"Question {i}: {question}")
    answer, docs = rag.answer_question(question)
    print(f"Answer: {answer[:200]}...")
    print(f"Retrieved {len(docs)} documents\n")
    print("-" * 50)

🧪 Testing with multiple questions:

Question 1: What are the applications of CNN?
Answer: Based on the available information:

Convolutional Neural Networks (CNNs) are a class of deep learning models particularly effective at processing spatial and visual data such as images and videos. Th...
Retrieved 3 documents

--------------------------------------------------
Question 2: How does RAG work?
Answer: Based on the available information:

RAG is used in applications such as knowledge-intensive Q&A systems, personalized assistants, legal or scientific document summarization, and enterprise search. It...
Retrieved 3 documents

--------------------------------------------------
Question 3: What is the difference between RNNs, LSTMs and transformers
Answer: Based on the available information:

Transformers are a deep learning architecture that processes sequences using self-attention mechanisms, enabling the model to consider all positions in a sequence ...
Retrieved 3 documents

--------

In [None]:
# Display system information
print("📊 RAG System Statistics:")
print(f"📚 Total documents: {len(rag.documents)}")
print(f"🧮 Embedding dimension: {rag.embedding_dim}")
print(f"🗃️ FAISS index size: {rag.index.ntotal}")
print(f"🔢 Average document length: {np.mean([len(doc) for doc in rag.documents]):.0f} characters")

# Show sample of document lengths
doc_lengths = [len(doc) for doc in rag.documents]
print(f"📏 Document length range: {min(doc_lengths)} - {max(doc_lengths)} characters")

📊 RAG System Statistics:
📚 Total documents: 39
🧮 Embedding dimension: 384
🗃️ FAISS index size: 39
🔢 Average document length: 377 characters
📏 Document length range: 287 - 619 characters
