# 🤖 RAG Chatbot: ML/AI Knowledge Assistant

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/your-username/your-repo/blob/main/rag_notebook.ipynb)

## 📋 Project Overview

This notebook implements a sophisticated **Retrieval-Augmented Generation (RAG) chatbot** that provides comprehensive information about machine learning, deep learning, AI, and related topics. The chatbot combines the power of modern AI technologies to deliver accurate, contextual responses.

### 🎯 What This Notebook Does

1. **Loads ML/AI Knowledge**: Accesses The Pile dataset from Hugging Face
2. **Processes Text Data**: Filters and chunks relevant ML/AI content
3. **Creates Vector Database**: Stores embeddings in Chroma for fast retrieval
4. **Implements RAG Pipeline**: Retrieves relevant context and generates answers
5. **Tests the System**: Validates functionality with sample questions

### 🛠️ Technologies Used

- **🤖 Generation Model**: Google Gemini 2.5 Flash
- **🔗 RAG Framework**: LangChain
- **🗄️ Vector Database**: Chroma
- **📚 Dataset**: The Pile (EleutherAI/the_pile) from Hugging Face
- **🧠 Embeddings**: Sentence Transformers

### 🚀 How to Run This Notebook

1. **Open in Colab**: Click the badge above or upload to Google Colab
2. **Set API Key**: Add your Gemini API key to Colab secrets
3. **Run All Cells**: Execute cells sequentially (Ctrl+F9)
4. **Test Chatbot**: Try the sample questions at the end

### 📊 Expected Outputs

- **Vector Database**: Chroma collection with ML/AI knowledge
- **RAG Pipeline**: Fully functional question-answering system
- **Test Results**: Sample Q&A demonstrating chatbot capabilities
- **Configuration**: Settings file for deployment


## 📦 Step 1: Installation and Setup

### 🔧 Required Packages

This cell installs all necessary dependencies for the RAG chatbot:

- **Streamlit**: Web interface framework
- **LangChain**: RAG pipeline orchestration
- **Chroma**: Vector database for embeddings
- **Sentence Transformers**: Text embedding models
- **Google Generative AI**: Gemini API integration
- **Hugging Face Datasets**: Dataset access

### ⚠️ Important Notes

- Run this cell first before any other cells
- Installation may take 2-3 minutes
- Restart runtime if you encounter import errors
- All packages are pinned to specific versions for compatibility


In [None]:
# Install required packages
!pip install streamlit==1.28.1
!pip install langchain==0.1.0
!pip install langchain-community==0.0.10
!pip install langchain-google-genai==0.0.6
!pip install chromadb==0.4.18
!pip install datasets==2.14.6
!pip install transformers==4.35.2
!pip install sentence-transformers==2.2.2
!pip install google-generativeai==0.3.2
!pip install tiktoken==0.5.1
!pip install numpy==1.24.3
!pip install pandas==2.0.3
!pip install tqdm==4.66.1


## 🔑 Step 2: API Key Configuration

### 🔐 Google Gemini API Setup

To use this chatbot, you need a Google Gemini API key:

1. **Get API Key**: Visit [Google AI Studio](https://makersuite.google.com/app/apikey)
2. **Create Key**: Generate a new API key
3. **Add to Colab**: Use the secrets manager (🔑 icon in sidebar)
4. **Set Secret Name**: `GEMINI_API_KEY`

### 🛡️ Security Best Practices

- Never hardcode API keys in notebooks
- Use Colab secrets for secure storage
- Keep your API key private and don't share it
- Monitor your API usage to avoid unexpected charges


In [None]:
# Set up Google Gemini API key
import os
from google.colab import userdata

# Get API key from Colab secrets
try:
    GEMINI_API_KEY = userdata.get('GEMINI_API_KEY')
    os.environ['GOOGLE_API_KEY'] = GEMINI_API_KEY
    print("✅ Gemini API key loaded successfully!")
except:
    print("❌ Please add your Gemini API key to Colab secrets:")
    print("1. Go to the key icon (🔑) in the left sidebar")
    print("2. Add a new secret with key 'GEMINI_API_KEY' and your API key as value")
    print("3. Restart the runtime and run this cell again")
    
    # Alternative: Set directly (not recommended for production)
    # GEMINI_API_KEY = "your_api_key_here"
    # os.environ['GOOGLE_API_KEY'] = GEMINI_API_KEY


## 📚 Step 3: Dataset Loading and Processing

### 🗃️ The Pile Dataset Overview

**The Pile** is a large-scale, diverse text dataset created by EleutherAI for training language models. For this project, we:

- **Access via API**: Use Hugging Face Datasets library (no local downloads)
- **Filter for ML/AI**: Extract content relevant to machine learning and AI
- **Process Text**: Clean, chunk, and prepare for embedding
- **Create Knowledge Base**: Build a searchable vector database

### 🔍 Content Filtering Strategy

We filter text samples using ML/AI keywords:
- Machine learning, deep learning, neural networks
- Artificial intelligence, algorithms, models
- Training, data, features, classification
- Regression, clustering, optimization, gradient, tensor

### 📊 Processing Pipeline

1. **Load Dataset**: Stream data from Hugging Face
2. **Filter Content**: Keep only ML/AI relevant text
3. **Clean Text**: Remove extra whitespace and format
4. **Chunk Text**: Split into manageable pieces (500 words)
5. **Validate Length**: Keep chunks between 100-2000 characters


In [None]:
# Import required libraries
import pandas as pd
import numpy as np
from datasets import load_dataset
from tqdm import tqdm
import re
import os

print("📚 Loading The Pile dataset...")

# Load a subset of The Pile dataset
# We'll use a smaller subset for demonstration to avoid memory issues
try:
    # Load a specific subset that contains ML/AI content
    dataset = load_dataset("EleutherAI/the_pile", split="train", streaming=True)
    
    # Take first 1000 samples for demonstration
    texts = []
    ml_keywords = ['machine learning', 'deep learning', 'neural network', 'artificial intelligence', 
                   'algorithm', 'model', 'training', 'data', 'feature', 'classification', 
                   'regression', 'clustering', 'optimization', 'gradient', 'tensor']
    
    print("🔍 Filtering ML/AI related content...")
    count = 0
    for sample in tqdm(dataset, desc="Processing samples"):
        if count >= 1000:  # Limit to 1000 samples for Colab
            break
            
        text = sample['text']
        # Check if text contains ML/AI keywords
        if any(keyword in text.lower() for keyword in ml_keywords):
            # Clean and preprocess text
            text = re.sub(r'\s+', ' ', text)  # Remove extra whitespace
            text = text.strip()
            
            # Only keep texts that are reasonable length (not too short or too long)
            if 100 <= len(text) <= 2000:
                texts.append(text)
                count += 1
    
    print(f"✅ Loaded {len(texts)} ML/AI related text samples")
    
except Exception as e:
    print(f"❌ Error loading dataset: {e}")
    print("🔄 Using fallback sample data...")
    
    # Fallback sample data if The Pile is not accessible
    texts = [
        "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data. Deep learning uses neural networks with multiple layers to process complex patterns in data.",
        "Neural networks are computing systems inspired by biological neural networks. They consist of interconnected nodes that process information using a connectionist approach.",
        "Supervised learning uses labeled training data to learn a mapping from inputs to outputs. Common algorithms include linear regression, decision trees, and support vector machines.",
        "Unsupervised learning finds hidden patterns in data without labeled examples. Clustering algorithms like K-means group similar data points together.",
        "Natural language processing combines computational linguistics with machine learning to help computers understand human language. It includes tasks like text classification and sentiment analysis.",
        "Computer vision enables machines to interpret and understand visual information from the world. It uses deep learning models like convolutional neural networks.",
        "Reinforcement learning is a type of machine learning where agents learn to make decisions by interacting with an environment and receiving rewards or penalties.",
        "Feature engineering is the process of selecting and transforming raw data into features that can be used by machine learning algorithms. Good features can significantly improve model performance.",
        "Cross-validation is a technique used to assess how well a machine learning model generalizes to new data. It involves splitting data into training and validation sets multiple times.",
        "Overfitting occurs when a model learns the training data too well and performs poorly on new data. Regularization techniques help prevent overfitting."
    ]
    print(f"✅ Using {len(texts)} sample texts")


## 🧠 Step 4: Vector Database and Embeddings Setup

### 🔧 Embedding Model Selection

We use **Sentence Transformers** with the `all-MiniLM-L6-v2` model:

- **Lightweight**: Fast and efficient for Colab environments
- **High Quality**: Good semantic understanding for ML/AI content
- **Multilingual**: Handles various text formats
- **Optimized**: Designed for similarity search tasks

### 🗄️ Chroma Vector Database

**Chroma** is our vector database choice because:

- **Easy Setup**: Simple Python API
- **Persistent Storage**: Saves embeddings between sessions
- **Efficient Search**: Fast similarity search capabilities
- **Scalable**: Can handle large collections of documents

### 📊 Database Architecture

- **Collection Name**: `ml_ai_knowledge`
- **Storage**: Local directory `./chroma_db`
- **Metadata**: Document source, chunk index, text length
- **Indexing**: Automatic vector indexing for fast retrieval


In [None]:
# Initialize embeddings and vector database
from sentence_transformers import SentenceTransformer
import chromadb
from chromadb.config import Settings

print("🧠 Initializing embeddings model...")

# Use a lightweight sentence transformer model
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
print("✅ Embedding model loaded!")

print("🗄️ Setting up Chroma vector database...")

# Create Chroma client with persistent storage
chroma_client = chromadb.Client(Settings(
    persist_directory="./chroma_db",
    anonymized_telemetry=False
))

# Create or get collection
collection_name = "ml_ai_knowledge"
try:
    collection = chroma_client.get_collection(collection_name)
    print(f"✅ Found existing collection: {collection_name}")
except:
    collection = chroma_client.create_collection(
        name=collection_name,
        metadata={"description": "ML/AI knowledge base from The Pile dataset"}
    )
    print(f"✅ Created new collection: {collection_name}")

print("🎯 Vector database ready!")


## 📝 Step 5: Text Processing and Embedding Storage

### 🔄 Text Chunking Strategy

We implement intelligent text chunking to optimize retrieval:

- **Chunk Size**: 500 words per chunk
- **Overlap**: 50 words between chunks (prevents information loss)
- **Minimum Length**: 50 characters (filters out empty chunks)
- **Metadata**: Track source document and chunk position

### 💾 Batch Processing

To handle large datasets efficiently:

- **Batch Size**: 100 documents per batch
- **Memory Management**: Process in chunks to avoid OOM errors
- **Progress Tracking**: Visual progress bars for long operations
- **Error Handling**: Graceful handling of processing errors

### 🏷️ Document Metadata

Each document chunk includes:

- **Source ID**: Original document identifier
- **Chunk Index**: Position within the document
- **Total Chunks**: Number of chunks in the document
- **Text Length**: Character count for quality control


In [None]:
# Process and embed text data
import uuid
from tqdm import tqdm

def chunk_text(text, chunk_size=500, overlap=50):
    """Split text into overlapping chunks"""
    words = text.split()
    chunks = []
    
    for i in range(0, len(words), chunk_size - overlap):
        chunk = ' '.join(words[i:i + chunk_size])
        if len(chunk.strip()) > 50:  # Only keep substantial chunks
            chunks.append(chunk)
    
    return chunks

print("📝 Processing and chunking text data...")

# Check if collection already has data
existing_count = collection.count()
print(f"📊 Current documents in collection: {existing_count}")

if existing_count == 0:
    print("🔄 Adding new documents to collection...")
    
    all_chunks = []
    chunk_ids = []
    chunk_metadatas = []
    
    for i, text in enumerate(tqdm(texts, desc="Processing texts")):
        chunks = chunk_text(text)
        
        for j, chunk in enumerate(chunks):
            chunk_id = f"doc_{i}_chunk_{j}"
            metadata = {
                "source": f"the_pile_doc_{i}",
                "chunk_index": j,
                "total_chunks": len(chunks),
                "text_length": len(chunk)
            }
            
            all_chunks.append(chunk)
            chunk_ids.append(chunk_id)
            chunk_metadatas.append(metadata)
    
    print(f"📊 Created {len(all_chunks)} text chunks")
    
    # Add documents to Chroma in batches to avoid memory issues
    batch_size = 100
    for i in tqdm(range(0, len(all_chunks), batch_size), desc="Adding to Chroma"):
        batch_chunks = all_chunks[i:i + batch_size]
        batch_ids = chunk_ids[i:i + batch_size]
        batch_metadatas = chunk_metadatas[i:i + batch_size]
        
        collection.add(
            documents=batch_chunks,
            ids=batch_ids,
            metadatas=batch_metadatas
        )
    
    print("✅ All documents added to Chroma!")
else:
    print("✅ Collection already contains data, skipping addition")

# Verify the collection
final_count = collection.count()
print(f"📊 Final document count: {final_count}")


## 🤖 Step 6: Google Gemini Model Integration

### 🧠 Model Configuration

We use **Google Gemini 2.5 Flash** for text generation:

- **Model**: `gemini-2.0-flash-exp` (latest available)
- **Temperature**: 0.7 (balanced creativity and accuracy)
- **Max Tokens**: 1024 (sufficient for detailed responses)
- **System Integration**: LangChain wrapper for easy use

### 🔧 LangChain Integration

**LangChain** provides:

- **Unified Interface**: Consistent API across different LLMs
- **Message Handling**: System and human message management
- **Error Handling**: Robust error management and retries
- **Streaming**: Optional streaming responses

### 🧪 Model Testing

We test the model to ensure:

- **API Connectivity**: Verify API key and connection
- **Response Quality**: Check output format and content
- **Error Handling**: Test error scenarios
- **Performance**: Measure response times


In [None]:
# Initialize Gemini model
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.schema import HumanMessage, SystemMessage

print("🤖 Initializing Gemini 2.5 Flash model...")

# Initialize the Gemini model
llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash-exp",  # Using the latest available model
    temperature=0.7,
    max_output_tokens=1024,
    convert_system_message_to_human=True
)

print("✅ Gemini model initialized!")

# Test the model
try:
    test_response = llm.invoke("Hello! Can you tell me about machine learning?")
    print("🧪 Test response:", test_response.content[:100] + "...")
    print("✅ Gemini model is working!")
except Exception as e:
    print(f"❌ Error testing Gemini model: {e}")
    print("Please check your API key and try again.")


## 🔍 Step 7: RAG Pipeline Implementation

### 🔄 Complete RAG Workflow

The RAG pipeline combines retrieval and generation:

1. **Query Processing**: User question is received
2. **Document Retrieval**: Similar documents are found using vector search
3. **Context Assembly**: Retrieved documents are combined into context
4. **Answer Generation**: Gemini generates response using context
5. **Response Delivery**: Formatted answer is returned to user

### 🎯 Retrieval Strategy

- **Similarity Search**: Cosine similarity between query and documents
- **Top-K Results**: Retrieve top 5 most relevant documents
- **Context Length**: Combine retrieved documents for comprehensive context
- **Metadata Tracking**: Track similarity scores and document sources

### 🤖 Generation Strategy

- **System Prompt**: Specialized instructions for ML/AI responses
- **Context Integration**: Retrieved documents used as context
- **Response Formatting**: Markdown support for rich text
- **Error Handling**: Graceful handling of generation errors


In [None]:
# Create RAG pipeline
def retrieve_relevant_docs(query, n_results=5):
    """Retrieve relevant documents from Chroma"""
    try:
        results = collection.query(
            query_texts=[query],
            n_results=n_results
        )
        
        # Extract documents and metadata
        documents = results['documents'][0]
        metadatas = results['metadatas'][0]
        distances = results['distances'][0]
        
        return documents, metadatas, distances
    except Exception as e:
        print(f"Error retrieving documents: {e}")
        return [], [], []

def create_context(documents):
    """Create context string from retrieved documents"""
    context = "\n\n".join(documents)
    return context

def generate_answer(query, context):
    """Generate answer using Gemini with retrieved context"""
    system_prompt = """You are an AI assistant specialized in machine learning, deep learning, and artificial intelligence. 
    Use the provided context to answer questions accurately and comprehensively. If the context doesn't contain enough 
    information, you can supplement with your general knowledge, but always prioritize the provided context.
    
    Provide clear, well-structured answers with examples when appropriate."""
    
    user_prompt = f"""Context:
    {context}
    
    Question: {query}
    
    Please provide a comprehensive answer based on the context above."""
    
    try:
        messages = [
            SystemMessage(content=system_prompt),
            HumanMessage(content=user_prompt)
        ]
        
        response = llm.invoke(messages)
        return response.content
    except Exception as e:
        return f"Error generating answer: {e}"

def rag_pipeline(query, n_results=5):
    """Complete RAG pipeline"""
    print(f"🔍 Processing query: '{query}'")
    
    # Retrieve relevant documents
    documents, metadatas, distances = retrieve_relevant_docs(query, n_results)
    
    if not documents:
        return "Sorry, I couldn't find relevant information for your query."
    
    print(f"📚 Retrieved {len(documents)} relevant documents")
    
    # Create context
    context = create_context(documents)
    
    # Generate answer
    answer = generate_answer(query, context)
    
    return answer, documents, metadatas, distances

print("✅ RAG pipeline created!")


## 🧪 Step 8: System Testing and Validation

### 🎯 Test Questions

We test the RAG system with diverse ML/AI questions covering:

- **Basic Concepts**: Fundamental ML/AI definitions
- **Algorithms**: Specific algorithm explanations
- **Applications**: Real-world use cases
- **Technical Details**: Deep technical concepts
- **Advanced Topics**: Cutting-edge AI research

### 📊 Performance Metrics

During testing, we evaluate:

- **Response Quality**: Accuracy and relevance of answers
- **Retrieval Performance**: Quality of retrieved documents
- **Response Time**: Speed of query processing
- **Context Relevance**: How well retrieved context matches queries

### 🔍 Debugging Information

Each test shows:

- **Retrieved Documents**: Number and content of retrieved chunks
- **Similarity Scores**: Distance metrics for relevance assessment
- **Response Content**: Generated answer quality
- **Error Handling**: Any issues encountered


In [None]:
# Test the RAG system
test_questions = [
    "What is machine learning?",
    "How do neural networks work?",
    "What is the difference between supervised and unsupervised learning?",
    "Explain deep learning",
    "What is overfitting in machine learning?"
]

print("🧪 Testing RAG system with sample questions...\n")

for i, question in enumerate(test_questions, 1):
    print(f"❓ Question {i}: {question}")
    print("-" * 50)
    
    try:
        answer, documents, metadatas, distances = rag_pipeline(question)
        print(f"🤖 Answer: {answer}")
        print(f"📊 Retrieved {len(documents)} documents")
        print(f"🎯 Similarity scores: {[f'{d:.3f}' for d in distances]}")
        print("\n" + "="*80 + "\n")
    except Exception as e:
        print(f"❌ Error: {e}")
        print("\n" + "="*80 + "\n")


## 💾 Step 9: Configuration and Deployment Preparation

### 🔧 Configuration Management

We save system configuration for deployment:

- **Model Settings**: Embedding model, LLM parameters
- **Database Config**: Collection name, storage settings
- **Pipeline Settings**: Retrieval parameters, generation settings
- **Version Info**: Component versions for reproducibility

### 📁 Output Files

The notebook generates:

- **Vector Database**: `./chroma_db/` directory with embeddings
- **Configuration**: `rag_config.json` with system settings
- **Test Results**: Validation of system functionality
- **Documentation**: Setup and usage instructions

### 🚀 Deployment Readiness

The system is now ready for:

- **Streamlit Deployment**: Use `app.py` for web interface
- **Hugging Face Spaces**: Deploy to cloud platform
- **Local Development**: Run in local environment
- **Production Use**: Scale for multiple users


In [None]:
# Save components for Streamlit app
import pickle
import json

print("💾 Saving components for Streamlit app...")

# Save the RAG pipeline functions and configuration
rag_config = {
    'collection_name': collection_name,
    'embedding_model_name': 'all-MiniLM-L6-v2',
    'gemini_model': 'gemini-2.0-flash-exp',
    'temperature': 0.7,
    'max_output_tokens': 1024,
    'n_results': 5
}

# Save configuration
with open('rag_config.json', 'w') as f:
    json.dump(rag_config, f, indent=2)

print("✅ Configuration saved to rag_config.json")

# Create a simple test to verify everything works
print("\n🎯 Final verification test...")
test_query = "What is artificial intelligence?"
try:
    answer, docs, metas, dists = rag_pipeline(test_query)
    print(f"✅ Test successful! Answer length: {len(answer)} characters")
    print(f"📊 Retrieved {len(docs)} documents")
except Exception as e:
    print(f"❌ Test failed: {e}")

print("\n🎉 RAG system is ready!")
print("📁 Files created:")
print("  - chroma_db/ (vector database)")
print("  - rag_config.json (configuration)")
print("\n🚀 You can now use this system in the Streamlit app!")


# 🤖 RAG Chatbot: ML/AI Knowledge Assistant

This notebook implements a Retrieval-Augmented Generation (RAG) chatbot that provides information about machine learning, deep learning, AI, and related topics using:

- **Generation Model**: Google Gemini 2.5 Flash
- **RAG Framework**: LangChain
- **Vector Database**: Chroma
- **Dataset**: The Pile (EleutherAI/the_pile) from Hugging Face

## 🎯 Project Overview

The chatbot works by:
1. Loading text data from The Pile dataset
2. Preprocessing and embedding the text
3. Storing embeddings in Chroma vector database
4. Retrieving relevant context for user queries
5. Generating answers using Gemini 2.5 Flash with retrieved context


## 📦 Installation and Setup

First, let's install all required packages:


In [None]:
# Install required packages
!pip install streamlit==1.28.1
!pip install langchain==0.1.0
!pip install langchain-community==0.0.10
!pip install langchain-google-genai==0.0.6
!pip install chromadb==0.4.18
!pip install datasets==2.14.6
!pip install transformers==4.35.2
!pip install sentence-transformers==2.2.2
!pip install google-generativeai==0.3.2
!pip install tiktoken==0.5.1
!pip install numpy==1.24.3
!pip install pandas==2.0.3
!pip install tqdm==4.66.1


Collecting streamlit==1.28.1
  Downloading streamlit-1.28.1-py2.py3-none-any.whl.metadata (8.1 kB)
Collecting importlib-metadata<7,>=1.4 (from streamlit==1.28.1)
  Downloading importlib_metadata-6.11.0-py3-none-any.whl.metadata (4.9 kB)
Collecting numpy<2,>=1.19.3 (from streamlit==1.28.1)
  Downloading numpy-1.26.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.0/61.0 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting packaging<24,>=16.8 (from streamlit==1.28.1)
  Downloading packaging-23.2-py3-none-any.whl.metadata (3.2 kB)
Collecting pillow<11,>=7.1.0 (from streamlit==1.28.1)
  Downloading pillow-10.4.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (9.2 kB)
Collecting protobuf<5,>=3.20 (from streamlit==1.28.1)
  Downloading protobuf-4.25.8-cp37-abi3-manylinux2014_x86_64.whl.metadata (541 bytes)
Collecting validators<1,>=0.2 (from streamlit==1.28.1)
  Downloading validators-0.3

Collecting langchain==0.1.0
  Downloading langchain-0.1.0-py3-none-any.whl.metadata (13 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain==0.1.0)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting langchain-community<0.1,>=0.0.9 (from langchain==0.1.0)
  Downloading langchain_community-0.0.38-py3-none-any.whl.metadata (8.7 kB)
Collecting langchain-core<0.2,>=0.1.7 (from langchain==0.1.0)
  Downloading langchain_core-0.1.53-py3-none-any.whl.metadata (5.9 kB)
Collecting langsmith<0.1.0,>=0.0.77 (from langchain==0.1.0)
  Downloading langsmith-0.0.92-py3-none-any.whl.metadata (9.9 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain==0.1.0)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain==0.1.0)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
INFO: pip is looking at multiple versions of langch

In [2]:
!pip install chromadb

Collecting chromadb
  Downloading chromadb-1.2.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.2 kB)
Collecting pybase64>=1.4.1 (from chromadb)
  Downloading pybase64-1.4.2-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl.metadata (8.7 kB)
Collecting posthog<6.0.0,>=2.4.0 (from chromadb)
  Downloading posthog-5.4.0-py3-none-any.whl.metadata (5.7 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.23.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (5.1 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)
  Downloading opentelemetry_exporter_otlp_proto_grpc-1.38.0-py3-none-any.whl.metadata (2.4 kB)
Collecting pypika>=0.48.9 (from chromadb)
  Downloading PyPika-0.48.9.tar.gz (67 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?

In [1]:
import streamlit, langchain, chromadb
print("Kurulum başarılı!")


Kurulum başarılı!


## 🔑 API Key Setup

Set up your Google Gemini API key. You can get one from [Google AI Studio](https://makersuite.google.com/app/apikey).


In [2]:
# Set up Google Gemini API key
import os
from google.colab import userdata

# Get API key from Colab secrets
try:
    GEMINI_API_KEY = userdata.get('GEMINI_API_KEY')
    os.environ['GOOGLE_API_KEY'] = GEMINI_API_KEY
    print("✅ Gemini API key loaded successfully!")
except:
    print("❌ Please add your Gemini API key to Colab secrets:")
    print("1. Go to the key icon (🔑) in the left sidebar")
    print("2. Add a new secret with key 'GEMINI_API_KEY' and your API key as value")
    print("3. Restart the runtime and run this cell again")

    # Alternative: Set directly (not recommended for production)
    # GEMINI_API_KEY = "your_api_key_here"
    # os.environ['GOOGLE_API_KEY'] = GEMINI_API_KEY


✅ Gemini API key loaded successfully!


## 📚 Step 1: Load Dataset from The Pile

We'll load text data from The Pile dataset using Hugging Face's datasets library. We'll focus on ML/AI related content.


In [3]:
# Import required libraries
import pandas as pd
import numpy as np
from datasets import load_dataset
from tqdm import tqdm
import re
import os

print("📚 Loading The Pile dataset...")

# Load a subset of The Pile dataset
# We'll use a smaller subset for demonstration to avoid memory issues
try:
    # Load a specific subset that contains ML/AI content
    dataset = load_dataset("EleutherAI/the_pile", split="train", streaming=True)

    # Take first 1000 samples for demonstration
    texts = []
    ml_keywords = ['machine learning', 'deep learning', 'neural network', 'artificial intelligence',
                   'algorithm', 'model', 'training', 'data', 'feature', 'classification',
                   'regression', 'clustering', 'optimization', 'gradient', 'tensor']

    print("🔍 Filtering ML/AI related content...")
    count = 0
    for sample in tqdm(dataset, desc="Processing samples"):
        if count >= 1000:  # Limit to 1000 samples for Colab
            break

        text = sample['text']
        # Check if text contains ML/AI keywords
        if any(keyword in text.lower() for keyword in ml_keywords):
            # Clean and preprocess text
            text = re.sub(r'\s+', ' ', text)  # Remove extra whitespace
            text = text.strip()

            # Only keep texts that are reasonable length (not too short or too long)
            if 100 <= len(text) <= 2000:
                texts.append(text)
                count += 1

    print(f"✅ Loaded {len(texts)} ML/AI related text samples")

except Exception as e:
    print(f"❌ Error loading dataset: {e}")
    print("🔄 Using fallback sample data...")

    # Fallback sample data if The Pile is not accessible
    texts = [
        "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data. Deep learning uses neural networks with multiple layers to process complex patterns in data.",
        "Neural networks are computing systems inspired by biological neural networks. They consist of interconnected nodes that process information using a connectionist approach.",
        "Supervised learning uses labeled training data to learn a mapping from inputs to outputs. Common algorithms include linear regression, decision trees, and support vector machines.",
        "Unsupervised learning finds hidden patterns in data without labeled examples. Clustering algorithms like K-means group similar data points together.",
        "Natural language processing combines computational linguistics with machine learning to help computers understand human language. It includes tasks like text classification and sentiment analysis.",
        "Computer vision enables machines to interpret and understand visual information from the world. It uses deep learning models like convolutional neural networks.",
        "Reinforcement learning is a type of machine learning where agents learn to make decisions by interacting with an environment and receiving rewards or penalties.",
        "Feature engineering is the process of selecting and transforming raw data into features that can be used by machine learning algorithms. Good features can significantly improve model performance.",
        "Cross-validation is a technique used to assess how well a machine learning model generalizes to new data. It involves splitting data into training and validation sets multiple times.",
        "Overfitting occurs when a model learns the training data too well and performs poorly on new data. Regularization techniques help prevent overfitting."
    ]
    print(f"✅ Using {len(texts)} sample texts")


📚 Loading The Pile dataset...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


❌ Error loading dataset: No (supported) data files found in EleutherAI/the_pile
🔄 Using fallback sample data...
✅ Using 10 sample texts


## 🧠 Step 2: Initialize Embeddings and Vector Database

We'll use sentence transformers for embeddings and Chroma for vector storage.


In [4]:
# Initialize embeddings and vector database
from sentence_transformers import SentenceTransformer
import chromadb
from chromadb.config import Settings

print("🧠 Initializing embeddings model...")

# Use a lightweight sentence transformer model
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
print("✅ Embedding model loaded!")

print("🗄️ Setting up Chroma vector database...")

# Create Chroma client with persistent storage
chroma_client = chromadb.Client(Settings(
    persist_directory="./chroma_db",
    anonymized_telemetry=False
))

# Create or get collection
collection_name = "ml_ai_knowledge"
try:
    collection = chroma_client.get_collection(collection_name)
    print(f"✅ Found existing collection: {collection_name}")
except:
    collection = chroma_client.create_collection(
        name=collection_name,
        metadata={"description": "ML/AI knowledge base from The Pile dataset"}
    )
    print(f"✅ Created new collection: {collection_name}")

print("🎯 Vector database ready!")


🧠 Initializing embeddings model...
✅ Embedding model loaded!
🗄️ Setting up Chroma vector database...
✅ Created new collection: ml_ai_knowledge
🎯 Vector database ready!


## 📝 Step 3: Process and Embed Text Data

We'll chunk the text data and create embeddings for storage in Chroma.


In [5]:
# Process and embed text data
import uuid
from tqdm import tqdm

def chunk_text(text, chunk_size=500, overlap=50):
    """Split text into overlapping chunks"""
    words = text.split()
    chunks = []

    for i in range(0, len(words), chunk_size - overlap):
        chunk = ' '.join(words[i:i + chunk_size])
        if len(chunk.strip()) > 50:  # Only keep substantial chunks
            chunks.append(chunk)

    return chunks

print("📝 Processing and chunking text data...")

# Check if collection already has data
existing_count = collection.count()
print(f"📊 Current documents in collection: {existing_count}")

if existing_count == 0:
    print("🔄 Adding new documents to collection...")

    all_chunks = []
    chunk_ids = []
    chunk_metadatas = []

    for i, text in enumerate(tqdm(texts, desc="Processing texts")):
        chunks = chunk_text(text)

        for j, chunk in enumerate(chunks):
            chunk_id = f"doc_{i}_chunk_{j}"
            metadata = {
                "source": f"the_pile_doc_{i}",
                "chunk_index": j,
                "total_chunks": len(chunks),
                "text_length": len(chunk)
            }

            all_chunks.append(chunk)
            chunk_ids.append(chunk_id)
            chunk_metadatas.append(metadata)

    print(f"📊 Created {len(all_chunks)} text chunks")

    # Add documents to Chroma in batches to avoid memory issues
    batch_size = 100
    for i in tqdm(range(0, len(all_chunks), batch_size), desc="Adding to Chroma"):
        batch_chunks = all_chunks[i:i + batch_size]
        batch_ids = chunk_ids[i:i + batch_size]
        batch_metadatas = chunk_metadatas[i:i + batch_size]

        collection.add(
            documents=batch_chunks,
            ids=batch_ids,
            metadatas=batch_metadatas
        )

    print("✅ All documents added to Chroma!")
else:
    print("✅ Collection already contains data, skipping addition")

# Verify the collection
final_count = collection.count()
print(f"📊 Final document count: {final_count}")


📝 Processing and chunking text data...
📊 Current documents in collection: 0
🔄 Adding new documents to collection...


Processing texts: 100%|██████████| 10/10 [00:00<00:00, 82891.38it/s]


📊 Created 10 text chunks


Adding to Chroma: 100%|██████████| 1/1 [00:01<00:00,  1.07s/it]

✅ All documents added to Chroma!
📊 Final document count: 10





## 🤖 Step 4: Initialize Gemini Model

Set up the Google Gemini 2.5 Flash model for text generation.


In [7]:
!pip install langchain_google_genai

Collecting langchain_google_genai
  Downloading langchain_google_genai-3.0.0-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-core<2.0.0,>=1.0.0 (from langchain_google_genai)
  Downloading langchain_core-1.0.1-py3-none-any.whl.metadata (3.5 kB)
Collecting google-ai-generativelanguage<1.0.0,>=0.7.0 (from langchain_google_genai)
  Downloading google_ai_generativelanguage-0.9.0-py3-none-any.whl.metadata (10 kB)
Collecting filetype<2.0.0,>=1.2.0 (from langchain_google_genai)
  Downloading filetype-1.2.0-py2.py3-none-any.whl.metadata (6.5 kB)
Collecting langsmith<1.0.0,>=0.3.45 (from langchain-core<2.0.0,>=1.0.0->langchain_google_genai)
  Downloading langsmith-0.4.38-py3-none-any.whl.metadata (14 kB)
Collecting protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<7.0.0,>=3.20.2 (from google-ai-generativelanguage<1.0.0,>=0.7.0->langchain_google_genai)
  Downloading protobuf-5.29.5-cp38-abi3-manylinux2014_x86_64.whl.metadata (592 bytes)
Downloading langchain_google_genai-3.0.

In [12]:
!pip install langchain==0.1.0
!pip install langchain_google_genai==2.0.0

Collecting langchain-core<0.2,>=0.1.7 (from langchain==0.1.0)
  Using cached langchain_core-0.1.53-py3-none-any.whl.metadata (5.9 kB)
Collecting langsmith<0.1.0,>=0.0.77 (from langchain==0.1.0)
  Using cached langsmith-0.0.92-py3-none-any.whl.metadata (9.9 kB)
INFO: pip is looking at multiple versions of langchain-core to determine which version is compatible with other requirements. This could take a while.
Collecting langchain-core<0.2,>=0.1.7 (from langchain==0.1.0)
  Using cached langchain_core-0.1.52-py3-none-any.whl.metadata (5.9 kB)
  Using cached langchain_core-0.1.51-py3-none-any.whl.metadata (5.9 kB)
  Using cached langchain_core-0.1.50-py3-none-any.whl.metadata (5.9 kB)
  Using cached langchain_core-0.1.49-py3-none-any.whl.metadata (5.9 kB)
  Using cached langchain_core-0.1.48-py3-none-any.whl.metadata (5.9 kB)
  Using cached langchain_core-0.1.47-py3-none-any.whl.metadata (5.9 kB)
  Using cached langchain_core-0.1.46-py3-none-any.whl.metadata (5.9 kB)
INFO: pip is still loo

Collecting langchain_google_genai==2.0.0
  Downloading langchain_google_genai-2.0.0-py3-none-any.whl.metadata (3.9 kB)
Collecting google-generativeai<0.8.0,>=0.7.0 (from langchain_google_genai==2.0.0)
  Downloading google_generativeai-0.7.2-py3-none-any.whl.metadata (4.0 kB)
Collecting langchain-core<0.4,>=0.3.0 (from langchain_google_genai==2.0.0)
  Downloading langchain_core-0.3.79-py3-none-any.whl.metadata (3.2 kB)
Collecting google-ai-generativelanguage==0.6.6 (from google-generativeai<0.8.0,>=0.7.0->langchain_google_genai==2.0.0)
  Downloading google_ai_generativelanguage-0.6.6-py3-none-any.whl.metadata (5.6 kB)
[31mERROR: Operation cancelled by user[0m[31m
[0mTraceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/pip/_internal/cli/base_command.py", line 179, in exc_logging_wrapper
    status = run_func(*args)
             ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pip/_internal/cli/req_command.py", line 67, in wrapper
    retur

In [1]:
!pip uninstall -y langchain langchain-core langsmith langchain_google_genai

Found existing installation: langchain 0.1.0
Uninstalling langchain-0.1.0:
  Successfully uninstalled langchain-0.1.0
Found existing installation: langchain-core 0.1.23
Uninstalling langchain-core-0.1.23:
  Successfully uninstalled langchain-core-0.1.23
Found existing installation: langsmith 0.0.87
Uninstalling langsmith-0.0.87:
  Successfully uninstalled langsmith-0.0.87
[0m

In [2]:
!pip install langchain==0.1.0
!pip install langchain_google_genai==1.0.0


Collecting langchain==0.1.0
  Using cached langchain-0.1.0-py3-none-any.whl.metadata (13 kB)
Collecting langchain-core<0.2,>=0.1.7 (from langchain==0.1.0)
  Using cached langchain_core-0.1.53-py3-none-any.whl.metadata (5.9 kB)
Collecting langsmith<0.1.0,>=0.0.77 (from langchain==0.1.0)
  Using cached langsmith-0.0.92-py3-none-any.whl.metadata (9.9 kB)
INFO: pip is looking at multiple versions of langchain-core to determine which version is compatible with other requirements. This could take a while.
Collecting langchain-core<0.2,>=0.1.7 (from langchain==0.1.0)
  Using cached langchain_core-0.1.52-py3-none-any.whl.metadata (5.9 kB)
  Using cached langchain_core-0.1.51-py3-none-any.whl.metadata (5.9 kB)
  Using cached langchain_core-0.1.50-py3-none-any.whl.metadata (5.9 kB)
  Using cached langchain_core-0.1.49-py3-none-any.whl.metadata (5.9 kB)
  Using cached langchain_core-0.1.48-py3-none-any.whl.metadata (5.9 kB)
  Using cached langchain_core-0.1.47-py3-none-any.whl.metadata (5.9 kB)
 

In [3]:
!pip uninstall -y langchain_google_genai
!pip install langchain_google_genai==1.0.1

[0mCollecting langchain_google_genai==1.0.1
  Downloading langchain_google_genai-1.0.1-py3-none-any.whl.metadata (3.8 kB)
Collecting google-generativeai<0.5.0,>=0.4.1 (from langchain_google_genai==1.0.1)
  Downloading google_generativeai-0.4.1-py3-none-any.whl.metadata (6.2 kB)
Collecting google-ai-generativelanguage==0.4.0 (from google-generativeai<0.5.0,>=0.4.1->langchain_google_genai==1.0.1)
  Downloading google_ai_generativelanguage-0.4.0-py3-none-any.whl.metadata (5.1 kB)
Collecting protobuf (from google-generativeai<0.5.0,>=0.4.1->langchain_google_genai==1.0.1)
  Using cached protobuf-4.25.8-cp37-abi3-manylinux2014_x86_64.whl.metadata (541 bytes)
INFO: pip is looking at multiple versions of grpcio-status to determine which version is compatible with other requirements. This could take a while.
Collecting grpcio-status<2.0.0,>=1.33.2 (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.0->google-ai-ge

In [6]:
# Initialize Gemini model
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.schema import HumanMessage, SystemMessage

print("🤖 Initializing Gemini 2.5 Flash model...")

# Initialize the Gemini model
llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash-exp",  # Using the latest available model
    temperature=0.7,
    max_output_tokens=1024,
    convert_system_message_to_human=True
)

print("✅ Gemini model initialized!")

# Test the model
try:
    test_response = llm.invoke("Hello! Can you tell me about machine learning?")
    print("🧪 Test response:", test_response.content[:100] + "...")
    print("✅ Gemini model is working!")
except Exception as e:
    print(f"❌ Error testing Gemini model: {e}")
    print("Please check your API key and try again.")


🤖 Initializing Gemini 2.5 Flash model...
✅ Gemini model initialized!
🧪 Test response: Okay, let's dive into the fascinating world of machine learning!

**What is Machine Learning (ML)?**...
✅ Gemini model is working!


## 🔍 Step 5: Create RAG Pipeline

Now we'll create the complete RAG pipeline that retrieves relevant context and generates answers.


In [7]:
# Create RAG pipeline
def retrieve_relevant_docs(query, n_results=5):
    """Retrieve relevant documents from Chroma"""
    try:
        results = collection.query(
            query_texts=[query],
            n_results=n_results
        )

        # Extract documents and metadata
        documents = results['documents'][0]
        metadatas = results['metadatas'][0]
        distances = results['distances'][0]

        return documents, metadatas, distances
    except Exception as e:
        print(f"Error retrieving documents: {e}")
        return [], [], []

def create_context(documents):
    """Create context string from retrieved documents"""
    context = "\n\n".join(documents)
    return context

def generate_answer(query, context):
    """Generate answer using Gemini with retrieved context"""
    system_prompt = """You are an AI assistant specialized in machine learning, deep learning, and artificial intelligence.
    Use the provided context to answer questions accurately and comprehensively. If the context doesn't contain enough
    information, you can supplement with your general knowledge, but always prioritize the provided context.

    Provide clear, well-structured answers with examples when appropriate."""

    user_prompt = f"""Context:
    {context}

    Question: {query}

    Please provide a comprehensive answer based on the context above."""

    try:
        messages = [
            SystemMessage(content=system_prompt),
            HumanMessage(content=user_prompt)
        ]

        response = llm.invoke(messages)
        return response.content
    except Exception as e:
        return f"Error generating answer: {e}"

def rag_pipeline(query, n_results=5):
    """Complete RAG pipeline"""
    print(f"🔍 Processing query: '{query}'")

    # Retrieve relevant documents
    documents, metadatas, distances = retrieve_relevant_docs(query, n_results)

    if not documents:
        return "Sorry, I couldn't find relevant information for your query."

    print(f"📚 Retrieved {len(documents)} relevant documents")

    # Create context
    context = create_context(documents)

    # Generate answer
    answer = generate_answer(query, context)

    return answer, documents, metadatas, distances

print("✅ RAG pipeline created!")


✅ RAG pipeline created!


## 🧪 Step 6: Test the RAG System

Let's test our RAG chatbot with some sample questions about ML/AI topics.


In [8]:
# Test the RAG system
test_questions = [
    "What is machine learning?",
    "How do neural networks work?",
    "What is the difference between supervised and unsupervised learning?",
    "Explain deep learning",
    "What is overfitting in machine learning?"
]

print("🧪 Testing RAG system with sample questions...\n")

for i, question in enumerate(test_questions, 1):
    print(f"❓ Question {i}: {question}")
    print("-" * 50)

    try:
        answer, documents, metadatas, distances = rag_pipeline(question)
        print(f"🤖 Answer: {answer}")
        print(f"📊 Retrieved {len(documents)} documents")
        print(f"🎯 Similarity scores: {[f'{d:.3f}' for d in distances]}")
        print("\n" + "="*80 + "\n")
    except Exception as e:
        print(f"❌ Error: {e}")
        print("\n" + "="*80 + "\n")


🧪 Testing RAG system with sample questions...

❓ Question 1: What is machine learning?
--------------------------------------------------
🔍 Processing query: 'What is machine learning?'
📚 Retrieved 5 relevant documents
🤖 Answer: Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data. It enables systems to improve their performance on a specific task over time without being explicitly programmed.

📊 Retrieved 5 documents
🎯 Similarity scores: ['0.528', '0.741', '0.901', '0.926', '0.984']


❓ Question 2: How do neural networks work?
--------------------------------------------------
🔍 Processing query: 'How do neural networks work?'
📚 Retrieved 5 relevant documents
🤖 Answer: Neural networks are computing systems inspired by biological neural networks, employing a connectionist approach to process information. They consist of interconnected nodes that work together to learn patterns from data. The context describes them as the basis for 

## 💾 Step 7: Save Components for Streamlit App

Save the necessary components so they can be used in the Streamlit app.


In [9]:
# Save components for Streamlit app
import pickle
import json

print("💾 Saving components for Streamlit app...")

# Save the RAG pipeline functions and configuration
rag_config = {
    'collection_name': collection_name,
    'embedding_model_name': 'all-MiniLM-L6-v2',
    'gemini_model': 'gemini-2.0-flash-exp',
    'temperature': 0.7,
    'max_output_tokens': 1024,
    'n_results': 5
}

# Save configuration
with open('rag_config.json', 'w') as f:
    json.dump(rag_config, f, indent=2)

print("✅ Configuration saved to rag_config.json")

# Create a simple test to verify everything works
print("\n🎯 Final verification test...")
test_query = "What is artificial intelligence?"
try:
    answer, docs, metas, dists = rag_pipeline(test_query)
    print(f"✅ Test successful! Answer length: {len(answer)} characters")
    print(f"📊 Retrieved {len(docs)} documents")
except Exception as e:
    print(f"❌ Test failed: {e}")

print("\n🎉 RAG system is ready!")
print("📁 Files created:")
print("  - chroma_db/ (vector database)")
print("  - rag_config.json (configuration)")
print("\n🚀 You can now use this system in the Streamlit app!")


💾 Saving components for Streamlit app...
✅ Configuration saved to rag_config.json

🎯 Final verification test...
🔍 Processing query: 'What is artificial intelligence?'
📚 Retrieved 5 relevant documents
✅ Test successful! Answer length: 255 characters
📊 Retrieved 5 documents

🎉 RAG system is ready!
📁 Files created:
  - chroma_db/ (vector database)
  - rag_config.json (configuration)

🚀 You can now use this system in the Streamlit app!


In [25]:
!pip install streamlit streamlit-chat

Collecting streamlit-chat
  Downloading streamlit_chat-0.1.1-py3-none-any.whl.metadata (4.2 kB)
Downloading streamlit_chat-0.1.1-py3-none-any.whl (1.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m16.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: streamlit-chat
Successfully installed streamlit-chat-0.1.1


In [51]:
import gradio as gr

# Chat history
chat_history = []

# Chat logic
def chat_with_rag(user_input):
    global chat_history
    if not user_input.strip():
        return chat_history

    # Call your RAG pipeline
    answer, _, _, _ = rag_pipeline(user_input)

    chat_history.append({"role": "user", "message": user_input})
    chat_history.append({"role": "assistant", "message": answer})
    return chat_history

def clear_chat():
    global chat_history
    chat_history = []
    return chat_history

# HTML message template with modern styling
def format_message(message):
    role = message["role"]
    text = message["message"]

    if role == "user":
        bubble_color = "#4CAF50"  # Green
        text_color = "#FFFFFF"
        justify = "flex-end"
        avatar = "https://cdn-icons-png.flaticon.com/512/194/194938.png"
        name = "You"
    else:
        bubble_color = "#1E1E1E"  # Dark grey
        text_color = "#F5F5F5"
        justify = "flex-start"
        avatar = "https://cdn-icons-png.flaticon.com/512/1995/1995574.png"
        name = "AI Assistant"

    return f"""
    <div style="display:flex; justify-content:{justify}; margin:8px 0; align-items:flex-start;">
        <img src="{avatar}" style="width:38px;height:38px;border-radius:50%; margin-right:10px;" />
        <div style="max-width:70%; background-color:{bubble_color};
                    color:{text_color}; padding:14px 18px;
                    border-radius:20px; font-size:15px; line-height:1.5;
                    box-shadow: 0 4px 12px rgba(0,0,0,0.2); font-family: 'Inter', sans-serif;
                    transition: transform 0.2s;">
            <div style="font-weight:600; margin-bottom:4px; opacity:0.85;">{name}</div>
            <div>{text}</div>
        </div>
    </div>
    """

custom_theme = gr.themes.Soft(
    primary_hue="green"
)

with gr.Blocks(theme=custom_theme, css="""
    #chatbox {
        height: 450px;
        overflow-y: auto;
        padding: 12px;
        background-color: #F7F7F7;
        border-radius: 15px;
        border: 1px solid #DDD;
    }
    #user_input textarea {
        font-size: 16px;
    }
    .gr-button {
        font-weight: 600;
    }
""") as demo:



    gr.Markdown(
    """
    <div style="display:flex; align-items:center; gap:12px;">
        <img src="https://cdn-icons-png.flaticon.com/512/1995/1995574.png"
             style="width:40px; height:40px; border-radius:50%;" />
        <div>
            <h1 style="margin:0;">AI Chat Assistant</h1>
            <span style="color:#FFFFFF; text-shadow: 1px 1px 2px #000;">
                Your AI assistant for Machine Learning, Deep Learning, and AI — Explore insights, learn concepts, and get expert guidance.
            </span>
        </div>
    </div>
    """,
    elem_id="title"
   )



    chat_box = gr.HTML(elem_id="chatbox")
    user_input = gr.Textbox(
        placeholder="Type your question...",
        label="Your message",
        lines=2
    )
    send_btn = gr.Button("Send", variant="primary")
    clear_btn = gr.Button("Clear Chat", variant="secondary")

    def update_display(user_message):
        chat_with_rag(user_message)
        html = "".join([format_message(m) for m in chat_history])
        return html, ""

    send_btn.click(update_display, inputs=user_input, outputs=[chat_box, user_input])
    clear_btn.click(lambda: ("", clear_chat()), None, outputs=chat_box)

demo.launch()


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://97ee63e42da153d3f5.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


