#  RAG Module - Google Colab

This notebook runs **ONLY the RAG Module** for legal question answering.

## 📋 What this does:
- Connects to your existing Pinecone vector database
- Provides web interface for legal questions
- Generates multiple response options
- Supports conversation history
- Handles simple greetings and complex legal queries

## 🔑 Requirements:
- Vector Database already set up (use Vector_DB_Colab.ipynb first)
- Same Pinecone credentials as Vector DB
- (Optional) OpenAI API key for enhanced responses


In [None]:
# 🔗 Step 1: Mount Google Drive and Setup
from google.colab import drive
import os
import sys
import zipfile

print("📁 Mounting Google Drive...")
drive.mount('/content/drive')
print("✅ Google Drive mounted successfully!")

In [None]:
# 📦 Step 2: Install RAG Module Specific Packages
print("📦 Installing RAG Module packages...")

!pip install -q streamlit
!pip install -q pinecone-client
!pip install -q sentence-transformers
!pip install -q transformers
!pip install -q torch
!pip install -q numpy pandas scikit-learn
!pip install -q nltk spacy
!pip install -q fastapi uvicorn
!pip install -q python-multipart pydantic
!pip install -q python-dotenv
!pip install -q plotly wordcloud matplotlib seaborn
!pip install -q pyngrok
!pip install -q openai  # Optional for enhanced responses

print("✅ RAG Module packages installed!")

In [None]:
# 📂 Step 3: Extract RAG Module
# 🔧 CHANGE THIS PATH to your uploaded zip file location
ZIP_PATH = '/content/drive/MyDrive/Vector_DB_module and RGA_Module.zip'

print(f"📂 Extracting RAG module from: {ZIP_PATH}")

try:
    # Extract the project
    with zipfile.ZipFile(ZIP_PATH, 'r') as zip_ref:
        zip_ref.extractall('/content/')
    
    # Change to RAG Module directory
    rag_dir = '/content/Vector_DB_module and RGA_Module/RAG_Module'
    os.chdir(rag_dir)
    
    # Add to Python path
    sys.path.append('/content/Vector_DB_module and RGA_Module')
    sys.path.append(rag_dir)
    
    print("✅ RAG module extracted successfully!")
    print("📁 RAG module contents:")
    !ls -la
    
except FileNotFoundError:
    print("❌ Zip file not found! Please check the ZIP_PATH variable.")
except Exception as e:
    print(f"❌ Error extracting project: {e}")

In [None]:
# 🔑 Step 4: Setup RAG Module Environment
print("🔑 Setting up RAG Module environment...")

# 🔧 CHANGE THESE VALUES - Use the SAME credentials as Vector DB Module
PINECONE_API_KEY = 'your-pinecone-api-key-here'
PINECONE_ENVIRONMENT = 'your-pinecone-environment-here'
INDEX_NAME = 'sri-lankan-legal-docs'  # Same as Vector DB
OPENAI_API_KEY = 'your-openai-api-key-here'  # Optional

# Set environment variables
os.environ['PINECONE_API_KEY'] = PINECONE_API_KEY
os.environ['PINECONE_ENVIRONMENT'] = PINECONE_ENVIRONMENT
os.environ['INDEX_NAME'] = INDEX_NAME
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

# Create .env file
env_content = f"""PINECONE_API_KEY={PINECONE_API_KEY}
PINECONE_ENVIRONMENT={PINECONE_ENVIRONMENT}
INDEX_NAME={INDEX_NAME}
OPENAI_API_KEY={OPENAI_API_KEY}
"""

with open('.env', 'w') as f:
    f.write(env_content)

print("✅ RAG Module environment configured!")
print(f"📊 Connecting to Index: {INDEX_NAME}")
print("⚠️  Make sure these are the SAME credentials used in Vector DB Module!")

In [None]:
# 🔍 Step 5: Test Vector Database Connection
print("🔍 Testing connection to Vector Database...")

try:
    import pinecone
    
    # Initialize Pinecone
    pinecone.init(
        api_key=os.environ['PINECONE_API_KEY'],
        environment=os.environ['PINECONE_ENVIRONMENT']
    )
    
    # Check if index exists
    index_name = os.environ['INDEX_NAME']
    
    if index_name in pinecone.list_indexes():
        index = pinecone.Index(index_name)
        stats = index.describe_index_stats()
        
        print("✅ Vector Database Connection Successful!")
        print(f"📊 Index: {index_name}")
        print(f"📈 Total Vectors: {stats['total_vector_count']}")
        print(f"📏 Dimension: {stats['dimension']}")
        
        if stats['total_vector_count'] > 0:
            print("🎉 Vector database has legal documents ready!")
        else:
            print("⚠️  Vector database is empty. Run Vector DB Module first!")
            
    else:
        print(f"❌ Index '{index_name}' not found!")
        print("Available indexes:", pinecone.list_indexes())
        print("💡 Please run the Vector DB Module first to create the index.")
        
except Exception as e:
    print(f"❌ Connection failed: {e}")
    print("💡 Check your Pinecone credentials and make sure Vector DB Module ran successfully.")

In [None]:
# 🧪 Step 6: Test RAG System Components
print("🧪 Testing RAG system components...")

try:
    # Test imports
    from vector_db_connector import VectorDBConnector
    from response_generator import ResponseGenerator
    from dialog_manager import DialogManager
    from rag_system import RAGSystem
    
    print("✅ All RAG components imported successfully!")
    
    # Initialize RAG system
    print("🔧 Initializing RAG system...")
    rag_system = RAGSystem()
    
    print("✅ RAG system initialized successfully!")
    
    # Test with a simple query
    print("🧪 Testing with sample query...")
    test_result = rag_system.process_query("What is property law?")
    
    if test_result['success']:
        print("✅ RAG system test successful!")
        print(f"📊 Retrieved {test_result['retrieved_docs_count']} documents")
        print(f"⏱️  Processing time: {test_result['processing_time']}s")
    else:
        print("⚠️  RAG system test had issues but components are working")
        
except Exception as e:
    print(f"❌ Component test failed: {e}")
    print("💡 Some components may need adjustment for Colab environment")
    import traceback
    traceback.print_exc()

In [None]:
# 🌐 Step 7: Launch RAG Web Interface
from pyngrok import ngrok
import subprocess
import threading
import time

print("🌐 Launching Sri Lankan Legal AI Assistant Web Interface...")

# Kill any existing processes
!pkill -f streamlit
!pkill -f "python.*app.py"

# Function to run the RAG web app
def run_rag_app():
    try:
        # Make sure we're in the RAG Module directory
        os.chdir('/content/Vector_DB_module and RGA_Module/RAG_Module')
        subprocess.run(['python', 'app.py', 'web'], check=True)
    except Exception as e:
        print(f"❌ Error starting RAG app: {e}")

# Start the app in a separate thread
app_thread = threading.Thread(target=run_rag_app)
app_thread.daemon = True
app_thread.start()

print("⏳ Waiting for RAG application to start...")
time.sleep(25)  # Wait for the app to fully start

try:
    # Create ngrok tunnel for public access
    public_url = ngrok.connect(8501)
    
    print("\n" + "="*70)
    print("🎉 Sri Lankan Legal AI Assistant is READY!")
    print("="*70)
    print(f"🌐 Access your Legal AI Assistant at:")
    print(f"   {public_url}")
    print("="*70)
    print("\n🚀 Features Available:")
    print("   ✅ Legal Question Answering")
    print("   ✅ Multiple Response Options (Professional, Detailed, Concise)")
    print("   ✅ Example Questions (Working!)")
    print("   ✅ Advanced Search Filters")
    print("   ✅ Beautiful Conversation History")
    print("   ✅ Simple Greeting Responses")
    print("   ✅ Document Source Citations")
    print("\n💡 Try These Examples:")
    print("   • What are the property ownership rights in Sri Lanka?")
    print("   • How to file for divorce in Sri Lankan courts?")
    print("   • Employment termination procedures")
    print("   • Commercial contract dispute resolution")
    print("\n⚠️  Keep this Colab tab open to maintain the connection!")
    print("📱 Share the URL with others to let them use your Legal AI!")
    
except Exception as e:
    print(f"❌ Error creating public URL: {e}")
    print("💡 The app might still be starting. Try running this cell again.")
    print("🔧 Alternative: Use CLI mode in the next cell.")

In [None]:
# 🔧 Step 8: Alternative - CLI Mode
print("🔧 Alternative: RAG Module CLI Mode")
print("If web interface doesn't work, use CLI mode:")
print()

# Uncomment the lines below to run in CLI mode:

# print("🤖 Starting CLI mode...")
# os.chdir('/content/Vector_DB_module and RGA_Module/RAG_Module')
# !python app.py cli

print("💡 To use CLI mode:")
print("   1. Uncomment the lines above")
print("   2. Run this cell")
print("   3. Ask legal questions directly in the terminal")
print("   4. Type 'quit' to exit")

print("\n🔄 Or try restarting the web interface:")
print("   1. Go back to Step 7")
print("   2. Run that cell again")
print("   3. Wait a bit longer for startup")

In [None]:
# 📊 Step 9: Monitor RAG System Performance
print("📊 RAG System Performance Monitoring")

# Check system resources
print("\n💻 System Resources:")
!nvidia-smi --query-gpu=utilization.gpu,memory.used,memory.total --format=csv,noheader,nounits

# Check memory usage
import psutil
memory = psutil.virtual_memory()
print(f"💾 RAM Usage: {memory.percent}% ({memory.used//1024//1024}MB / {memory.total//1024//1024}MB)")

# Check if processes are running
print("\n🔍 Active Processes:")
!ps aux | grep -E "(streamlit|python.*app.py)" | grep -v grep

# Show recent logs (if available)
print("\n📝 Recent Activity:")
if os.path.exists('app.log'):
    !tail -10 app.log
else:
    print("No log file found yet")

print("\n✅ Monitoring complete!")
print("💡 If you see issues, try restarting the web interface.")

In [None]:
# 💾 Step 10: Save RAG Session Results
from datetime import datetime
import shutil

print("💾 Saving RAG Module session results...")

# Create results directory in Google Drive
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
results_dir = f'/content/drive/MyDrive/RAG_Module_Results_{timestamp}'
os.makedirs(results_dir, exist_ok=True)

# Save configuration and logs
files_to_save = [
    '.env',
    'rag_config.py',
    'config.py',
    'app.log',
    'rag_system.log',
    'conversation_history.json'
]

for file_name in files_to_save:
    if os.path.exists(file_name):
        shutil.copy2(file_name, results_dir)
        print(f"✅ Saved {file_name}")

# Save session summary
summary = f"""RAG Module Session Summary
============================
Timestamp: {datetime.now()}
Index Name: {os.environ.get('INDEX_NAME', 'N/A')}
Pinecone Environment: {os.environ.get('PINECONE_ENVIRONMENT', 'N/A')}
OpenAI Enabled: {'Yes' if os.environ.get('OPENAI_API_KEY') else 'No (Fallback mode)'}

RAG Module running in Google Colab
Web interface accessible via ngrok tunnel

Features Active:
- Legal Question Answering ✅
- Multiple Response Options ✅
- Example Questions ✅
- Advanced Search Filters ✅
- Conversation History ✅
- Simple Greeting Responses ✅
"""

with open(f'{results_dir}/session_summary.txt', 'w') as f:
    f.write(summary)

print(f"\n💾 Results saved to: {results_dir}")
print("✅ RAG Module session backup completed!")
print("\n🎉 Your Sri Lankan Legal AI Assistant is fully operational!")
print("🌐 Users can now ask legal questions and get AI-powered responses!")

## 🎉 RAG Module Successfully Deployed!

### ✅ What's Now Available:
- ** Web Interface**: Accessible via ngrok public URL
- ** AI Legal Assistant**: Answers questions about Sri Lankan law
- ** Multiple Responses**: Professional, Detailed, and Concise options
- ** Example Questions**: One-click legal question templates
- ** Advanced Search**: Filter by document type, legal domain, etc.
- ** Conversation History**: Beautiful, clickable chat history
- ** Smart Greetings**: Handles simple interactions without vector search

### 🚀 How Users Can Interact:
1. **Visit the public URL** provided above
2. **Ask legal questions** in natural language
3. **Get multiple response options** to choose from
4. **Use example questions** for quick start
5. **Apply filters** for specific searches
6. **Review conversation history** and click on previous responses

### 📱 Sharing Your Legal AI:
- **Share the ngrok URL** with colleagues, students, or legal professionals
- **Keep this Colab session active** to maintain the service
- **Monitor performance** using the monitoring cell above

### 🔧 Maintenance:
- **Session Duration**: Colab sessions timeout after inactivity
- **Restart if Needed**: Re-run Step 7 if the service stops
- **Backup Results**: Your session data is saved to Google Drive

**🎯 Your Sri Lankan Legal AI Assistant is now serving users worldwide!** 🌍
