# 🏛️ Sri Lankan Legal AI Assistant - Google Colab Setup

This notebook will help you run the Vector DB Module and RAG Module in Google Colab.

## 📋 Before You Start:
1. Upload your project zip file to Google Drive
2. Get your Pinecone API key and environment
3. (Optional) Get your OpenAI API key


In [None]:
# 🔗 Step 1: Mount Google Drive
from google.colab import drive
import os
import sys

print("📁 Mounting Google Drive...")
drive.mount('/content/drive')
print("✅ Google Drive mounted successfully!")

In [None]:
# 📦 Step 2: Install Required Packages
print("📦 Installing required packages...")

!pip install -q streamlit
!pip install -q pinecone-client
!pip install -q sentence-transformers
!pip install -q transformers
!pip install -q torch
!pip install -q numpy pandas scikit-learn
!pip install -q nltk spacy
!pip install -q fastapi uvicorn
!pip install -q python-multipart pydantic
!pip install -q python-dotenv
!pip install -q plotly wordcloud matplotlib seaborn
!pip install -q pyngrok

print("✅ All packages installed successfully!")

In [None]:
# 📂 Step 3: Extract Project Files
import zipfile
import shutil

# 🔧 CHANGE THIS PATH to your uploaded zip file location
ZIP_PATH = '/content/drive/MyDrive/Vector_DB_module and RGA_Module.zip'

print(f"📂 Extracting project from: {ZIP_PATH}")

try:
    # Extract the project
    with zipfile.ZipFile(ZIP_PATH, 'r') as zip_ref:
        zip_ref.extractall('/content/')
    
    # Change to project directory
    project_dir = '/content/Vector_DB_module and RGA_Module'
    os.chdir(project_dir)
    
    # Add to Python path
    sys.path.append(project_dir)
    sys.path.append(os.path.join(project_dir, 'Vector_DB_module'))
    sys.path.append(os.path.join(project_dir, 'RAG_Module'))
    
    print("✅ Project extracted successfully!")
    print("📁 Project contents:")
    !ls -la
    
except FileNotFoundError:
    print("❌ Zip file not found! Please check the ZIP_PATH variable.")
    print("💡 Make sure you've uploaded your project zip to Google Drive.")
except Exception as e:
    print(f"❌ Error extracting project: {e}")

In [None]:
# 🔑 Step 4: Setup Environment Variables
import os

print("🔑 Setting up environment variables...")

# 🔧 CHANGE THESE VALUES to your actual API keys
PINECONE_API_KEY = 'your-pinecone-api-key-here'
PINECONE_ENVIRONMENT = 'your-pinecone-environment-here'
OPENAI_API_KEY = 'your-openai-api-key-here'  # Optional

# Set environment variables
os.environ['PINECONE_API_KEY'] = PINECONE_API_KEY
os.environ['PINECONE_ENVIRONMENT'] = PINECONE_ENVIRONMENT
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

# Create .env file
env_content = f"""PINECONE_API_KEY={PINECONE_API_KEY}
PINECONE_ENVIRONMENT={PINECONE_ENVIRONMENT}
OPENAI_API_KEY={OPENAI_API_KEY}
"""

with open('.env', 'w') as f:
    f.write(env_content)

# Also create in RAG_Module directory
with open('RAG_Module/.env', 'w') as f:
    f.write(env_content)

print("✅ Environment variables configured!")
print("⚠️  Make sure to replace the placeholder API keys with your actual keys!")

In [None]:
# 🗄️ Step 5: Initialize Vector Database (Optional)
print("🗄️ Checking Vector Database setup...")

try:
    os.chdir('/content/Vector_DB_module and RGA_Module/Vector_DB_module')
    
    # Check if we have legal documents to process
    if os.path.exists('legal_documents'):
        print("📚 Legal documents found. You can run:")
        print("   !python main.py")
        print("   to process and store documents in the vector database.")
    else:
        print("📚 No legal_documents folder found.")
        print("💡 Add your legal documents to process them into the vector database.")
    
    # Go back to project root
    os.chdir('/content/Vector_DB_module and RGA_Module')
    
except Exception as e:
    print(f"⚠️  Vector DB module check failed: {e}")

print("✅ Vector Database check completed!")

In [None]:
# 🚀 Step 6: Launch RAG Module Web Interface
from pyngrok import ngrok
import subprocess
import threading
import time

print("🚀 Starting Sri Lankan Legal AI Assistant...")

# Kill any existing processes
!pkill -f streamlit
!pkill -f "python.*app.py"

# Function to run the web app
def run_legal_ai():
    try:
        os.chdir('/content/Vector_DB_module and RGA_Module/RAG_Module')
        subprocess.run(['python', 'app.py', 'web'], check=True)
    except Exception as e:
        print(f"❌ Error starting app: {e}")

# Start the app in a separate thread
app_thread = threading.Thread(target=run_legal_ai)
app_thread.daemon = True
app_thread.start()

print("⏳ Waiting for the application to start...")
time.sleep(20)  # Wait for the app to fully start

try:
    # Create ngrok tunnel for public access
    public_url = ngrok.connect(8501)
    
    print("\n" + "="*60)
    print("🎉 Sri Lankan Legal AI Assistant is READY!")
    print("="*60)
    print(f"🌐 Access your Legal AI Assistant at:")
    print(f"   {public_url}")
    print("="*60)
    print("\n📋 Features Available:")
    print("   ✅ Legal Question Answering")
    print("   ✅ Multiple Response Options")
    print("   ✅ Example Questions")
    print("   ✅ Advanced Search Filters")
    print("   ✅ Conversation History")
    print("   ✅ Simple Greeting Responses")
    print("\n💡 Tips:")
    print("   • Try asking about Sri Lankan property law")
    print("   • Use the Example Questions for quick start")
    print("   • Check the Advanced Search for specific filters")
    print("\n⚠️  Keep this Colab tab open to maintain the connection!")
    
except Exception as e:
    print(f"❌ Error creating public URL: {e}")
    print("💡 The app might still be starting. Try running this cell again in a few seconds.")

In [None]:
# 🔧 Step 7: Alternative - CLI Mode (Optional)
print("🔧 Alternative: Run in CLI mode")
print("If the web interface doesn't work, you can use CLI mode:")
print()

# Uncomment the lines below to run in CLI mode instead
# os.chdir('/content/Vector_DB_module and RGA_Module/RAG_Module')
# !python app.py cli

print("💡 To use CLI mode:")
print("   1. Uncomment the lines above")
print("   2. Run this cell")
print("   3. Follow the prompts to ask legal questions")

In [None]:
# 💾 Step 8: Save Results to Google Drive (Optional)
import shutil
from datetime import datetime

print("💾 Saving results to Google Drive...")

# Create results directory in Google Drive
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
results_dir = f'/content/drive/MyDrive/Legal_AI_Results_{timestamp}'
os.makedirs(results_dir, exist_ok=True)

# Save any generated results
project_dir = '/content/Vector_DB_module and RGA_Module'

# Copy logs if they exist
for log_file in ['app.log', 'rag_system.log', 'vector_db.log']:
    log_path = os.path.join(project_dir, 'RAG_Module', log_file)
    if os.path.exists(log_path):
        shutil.copy2(log_path, results_dir)
        print(f"✅ Saved {log_file}")

# Save configuration files
config_files = ['.env', 'RAG_Module/config.py', 'RAG_Module/rag_config.py']
for config_file in config_files:
    config_path = os.path.join(project_dir, config_file)
    if os.path.exists(config_path):
        shutil.copy2(config_path, results_dir)
        print(f"✅ Saved {config_file}")

print(f"\n💾 Results saved to: {results_dir}")
print("✅ Backup completed!")

## 🔧 Troubleshooting

### Common Issues:

1. **"Zip file not found"**
   - Check the `ZIP_PATH` in Step 3
   - Make sure you uploaded the zip file to Google Drive

2. **"API key errors"**
   - Update the API keys in Step 4
   - Make sure your Pinecone account is active

3. **"App not starting"**
   - Wait longer (up to 30 seconds)
   - Try restarting the runtime
   - Check the error messages

4. **"ngrok tunnel failed"**
   - Try running Step 6 again
   - Use CLI mode as alternative

### Quick Fixes:
```python
# Restart everything
!pkill -f streamlit
!pkill -f python
# Then re-run Step 6
```

### Memory Issues:
- Runtime → Restart runtime
- Edit → Clear all outputs
- Use GPU runtime for better performance
