# üèõÔ∏è Legal RAG Indonesia - Kaggle Setup

This notebook sets up and runs the Legal RAG system in Kaggle.

**Run cells in order:**
1. Setup & Dependencies
2. GPU Check
3. Run Diagnostic Test (optional but recommended)
4. Launch Application

## Cell 1: Setup Path & Dependencies

In [None]:
import os
import sys

# Set project root
PROJECT_ROOT = '/kaggle/working/06_ID_Legal'
os.chdir(PROJECT_ROOT)
sys.path.insert(0, PROJECT_ROOT)

print(f"Working directory: {os.getcwd()}")
print(f"Python path includes project: {PROJECT_ROOT in sys.path}")

## Cell 2: GPU Memory Check

In [None]:
import gc
import torch

def check_gpu():
    gc.collect()
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
        torch.cuda.synchronize()
        
        allocated = torch.cuda.memory_allocated() / 1024**3
        total = torch.cuda.get_device_properties(0).total_memory / 1024**3
        name = torch.cuda.get_device_name(0)
        
        print(f"‚úÖ GPU: {name}")
        print(f"‚úÖ Memory: {allocated:.2f}GB / {total:.2f}GB used ({100*allocated/total:.1f}%)")
        return True
    else:
        print("‚ùå No GPU available!")
        return False

check_gpu()

## Cell 3: (Optional) Run Diagnostic Test

Run this to verify the pipeline works for multiple sequential queries.
This takes ~15-20 minutes but confirms everything is working.

In [None]:
# OPTIONAL: Run diagnostic test to verify multi-turn works
# Uncomment the line below to run
# %run tests/minimal_pipeline_test.py

## Cell 4: Launch Application

This launches the API server in background and then the Gradio UI.

**Important:** This cell will:
1. Initialize and load all models (takes ~5-10 minutes)
2. Start API server on port 8000
3. Launch Gradio UI with public share URL

In [None]:
%run kaggle_launcher.py

## Alternative: Manual Launch (if above doesn't work)

If the launcher above has issues, use these cells to manually control each step.

In [None]:
# Alternative Cell 4a: Start API Server in Background Thread
import threading
import time

def run_api_server():
    import uvicorn
    from api.server import create_app
    
    app = create_app()
    config = uvicorn.Config(app, host="127.0.0.1", port=8000, log_level="warning")
    server = uvicorn.Server(config)
    server.run()

# Start in background thread
api_thread = threading.Thread(target=run_api_server, daemon=True)
api_thread.start()

print("API server starting in background...")
print("Wait for 'API ready' message before running next cell.")
print("This may take 5-10 minutes to load models.")

In [None]:
# Alternative Cell 4b: Wait for API to be ready
import requests
import time

def wait_for_api(timeout=600):
    print("Waiting for API to be ready...")
    start = time.time()
    
    while time.time() - start < timeout:
        try:
            resp = requests.get("http://127.0.0.1:8000/api/v1/ready", timeout=5)
            if resp.status_code == 200:
                data = resp.json()
                if data.get('ready'):
                    print("‚úÖ API is ready!")
                    return True
                else:
                    print(f"Loading: {data.get('message', '...')}")
        except:
            pass
        time.sleep(10)
    
    print("‚ùå API failed to start")
    return False

wait_for_api()

In [None]:
# Alternative Cell 4c: Launch Gradio UI
from ui.unified_app_api import launch_app

# share=True generates a public URL for Kaggle
launch_app(share=True, server_port=7860, server_name="0.0.0.0")

## Troubleshooting

### If the API blocks after a few requests:
1. Run `%run tests/minimal_pipeline_test.py` to verify pipeline works
2. Check memory: `check_gpu()` should show < 90% used
3. Restart kernel and run again

### If Gradio doesn't connect:
1. Check API is running: `requests.get('http://127.0.0.1:8000/api/v1/health')`
2. Try the Alternative cells (4a, 4b, 4c) separately

### If out of memory:
1. Restart kernel
2. Don't run diagnostic test before launching (saves memory)
3. Use lower thinking_mode ('low' instead of 'high')