# üß† Sovereign-Doc Cloud Brain

GPU-accelerated Vision-Language AI for document analysis using Qwen2.5-VL-7B.

## Requirements
- ‚úÖ GPU Runtime (T4 or better)
- ‚úÖ ngrok Auth Token
- ‚úÖ HuggingFace Token
- ‚úÖ Sovereign Access Token

## Setup Instructions
1. Enable GPU: Runtime ‚Üí Change runtime type ‚Üí T4 GPU
2. Add secrets (üîë icon in sidebar):
   - `NGROK_TOKEN`: Your ngrok token
   - `HF_TOKEN`: Your HuggingFace token
   - `SOVEREIGN_ACCESS_TOKEN`: `mKEHz_sxn8g3LLGqe7cTsuRBs_QEolmDkCh_sL91akM`
3. Upload `colab_brain/` folder (3 files) to `/content/colab_brain/`
4. Run cells in order

## üì¶ Step 1: Install Dependencies

In [None]:
# Install required packages
!pip install -q vllm fastapi uvicorn pyngrok python-multipart nest-asyncio loguru pillow transformers
print('‚úÖ Dependencies installed')

## üîë Step 2: Load Configuration

In [None]:
import os
from google.colab import userdata

# Load secrets from Colab
try:
    os.environ['NGROK_TOKEN'] = userdata.get('NGROK_TOKEN')
    os.environ['HF_TOKEN'] = userdata.get('HF_TOKEN')
    os.environ['SOVEREIGN_ACCESS_TOKEN'] = userdata.get('SOVEREIGN_ACCESS_TOKEN')
    
    print('‚úÖ All tokens loaded successfully')
    print(f'   - Ngrok token: {os.environ["NGROK_TOKEN"][:10]}...')
    print(f'   - HF token: {os.environ["HF_TOKEN"][:10]}...')
    print(f'   - Access token: {os.environ["SOVEREIGN_ACCESS_TOKEN"][:10]}...')
except Exception as e:
    print(f'‚ùå Error loading secrets: {e}')
    print('   Make sure you added all tokens in Colab Secrets (üîë icon)')

## üìÅ Step 3: Verify Uploaded Files

Upload the `colab_brain/` folder using the file browser (üìÅ icon) on the left.

In [None]:
# Check if files are uploaded
import os

required_files = [
    'colab_brain/__init__.py',
    'colab_brain/inference.py',
    'colab_brain/server.py'
]

print('üìÅ Checking uploaded files...')
all_present = True
for file in required_files:
    if os.path.exists(file):
        print(f'   ‚úÖ {file}')
    else:
        print(f'   ‚ùå {file} - MISSING!')
        all_present = False

if all_present:
    print('\n‚úÖ All files present! Ready to start server.')
else:
    print('\n‚ùå Some files are missing. Please upload the colab_brain/ folder.')

## üöÄ Step 4: Start Cloud Brain Server

This will:
1. Start ngrok tunnel on port 8000
2. Load Qwen2.5-VL-7B model on GPU
3. Start FastAPI server
4. Display public URL

**Keep this cell running!** The server will stop if you interrupt it.

In [None]:
from pyngrok import ngrok
import sys
import asyncio

# Start ngrok tunnel
print('üîå Starting ngrok tunnel...')
ngrok.set_auth_token(os.environ['NGROK_TOKEN'])
tunnel = ngrok.connect(8000, bind_tls=True)

print(f'\n‚úÖ Cloud Brain is LIVE!')
print(f'=' * 60)
print(f'üåê Public URL: {tunnel.public_url}')
print(f'=' * 60)
print(f'\nüì° Endpoints:')
print(f'  - GET  {tunnel.public_url}/health')
print(f'  - POST {tunnel.public_url}/analyze')
print(f'\nüîê Authentication: Bearer {os.environ["SOVEREIGN_ACCESS_TOKEN"][:20]}...')
print(f'\n‚ö° Starting server (this will take ~2 minutes to load the model)...')
print(f'=' * 60)

# Add colab_brain to Python path
sys.path.insert(0, '/content')

# Import FastAPI app
from colab_brain.server import app

# Run server in Colab-compatible way
import nest_asyncio
nest_asyncio.apply()

# Start uvicorn server
import uvicorn
config = uvicorn.Config(app, host='0.0.0.0', port=8000, log_level='info')
server = uvicorn.Server(config)
await server.serve()

## ‚úÖ Testing (Optional)

Run this in a **separate cell** while the server is running to test it.

In [None]:
import requests

# Replace with your tunnel URL from above
TUNNEL_URL = 'https://your-tunnel-url.ngrok-free.dev'
ACCESS_TOKEN = os.environ['SOVEREIGN_ACCESS_TOKEN']

# Test health endpoint
print('üè• Testing health endpoint...')
response = requests.get(
    f'{TUNNEL_URL}/health',
    headers={'X-Sovereign-Token': ACCESS_TOKEN}
)
print(f'Status: {response.status_code}')
print(f'Response: {response.json()}')

print('\n‚úÖ If you see status 200, the server is working!')
print('   Your local Sovereign-Doc app can now use this URL for Vision Agent.')

## üìù Notes

- **Keep the server cell running** - don't interrupt it
- **Free GPU limit**: Colab gives ~12 hours of T4 GPU time
- **Restart needed**: If the model crashes, restart runtime and run all cells again
- **Tunnel URL changes**: Each time you restart, you get a new ngrok URL

## üõ†Ô∏è Troubleshooting

**"CUDA out of memory"**
- Restart runtime: Runtime ‚Üí Restart runtime
- Make sure you selected T4 GPU, not CPU

**"ngrok authentication failed"**
- Check your NGROK_TOKEN in Colab Secrets
- Get a new token from https://dashboard.ngrok.com

**"Model loading failed"**
- Check your HF_TOKEN in Colab Secrets
- Verify you have access to Qwen models on HuggingFace