# üöÄ TrainForge External GPU Worker - Google Colab

This notebook connects your Google Colab GPU to your local TrainForge instance.

## Setup Steps:
1. **Enable GPU**: Runtime ‚Üí Change runtime type ‚Üí GPU (T4, V100, or A100)
2. **Expose your local TrainForge**: Use ngrok or localtunnel
3. **Run this notebook**

---

## Step 1: Check GPU Availability

In [None]:
import torch
import subprocess

print("üîç Checking GPU availability...")

if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    gpu_memory = torch.cuda.get_device_properties(0).total_memory // 1024**3
    print(f"‚úÖ GPU Available: {gpu_name}")
    print(f"üíæ GPU Memory: {gpu_memory}GB")
    print(f"üîß CUDA Version: {torch.version.cuda}")
    print(f"üêç PyTorch Version: {torch.__version__}")
else:
    print("‚ùå No GPU available - Please enable GPU in Runtime settings")

# Show nvidia-smi output
try:
    result = subprocess.run(['nvidia-smi'], capture_output=True, text=True)
    print("\nüìä nvidia-smi output:")
    print(result.stdout)
except:
    print("‚ö†Ô∏è nvidia-smi not available")

## Step 2: Install Dependencies

In [None]:
# Install required packages
!pip install requests zipfile36 pathlib

print("‚úÖ Dependencies installed")

## Step 3: Download TrainForge Worker

In [None]:
import requests
from pathlib import Path

# Download the worker script
worker_url = "https://raw.githubusercontent.com/your-repo/trainforge/main/external-gpu/colab_worker.py"

# For now, let's create the worker script directly
worker_script = '''
# The colab_worker.py content goes here
# (This would be the content from the file we just created)
'''

# Save worker script
with open('colab_worker.py', 'w') as f:
    f.write(worker_script)

print("‚úÖ TrainForge worker downloaded")

## Step 4: Expose Your Local TrainForge API

### Option A: Using ngrok (Recommended)

1. Install ngrok: https://ngrok.com/download
2. Run on your local machine:
   ```bash
   ngrok http 3000
   ```
3. Copy the https URL (e.g., `https://abc123.ngrok.io`)

### Option B: Using localtunnel

1. Install: `npm install -g localtunnel`
2. Run: `lt --port 3000`
3. Copy the URL provided

---

## Step 5: Connect to TrainForge

In [None]:
# Enter your TrainForge API URL here
API_URL = input("Enter your TrainForge API URL (from ngrok/localtunnel): ")

if not API_URL:
    print("‚ùå Please enter a valid API URL")
else:
    print(f"üì° API URL set to: {API_URL}")
    
    # Test connection
    try:
        import requests
        response = requests.get(f"{API_URL}/health")
        if response.status_code == 200:
            print("‚úÖ Successfully connected to TrainForge API")
        else:
            print(f"‚ùå Failed to connect: HTTP {response.status_code}")
    except Exception as e:
        print(f"‚ùå Connection failed: {e}")

## Step 6: Start GPU Worker

In [None]:
# Import and start the worker
import sys
sys.path.append('/content')

from colab_worker import ColabGPUWorker

# Create worker
worker = ColabGPUWorker(API_URL)

print("üöÄ Starting TrainForge GPU Worker...")
print("‚ö†Ô∏è Keep this cell running to maintain the connection")
print("‚ö†Ô∏è The worker will stop if you close this tab or the session times out")
print("\n" + "="*50)

# Start worker (this will run indefinitely)
try:
    worker.start()
except KeyboardInterrupt:
    print("\n‚ö†Ô∏è Worker stopped by user")
except Exception as e:
    print(f"‚ùå Worker error: {e}")

## üìä Monitor Worker Status

You can run this cell to check the worker status:

In [None]:
import requests
import json

try:
    # Check API health
    health = requests.get(f"{API_URL}/health")
    print(f"API Health: {health.status_code}")
    
    # Check workers
    workers = requests.get(f"{API_URL}/api/workers")
    if workers.status_code == 200:
        worker_list = workers.json()
        print(f"\nüìä Active Workers: {len(worker_list)}")
        for w in worker_list:
            print(f"  - {w.get('worker_id', 'Unknown')}: {w.get('status', 'Unknown')}")
    
    # Check pending jobs
    jobs = requests.get(f"{API_URL}/api/jobs/pending")
    if jobs.status_code == 200:
        job_list = jobs.json()
        print(f"\nüéØ Pending Jobs: {len(job_list)}")
        for job in job_list:
            print(f"  - {job.get('job_id', 'Unknown')}: {job.get('status', 'Unknown')}")
            
except Exception as e:
    print(f"‚ùå Error checking status: {e}")

## üéØ Usage Tips

1. **Keep the notebook running**: The worker will stop if you close the tab
2. **Check connection**: Use the monitor cell to verify connectivity
3. **Session limits**: Colab sessions timeout after 12-24 hours
4. **GPU types**: You might get T4, V100, or A100 depending on availability
5. **Multiple workers**: You can run multiple Colab instances for more GPUs

## üö® Troubleshooting

- **Connection failed**: Check your ngrok/localtunnel URL
- **No GPU**: Go to Runtime ‚Üí Change runtime type ‚Üí GPU
- **Worker not appearing**: Check the API URL and firewall settings
- **Jobs not running**: Verify the worker is registered and API is accessible

---

**Happy Training! üöÄ**