# üöÄ Qwen Coder on Google Colab with Ollama

This notebook sets up **Qwen2.5-Coder** using **Ollama** and exposes the API via **Cloudflare Tunnel** for use with OpenCode.

## Quick Start
1. Make sure GPU runtime is enabled (Runtime ‚Üí Change runtime type ‚Üí T4 GPU)
2. Run all cells in order
3. Copy the `trycloudflare.com` URL from the output
4. Use the URL in OpenCode as your API endpoint

---

## 1Ô∏è‚É£ Check GPU Availability

In [None]:
# Check if GPU is available
!nvidia-smi

import torch
if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f"\n‚úÖ GPU Available: {gpu_name}")
    print(f"üìä VRAM: {gpu_memory:.1f} GB")
else:
    print("‚ùå No GPU detected! Please enable GPU in Runtime settings.")

## 2Ô∏è‚É£ Install Ollama

In [None]:
# Install Ollama
!curl -fsSL https://ollama.com/install.sh | sh
print("\n‚úÖ Ollama installed successfully!")

## 3Ô∏è‚É£ Install Cloudflared (for tunneling)

In [None]:
# Install Cloudflared
!wget -q https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
!dpkg -i cloudflared-linux-amd64.deb
!cloudflared --version
print("\n‚úÖ Cloudflared installed successfully!")

## 4Ô∏è‚É£ Start Ollama Server

In [None]:
import subprocess
import time
import os

# Set environment variable to allow all origins (for API access)
os.environ['OLLAMA_HOST'] = '0.0.0.0:11434'
os.environ['OLLAMA_ORIGINS'] = '*'

# Start Ollama server in background
ollama_process = subprocess.Popen(
    ['ollama', 'serve'],
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
    env=os.environ
)

# Wait for server to start
time.sleep(5)

# Check if server is running
import requests
try:
    response = requests.get('http://localhost:11434/api/tags', timeout=5)
    if response.status_code == 200:
        print("‚úÖ Ollama server is running on port 11434!")
    else:
        print(f"‚ö†Ô∏è Server responded with status: {response.status_code}")
except:
    print("‚è≥ Server starting... please wait a moment and try again.")

## 5Ô∏è‚É£ Download Qwen Coder Model

**Available models for T4 GPU (15GB VRAM):**
- `qwen2.5-coder:7b` - Recommended for free tier (fast)
- `qwen2.5-coder:3b` - Lighter option
- `qwen2.5-coder:1.5b` - Fastest, smallest

For larger models, use the AirLLM notebook instead.

In [None]:
# Pull the Qwen Coder model (this may take a few minutes)
# Change the model name below if you want a different size

MODEL_NAME = "qwen2.5-coder:7b"  # @param ["qwen2.5-coder:7b", "qwen2.5-coder:3b", "qwen2.5-coder:1.5b"]

print(f"üì• Downloading {MODEL_NAME}...")
print("This may take 5-10 minutes depending on model size.\n")

!ollama pull {MODEL_NAME}

print(f"\n‚úÖ Model {MODEL_NAME} downloaded successfully!")

## 6Ô∏è‚É£ Test the Model Locally

In [None]:
import requests
import json

# Test with a simple prompt
response = requests.post(
    'http://localhost:11434/api/generate',
    json={
        'model': MODEL_NAME,
        'prompt': 'Write a Python function to calculate factorial',
        'stream': False
    },
    timeout=120
)

if response.status_code == 200:
    result = response.json()
    print("‚úÖ Model test successful!\n")
    print("Response:")
    print("-" * 50)
    print(result.get('response', 'No response')[:500])
else:
    print(f"‚ùå Test failed with status: {response.status_code}")
    print(response.text)

## 7Ô∏è‚É£ Start Cloudflare Tunnel üåê

This will create a public URL for your Ollama API.

**‚ö†Ô∏è Keep this cell running!** The tunnel stays active as long as this cell is executing.

In [None]:
import subprocess
import re
import time
from IPython.display import display, HTML

print("üöÄ Starting Cloudflare Tunnel...\n")

# Start cloudflared tunnel
tunnel = subprocess.Popen(
    ['cloudflared', 'tunnel', '--url', 'http://localhost:11434'],
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT,
    text=True
)

# Extract the public URL
public_url = None
for line in tunnel.stdout:
    print(line, end='')
    if 'trycloudflare.com' in line:
        match = re.search(r'https://[^\s]+\.trycloudflare\.com', line)
        if match:
            public_url = match.group()
            break

if public_url:
    # Display the URL prominently
    display(HTML(f'''
    <div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); 
                padding: 20px; border-radius: 10px; margin: 20px 0;">
        <h2 style="color: white; margin: 0;">üéâ Your API is Live!</h2>
        <p style="color: #f0f0f0; font-size: 14px;">Use this URL in OpenCode or any API client:</p>
        <div style="background: rgba(255,255,255,0.2); padding: 10px; border-radius: 5px; 
                    font-family: monospace; font-size: 16px; color: white;">
            {public_url}
        </div>
        <br>
        <p style="color: #f0f0f0; font-size: 12px; margin: 0;">
            <b>OpenCode Config:</b><br>
            ‚Ä¢ Provider: Ollama<br>
            ‚Ä¢ Base URL: {public_url}<br>
            ‚Ä¢ Model: {MODEL_NAME}
        </p>
    </div>
    '''))
    
    print("\n" + "="*60)
    print("üìã COPY THIS URL FOR OPENCODE:")
    print(f"   {public_url}")
    print("="*60)
    print(f"\nüîß Model: {MODEL_NAME}")
    print("\n‚ö†Ô∏è Keep this cell running! The tunnel closes when you stop it.")
    print("\n" + "-"*60)
    print("Tunnel logs:")
    
    # Keep reading output to keep tunnel alive
    for line in tunnel.stdout:
        print(line, end='')

---

## üìñ API Usage Examples

### Generate Text (Ollama Native API)
```bash
curl YOUR_URL/api/generate -d '{
  "model": "qwen2.5-coder:7b",
  "prompt": "Write a Python hello world",
  "stream": false
}'
```

### Chat (Ollama Native API)
```bash
curl YOUR_URL/api/chat -d '{
  "model": "qwen2.5-coder:7b",
  "messages": [{"role": "user", "content": "Hello!"}],
  "stream": false
}'
```

### OpenAI-Compatible (v1 API)
```bash
curl YOUR_URL/v1/chat/completions -d '{
  "model": "qwen2.5-coder:7b",
  "messages": [{"role": "user", "content": "Hello!"}]
}'
```