# Ollama Setup on Google Colab with Cloudflare Tunnel

This notebook will:
1. Install Ollama on Google Colab
2. Set up Cloudflare Tunnel for public access
3. Download and run large language models

**Important Notes:**
- This requires a Colab session with sufficient resources
- Models are large and will take time to download
- The Cloudflare tunnel will remain active as long as the notebook is running
- Free Colab has runtime limits; consider Colab Pro for longer sessions

## 1. System Information and Prerequisites

In [None]:
# Check system information
!echo "=== System Information ==="
!uname -a
!echo "\n=== GPU Information ==="
!nvidia-smi 2>/dev/null || echo "No GPU detected (CPU mode will be slower)"
!echo "\n=== Memory Information ==="
!free -h
!echo "\n=== Disk Space ==="
!df -h /

## 2. Install Ollama

In [None]:
import os
import subprocess
import time
import requests
from pathlib import Path

print("üì¶ Installing Ollama...")
print("=" * 80)

# Download and install Ollama
!curl -fsSL https://ollama.com/install.sh | sh

print("\n‚úÖ Ollama installation completed!")
print("=" * 80)

## 3. Start Ollama Server

In [None]:
import subprocess
import time
import requests

print("üöÄ Starting Ollama server...")
print("=" * 80)

# Start Ollama server in background
ollama_process = subprocess.Popen(
    ['ollama', 'serve'],
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
    text=True
)

# Wait for server to start
print("Waiting for Ollama server to start...")
max_retries = 30
retry_count = 0

while retry_count < max_retries:
    try:
        response = requests.get('http://localhost:11434/api/tags', timeout=1)
        if response.status_code == 200:
            print("‚úÖ Ollama server is running!")
            break
    except:
        pass
    
    time.sleep(1)
    retry_count += 1
    print(f"Attempt {retry_count}/{max_retries}...", end='\r')

if retry_count >= max_retries:
    print("\n‚ùå Failed to start Ollama server")
else:
    print(f"\n‚úÖ Server started successfully on http://localhost:11434")
    print("=" * 80)

## 4. Install Cloudflare Tunnel (cloudflared)

In [None]:
print("‚òÅÔ∏è  Installing Cloudflare Tunnel...")
print("=" * 80)

# Download and install cloudflared
!wget -q https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
!dpkg -i cloudflared-linux-amd64.deb

# Verify installation
!cloudflared --version

print("\n‚úÖ Cloudflare Tunnel installed successfully!")
print("=" * 80)

## 5. Start Cloudflare Tunnel to Expose Ollama

In [None]:
import subprocess
import time
import re
from IPython.display import display, HTML

print("üåê Starting Cloudflare Tunnel...")
print("=" * 80)
print("This will create a public URL to access your Ollama instance.")
print("The URL will be displayed below when ready.\n")

# Start cloudflared tunnel
tunnel_process = subprocess.Popen(
    ['cloudflared', 'tunnel', '--url', 'http://localhost:11434'],
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT,
    text=True,
    bufsize=1
)

# Extract the public URL
public_url = None
print("Waiting for tunnel to establish...")

for line in iter(tunnel_process.stdout.readline, ''):
    print(line.strip())
    
    # Look for the trycloudflare.com URL
    if 'trycloudflare.com' in line:
        match = re.search(r'https://[\w-]+\.trycloudflare\.com', line)
        if match:
            public_url = match.group(0)
            break

if public_url:
    print("\n" + "=" * 80)
    print("‚úÖ TUNNEL ACTIVE!")
    print("=" * 80)
    print(f"\nüåç Public URL: {public_url}")
    print(f"\nüìù API Endpoint: {public_url}/api")
    print(f"\nüîó Use this URL to connect to your Ollama instance from anywhere!")
    print("\n‚ö†Ô∏è  Keep this cell running to maintain the tunnel.")
    print("=" * 80)
    
    # Display clickable link
    display(HTML(f'<h2>Your Public Ollama URL:</h2><a href="{public_url}" target="_blank" style="font-size:20px; color:blue;">{public_url}</a>'))
    
    # Store URL for later use
    with open('/tmp/ollama_url.txt', 'w') as f:
        f.write(public_url)
else:
    print("\n‚ùå Failed to get public URL from cloudflared")

## 6. Download Models

Choose which model to download. Available options include:
- `llama3:8b` - Meta's Llama 3 8B model (~4.7GB)
- `llama3:13b` - Meta's Llama 3 13B model (~7.4GB)
- `mistral:7b` - Mistral 7B model (~4.1GB)
- `codellama:13b` - Code Llama 13B (~7.4GB)
- `qwen2.5:14b` - Qwen 2.5 14B model (~9GB)

**Note:** Models are large! Download time depends on your connection speed.

In [None]:
import subprocess
import time

# Configure which model to download
# Change this to your preferred model
MODEL_NAME = "llama3:13b"  # You can change this to other models

print(f"üì• Downloading model: {MODEL_NAME}")
print("=" * 80)
print("‚è≥ This may take 10-30 minutes depending on the model size...")
print("")

try:
    # Pull the model
    process = subprocess.Popen(
        ['ollama', 'pull', MODEL_NAME],
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT,
        text=True,
        bufsize=1
    )
    
    # Stream output
    for line in iter(process.stdout.readline, ''):
        if line:
            print(line.strip())
    
    process.wait()
    
    if process.returncode == 0:
        print("\n" + "=" * 80)
        print(f"‚úÖ Model '{MODEL_NAME}' downloaded successfully!")
        print("=" * 80)
    else:
        print(f"\n‚ùå Failed to download model '{MODEL_NAME}'")
        
except Exception as e:
    print(f"\n‚ùå Error downloading model: {str(e)}")

## 7. List Available Models

In [None]:
print("üìã Available Models:")
print("=" * 80)

!ollama list

print("=" * 80)

## 8. Test the Model Locally

In [None]:
import requests
import json

def chat_with_ollama(prompt, model="llama3:13b", stream=False):
    """
    Send a chat request to the local Ollama instance
    """
    url = "http://localhost:11434/api/chat"
    
    payload = {
        "model": model,
        "messages": [
            {"role": "user", "content": prompt}
        ],
        "stream": stream
    }
    
    print(f"ü§ñ Asking {model}: {prompt}")
    print("=" * 80)
    print("Response:")
    print()
    
    try:
        response = requests.post(url, json=payload, stream=stream)
        
        if stream:
            full_response = ""
            for line in response.iter_lines():
                if line:
                    data = json.loads(line)
                    if 'message' in data:
                        content = data['message'].get('content', '')
                        print(content, end='', flush=True)
                        full_response += content
            print()  # New line at the end
            return full_response
        else:
            data = response.json()
            if 'message' in data:
                content = data['message'].get('content', '')
                print(content)
                return content
    
    except Exception as e:
        print(f"‚ùå Error: {str(e)}")
        return None
    
    finally:
        print()
        print("=" * 80)

# Test with a simple prompt
chat_with_ollama(
    "Explain what Ollama is in 2 sentences.",
    model=MODEL_NAME,
    stream=True
)

## 9. Test Public URL Access

In [None]:
import requests
import json
from pathlib import Path

# Read the public URL
try:
    with open('/tmp/ollama_url.txt', 'r') as f:
        public_url = f.read().strip()
    
    print(f"üåê Testing public access via: {public_url}")
    print("=" * 80)
    
    # Test the /api/tags endpoint
    response = requests.get(f"{public_url}/api/tags")
    
    if response.status_code == 200:
        print("‚úÖ Public URL is accessible!")
        print("\nAvailable models via public URL:")
        data = response.json()
        if 'models' in data:
            for model in data['models']:
                print(f"  - {model.get('name', 'Unknown')}")
    else:
        print(f"‚ùå Failed to access public URL (Status: {response.status_code})")
    
    print("\n" + "=" * 80)
    print("üìù Example API Usage:")
    print("=" * 80)
    print(f"""
# Python example:
import requests

url = "{public_url}/api/chat"
payload = {{
    "model": "{MODEL_NAME}",
    "messages": [
        {{"role": "user", "content": "Hello!"}}
    ]
}}

response = requests.post(url, json=payload)
print(response.json())

# cURL example:
curl {public_url}/api/chat -d '{{
  "model": "{MODEL_NAME}",
  "messages": [
    {{"role": "user", "content": "Hello!"}}
  ]
}}'
    """)
    
except FileNotFoundError:
    print("‚ùå Public URL not found. Make sure the Cloudflare tunnel is running.")
except Exception as e:
    print(f"‚ùå Error testing public URL: {str(e)}")

## 10. Interactive Chat Interface

In [None]:
import requests
import json

def interactive_chat(model=MODEL_NAME, use_public_url=False):
    """
    Simple interactive chat interface
    """
    # Determine which URL to use
    if use_public_url:
        try:
            with open('/tmp/ollama_url.txt', 'r') as f:
                base_url = f.read().strip()
        except:
            print("‚ùå Public URL not available, using localhost")
            base_url = "http://localhost:11434"
    else:
        base_url = "http://localhost:11434"
    
    url = f"{base_url}/api/chat"
    conversation_history = []
    
    print("ü§ñ Interactive Chat with Ollama")
    print("=" * 80)
    print(f"Model: {model}")
    print(f"Endpoint: {base_url}")
    print("\nType your message and press Enter. Type 'quit' to exit.")
    print("=" * 80)
    print()
    
    while True:
        # Get user input
        user_input = input("You: ").strip()
        
        if user_input.lower() in ['quit', 'exit', 'bye']:
            print("\nüëã Goodbye!")
            break
        
        if not user_input:
            continue
        
        # Add user message to history
        conversation_history.append({"role": "user", "content": user_input})
        
        # Send request
        payload = {
            "model": model,
            "messages": conversation_history,
            "stream": True
        }
        
        try:
            print("\nAssistant: ", end="", flush=True)
            response = requests.post(url, json=payload, stream=True)
            
            full_response = ""
            for line in response.iter_lines():
                if line:
                    data = json.loads(line)
                    if 'message' in data:
                        content = data['message'].get('content', '')
                        print(content, end='', flush=True)
                        full_response += content
            
            print()  # New line
            print()
            
            # Add assistant response to history
            conversation_history.append({"role": "assistant", "content": full_response})
            
        except Exception as e:
            print(f"\n‚ùå Error: {str(e)}")
            print()

# Start interactive chat
interactive_chat(model=MODEL_NAME, use_public_url=False)

## 11. Utility Functions

In [None]:
import requests
import json

def show_status():
    """Display current status of Ollama and tunnel"""
    print("üìä Status Report")
    print("=" * 80)
    
    # Check Ollama server
    try:
        response = requests.get('http://localhost:11434/api/tags', timeout=2)
        if response.status_code == 200:
            print("‚úÖ Ollama server: RUNNING")
            data = response.json()
            models = data.get('models', [])
            print(f"   Models loaded: {len(models)}")
        else:
            print("‚ùå Ollama server: NOT RESPONDING")
    except:
        print("‚ùå Ollama server: OFFLINE")
    
    # Check public URL
    try:
        with open('/tmp/ollama_url.txt', 'r') as f:
            public_url = f.read().strip()
        print(f"‚úÖ Public URL: {public_url}")
        
        # Test public access
        response = requests.get(f"{public_url}/api/tags", timeout=5)
        if response.status_code == 200:
            print("   Status: ACCESSIBLE")
        else:
            print("   Status: NOT ACCESSIBLE")
    except FileNotFoundError:
        print("‚ùå Public URL: NOT CONFIGURED")
    except Exception as e:
        print(f"‚ö†Ô∏è  Public URL: ERROR ({str(e)})")
    
    print("=" * 80)

def download_additional_model(model_name):
    """Download an additional model"""
    print(f"üì• Downloading {model_name}...")
    !ollama pull {model_name}
    print(f"\n‚úÖ Model {model_name} downloaded!")

def get_public_url():
    """Get the current public URL"""
    try:
        with open('/tmp/ollama_url.txt', 'r') as f:
            return f.read().strip()
    except:
        return None

# Run status check
show_status()

## 12. Keep-Alive Cell

Run this cell to keep both Ollama and the Cloudflare tunnel running.
This will prevent the notebook from timing out.

In [None]:
import time
import requests
from datetime import datetime

print("üîÑ Keep-Alive Monitor")
print("=" * 80)
print("This cell will run indefinitely to keep services active.")
print("Press the Stop button to interrupt.\n")

try:
    counter = 0
    while True:
        counter += 1
        current_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        
        # Check Ollama
        try:
            requests.get('http://localhost:11434/api/tags', timeout=2)
            ollama_status = "‚úÖ"
        except:
            ollama_status = "‚ùå"
        
        # Check tunnel
        try:
            with open('/tmp/ollama_url.txt', 'r') as f:
                public_url = f.read().strip()
            requests.get(f"{public_url}/api/tags", timeout=5)
            tunnel_status = "‚úÖ"
        except:
            tunnel_status = "‚ùå"
        
        print(f"[{current_time}] Check #{counter} - Ollama: {ollama_status} | Tunnel: {tunnel_status}", end='\r')
        
        time.sleep(60)  # Check every minute
        
except KeyboardInterrupt:
    print("\n\n‚èπÔ∏è  Keep-alive monitor stopped.")

## üìö Additional Information

### Available Models

You can download additional models using:
```python
!ollama pull <model-name>
```

Popular models:
- `llama3:8b` - Llama 3 8B
- `llama3:13b` - Llama 3 13B
- `llama3:70b` - Llama 3 70B (requires significant resources)
- `mistral:7b` - Mistral 7B
- `codellama:13b` - Code Llama 13B
- `qwen2.5:14b` - Qwen 2.5 14B

### API Endpoints

- **Chat**: `POST /api/chat`
- **Generate**: `POST /api/generate`
- **List Models**: `GET /api/tags`
- **Pull Model**: `POST /api/pull`
- **Delete Model**: `DELETE /api/delete`

### Important Notes

1. **Resource Usage**: Large models require significant RAM and GPU memory
2. **Session Limits**: Free Colab sessions have time limits (~12 hours)
3. **Tunnel Duration**: The Cloudflare tunnel URL changes each time you restart
4. **Security**: The public URL is accessible to anyone - avoid sharing sensitive data

### Troubleshooting

- **Out of Memory**: Try using smaller models (7B or 8B instead of 13B+)
- **Slow Performance**: Ensure GPU is enabled in Colab settings
- **Tunnel Issues**: Restart the Cloudflare tunnel cell
- **Model Download Fails**: Check internet connection and disk space

### Documentation

- Ollama: https://github.com/ollama/ollama
- Ollama API: https://github.com/ollama/ollama/blob/main/docs/api.md
- Cloudflare Tunnel: https://developers.cloudflare.com/cloudflare-one/connections/connect-apps