This notebook demonstrates how to connect to your local Ollama instance.

## Prerequisites
- Ollama installed and running locally
- Python ollama package installed
- At least one model pulled (e.g., gpt-oss:20b, gpt-oss:120b, etc.)

##  Approaches Covered:
cloud - High-performance cloud processing (requires subscription)


## 1. Setup and Imports

**Note:** Make sure you have the required packages installed:
```bash
pip install ollama python-dotenv
```


In [1]:
# Import required libraries
import ollama
import json
import time
import os
from datetime import datetime
import warnings
from dotenv import load_dotenv
from pathlib import Path

# Load environment variables from .env file
load_dotenv()

warnings.filterwarnings('ignore')

# Load configuration from environment variables
OLLAMA_CLOUD_BASE_URL = os.getenv('OLLAMA_CLOUD_BASE_URL', 'https://ollama.com')
OLLAMA_CLOUD_MODEL = os.getenv('OLLAMA_CLOUD_MODEL', 'gpt-oss:20b')
OLLAMA_API_KEY = os.getenv('OLLAMA_API_KEY', '')


## 2. Test Ollama Cloud Connection


In [2]:
# Test connection to Ollama cloud service
if not OLLAMA_API_KEY:
    print("ERROR: Ollama API key not found in .env file!")
else:
    try:
        # Test cloud connection
        cloud_client = ollama.Client(
            host=OLLAMA_CLOUD_BASE_URL,
            headers={'Authorization': f'Bearer {OLLAMA_API_KEY}'}
        )
        
        print("SUCCESS: Successfully connected to Ollama Cloud!")
        print(f"Connection time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
        print(f"Connected to: {OLLAMA_CLOUD_BASE_URL}")
        print(f"Cloud Model: {OLLAMA_CLOUD_MODEL}")
        print(f"API Key: {'Set' if OLLAMA_API_KEY else 'Not set'}")
        
        # request
        print("\nTesting cloud model availability...")
        try:
            response = cloud_client.generate(
                model=OLLAMA_CLOUD_MODEL,
                prompt="Hello",
                options={'num_predict': 10}
            )
            print(f"SUCCESS: Cloud model '{OLLAMA_CLOUD_MODEL}' is working!")
        except Exception as e:
            print(f"WARNING: Cloud model test failed: {e}")
            
    except Exception as e:
        print(f"ERROR: Failed to connect to Ollama Cloud: {e}")
        print("\nTroubleshooting tips:")
        print("1. Check your API key in the .env file")
        print("2. Verify your subscription at https://ollama.com")
        print("3. Check your internet connection")


SUCCESS: Successfully connected to Ollama Cloud!
Connection time: 2025-10-01 19:38:36
Connected to: https://ollama.com
Cloud Model: gpt-oss:20b
API Key: Set

Testing cloud model availability...
SUCCESS: Cloud model 'gpt-oss:20b' is working!


## 3. Cloud Performance

In [3]:
# Configure model for cloud usage using environment variables
MODEL_NAME = OLLAMA_CLOUD_MODEL  # Use cloud model from .env file

try:
    # Create cloud client
    cloud_client = ollama.Client(
        host=OLLAMA_CLOUD_BASE_URL,
        headers={'Authorization': f'Bearer {OLLAMA_API_KEY}'}
    )
    
    # Measure generation time using time.time()
    start_time = time.time()
    
    # Basic generation with cloud model
    response = cloud_client.generate(
        model=MODEL_NAME,
        prompt="Explain what artificial intelligence is in one sentence.",
        options={
            'temperature': 0.7,
            'top_p': 0.9,
            'num_predict': 100
        }
    )
    
    end_time = time.time()
    generation_time = end_time - start_time
    
    print("Response:")
    print(response['response'])
    print(f"\nGeneration time: {generation_time:.2f} seconds")
    print(f"Response length: {len(response['response'])} characters")
    print(f"Tokens per second: {len(response['response'].split()) / generation_time:.1f}")
    
except Exception as e:
    print(f"ERROR: Error with cloud model {MODEL_NAME}: {e} , make sure your API key is set in the .env file")




Response:
Artificial intelligence is the simulation of human intelligence processes by machines—particularly computer systems—that enable them to learn, reason, and solve problems

Generation time: 1.40 seconds
Response length: 169 characters
Tokens per second: 15.0


## 4. Streaming Responses


In [4]:
# streaming
cloud_client = ollama.Client(
    host=OLLAMA_CLOUD_BASE_URL,
    headers={'Authorization': f'Bearer {OLLAMA_API_KEY}'}
    )

stream = cloud_client.generate(
    model=MODEL_NAME,
    prompt="Write a short poem about programming.",
    stream=True,
    options={
        'temperature': 0.8,
        'num_predict': 150
        }
    )
    
print("Streaming response:")
full_response = ""

for chunk in stream:
    if 'response' in chunk:
        print(chunk['response'], end='', flush=True)
        full_response += chunk['response']

print("\n\nSUCCESS: Streaming completed!")
print(f"Total tokens: {len(full_response.split())} words")

Streaming response:
In loops we chase the logic's dance,  
A cursor glides through endless code.  
Variables whisper, functions rhyme—  
The silent hum of keys, our ode.

SUCCESS: Streaming completed!
Total tokens: 24 words


**Note:** This section demonstrates the cloud-based Ollama Turbo service, which requires:
- Ollama account subscription ($20/month)
- API key from https://ollama.com/settings/keys
- Internet connection


In [5]:
def setup_turbo_client():
    if not OLLAMA_API_KEY:
        print("ERROR: Ollama API key not found in .env file!")
        return None
    try:
        turbo_client = ollama.Client(
            host=OLLAMA_CLOUD_BASE_URL,
            headers={'Authorization': f'Bearer {OLLAMA_API_KEY}'}
        )
        print("SUCCESS: Ollama Turbo client configured successfully!")
        print(f"Cloud URL: {OLLAMA_CLOUD_BASE_URL}")
        print(f"Cloud Model: {OLLAMA_CLOUD_MODEL}")
        print(f"API Key: {'Set' if OLLAMA_API_KEY else 'Not set'}")
        return turbo_client
    except Exception as e:
        print(f"ERROR: Error setting up Turbo client: {e}")
        return None

def test_turbo_performance():
    turbo_client = setup_turbo_client()
    if not turbo_client:
        return
    
    print("\nTesting Ollama Turbo Cloud Service")
    print("=" * 50)
    
    try:
        messages = [
            {
                'role': 'user',
                'content': 'Explain quantum computing in simple terms.',
            },
        ]
        
        print(f"Streaming response from Turbo cloud ({OLLAMA_CLOUD_MODEL}):")
        start_time = time.time()
        
        for part in turbo_client.chat(OLLAMA_CLOUD_MODEL, messages=messages, stream=True):
            print(part['message']['content'], end='', flush=True)
        
        end_time = time.time()
        print(f"\n\nTotal time: {end_time - start_time:.2f} seconds")
        print("SUCCESS: Turbo cloud service test completed!")
        
    except Exception as e:
        print(f"ERROR: Turbo cloud error: {e}")

#  API key is available -> run test, else add API key to .env file
if OLLAMA_API_KEY:
    print("API key found - testing Turbo cloud service...")
    test_turbo_performance()
else:
    print("Add OLLAMA_API_KEY to your .env file")


API key found - testing Turbo cloud service...
SUCCESS: Ollama Turbo client configured successfully!
Cloud URL: https://ollama.com
Cloud Model: gpt-oss:20b
API Key: Set

Testing Ollama Turbo Cloud Service
Streaming response from Turbo cloud (gpt-oss:20b):
### What is quantum computing?

Think of a normal computer as a very fast **dumb switch** that can flip a bit from **0** to **1** (or vice‑versa). Every calculation is a sequence of those on/off switches. All the work an ordinary computer can do is done by turning the right mixture of switches on and off at the right times.

A quantum computer, on the other hand, uses *quantum bits* (**qubits**) that can be 0, 1, or a **blend of both** at the same time. This blend is called **superposition**. It’s like having a coin that can be heads, tails, *and* in between all at once. Because of this, a handful of qubits can represent many more possible states at the same moment than the same number of regular bits.

### Two key ingredients

1. **S

## 7. Output to Markdown File


In [6]:
# Markdown output functionality
class MarkdownWriter:
    """Class to write outputs to markdown files."""
    
    def __init__(self, filename="ollama_output.md"):
        # Ensure outputs directory exists
        outputs_dir = Path("outputs")
        outputs_dir.mkdir(exist_ok=True)
        
        # Set filename to outputs subfolder
        self.filename = outputs_dir / filename
        self.content = []
        self.start_time = datetime.now()
        
    def add_header(self, level, text):
        """Add a header to the markdown."""
        self.content.append(f"{'#' * level} {text}\n")
        
    def add_text(self, text):
        """Add plain text to the markdown."""
        self.content.append(f"{text}\n")
        
    def add_code_block(self, code, language="python"):
        """Add a code block to the markdown."""
        self.content.append(f"```{language}\n{code}\n```\n")
        
    def add_json_block(self, data, title="JSON Data"):
        """Add formatted JSON to the markdown."""
        self.content.append(f"### {title}\n")
        self.content.append(f"```json\n{json.dumps(data, indent=2)}\n```\n")
        
    def add_table(self, headers, rows):
        """Add a table to the markdown."""
        # Create table header
        header_row = "| " + " | ".join(headers) + " |"
        separator = "| " + " | ".join(["---"] * len(headers)) + " |"
        
        self.content.append(header_row + "\n")
        self.content.append(separator + "\n")
        
        # Add data rows
        for row in rows:
            row_str = "| " + " | ".join(str(cell) for cell in row) + " |"
            self.content.append(row_str + "\n")
        
    def add_timestamp(self):
        """Add timestamp to the markdown."""
        timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        self.content.append(f"\n---\n*Generated on: {timestamp}*\n")
        
    def save(self):
        """Save the markdown content to file."""
        # Add header with timestamp
        self.content.insert(0, f"# Ollama Turbo Cloud Service Report\n")
        self.content.insert(1, f"*Generated on: {self.start_time.strftime('%Y-%m-%d %H:%M:%S')}*\n\n")
        
        # Write to file
        with open(self.filename, 'w', encoding='utf-8') as f:
            f.writelines(self.content)
        
        print(f"SUCCESS: Markdown output saved to {self.filename}")
        return self.filename




In [7]:
# Example usage of MarkdownWriter with Turbo cloud service
def ymo_markdown_output():
    # Create markdown writer
    md_writer = MarkdownWriter("ollama_turbo_output.md")
    
    # Add configuration section
    md_writer.add_header(2, "Configuration")
    md_writer.add_text("Current Ollama Turbo cloud configuration loaded from .env file:")
    
    config_data = {
        "cloud_url": OLLAMA_CLOUD_BASE_URL,
        "cloud_model": OLLAMA_CLOUD_MODEL,
        "api_key_set": bool(OLLAMA_API_KEY)
    }
    
    md_writer.add_json_block(config_data, "Turbo Cloud Configuration")
    
    # service test section
    md_writer.add_header(2, "Turbo Cloud Service Test")
    
    if OLLAMA_API_KEY:
        try:
            # Setup Turbo client
            turbo_client = ollama.Client(
                host=OLLAMA_CLOUD_BASE_URL,
                headers={'Authorization': f'Bearer {OLLAMA_API_KEY}'}
            )
            
            md_writer.add_text("Testing Ollama Turbo Cloud Service performance...")
            
            messages = [
                {
                    'role': 'user',
                    'content': 'Explain quantum computing in simple terms.',
                },
            ]
            
            md_writer.add_text(f"**Model:** {OLLAMA_CLOUD_MODEL}")
            md_writer.add_text(f"**Prompt:** Explain quantum computing in simple terms.")
            md_writer.add_text("**Response:**")
            
            # Capture streaming response + time
            start_time = time.time()
            full_response = ""
            
            for part in turbo_client.chat(OLLAMA_CLOUD_MODEL, messages=messages, stream=True):
                if 'message' in part and 'content' in part['message']:
                    content = part['message']['content']
                    full_response += content
            
            end_time = time.time()
            duration = end_time - start_time
            
            # response to markdown
            md_writer.add_text(full_response)
            md_writer.add_text(f"\n**Response Time:** {duration:.2f} seconds")
            md_writer.add_text(f"**Response Length:** {len(full_response)} characters")
            
            # Add performance summary
            performance_summary = {
                "model": OLLAMA_CLOUD_MODEL,
                "response_time_seconds": round(duration, 2),
                "response_length": len(full_response),
                "status": "success"
            }
            
            md_writer.add_json_block(performance_summary, "Performance Summary")
            
        except Exception as e:
            md_writer.add_text(f"Error during Turbo cloud service test: {e}")
            error_info = {
                "error": str(e),
                "status": "failed"
            }
            md_writer.add_json_block(error_info, "Error Details")
    else:
        md_writer.add_text("API key not configured - cannot test Turbo cloud service")
        md_writer.add_text("To use Ollama Turbo cloud service:")
        md_writer.add_text("1. Sign up at https://ollama.com ($20/month)")
        md_writer.add_text("2. Get API key from https://ollama.com/settings/keys")
        md_writer.add_text("3. Add OLLAMA_API_KEY to your .env file")
    
    # Add end of report blah blah 
    md_writer.add_header(2, "Summary")
    md_writer.add_text("This report demonstrates Ollama Turbo Cloud Service integration with:")
    md_writer.add_text("- Environment configuration loading")
    md_writer.add_text("- Turbo cloud service testing")
    md_writer.add_text("- Streaming response capture")
    md_writer.add_text("- Performance metrics collection")
    md_writer.add_text("- Markdown output generation")
    
    # Save
    filename = md_writer.save()
    return filename

# Run 
print("Generating Turbo cloud service markdown output...")
output_file = ymo_markdown_output()
print(f"Output saved to: {output_file}")

# Content
if Path(output_file).exists():
    with open(output_file, 'r', encoding='utf-8') as f:
        content = f.read()
        print(f"\nPreview of {output_file}:")
        print("=" * 50)
        print(content[:500] + "..." if len(content) > 500 else content)

Generating Turbo cloud service markdown output...
SUCCESS: Markdown output saved to outputs/ollama_turbo_output.md
Output saved to: outputs/ollama_turbo_output.md

Preview of outputs/ollama_turbo_output.md:
# Ollama Turbo Cloud Service Report
*Generated on: 2025-10-01 19:38:43*

## Configuration
Current Ollama Turbo cloud configuration loaded from .env file:
### Turbo Cloud Configuration
```json
{
  "cloud_url": "https://ollama.com",
  "cloud_model": "gpt-oss:20b",
  "api_key_set": true
}
```
## Turbo Cloud Service Test
Testing Ollama Turbo Cloud Service performance...
**Model:** gpt-oss:20b
**Prompt:** Explain quantum computing in simple terms.
**Response:**
## Quantum Computing – In Plain English...


### Resources:
- [Ollama Documentation](https://ollama.ai/docs)
- [Ollama Python Package](https://github.com/ollama/ollama-python)
- [Model Library](https://ollama.ai/library)