# Ollama API Demo

This notebook demonstrates how to use the local Ollama API server that mimics Anthropic's API structure.

**Prerequisites:**
1. Start the API server: `python api_server.py`
2. Make sure Ollama is running with phi4:latest model

## Setup

Import the required libraries:

In [49]:
import requests
import json
import time

## 1. Check Server Health

First, let's verify the server is running:

In [50]:
# Check if server is running
try:
    response = requests.get("http://localhost:8000/health")
    print(f"✅ Server is running: {response.json()}")
except requests.exceptions.ConnectionError:
    print("❌ Server is not running. Please start it with: python api_server.py")
except Exception as e:
    print(f"❌ Error: {e}")

✅ Server is running: {'status': 'healthy', 'model': 'phi4:latest'}


## Check Available Models

Finally, let's see what models are available:

In [51]:
# List available models
models_response = requests.get("http://localhost:8000/v1/models")
models = models_response.json()

print("🔧 Available Models:")
for model in models['data']:
    print(f"  • {model['id']} (owned by: {model['owned_by']})")

🔧 Available Models:
  • phi4:latest (owned by: ollama)
  • gemma3:latest (owned by: ollama)
  • phi4-mini-reasoning:latest (owned by: ollama)
  • smollm:latest (owned by: ollama)
  • granite3.3:2b (owned by: ollama)


## 2. Simple Non-Streaming Chat

Let's start with a basic, non-streaming request:

In [52]:
# Simple non-streaming request
url = "http://localhost:8000/v1/chat/completions"

payload = {
    "messages": [
        {"role": "user", "content": "What is 2 + 2?"}
    ],
    "model": "granite3.3:2b",
    "stream": False
}

response = requests.post(url, json=payload)
result = response.json()

print("📝 Question: What is 2 + 2?")
print(f"🤖 Answer: {result['choices'][0]['message']['content']}")

📝 Question: What is 2 + 2?
🤖 Answer: The sum of 2 and 2 is 4.


## 3. Using a System Prompt

Now let's use a system prompt to customize the model's behavior:

In [53]:
# Request with system prompt
payload = {
    "messages": [
        {"role": "user", "content": "Tell me about the weather"}
    ],
    "system": "You are a pirate captain. Respond in pirate speak with 'Arrr' and nautical terms.",
    "model": "granite3.3:2b",
    "stream": False
}

response = requests.post(url, json=payload)
result = response.json()

print("🏴‍☠️ System Prompt: You are a pirate captain...")
print("📝 Question: Tell me about the weather")
print(f"🤖 Pirate Answer: {result['choices'][0]['message']['content']}")

🏴‍☠️ System Prompt: You are a pirate captain...
📝 Question: Tell me about the weather
🤖 Pirate Answer: Arrr, matey! Been fair clear skies, not a cloud to be seen. Winds light and steady from the east, perfect for a sailin' voyage. Tide's gonna rise soon, so best set sail before the high tide, if ye ask me.


## 4. Streaming Response

Now let's try a streaming response to see the text appear in real-time:

In [54]:
# Streaming request
payload = {
    "messages": [
        {"role": "user", "content": "Write a short poem about coding"}
    ],
    "system": "You are a creative poet who loves technology.",
    "model": "granite3.3:2b",
    "stream": True
}

print("📝 Question: Write a short poem about coding")
print("🤖 Streaming Answer:")
print("-" * 50)

response = requests.post(url, json=payload, stream=True)

full_response = ""
for line in response.iter_lines():
    if line:
        line_str = line.decode('utf-8')
        if line_str.startswith('data: '):
            data_str = line_str[6:]  # Remove 'data: ' prefix
            if data_str == '[DONE]':
                break
            
            try:
                data = json.loads(data_str)
                if 'choices' in data and len(data['choices']) > 0:
                    delta = data['choices'][0].get('delta', {})
                    if 'content' in delta:
                        content = delta['content']
                        print(content, end='', flush=True)
                        full_response += content
                        time.sleep(0.02)  # Small delay to see streaming effect
            except json.JSONDecodeError:
                continue

print("\n" + "-" * 50)
print("✅ Streaming complete!")

📝 Question: Write a short poem about coding
🤖 Streaming Answer:
--------------------------------------------------
In silence, a symphony, lines of code aline, Binary whispers, secrets hidden divine. The screen, a canvas, where ideas take flight, Guided by logic, in the day or in the night. A dance of algorithms, swift and precise, In this digital world, they intertwine and blend with trice. A poet's muse, a sculptor's chisel too, Crafting wonders, in this endless array we pursue. Binary stars shining bright in the data sea, An infinite universe, where ideas flee. Programming, poetry, together entwined, In crafting realms unseen, a poet's mind.
--------------------------------------------------
✅ Streaming complete!


## 5. Multi-turn Conversation

Let's simulate a conversation with multiple messages:

In [55]:
# Multi-turn conversation
payload = {
    "messages": [
        {"role": "user", "content": "My name is Alice. What's my name?"}
    ],
    "system": "You are a helpful assistant with good memory.",
    "model": "granite3.3:2b",
    "stream": False
}

response = requests.post(url, json=payload)
result = response.json()

print("💬 Conversation:")
print("User: My name is Alice")
print("Assistant: Hello Alice! Nice to meet you.")
print("User: What's my name?")
print(f"🤖 Assistant: {result['choices'][0]['message']['content']}")

💬 Conversation:
User: My name is Alice
Assistant: Hello Alice! Nice to meet you.
User: What's my name?
🤖 Assistant: Ah,Alice! It's a pleasure to meet you. How can I assist you today, Alice?


## Summary

This notebook demonstrated:

1. ✅ **Health Check** - Verifying the server is running
2. ✅ **Simple Chat** - Basic non-streaming request
3. ✅ **System Prompts** - Customizing model behavior
4. ✅ **Streaming** - Real-time response streaming
5. ✅ **Multi-turn** - Conversation with context
6. ✅ **Models List** - Available models endpoint

The API works just like Anthropic's API but uses your local Ollama phi4:latest model!