# Percolate API Streaming Example

This notebook demonstrates how to properly consume streaming responses from the Percolate API in a Jupyter notebook environment.

The key to getting true streaming character-by-character display is to use the `requests` library with `stream=True` instead of TestClient, and process each chunk as it arrives.

In [None]:
import requests
import json
import time
from IPython.display import clear_output, display
import sys
import os

# Add the parent directory to the path so we can import percolate
sys.path.append(os.path.dirname(os.getcwd()))

## Setup API URL

Change this to match your environment

In [None]:
# API URL - adjust as needed for your environment
API_URL = "http://localhost:8000/chat/completions"

## Method 1: Using TestClient (problematic for streaming)

This approach will show the issue you're experiencing - chunks appear all at once rather than character-by-character.

In [None]:
from fastapi.testclient import TestClient
from fastapi import FastAPI
import percolate.api.routes.chat.router as chat_router

# Create a simple FastAPI app for testing
app = FastAPI()
app.include_router(chat_router.router, prefix="/chat")

# Create test client
client = TestClient(app)

# Request payload
payload = {
    "model": "gpt-4o-mini",  # Change to match your available model
    "prompt": "Write a poem about Paris in 4 lines",
    "max_tokens": 50,
    "temperature": 0.7,
    "stream": True
}

# Make streaming request with TestClient
print("Using TestClient (chunks appear all at once):")
with client.stream("POST", "/chat/completions", json=payload) as response:
    for chunk in response.iter_text():
        print(f"Received chunk: {chunk}")

## Method 2: Using requests with stream=True (recommended)

This approach properly shows incremental updates character-by-character.

In [None]:
def stream_with_requests():
    """Example of properly consuming a streaming response with requests."""
    
    # Request payload
    payload = {
        "model": "gpt-4o-mini",  # Change to match your available model
        "prompt": "Write a poem about Paris in 4 lines",
        "max_tokens": 100,
        "temperature": 0.7,
        "stream": True
    }
    
    # Make the request with streaming enabled
    print("Sending request...")
    response = requests.post(
        API_URL,
        json=payload,
        headers={"Content-Type": "application/json"},
        stream=True  # This is critical for streaming!
    )
    
    if response.status_code != 200:
        print(f"Error: {response.status_code}")
        print(response.text)
        return
    
    print("\nStreaming response (character by character):")
    print("-" * 50)
    
    # Accumulated text for display
    full_text = ""
    
    # Process each line in the stream
    for line in response.iter_lines():
        if line:
            try:
                # Decode the line from bytes to string
                line_text = line.decode('utf-8')
                
                # Handle SSE format if it starts with 'data: '
                if line_text.startswith('data: '):
                    line_text = line_text[6:]  # Remove 'data: ' prefix
                
                # Skip the [DONE] marker
                if line_text == "[DONE]":
                    continue
                
                # Parse the JSON
                data = json.loads(line_text)
                
                # Extract content based on the response format
                content = ""
                if "choices" in data and data["choices"]:
                    # Standard OpenAI format
                    if "delta" in data["choices"][0]:
                        delta = data["choices"][0]["delta"]
                        if "content" in delta:
                            content = delta["content"]
                    # Our canonical format
                    elif "text" in data["choices"][0]:
                        content = data["choices"][0]["text"]
                
                # Add to full text and display incrementally
                if content:
                    full_text += content
                    # Display each character with a slight delay to demonstrate streaming
                    for char in content:
                        # Print character by character with flush to show incremental updates
                        print(char, end='', flush=True)
                        time.sleep(0.01)  # Small delay to make the streaming visible
            
            except json.JSONDecodeError:
                # Handle non-JSON data if any
                print(f"Raw: {line.decode('utf-8')}")
    
    print("\n" + "-" * 50)
    print(f"Complete text: {full_text}")

# Run the example
stream_with_requests()

## Method 3: Advanced Visualization with IPython

This method uses IPython's display capabilities to show a more visually appealing incremental update.

In [None]:
from IPython.display import display, HTML, clear_output

def stream_with_ipython_display():
    """Example with IPython's display capabilities for nice visualization."""
    
    payload = {
        "model": "gpt-4o-mini",  # Change to match your available model
        "prompt": "Write a poem about Paris in 4 lines, highlighting the beauty of the Eiffel Tower.",
        "max_tokens": 100,
        "temperature": 0.7,
        "stream": True
    }
    
    print("Sending request...")
    response = requests.post(
        API_URL,
        json=payload,
        headers={"Content-Type": "application/json"},
        stream=True
    )
    
    if response.status_code != 200:
        print(f"Error: {response.status_code}")
        print(response.text)
        return
    
    print("Streaming response with IPython display:")
    
    # Accumulated text
    full_text = ""
    new_text = ""
    
    # Process each line
    for line in response.iter_lines():
        if line:
            try:
                line_text = line.decode('utf-8')
                if line_text.startswith('data: '):
                    line_text = line_text[6:]
                if line_text == "[DONE]":
                    continue
                    
                data = json.loads(line_text)
                
                content = ""
                if "choices" in data and data["choices"]:
                    if "delta" in data["choices"][0]:
                        delta = data["choices"][0]["delta"]
                        content = delta.get("content", "")
                    elif "text" in data["choices"][0]:
                        content = data["choices"][0]["text"]
                
                if content:
                    # Add to accumulated text
                    full_text += content
                    new_text += content
                    
                    # Display with highlighting for new text
                    display_text = full_text[:-len(new_text)] + f"<span style='color:red;font-weight:bold'>{new_text}</span>"
                    clear_output(wait=True)
                    display(HTML(f"<p>{display_text}</p>"))
                    
                    # Reset new text after a short delay
                    if len(new_text) > 10 or " " in new_text:
                        time.sleep(0.3)
                        new_text = ""
                    else:
                        time.sleep(0.05)
                    
            except json.JSONDecodeError:
                pass
    
    # Final display without highlighting
    clear_output(wait=True)
    display(HTML(f"<p>{full_text}</p>"))

# Run the advanced example
stream_with_ipython_display()

## Why TestClient Doesn't Show Character-by-Character Streaming

The TestClient in FastAPI is designed for testing endpoints, not for real-world consumption of streaming APIs. It doesn't process streams the same way that a real HTTP client would.

When you use TestClient:
1. It collects chunks as defined by the server's `yield` statements
2. It doesn't break these down further into smaller increments
3. It presents each complete chunk as a single unit

For true streaming behavior, always use `requests` with `stream=True` in real-world applications.

## Summary

To get proper streaming in your application:

1. Use the real `requests` library, not TestClient
2. Always set `stream=True` in your requests call
3. Process chunks incrementally using `response.iter_lines()`
4. Display content character-by-character with proper flushing

This approach will show true incremental streaming behavior, just like the real API would deliver.