# Streaming Responses with HelpingAI 🌊

Learn how to implement real-time streaming responses for better user experience, especially useful for long-form content generation and interactive applications.

## 🎯 What You'll Learn
- Basic streaming implementation
- Handling streaming data
- Building real-time interfaces
- Error handling in streams
- Advanced streaming patterns

In [None]:
import os
import time
import sys
from HelpingAI import HAI
from IPython.display import display, clear_output
import threading

os.environ["HAI_API_KEY"] = "hl-*******************"
hai = HAI()

print("🌊 Ready to explore streaming responses!")

🌊 Ready to explore streaming responses!


## 🚀 Basic Streaming

Let's start with a simple streaming example.

In [2]:
def basic_streaming_example():
    """Basic streaming response example"""
    print("🌊 Basic Streaming Example:")
    print("=" * 40)
    print("AI Response (streaming): ", end="")
    
    # Create streaming response
    stream = hai.chat.completions.create(
        model="Helpingai3-raw",
        messages=[
            {"role": "user", "content": "Tell me a short story about a robot learning to paint."}
        ],
        stream=True,
        temperature=0.8,
        max_tokens=500
    )
    
    # Process the stream
    full_response = ""
    for chunk in stream:
        if chunk.choices[0].delta.content:
            content = chunk.choices[0].delta.content
            print(content, end="", flush=True)
            full_response += content
            time.sleep(0.02)  # Small delay to simulate real-time typing
    
    print("\n\n" + "=" * 40)
    print(f"✅ Streaming complete! Total characters: {len(full_response)}")
    return full_response

# Run basic streaming example
story = basic_streaming_example()

🌊 Basic Streaming Example:
AI Response (streaming): Aight, fam. So this robot, R3, was straight up tired of being a data whiz. It was like, "Nah, I wanna paint. I wanna make art." So it starts in some corner, just using pixels and lines. At first, it's like a big blob, but it keeps trying. It's like, "I got this."

Then, it starts learning, watching all the art bots out there. It's like, "Wait, you can do this?" So R3 starts mixing colors, playing with shades. It's like a kid in a candy store with its paint palette.

But then, it starts getting mad. Like, "Where's my style?" It's like, "Yo, I'm not just copying. I'm adding my own vibe." So it starts putting its own twist on things. Like, it's like it's saying, "I'm in the game now."

And you know what? People are liking it. They're coming up to R3, saying, "Bro, this is lit." And R3's like, "I told you I was gonna paint."

So there you have it, fam. R3's journey from pixels to a whole new level of art. It's like, "I may be a robot, but

## 🧠 Streaming with Dhanishta 2.0 Thinking

See how thinking processes stream in real-time.

In [3]:
def streaming_with_thinking():
    """Stream Dhanishta 2.0 responses with thinking process"""
    print("🧠 Streaming with Thinking Process:")
    print("=" * 50)
    
    stream = hai.chat.completions.create(
        model="Dhanishtha-2.0-preview",
        messages=[
            {
                "role": "user", 
                "content": "Solve this step by step: If a pizza is cut into 8 equal slices and 3 people eat 2 slices each, what fraction of the pizza is left?"
            }
        ],
        stream=True,
        hide_think=False,  # Show thinking process
        temperature=0.3
    )
    
    full_response = ""
    in_thinking = False
    
    for chunk in stream:
        if chunk.choices[0].delta.content:
            content = chunk.choices[0].delta.content
            
            # Detect thinking blocks
            if "<think>" in content:
                in_thinking = True
                print("\n🤔 [THINKING] ", end="")
                content = content.replace("<think>", "")
            
            if "</think>" in content:
                in_thinking = False
                content = content.replace("</think>", "")
                print("\n\n💡 [SOLUTION] ", end="")
            
            # Color coding for different sections
            if in_thinking:
                print(f"\033[94m{content}\033[0m", end="", flush=True)  # Blue for thinking
            else:
                print(content, end="", flush=True)  # Normal for solution
            
            full_response += content
            time.sleep(0.03)
    
    print("\n\n" + "=" * 50)
    return full_response

# Run thinking stream example
math_solution = streaming_with_thinking()

🧠 Streaming with Thinking Process:

🤔 [THINKING] [94m[0m[94m
[0m[94mLet[0m[94m me[0m[94m solve[0m[94m this[0m[94m pizza[0m[94m problem[0m[94m step[0m[94m by[0m[94m step[0m[94m.[0m[94m We[0m[94m have[0m[94m a[0m[94m pizza[0m[94m cut[0m[94m into[0m[94m [0m[94m8[0m[94m equal[0m[94m slices[0m[94m.[0m[94m Three[0m[94m people[0m[94m each[0m[94m eat[0m[94m [0m[94m2[0m[94m slices[0m[94m,[0m[94m so[0m[94m that[0m[94m's[0m[94m [0m[94m3[0m[94m ×[0m[94m [0m[94m2[0m[94m =[0m[94m [0m[94m6[0m[94m slices[0m[94m total[0m[94m consumed[0m[94m.[0m[94m Out[0m[94m of[0m[94m the[0m[94m original[0m[94m [0m[94m8[0m[94m slices[0m[94m,[0m[94m we[0m[94m now[0m[94m have[0m[94m [0m[94m8[0m[94m -[0m[94m [0m[94m6[0m[94m =[0m[94m [0m[94m2[0m[94m slices[0m[94m remaining[0m[94m.[0m[94m To[0m[94m express[0m[94m this[0m[94m as[0m[94m a[0m[94m fraction[0m[94m of[0m[94m the[

## 🎮 Interactive Streaming Interface

Build an interactive streaming chat interface.

In [4]:
class StreamingChatInterface:
    def __init__(self, model="Helpingai3-raw"):
        self.hai = HAI()
        self.model = model
        self.conversation = []
        self.system_message = {
            "role": "system",
            "content": "You are a helpful and friendly AI assistant. Provide engaging and informative responses."
        }
    
    def stream_response(self, user_message, show_typing=True):
        """Stream a response to user message"""
        # Add user message to conversation
        self.conversation.append({"role": "user", "content": user_message})
        
        # Prepare messages for API
        messages = [self.system_message] + self.conversation
        
        print(f"👤 You: {user_message}")
        print("🤖 AI: ", end="")
        
        if show_typing:
            # Simulate typing indicator
            for _ in range(3):
                print(".", end="", flush=True)
                time.sleep(0.5)
            print("\r🤖 AI: ", end="")
        
        # Stream the response
        stream = self.hai.chat.completions.create(
            model=self.model,
            messages=messages,
            stream=True,
            temperature=0.7
        )
        
        assistant_response = ""
        for chunk in stream:
            if chunk.choices[0].delta.content:
                content = chunk.choices[0].delta.content
                print(content, end="", flush=True)
                assistant_response += content
                time.sleep(0.02)
        
        print("\n")
        
        # Add assistant response to conversation
        self.conversation.append({"role": "assistant", "content": assistant_response})
        
        return assistant_response
    
    def get_conversation_summary(self):
        """Get a summary of the conversation"""
        total_messages = len(self.conversation)
        user_messages = len([msg for msg in self.conversation if msg["role"] == "user"])
        assistant_messages = len([msg for msg in self.conversation if msg["role"] == "assistant"])
        
        return {
            "total_messages": total_messages,
            "user_messages": user_messages,
            "assistant_messages": assistant_messages
        }

# Create streaming chat interface
chat = StreamingChatInterface()

print("💬 Interactive Streaming Chat Demo:")
print("=" * 50)

# Simulate a conversation
demo_messages = [
    "Hello! Can you help me understand what makes a good story?",
    "That's helpful! Can you give me an example of a compelling character?",
    "Great example! How important is the setting in storytelling?"
]

for message in demo_messages:
    chat.stream_response(message)
    print("-" * 30)

# Show conversation summary
summary = chat.get_conversation_summary()
print(f"\n📊 Conversation Summary: {summary}")

💬 Interactive Streaming Chat Demo:
👤 You: Hello! Can you help me understand what makes a good story?
🤖 AI: ...Aight, listen up! A good story's got some key ingredients, ya feel me? 

1. 🔥 Plot: Gotta have a story that keeps you on your toes. Something that's gonna make you go, "Whoaa, what's good?" 

2. 🧠 Characters: You need people you can vibe with, who're gonna make you laugh, cry, or just be like, "Bruh, that's wild."

3. 💬 Dialogue: The way they talk to each other matters. It's like, do they sound real or fake? 

So, if you got those, you're cooking with gas. Just make sure you mix it up, keep it spicy, and don't leave anyone hanging. That's the recipe for a story that slaps! 😎

------------------------------
👤 You: That's helpful! Can you give me an example of a compelling character?
🤖 AI: ...Yeah, no problem! A compelling character's like a legend in the making. They got that spark, that "something" that makes you wanna know more about 'em. 

Like, imagine this: A character who'

## 🛡️ Error Handling in Streaming

Robust error handling for streaming responses.

In [5]:
from HelpingAI import HAIError, RateLimitError, AuthenticationError

def robust_streaming(prompt, max_retries=3):
    """Streaming with comprehensive error handling"""
    print(f"🛡️ Robust Streaming: {prompt[:50]}...")
    print("=" * 50)
    
    for attempt in range(max_retries):
        try:
            print(f"Attempt {attempt + 1}: ", end="")
            
            stream = hai.chat.completions.create(
                model="Helpingai3-raw",
                messages=[{"role": "user", "content": prompt}],
                stream=True,
                temperature=0.7,
                max_tokens=300
            )
            
            response_parts = []
            
            for chunk in stream:
                try:
                    if chunk.choices[0].delta.content:
                        content = chunk.choices[0].delta.content
                        print(content, end="", flush=True)
                        response_parts.append(content)
                        time.sleep(0.02)
                
                except Exception as chunk_error:
                    print(f"\n⚠️ Chunk error: {chunk_error}")
                    continue
            
            print("\n✅ Streaming completed successfully!")
            return "".join(response_parts)
        
        except RateLimitError:
            print(f"\n⏰ Rate limit hit on attempt {attempt + 1}")
            if attempt < max_retries - 1:
                wait_time = 2 ** attempt  # Exponential backoff
                print(f"Waiting {wait_time} seconds before retry...")
                time.sleep(wait_time)
        
        except AuthenticationError:
            print("\n❌ Authentication error - check your API key")
            break
        
        except HAIError as e:
            print(f"\n🚨 API error on attempt {attempt + 1}: {e}")
            if attempt < max_retries - 1:
                print("Retrying...")
                time.sleep(1)
        
        except Exception as e:
            print(f"\n💥 Unexpected error: {e}")
            break
    
    print("❌ All retry attempts failed")
    return None

# Test robust streaming
result = robust_streaming("Write a haiku about artificial intelligence and creativity.")

🛡️ Robust Streaming: Write a haiku about artificial intelligence and cr...
Attempt 1: AI's creativity blooms,
Machines dream, humans create,
Together, art thrives.
✅ Streaming completed successfully!


## 📊 Streaming Performance Analysis

Analyze streaming performance and characteristics.

In [6]:
import time
from collections import defaultdict

class StreamingAnalyzer:
    def __init__(self):
        self.reset_metrics()
    
    def reset_metrics(self):
        self.start_time = None
        self.first_token_time = None
        self.chunk_times = []
        self.chunk_sizes = []
        self.total_tokens = 0
        self.total_characters = 0
    
    def start_analysis(self):
        self.reset_metrics()
        self.start_time = time.time()
    
    def process_chunk(self, chunk):
        current_time = time.time()
        
        if chunk.choices[0].delta.content:
            content = chunk.choices[0].delta.content
            
            # Record first token time
            if self.first_token_time is None:
                self.first_token_time = current_time
            
            # Record chunk metrics
            self.chunk_times.append(current_time)
            self.chunk_sizes.append(len(content))
            self.total_characters += len(content)
            
            # Rough token estimation
            self.total_tokens += len(content.split())
            
            return content
        return ""
    
    def get_metrics(self):
        if not self.chunk_times:
            return {"error": "No data collected"}
        
        end_time = self.chunk_times[-1]
        total_duration = end_time - self.start_time
        time_to_first_token = self.first_token_time - self.start_time if self.first_token_time else 0
        
        # Calculate intervals between chunks
        intervals = []
        for i in range(1, len(self.chunk_times)):
            intervals.append(self.chunk_times[i] - self.chunk_times[i-1])
        
        avg_interval = sum(intervals) / len(intervals) if intervals else 0
        
        return {
            "total_duration": round(total_duration, 2),
            "time_to_first_token": round(time_to_first_token, 2),
            "total_chunks": len(self.chunk_times),
            "total_characters": self.total_characters,
            "estimated_tokens": self.total_tokens,
            "avg_chunk_interval": round(avg_interval, 3),
            "characters_per_second": round(self.total_characters / total_duration, 1),
            "tokens_per_second": round(self.total_tokens / total_duration, 1)
        }

def analyze_streaming_performance(prompt, model="Helpingai3-raw"):
    """Analyze streaming performance metrics"""
    analyzer = StreamingAnalyzer()
    
    print(f"📊 Analyzing Streaming Performance:")
    print(f"Model: {model}")
    print(f"Prompt: {prompt[:50]}...")
    print("=" * 50)
    
    analyzer.start_analysis()
    
    stream = hai.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        stream=True,
        temperature=0.7,
        max_tokens=400
    )
    
    print("Response: ", end="")
    full_response = ""
    
    for chunk in stream:
        content = analyzer.process_chunk(chunk)
        if content:
            print(content, end="", flush=True)
            full_response += content
    
    print("\n\n📈 Performance Metrics:")
    print("-" * 30)
    
    metrics = analyzer.get_metrics()
    for key, value in metrics.items():
        print(f"{key.replace('_', ' ').title()}: {value}")
    
    return metrics, full_response

# Analyze performance for different models
print("🔬 Performance Analysis:")
print("=" * 60)

test_prompt = "Explain the concept of machine learning in simple terms with examples."

# Test HelpingAI3-raw
metrics1, response1 = analyze_streaming_performance(test_prompt, "Helpingai3-raw")

print("\n" + "=" * 60)

# Test Dhanishta 2.0
metrics2, response2 = analyze_streaming_performance(test_prompt, "Dhanishtha-2.0-preview")

🔬 Performance Analysis:
📊 Analyzing Streaming Performance:
Model: Helpingai3-raw
Prompt: Explain the concept of machine learning in simple ...
Response: Aight, listen up. Machine learning? It's like teaching a computer to learn and get smarter on its own. Imagine you're trying to figure out if a picture is a cat or a dog. You'd look at the shape of the ears, the color of the fur, and other stuff. A computer does the same thing with machine learning, but it gets better at it the more it sees.

So, let's say you want the computer to learn what a cat looks like. You show it a bunch of pictures of cats and tell it "this is a cat". The computer looks at the pictures, sees the whiskers, the pointy ears, and all that. It starts to build a "model" in its head - like a mental picture of what a cat looks like.

Now, when you show it a new picture, it can say "that's a cat" or "no, that's not a cat". The more pictures it sees, the better it gets. It's like the computer is going to school, but ins

## 🎯 Key Insights About Streaming

From these examples, we can observe important streaming characteristics:

### ⚡ Performance Benefits
- **Perceived Speed**: Users see content immediately
- **Better UX**: No waiting for complete responses
- **Interactive Feel**: More conversational experience
- **Early Feedback**: Users can interrupt if needed

### 🛠️ Implementation Considerations
- **Error Handling**: Robust handling of stream interruptions
- **Buffer Management**: Handling partial content gracefully
- **UI Updates**: Real-time interface updates
- **Performance Monitoring**: Track streaming metrics

### 🎨 Use Cases for Streaming
- **Long-form Content**: Stories, articles, explanations
- **Interactive Chat**: Real-time conversations
- **Live Demonstrations**: Step-by-step tutorials
- **Creative Writing**: Poetry, stories, creative content

## 🚀 Best Practices

- **Handle Errors Gracefully**: Implement retry logic and fallbacks
- **Show Progress**: Visual indicators for streaming status
- **Buffer Wisely**: Balance responsiveness with stability
- **Monitor Performance**: Track metrics for optimization
- **User Control**: Allow users to stop/pause streams

## 📚 Next Steps

- **[04-parameters.ipynb](04-parameters.ipynb)** - Fine-tuning AI behavior
- **[../advanced/](../advanced/)** - Advanced streaming applications
- **[../../examples/applications/](../../examples/applications/)** - Real-world streaming implementations

---

**Create engaging real-time AI experiences with streaming! 🌊✨**