Release 🏁 L0 (Python) v0.21.0 - Streaming Performance Overhaul, Guardrail Optimizations, and Drift Efficiency · ai-2070/reliable-ai-streams-py

This release is a major internal performance upgrade for the Python runtime.

No API changes — but substantial improvements to:

streaming efficiency (O(n²) → O(n))
guardrail execution cost
drift detection memory + speed
event + callback overhead

Net result: faster, more scalable streaming with lower overhead across the entire pipeline.

✨ Highlights

1. O(n) Token Accumulation (Major Performance Fix)

String concatenation during streaming has been replaced with a buffered approach.

Before:

state.content += token  # O(n²) over time

Now:

Tokens appended to _content_buffer
Joined lazily via a descriptor (_ContentDescriptor)
Flushed only when state.content is read

Result:

O(n) total complexity
Dramatically better performance for long streams
Reduced memory churn

2. Drift Detection: Sliding Window + Bounded Memory

Drift detection has been rewritten to avoid unbounded growth.

Changes:

list → deque(maxlen=N) for:
- entropy tracking
- token history
Only stores a window, not full content
Uses last_window instead of full last_content

Impact:

Stable memory usage
Faster drift checks
Better scalability on long-running streams

3. Guardrails: Significant Runtime Optimizations

JSON Guardrail

Adds is_json_content caching
Avoids repeated looks_like_json() calls
Resets cache correctly on stream resets

Markdown Guardrail

Skips all analysis during streaming
Only runs on completion

Pattern Guardrail (Major Change)

Precompiles all patterns into a single regex
Uses incremental scanning:
- scans only new content (+ small overlap)
- full scan only on completion

Result:

From repeated full scans → near O(delta)
Much lower overhead on large streams

4. Runtime Hot Path Optimizations

Callback Execution

Skips function calls when callbacks are None
Reduces overhead per token

Observability Events

Guardrail observability only runs if handlers exist
Avoids unnecessary timing + event construction

Buffer Reset Fixes

_content_buffer now cleared correctly on:
- retries
- checkpoint resets

5. Improved Checkpoint + State Handling

Ensures buffer and content stay in sync during:
- retries
- invalid checkpoint recovery
Prevents subtle duplication or stale state issues

6. Updated Benchmarks (Python 3.13)

Performance improvements reflected in benchmarks:

L0 Core: ~596K tokens/sec
Full Stack: ~114K tokens/sec
Lower overhead percentages across most scenarios

Still comfortably above real-world model throughput.

7. Documentation Updates

README now includes Python performance section
BENCHMARKS.md updated with latest numbers
WHITEPAPER.md significantly expanded

🧭 Upgrade Notes

No breaking changes
Fully backward compatible
Strongly recommended if you:
- stream large outputs
- use guardrails heavily
- rely on drift detection
- run long-lived pipelines

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🏁 L0 (Python) v0.21.0 - Streaming Performance Overhaul, Guardrail Optimizations, and Drift Efficiency

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

✨ Highlights

1. O(n) Token Accumulation (Major Performance Fix)

2. Drift Detection: Sliding Window + Bounded Memory

3. Guardrails: Significant Runtime Optimizations

JSON Guardrail

Markdown Guardrail

Pattern Guardrail (Major Change)

4. Runtime Hot Path Optimizations

Callback Execution

Observability Events

Buffer Reset Fixes

5. Improved Checkpoint + State Handling

6. Updated Benchmarks (Python 3.13)

7. Documentation Updates

🧭 Upgrade Notes

Uh oh!