feat: streaming benchmark suite — PromptKit vs LangChain vs Strands by chaholl · Pull Request #915 · AltairaLabs/PromptKit

chaholl · 2026-04-07T12:50:30Z

Summary

Reproducible benchmark suite comparing PromptKit's LLM streaming performance against LangChain and Strands Agents (AWS AgentCore preferred runtime).

Mock upstream servers — OpenAI SSE, Deepgram STT WebSocket, Cartesia TTS WebSocket with configurable latency profiles
Client harness — concurrent load driver with per-request timing, percentiles, jitter, RSS/CPU sampling, JSON/markdown/CSV output
Framework implementations — minimal, idiomatic wrappers for PromptKit (Go/SDK), LangChain (Python/FastAPI), Strands Agents (Python/FastAPI), Pipecat (Python/voice pipeline)
Docker Compose + Makefile orchestration for one-command reproducible runs

Local benchmark results (darwin/arm64 14 cores, fast profile)

Concurrent	PK rps	PK p50	LC rps	LC p50	SA rps	SA p50
100	889	107ms	319	316ms	185	540ms
1000	921	1.06s	192	5.23s	175	5.19s
5000	922	5.32s	2.9	44.7s	122	17.6s
10000	920	10.7s	1.7	1m40s	115	14.9s

PromptKit sustains ~920 rps from 100 to 10k concurrent. LangChain collapses at 2.5k. Strands degrades gracefully but peaks at ~193 rps (4.8x less throughput, 7.7x less memory-efficient per rps).

Test plan

All 22 Go tests pass (benchmarks/mockupstream/ + benchmarks/harness/)
Mock upstream smoke tested locally (all 3 protocols)
Harness smoke tested against mock upstream
PromptKit Round 1 benchmarked at 10-25k concurrent
LangChain Round 1 benchmarked at 10-10k concurrent
Strands Agents Round 1 benchmarked at 10-10k concurrent
Docker Compose end-to-end (not yet validated — local runs used direct processes)
Round 2 voice pipeline (Pipecat) — deferred to follow-up

Add benchmarks/ as a new Go module in the workspace with: - go.mod declaring module github.com/AltairaLabs/PromptKit/benchmarks - gorilla/websocket v1.5.3 and gopkg.in/yaml.v3 v3.0.1 dependencies - Makefile with build-mock, build-harness, round1, round2, all, clean targets - results/.gitkeep placeholder directory - .gitignore entries for *.json, *.csv, *.md result artifacts

Implements Profile, OpenAIProfile, STTProfile, TTSProfile structs with YAML tags and time.Duration fields. Provides DefaultProfile() and LoadProfile(path) with full test coverage. Adds fast.yaml (10ms delays) and realistic.yaml (200ms first-chunk, 30ms inter-chunk) profiles. Also fixes unused "net/http" import in stt_ws_test.go.

Implements NewTTSHandler(cfg TTSProfile) http.Handler that upgrades HTTP to WebSocket at /tts/ws, reads a Cartesia-compatible synthesis request, waits FirstByteDelay, streams ceiling(32000/ChunkSize) full binary audio chunks (each exactly ChunkSize bytes), then sends JSON {"type":"done"}. Includes TestTTSWebSocket_StreamsAudio and TestTTSWebSocket_FirstByteDelay.

Implements NewSTTHandler(STTProfile) http.Handler at /v1/listen with Deepgram-compatible Results events. Read loop accepts binary audio frames and CloseStream JSON; write loop emits interim transcripts on a ticker and guarantees at least one interim before sending the final transcript. Also declares the shared wsUpgrader used by all WebSocket handlers.

wsUpgrader is already declared in tts_ws.go (added by the TTS task). Remove the redundant declaration to avoid a redeclaration build error.

Implements BenchmarkReport and TierResult types with WriteJSON (indented), RenderMarkdown (table with Framework/Concurrent/p50/p99/Throughput/RSS columns), and WriteCSV (header + one row per tier result). Round-trip and content tests included.

…ation Wraps the PromptKit SDK behind an OpenAI-compatible HTTP endpoint so the benchmark harness can measure it identically to LangChain and Pipecat.

Add docker-compose.yaml with profiled services for round1 (LLM streaming) and round2 (voice pipeline), using repo root as build context so Dockerfiles can access the full monorepo. Add Dockerfiles for mockupstream and harness using multi-stage builds. Replace stub Makefile with full orchestration: tiered concurrency loops for both rounds, help target, and local build/clean targets.

- mockupstream/stt_ws.go: handle Pipecat's KeepAlive (ignored) and Finalize (emit final transcript without closing) messages in the STT WebSocket read loop; accept any URL path/query-params so the Deepgram SDK's decorated handshake URLs connect cleanly - mockupstream/tts_ws.go: auto-detect Pipecat/Cartesia request format (transcript + voice.id) and respond with base64-encoded JSON chunk frames instead of raw binary; simple protocol unchanged - mockupstream/stt_ws_test.go: add TestSTTWebSocket_KeepAliveIgnored and TestSTTWebSocket_FinalizeTriggersImmediateFinal - mockupstream/tts_ws_test.go: add TestTTSWebSocket_PipecatProtocol - benchmarks/frameworks/promptkit/round2: new standalone Go WebSocket server that coordinates STT→LLM→TTS using stdlib + gorilla/websocket (no SDK dependency), measuring raw Go runtime performance - go.work: add round2 module

…ipeline Replace Pipecat framework-specific implementation with a raw Python asyncio equivalent that coordinates STT → LLM → TTS using the same WebSocket protocol as the PromptKit round2 server. This measures Python's async runtime overhead for voice pipeline coordination — the fairest comparison: same protocol, same pipeline logic, different language runtime.

…iver - Real Pipecat framework using FastAPIWebsocketTransport for multi-client - Protobuf frame serialization (AudioRawFrame) matching Pipecat's wire format - Generated Go protobuf bindings from Pipecat's frames.proto - Updated mock TTS to include context_id for Cartesia protocol compat - Updated mock STT/TTS to use catch-all handlers for SDK URL decoration - Added python-asyncio as separate framework (renamed from pipecat) - Harness round2-pipecat mode with 440Hz sine wave audio for VAD triggering

…entations - GenKit: Google's Go-based AI framework, OpenAI-compatible plugin pointing at mock upstream. Benchmarked Round 1 at 10-10k concurrent. - LiveKit Agents: Python voice pipeline framework with fake I/O pattern (bypasses LiveKit server). FastAPI WebSocket wrapper for harness compat. Round 2 deferred — needs mock HTTP STT/TTS endpoints (/audio/transcriptions, /audio/speech) which aren't implemented yet. - Renamed python-asyncio framework (was incorrectly in pipecat/ directory)

…sults - Mock upstream now serves /v1/audio/transcriptions (Whisper-compatible) and /v1/audio/speech (TTS) on the OpenAI port for LiveKit Agents compat - LiveKit Agents benchmarked at 1-100 concurrent voice sessions - Remove GenKit from Round 1 comparison (easily optimizable HTTP client config)

sonarqubecloud · 2026-04-07T16:58:23Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

chaholl added 24 commits April 7, 2026 12:33

feat(benchmarks): add mock OpenAI SSE server

f0629ba

fix(benchmarks): remove duplicate wsUpgrader from stt_ws.go

dba114a

wsUpgrader is already declared in tts_ws.go (added by the TTS task). Remove the redundant declaration to avoid a redeclaration build error.

feat(benchmarks): add mock upstream CLI entry point

3fffb0e

feat(benchmarks): add resource sampler (RSS/CPU via ps)

62c2d07

feat(benchmarks): add metrics collection and percentile aggregation

43e20fe

feat(benchmarks): add Round 1 streaming benchmark driver

d5bbc90

feat(benchmarks): add Round 2 voice pipeline benchmark driver

aade12a

feat(benchmarks): add harness CLI entry point

8615917

feat(benchmarks): add LangChain and Pipecat framework implementations

01bfdd9

feat(benchmarks): add PromptKit Round 1 streaming framework implement…

7e85378

…ation Wraps the PromptKit SDK behind an OpenAI-compatible HTTP endpoint so the benchmark harness can measure it identically to LangChain and Pipecat.

fix(benchmarks): compute throughput and wall clock in harness CLI

b7800f7

fix(benchmarks): use valid PromptPack schema in benchmark chat pack

b4abb58

feat(benchmarks): add Strands Agents framework implementation

20a80a3

chaholl mentioned this pull request Apr 7, 2026

perf: investigate 930 rps streaming throughput ceiling #916

Closed

chaholl merged commit 1a508b8 into main Apr 7, 2026
32 checks passed

chaholl deleted the feat/benchmark-suite branch April 18, 2026 17:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: streaming benchmark suite — PromptKit vs LangChain vs Strands#915

feat: streaming benchmark suite — PromptKit vs LangChain vs Strands#915
chaholl merged 24 commits intomainfrom
feat/benchmark-suite

chaholl commented Apr 7, 2026

Uh oh!

sonarqubecloud Bot commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chaholl commented Apr 7, 2026

Summary

Local benchmark results (darwin/arm64 14 cores, fast profile)

Test plan

Uh oh!

sonarqubecloud Bot commented Apr 7, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant