A topic index of streaming protocols, platforms, and processing engines tracked by API Evangelist.
This repository is the landscape map for streaming as a distinct discipline from events (discrete event envelopes) and async-apis (the AsyncAPI specification). Streaming is the real-time, high-volume, often bidirectional pipe between producers and consumers — partitioned, replayable, ordered (per-partition), and increasingly the spine of operational AI, observability, and customer-facing real-time experiences.
| Streaming | Events | Async APIs | |
|---|---|---|---|
| Primary artifact | A partitioned, replayable log / pipe | A discrete event envelope (CloudEvents) | A contract document (AsyncAPI spec) |
| Ordering | Per-partition / per-key | Per producer | Per channel |
| Volume | High-throughput, sustained | Bursty, business-significant | Defined by the contract |
| Replay | Yes (offsets, sequence numbers) | Sometimes (event store) | Out of scope |
| Bidirectional | Often (WebSocket, gRPC bidi) | No | Defined per channel |
| Examples here | Kafka, Kinesis, WebSocket, Flink | CloudEvents, Webhooks | AsyncAPI 3.0 |
The three topics overlap (CloudEvents flow over Kafka, AsyncAPI describes Kafka channels), but the primary operational concerns differ enough that they get separate catalogs.
The reference architecture: an append-only, partitioned log with consumer-tracked offsets. Records are durable, replayable, and ordered within a partition.
- Apache Kafka — the reference open-source streaming platform; topics, partitions, consumer groups, exactly-once, KRaft consensus.
- Apache Pulsar — cloud-native, multi-tenant, with BookKeeper tiered storage and native geo-replication.
- Redpanda — Kafka-API-compatible, written in C++, single binary, no ZooKeeper/JVM.
- NATS JetStream — JetStream brings persistent streaming to NATS for edge, IoT, and microservice topologies.
The hyperscalers offer fully managed log-structured or pub/sub services, often with Kafka-protocol compatibility.
- Amazon Kinesis — Data Streams (shards), Firehose (delivery), Video Streams (media), HTTP/2 SubscribeToShard.
- Google Cloud Pub/Sub + Dataflow — managed pub/sub with at-least-once and exactly-once, fed into Beam pipelines on Dataflow.
- Azure Event Hubs — Kafka-protocol compatible with Capture to ADLS / Blob.
- Confluent Cloud — managed Kafka by Kafka's original authors, plus ksqlDB, Schema Registry, Stream Governance, and Flink.
- StreamNative — managed Apache Pulsar from Pulsar's contributors.
The protocols an API consumer actually speaks when subscribing to a real-time feed. These are what end-users see in an SDK — the platform tier above is what an operator runs.
- Server-Sent Events (SSE) — one-way HTTP streaming using
text/event-stream; the de facto delivery for LLM token streams and live dashboards. - WebSocket — full-duplex, bidirectional, RFC 6455; foundation for chat, collaborative apps, market data, and real-time control planes.
- gRPC streaming — server-streaming, client-streaming, and bidirectional RPC over HTTP/2; the default for service-to-service streaming and Kubernetes-native APIs.
- GraphQL Subscriptions — GraphQL operation type for typed streams, typically over WebSocket via
graphql-ws.
Streams aren't useful unless data flows into and out of them. The connector tier ingests from databases, file systems, and SaaS APIs and delivers to warehouses, search indexes, and lakes.
- Kafka Connect — source/sink connector framework; distributed mode is a REST-controlled cluster of workers.
- Debezium — change-data-capture for Postgres, MySQL, MongoDB, SQL Server, Oracle, Cassandra; reads each database's native replication log and emits row-level change events into Kafka.
Once you have a stream, you process it: filter, enrich, join, window, materialize. This tier defines the event-time / watermark / window / trigger semantics that make stream processing correct.
- Apache Flink — distributed stateful stream processing; SQL, DataStream, Table APIs; reference engine for event-time, watermarks, and exactly-once state.
- Spark Structured Streaming — micro-batch (and experimental continuous) processing on Spark SQL; "stream as unbounded table."
- Materialize — operational data warehouse maintaining incrementally updated materialized views via Differential Dataflow.
- Tinybird — real-time analytics on ClickHouse, exposing SQL pipes as parameterized HTTP API endpoints.
- Bytewax — Python-native stream processing on Timely Dataflow; built for data scientists and ML teams.
- Apache Beam — unified batch + streaming model that compiles to Dataflow, Flink, Spark, and Samza; defines the canonical streaming semantics.
| File | Purpose |
|---|---|
apis.yml |
APIs.json catalog of every platform, protocol, and engine indexed here, with links to docs and topic-level API Evangelist repos. |
vocabulary/streaming-vocabulary.yml |
Domain vocabulary — stream, topic, partition, offset, watermark, window, exactly-once, CDC, schema registry, and the rest. |
json-schema/streaming-stream-schema.json |
JSON Schema for a Stream — a topic / shard / partitioned log as it appears in a catalog. |
json-schema/streaming-stream-record-schema.json |
JSON Schema for a StreamRecord — the unit of data on a stream, generalized across Kafka, Pulsar, Kinesis, Pub/Sub. |
json-schema/streaming-stream-platform-schema.json |
JSON Schema for a StreamPlatform — catalog record for a Kafka, Pulsar, Kinesis, etc. installation. |
json-structure/streaming-stream-structure.json |
JSON Structure description of a Stream record for control-plane catalog APIs. |
json-ld/streaming-context.jsonld |
JSON-LD context mapping the streaming vocabulary into schema.org-compatible linked data. |
examples/ |
Reference payloads for a Stream, a Stream Record, and a Stream Platform record. |
events— discrete event envelopes, CloudEvents, webhook patterns.async-apis— AsyncAPI specification and async API contracts.asyncapi— the AsyncAPI Initiative itself.webhooks— server-to-server callback patterns.cloudevents— the CloudEvents specification.kafka/apache-kafka— the Kafka project.apache-flink,materialize,amazon-kinesis,kafka-connect— provider-level repos for individual streaming products.
Maintained by Kin Lane and the API Evangelist Network. Contributions and corrections welcome via PR.