Hands‑on lab to learn end‑to‑end streaming with:
- PostgreSQL logical replication + Outbox pattern
- Debezium (Kafka Connect) Change Data Capture (CDC)
- Kafka topics (CDC envelopes vs direct app events)
- Python producer & consumer building a materialized projection
- Optional ksqlDB / future Kafka Streams extension
- Docker Compose orchestrates services
- Apache Kafka (broker + Zookeeper)
- Kafka Connect (Debezium Postgres connector)
- PostgreSQL 15 (orders + outbox tables + projection table)
- Schema Registry (placeholder for future Avro use)
- Python producer & consumer containers (or run locally)
- Optional: ksqlDB server (enable via profile
ksql)
- Capture DB changes as immutable Kafka events (CDC).
- Compare direct app-produced JSON vs Debezium envelopes.
- Build a read model (
order_totals) from a stream. - Replay state by truncating and re-consuming.
- Inspect replication artifacts (slot, publication) & message metadata.
- db: primary Postgres OLTP (orders, order_items)
- connect: Kafka Connect + Debezium for CDC of public schema tables
- producer: simple Python app generating Orders
- consumer: Python app consuming enriched order events and writing aggregates
- streams-app: Java (Kafka Streams) performing join & aggregation (optional)
- schema-registry: Confluent Schema Registry (Avro serialization)
- Producer inserts into Postgres (orders table) and also writes an outbox event (orders_outbox).
- Debezium captures row-level changes -> emits to Kafka topics (dbserver1.public.orders, ... , outbox.events).
- Streams app / ksqlDB transforms raw CDC events into domain events topic (orders.events.v1).
- Consumer aggregates total order value per customer -> writes to table order_totals.
Prereqs: Docker & Docker Compose, Java 17 (for Streams app), Python 3.11.
- Docker Desktop (or other Docker engine) running
- Python 3.11+ (if running producer/consumer locally)
- (Optional)
jqfor pretty JSON
git clone <this repo> # if not already
cd kafka-playground
# 1. Start ONLY Postgres first time not required anymore; start full stack:
/scripts
# 2. Register Debezium connector (creates slot + publication + topics)
./scripts/register_connector.sh # or: bash scripts/register_connector.sh
# 3. Verify connector RUNNING
curl -s http://localhost:8083/connectors/orders-outbox-connector/status
# 4. List topics (should see dbserver1.public.* after first inserts)
register_connector.sh
# 5. (Optional) Run producer locally instead of container
pip install -r services/python/requirements.txt
python services/python/producer.py
# 6. (Optional) Run consumer locally
python services/python/consumer.pyThe included producer/consumer containers also start automatically (see docker compose ps).
- Confirm Postgres replication settings:
/dbExpect: logical.
2. Check replication slot & publication:
init
01_schema.sql- Insert order (if producer not running):
02_seed.sql- Consume a CDC message:
/services
pythonLook for JSON with op":"c" (create) and metadata block source.
Container mode (already running):
requirements.txt
docker compose logs -f consumer | headLocal mode (stop containerized ones first if you wish):
producer.py
python services/python/producer.py
python services/python/consumer.pyProjection table: order_totals (sum + count per customer).
Inspect:
consumer.pyReplay (rebuild from earliest offsets):
streams-app
python services/python/consumer.py # restart consumer; it will re-consume| Purpose | Topic |
|---|---|
| Debezium CDC orders | dbserver1.public.orders |
| Debezium CDC outbox | dbserver1.public.outbox_events |
| Heartbeat | __debezium-heartbeat.dbserver1 |
| Connect internal | connect-configs, connect-offsets, connect-status |
| Direct published events (bypass CDC) | outbox.events |
| Script | Purpose |
|---|---|
scripts/register_connector.sh |
Registers Debezium Postgres connector |
scripts/check_env.sh |
Quick diagnostic (replication settings, topics, connector) |
# Show running services
build.gradle
# List topics (all)
settings.gradle
# Consume N CDC messages
src/main/java/... (Streams code)
## Troubleshooting
| Symptom | Check / Fix |
|---------|-------------|
| Connector 400: wal_level must be logical | Ensure `db` service includes the `command` with `wal_level=logical`; `docker compose down -v && up -d` |
| No `dbserver1.public.*` topics | Insert a new row; ensure connector RUNNING; list topics without grep |
| Connect logs show broker errors | Ensure `KAFKA_LISTENERS` & `KAFKA_ADVERTISED_LISTENERS` both set; restart kafka & connect |
| Projection not updating | Confirm consumer running; check its stdout/logs; ensure topic offsets not exhausted |
| Want a clean slate | `docker compose down -v` removes volumes (loses DB data) |
| Disk bloat during experimentation | `docker system prune -f` |
Detailed manual steps for Docker Desktop hiccups & Debezium snapshot issues are also summarized inside the troubleshooting section above.
## Extending the Playground
Short next-task ideas:
1. Remove direct publish to `outbox.events` and derive everything from CDC.
2. Introduce Avro + Schema Registry (switch producer serializer & consumer deserializer).
3. Add Kafka Streams / ksqlDB to project a cleaner `orders.events.v1` topic.
4. Add a dead-letter topic for consumer failures.
5. Add a replay script that rebuilds `order_totals` purely from CDC topics.
6. Add tests (Python) for aggregation logic (idempotency / replay).
## Cleanup
```bash
- Add schema evolution (alter orders table, update Avro schema).
- Add dead letter topic handling.
- Switch broker to KRaft mode (remove Zookeeper).
MIT
- Replace Python consumer with Materialize or ClickHouse sink.
MIT