Add adaptive backpressure and bound librdkafka internal memory by jghoman · Pull Request #32 · PostHog/millpond

jghoman · 2026-04-02T18:21:43Z

Summary

Two complementary mechanisms to prevent OOM during catchup and traffic spikes:

Adaptive batch sizing (`backpressure.py`)

Consume batch size scales proportionally with buffer fullness:

fullness = pending_bytes / flush_size
batch_size = max(10, CONSUME_BATCH_SIZE * (1.0 - fullness))

Buffer empty → consume at full speed. Buffer approaching flush threshold → consume in tiny batches. No state machine, no mode switching. Handles catchup, steady state, and bursts with one formula.

This is a throughput-smoothing mechanism — it controls how fast we dequeue from librdkafka's internal buffer.

librdkafka memory bound (`consumer.py`)

queued.max.messages.kbytes=16384 (16MB per partition). This is the OOM prevention mechanism. Without it, librdkafka pre-fetches up to 64MB per partition regardless of consume rate. With 8 partitions (prod at 64 replicas), that's 512MB of uncontrolled internal buffering. Now capped at 128MB.

New metrics

millpond_buffer_fullness — ratio of pending bytes to flush size (0.0 = empty, 1.0 = flush threshold)
millpond_consume_batch_size_current — current adaptive batch size

Test plan

150 unit tests pass (14 new for backpressure, 1 new for queued.max.messages.kbytes)
Lint + format clean
Deploy to dev, verify buffer_fullness metric in Grafana
Verify no OOM during catchup with reduced flush sizes

Proportional batch sizing: consume batch size scales linearly from CONSUME_BATCH_SIZE (buffer empty) to 10 (buffer at flush threshold). Smooths throughput during catchup and traffic spikes. Bound librdkafka memory: queued.max.messages.kbytes=16384 (16MB per partition) prevents librdkafka from pre-fetching unbounded data. This is the actual OOM prevention; batch sizing is throughput smoothing. New metrics: millpond_buffer_fullness, millpond_consume_batch_size_current.

Copilot

Pull request overview

Adds adaptive consume backpressure and caps librdkafka’s internal consumer queue to reduce OOM risk during catchup and traffic spikes.

Changes:

Introduces proportional batch sizing based on pending buffer fullness (millpond/backpressure.py) and wires it into the main consume loop.
Adds a librdkafka queue memory bound via queued.max.messages.kbytes and tests for it.
Adds Prometheus gauges for buffer fullness and current adaptive batch size, plus documentation updates.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`millpond/backpressure.py`	New adaptive batch sizing logic + metric emission.
`millpond/main.py`	Uses adaptive batch size when calling `consumer.consume()`.
`millpond/consumer.py`	Sets `queued.max.messages.kbytes` to bound internal buffering.
`millpond/metrics.py`	Adds gauges for buffer fullness and current batch size.
`tests/unit/test_backpressure.py`	Unit tests for batch sizing + metric updates.
`tests/unit/test_consumer.py`	Unit test asserting consumer config includes queue bound.
`README.md`	Documents adaptive backpressure behavior and metrics.
`AGENT.md`	Adds design notes for adaptive backpressure and metrics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Remove unused logger from backpressure.py - Clamp max_batch_size to MIN_BATCH_SIZE in init() - Use setdefault for queued.max.messages.kbytes (allow env override) - Fix README formula to include int() - Align docs: backpressure is throughput smoothing, not OOM prevention - Fix AGENT.md: buffer_fullness can exceed 1.0 - Merge conflict resolution: include offset resume tests from main

jghoman requested a review from Copilot April 2, 2026 18:23

Copilot started reviewing on behalf of jghoman April 2, 2026 18:23 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

Comment thread millpond/backpressure.py Outdated

Comment thread millpond/backpressure.py Outdated

Comment thread millpond/consumer.py

Comment thread README.md Outdated

Comment thread README.md Outdated

Comment thread AGENT.md Outdated

jghoman merged commit 33f49e7 into main Apr 2, 2026
15 checks passed

jghoman deleted the adaptive-batch-size branch April 2, 2026 19:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add adaptive backpressure and bound librdkafka internal memory#32

Add adaptive backpressure and bound librdkafka internal memory#32
jghoman merged 2 commits intomainfrom
adaptive-batch-size

jghoman commented Apr 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jghoman commented Apr 2, 2026

Summary

Adaptive batch sizing (backpressure.py)

librdkafka memory bound (consumer.py)

New metrics

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Adaptive batch sizing (`backpressure.py`)

librdkafka memory bound (`consumer.py`)