perf: O(1) qsize, AWS Full Jitter, buffered recordings by TexasCoding · Pull Request #111 · TexasCoding/kalshi-python-sdk

TexasCoding · 2026-05-17T11:42:03Z

Summary

Three disjoint performance fixes from the v1.2 audit:

Performance: MessageQueue.qsize() is O(n) linear scan #103 MessageQueue.qsize() is now O(1) via a counter updated on put/get/__anext__/DROP_OLDEST eviction (sentinel never counts). Picked the counter approach per task hint.
Performance: REST retry jitter is additive, not proportional (AWS Full Jitter) #104 REST retry backoff switched to AWS Full Jitter — uniform(0, min(cap, base * 2**attempt)). Cap applied before randomization so the upper bound never exceeds retry_max_delay.
Performance: RecordingTransport does O(N²) work per recording session #105 RecordingTransport rewrite: new _RecorderBuffer lazy-loads existing on-disk pairs once per (method, path), tracks a dirty set, flushes only touched keys on close. Cross-session append semantics preserved. Was O(N²) per session; now O(1) amortized per request.

Test deltas

Issue	New test file	Note
#103	`tests/ws/test_backpressure.py` (+)	Counter correctness across put/get/drop
#104	`tests/test_backoff.py` (new)	Verifies upper-bound invariant; deterministic via `random.seed`
#105	`tests/test_mock_transport.py` (+)	Counter-based regression: exactly 1 `save_pairs` call for 50 sequential requests (no wall-clock)

Full suite: 1413 passed (+6), 48 skipped. ruff + mypy --strict clean.

Closes #103
Closes #104
Closes #105

Test plan

uv run pytest tests/ --ignore=tests/integration -v
uv run ruff check .
uv run mypy kalshi/

Replace the O(n) linear scan over the deque (subtracting the at-most-one sentinel) with an integer counter incremented in put() and decremented in get()/__anext__() and on DROP_OLDEST eviction. Measured 600x regression at maxsize=1000 in the audit becomes a single attribute read. Sentinel never counts toward the running total, so qsize() continues to report logical occupancy unchanged. Regression test exercises put, get, DROP_OLDEST eviction, and put_sentinel paths to pin the counter against logical occupancy. Closes #103 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the fixed-magnitude additive jitter (base * 2**attempt + uniform(0, 0.5)) with AWS "Full Jitter": uniform(0, min(cap, base * 2**attempt)). The previous formula collapsed to <3% jitter at higher attempts and forced every parallel client into a narrow ~0.5s window on attempt 0, defeating the purpose of jitter during retry storms. Full Jitter spreads colliding retries evenly across the entire backoff window. retry_max_delay continues to cap the exponential term before randomization. Tests assert: every sample stays within [0, capped_upper]; attempt-0 samples cover the full window with distinct values; the cap clamps high attempts. Closes #104 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

RecordingTransport / AsyncRecordingTransport previously did load_pairs -> append -> save_pairs (with atomic .tmp rename) for every single request. With N sequential requests to one endpoint that's O(N^2) work over the session: the last request re-parses and re-writes all 999 prior pairs. Switch to a session-scoped _RecorderBuffer that lazily loads any existing on-disk pairs once per (method, path), appends to an in-memory list, and flushes only the dirty keys to disk on close()/aclose(). Re-recording into the same fixtures dir still extends prior pairs (load-once preserves the original semantics). Regression test counts save_pairs invocations: 50 requests to one endpoint must produce exactly one save_pairs call at flush time, holding a list of 50 pairs. Second test verifies cross-session append safety. Closes #105 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude · 2026-05-17T11:45:42Z

Code Review — PR #111: O(1) qsize, AWS Full Jitter, buffered recordings

Overall this is a solid, well-scoped performance PR. Each fix is independently motivated, the test coverage matches the changes, and the implementation choices are sound. A few things worth calling out.

#103 — O(1) `qsize()` counter (`kalshi/ws/backpressure.py`)

Correct and well-executed. The counter is updated on every path that mutates the buffer (put, DROP_OLDEST eviction, get, __anext__), and the sentinel is correctly excluded from the count because:

put_sentinel() doesn't increment _size
get() and __anext__() raise StopAsyncIteration before decrementing when they hit the sentinel

Since asyncio is single-threaded, there are no race conditions within the hot path (no await between eviction and append).

Minor gap: There's no test that mixes get() and __anext__() calls on the same queue. Both paths decrement _size, and while they're identical in structure, a quick combined exercise would close that coverage hole.

#104 — AWS Full Jitter (`kalshi/_base_client.py`)

The formula is correctly implemented — cap is applied before randomization, so the upper bound never exceeds retry_max_delay. The tests are well-targeted: upper-bound invariant, spread check, and cap-clamp at high attempt numbers.

One behavioral change worth acknowledging: the old formula had a guaranteed minimum delay of retry_base_delay on attempt 0 (e.g., 0.5s). Full Jitter allows essentially 0-delay retries at any attempt. For a trading API, a first-retry delay near 0 on a 429 could re-hammer a rate-limited server immediately. The test explicitly asserts min(samples) < 0.1, confirming this is intentional — but it may be worth a note in the _compute_backoff docstring or KalshiConfig that retry_base_delay no longer acts as a minimum floor.

#105 — Buffered `RecordingTransport` (`kalshi/testing/_recorder.py`)

Clean rewrite. The _dirty set is a nice touch — flush only rewrites files that were actually touched, avoiding unnecessary I/O for endpoints not recorded in a session.

Correctness of try/finally: The close()/aclose() wrapping is correct — the real transport closes even if flush raises.

Blocking I/O in async context: AsyncRecordingTransport.aclose() calls self._buffer.flush() synchronously. flush() does disk writes without await. This blocks the event loop briefly during test teardown. For test infrastructure it's acceptable, but worth a short comment acknowledging the intentional sync-in-async call.

Removed inline comment: The old handle_request had the note "Recordings are expected to run sequentially." It was removed from the method body but still partially appears in the sync class docstring. The async handle_async_request is now missing this note entirely — async recording has a stronger race risk than sync, so it's more important there.

Test design: The monkey-patch approach for counting save_pairs calls is correct — patching the module attribute makes the lookup in flush() hit the stub. The try/finally cleanup restoring the real function is present and necessary. The cross-session append test is particularly valuable.

Tests summary

Change	Test quality
#103 qsize counter	Good — covers put/get/drop/sentinel transitions
#104 full jitter	Good — upper-bound invariant + spread check + cap-clamp
#105 buffered flush	Good — call-count regression + cross-session semantics

Verdict

Approve with minor suggestions. No blocking issues. All three changes are correct and well-tested. Suggested follow-ups (none blocking):

Add a test mixing get() and __anext__() on the same queue to fully close the counter coverage gap.
Add a note to _compute_backoff or KalshiConfig.retry_base_delay that Full Jitter allows near-zero first-retry delays (the old minimum floor from additive jitter is gone).
Restore the sequential-recordings comment in AsyncRecordingTransport.handle_async_request — it matters more there than in the sync version.

TexasCoding and others added 3 commits May 17, 2026 06:41

TexasCoding merged commit ca0d125 into main May 17, 2026
4 checks passed

TexasCoding deleted the perf/issues-103-104-105 branch May 17, 2026 12:23

This was referenced May 17, 2026

test(recorder): cover non-JSON body branch (HTML 502, plain text, binary) #132

Merged

test(retry): cover Retry-After cap, HTTP-date fallback, TimeoutException #133

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: O(1) qsize, AWS Full Jitter, buffered recordings#111

perf: O(1) qsize, AWS Full Jitter, buffered recordings#111
TexasCoding merged 3 commits into
mainfrom
perf/issues-103-104-105

TexasCoding commented May 17, 2026

Uh oh!

claude Bot commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

TexasCoding commented May 17, 2026

Summary

Test deltas

Test plan

Uh oh!

claude Bot commented May 17, 2026

Code Review — PR #111: O(1) qsize, AWS Full Jitter, buffered recordings

#103 — O(1) qsize() counter (kalshi/ws/backpressure.py)

#104 — AWS Full Jitter (kalshi/_base_client.py)

#105 — Buffered RecordingTransport (kalshi/testing/_recorder.py)

Tests summary

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

#103 — O(1) `qsize()` counter (`kalshi/ws/backpressure.py`)

#104 — AWS Full Jitter (`kalshi/_base_client.py`)

#105 — Buffered `RecordingTransport` (`kalshi/testing/_recorder.py`)