v2.4.0 — multi-reviewer SDK audit closure
Comprehensive multi-reviewer audit (#224) — 33 issues across security, HTTP
transport, WebSocket reliability, models/types, resources, performance,
testing, and documentation. Identified by an 8-agent parallel review; executed
across 4 sequential waves of disjoint git worktrees. The most impactful items
by category:
Critical (silent data loss / silent money corruption fixes)
- WS orderbook resync after sequence gap (
#189). Before: a single dropped
frame cleared the local book and never asked the server for a fresh
snapshot — the consumer kept receiving deltas against a permanently-empty
book. After:_handle_seq_gapdrives a real unsubscribe+resubscribe with
per-sid ticker tracking so all-markets subscriptions are also covered. Page.to_dataframe/Page.to_polarsDecimal preservation (#190).
Before:DollarDecimal/FixedPointCountserializers ran for
mode='python'too, so DataFrame columns heldstranddf['price'].sum()
returned concatenated strings instead of a numeric sum. After: serializers
usewhen_used='json'; liveDecimalflows through pandas/polars.
High-impact correctness (HTTP + WS + Decimal + V1 orders)
- DollarDecimal serialization is positional (
#191)._decimal_to_str
usesf'{v:f}'so values likeDecimal('1E+10')never reach the wire as
scientific notation that Kalshi would reject. - Retry policy widened (
#192).RETRYABLE_STATUS_CODESnow includes
408, 425, and the Cloudflare 5xx range (520–524). POST/DELETE still never
retry, preserving idempotency. - Total wall-clock retry budget (
#193). NewKalshiConfig.total_timeout
caps cumulative time spent inside a single request including retries.
None(default) preserves the legacy unbounded behavior. - V1 batch order endpoints surface typed per-leg responses (
#194,
BREAKING).orders.batch_createnow returns
BatchCreateOrdersResponse(waslist[Order]that crashed on any failed
leg).orders.batch_cancelnow returnsBatchCancelOrdersResponse(was
None) exposing per-orderreduced_by_fp. Migration: upgrade reads from
response[i]toresponse.orders[i].order(and check.error). - WS generic
subscribe()rejects unknown param keys (#195). Was
silently dropping typos likeparams={'tickerz': [...]}and subscribing
the consumer to a much broader stream than intended. - WS server-side seq reset detection (
#196).SequenceTracker.track
now distinguishesseq == last(drop) fromseq < last(reset → gap
recovery); was silently dispatching the reset window with no signal. - WS fast-fail on permanent close codes (
#197).ConnectionClosedwith
codes 1002/3/7-10 or 4xxx now raisesKalshiConnectionErrorimmediately
instead of burning the 10-retry budget on doomed reconnect attempts. - WS payload type alignment with REST (
#198).*_fpcount/size/volume
fields on every WS payload model now type asFixedPointCount; RFC3339
timestamps type asdatetime. Eliminates silent str+int TypeErrors when
consumer code mixes REST and WS data. order_group_updatessequence gap recovery (#205). Same resubscribe
helper as orderbook gaps; was missed events with no signal.- WS unsubscribe drops orderbook state (
#206). Long-running
subscribe/unsubscribe cycles no longer leak_BookStateentries. - ERROR backpressure strategy raises through iterator (
#207). Consumer
async fornow raisesKalshiBackpressureErrorinstead of terminating
silently (indistinguishable from a clean close).
Performance
- WS recv loop stops rebuilding+discarding orderbook snapshots (
#199).
New_apply_*_inplacevariants onOrderbookManagerskip the O(n log n)
sort + ~2N OrderbookLevel allocations on the per-frame hot path. - Pluggable JSON loader/dumper (
#209).KalshiConfig.ws_json_loads/
ws_json_dumpsallow opt-in toorjson/ujsonfor high-rate
streaming (default: stdlibjson). - WS reconnect uses AWS Full Jitter (
#221polish). Matches the REST
policy; eliminates the thundering-herd window at the capped-delay end. - Batch order bodies serialized once (
#223polish). Resource layer
routes batch_create/batch_cancel through new_post_json/_delete_with_body_json
bytes helpers that usemodel_dump_json+httpx content=, skipping one
full dict-walk per call. _list_allcursor-loop guard is O(1) (#223polish). Switched from
unboundedset[str]to singlelast_cursor(only catches realistic
server-replay shape).
Security & robustness
- Response-body buffering bounded (
#203)._map_errorcaps via
Content-Length(16KB) and truncates the exception message to 1024
chars. Prevents memory + log-volume blowup on hostile error payloads. base_urlvalidated to include/trade-api/v2(#202). Misconfigs
fail at construction instead of producing silent 401s from a corrupted
signing path.- Passphrase-protected PEMs supported (
#217).KalshiAuth.from_pem/
from_key_path/from_envacceptpassword=(str/bytes/callable);
KALSHI_PRIVATE_KEY_PASSPHRASEenv var. Users no longer need to write
plaintext keys to disk. - URL-encoded path segments (
#211)._seg()helper applied across
every resource — user-supplied IDs with/,?,..etc. are encoded
or rejected at the SDK boundary. - RecordingTransport scrubs response headers (
#220polish).
Set-Cookie, Authorization, andX-Kalshi-*-(id|key|account|user)headers
filtered by default (user-overridable).
Typed-exception expansion
- New
KalshiConflictError(409),KalshiTimeoutError,KalshiPoolExhaustedError
(#201,#204). 422 routes toKalshiValidationError.httpx.PoolTimeout
raisesKalshiPoolExhaustedErrorand IS safe to retry on POST/DELETE
(request never reached the wire) —httpx.TimeoutExceptionraises
KalshiTimeoutErrorand preserves the existing POST/DELETE never-retry
policy (server may have committed). KalshiSequenceGapError+KalshiSubscriptionErrorcarry structured
channel/sid/last_seq/next_seq/opcontext (#213).AuthRequiredErrordefault message mentions both
KALSHI_PRIVATE_KEY_PATHandKALSHI_PRIVATE_KEY(#215).
Configuration knobs (additive, all opt-in)
total_timeout(#193)ws_ping_interval,ws_close_timeout(#208)ws_json_loads,ws_json_dumps(#209)http2install extra (#220polish;pip install kalshi-sdk[http2])- Per-request
extra_headersplumbed through transport (#220polish)
Documentation
docs/migration.mdnow has continuous coverage v1 → v2.3 (was missing
v2.1→v2.2 and v2.2→v2.3 sections;#200) plus a v2.3→v2.4 section
documenting #194's breaking shape and the new typed exceptions.- README +
docs/websockets.mdagree on channel count + use real SDK
method names (#218). - New
docs/websockets.mdPerformance section: queue sizing, overflow
strategy, orjson example, recv-loop threading (#222polish). docs/configuration.md,docs/environment-variables.md, cancel/delete
docstrings, and stale audit/predecessor refs cleaned up (#222polish).pydantic.AwareDatetimeadopted on REST response model datetime fields;
new datetime-semantics note indocs/concepts.md(#221polish).
Testing
- WS hardening: 27+ new tests across orderbook resync, seq reset, close
codes, backpressure signal, unsubscribe cleanup (#231). - Phantom-kwarg behavioral coverage parametrized across all 23 Request
models (#219). - Three new bench harnesses:
scripts/bench_ws_recv.py,
scripts/bench_orderbook_delta.py,scripts/bench_request_hot_path.py
(#223). - Integration
conftest.pyenv-bridging moved from import-time mutation
to a session-scoped fixture for clean test isolation (#223).
Breaking changes summary
Only one user-visible breaking change: orders.batch_create and
orders.batch_cancel return typed response models instead of list[Order]
and None respectively (#194). The V2 family (batch_create_v2 /
batch_cancel_v2) was already shaped this way; the V1 fix brings parity.
Migration in docs/migration.md v2.3→v2.4 section.