feat: venue decoder crates + SPSC benchmark overhaul#28
Conversation
…ion simulator Replace "BBO cross = fill" with queue-aware fill model: - SimulatedExchange with FIFO queue tracking per price level - Latency injection (configurable profiles: optimistic/realistic/pessimistic) - Partial fills (trade_qty minus queue_ahead) - Cancel/repost loses queue priority - Post-only safety check at arrival-time book state - Markout tracking (adverse selection at 100ms/500ms/1s/5s) - PnL breakdown: execution vs MTM vs rebate estimate (rebates separate) - Two modes only: dry_run (realistic simulation) and live Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rsing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Benchmarks parse_decimal_bytes against rust_decimal, fixed crate, fast-float, lexical-core, and stdlib f64 parse. Tests Polymarket short prices (0.53) and Binance medium prices (67396.70). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Required for simd-json in-place parsing. read_text() becomes read_bytes() writing into a reusable Vec<u8> (allocated once on IO thread). Callback receives &mut [u8] — decoders can call simd_json::from_slice directly without copying. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add workspace dependency entries (serde, serde_json, simd-json, tracing) and create the mantis-binance crate with zero-allocation BinanceBookTicker serde schema and spawn stub. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Zero-allocation decoder converting Binance bookTicker JSON into HotEvent TopOfBook values using FixedI64 parse_decimal_bytes and InstrumentMeta tick/lot conversion. Supports simd-json feature gate with serde_json fallback. Includes 4 tests covering happy path, seq increment, malformed input, and empty input. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add mantis-polymarket workspace dependency and crate with zero-allocation serde schema types for Polymarket book, price_change, and trade messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ice_change Add registry-backed decoder that converts Polymarket WS messages into HotEvent values with peek_type() for simd-json-safe type dispatch. Includes 11 tests covering all message types, unknown tokens, and malformed input. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR removes legacy SPSC Criterion benchmarks and C++/Rust FFI contenders, adds a standalone cross-language benchmarks suite (Rust + C++), introduces new Binance and Polymarket decoder crates, changes transport WebSocket text APIs to a mutable byte-buffer API, adds byte-slice decimal parsing, and supplies many benchmark tooling scripts and CI/workflow adjustments. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant FeedThread
participant WsConnection
participant Decoder
participant Results
rect rgba(200,200,255,0.5)
Client->>FeedThread: spawn(on_message: FnMut(&mut [u8]))
end
FeedThread->>WsConnection: read_bytes(&mut buffer)
WsConnection-->>FeedThread: bytes_read (usize)
alt bytes_read > 0
FeedThread->>Decoder: pass &mut buffer (in-place parse)
Decoder-->>FeedThread: events / continue? (bool)
FeedThread->>Results: record histograms / write JSON
else bytes_read == 0
FeedThread-->>FeedThread: no-data / continue loop
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
…module Add benchmarks/rust/ as a standalone binary crate (excluded from workspace) with CycleHistogram (4096 direct + 20 log-bucketed overflow), percentile computation, and serializable BenchResult/LatencyResults/SystemInfo structs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…re suite Replace all single-threaded push+pop benchmarks with proper two-thread latency measurement using core pinning and rdtsc timestamping. - Add benchmarks/rust/: standalone Rust binary with mantis-inline, mantis-copy, and rtrb adapters, real histogram percentiles, JSON output - Add benchmarks/cpp/: standalone C++ binary with native rigtorp and drogalis adapters (no FFI), same protocol and JSON schema - Add benchmarks/scripts/: system validation (check_system, prepare_system), perf profiling (perf_profile, perf_cache, perf_branch, perf_flamegraph), orchestration (run_bench, setup_machine, deploy_and_run), and compare.py - Remove crates/bench/benches/spsc.rs and all FFI contender code - Remove crossbeam (MPMC) from SPSC comparisons Addresses methodology flaws identified by HFT University article: single-thread measurement, FFI-handicapped contenders, forced copies, MPMC-vs-SPSC confusion, and Gaussian percentile approximation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Benchmark ReportCommit: Sequence Lock (mantis-seqlock)LinuxCPU: seqlock_read_contended
seqlock_read_uncontended
seqlock_write
macOSCPU: seqlock_read_contended
seqlock_read_uncontended
seqlock_write
Fixed-Point Arithmetic (mantis-fixed)LinuxCPU: fixed_checked_add
fixed_checked_div
fixed_checked_mul_trunc
fixed_decimal_parse
fixed_display
fixed_mul_round_vs_trunc
fixed_parse
fixed_rescale
macOSCPU: fixed_checked_add
fixed_checked_div
fixed_checked_mul_trunc
fixed_decimal_parse
fixed_display
fixed_mul_round_vs_trunc
fixed_parse
fixed_rescale
Market-State Engine (mantis-market-state)LinuxCPU: market_state_array_book
market_state_engine
macOSCPU: market_state_array_book
market_state_engine
|
3d1fa1c to
37b8138
Compare
These files were committed before the .gitignore rule was added. Remove them from tracking — they remain on disk but are ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update fastrand 2.4.0 -> 2.4.1 (yanked crate) - Fix live_feeds test: use str::from_utf8 for &mut [u8] callback Display - Fix mantis-binance doc: remove intra-doc link to HotEvent::TopOfBook Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #28 +/- ##
==========================================
- Coverage 95.84% 95.57% -0.27%
==========================================
Files 75 77 +2
Lines 9183 10009 +826
Branches 9183 10009 +826
==========================================
+ Hits 8801 9566 +765
- Misses 338 393 +55
- Partials 44 50 +6
🚀 New features to boost your workflow:
|
ring 0.17 fails to compile on nightly + macOS Xcode 16.4. The new venue decoder crates (binance, polymarket) pull ring into the bench crate transitively. Switch to aws-lc-rs which compiles cleanly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove dead SPSC normalize/render functions from bench-report.sh - Update positional args (no longer expects spsc report as $1/$2) - Fix bench.yml to pass correct args to bench-report.sh - Remove stale comment in criterion-to-json.sh Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/transport/src/polymarket/market.rs (1)
3-5:⚠️ Potential issue | 🟡 MinorUpdate docs to match byte-slice callback API.
The docs still describe “text frames,” but the callback now receives
&mut [u8]. Please update wording to avoid API confusion.✏️ Suggested doc fix
-//! forwards raw JSON text frames via the `FeedThread` callback. +//! forwards raw JSON byte frames via the `FeedThread` callback. @@ -/// The callback receives each raw JSON text frame (excluding `"PONG"` +/// The callback receives each raw JSON byte frame (excluding `"PONG"`Also applies to: 41-43
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/transport/src/polymarket/market.rs` around lines 3 - 5, Update the module docs in crates/transport/src/polymarket/market.rs to describe the callback API as receiving raw byte slices rather than “text frames”: replace references to “text frames” with wording like “raw JSON byte slices (&mut [u8])” and mention that the FeedThread callback is invoked with a mutable byte slice containing the JSON payload; also apply the same wording change to the other doc occurrences around lines 41-43 to avoid API confusion.
🧹 Nitpick comments (10)
crates/fixed/src/parse.rs (1)
93-135: Orphaned documentation block.Lines 93-112 contain documentation that was originally for
from_str_decimal, but this doc block is now orphaned — it precedes the newparse_decimal_bytesdocumentation (lines 113-135) and doesn't attach to any function. Thefrom_str_decimalfunction at line 270 has its own documentation.Consider removing lines 93-112 to avoid documentation duplication and confusion.
♻️ Proposed fix
#[expect( clippy::cast_possible_truncation, reason = "i128-to-i64 cast is guarded by range check" )] impl<const D: u8> FixedI64<D> { - /// Parse a decimal string like `"1.23"` or `"-0.5"` into `FixedI64<D>`. - /// - /// Accepts: - /// - Optional sign prefix (`-` or `+`) - /// - Integer-only: `"123"` - /// - Decimal: `"1.23"`, `".5"`, `"0.5"` - /// - Leading zeros: `"007.50"` - /// - /// Rejects: - /// - Empty strings - /// - Exponent notation (`"1e5"`) - /// - Double dots (`"1..2"`) - /// - Non-digit characters - /// - Fractional digits exceeding D - /// - /// # Errors - /// - /// Returns [`ParseFixedError::InvalidFormat`] for malformed input, - /// [`ParseFixedError::Overflow`] if the value exceeds `i64` range, - /// or [`ParseFixedError::ExcessPrecision`] if fractional digits exceed D. /// Parse a decimal byte slice like `b"1.23"` or `b"-0.5"` into `FixedI64<D>`.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/fixed/src/parse.rs` around lines 93 - 135, Remove the duplicated/orphaned doc block that describes from_str_decimal which currently sits immediately before the parse_decimal_bytes docs (the earlier block that starts the duplicate description); delete that orphaned documentation so only the parse_decimal_bytes doc (and the existing from_str_decimal doc attached to the from_str_decimal function) remain, ensuring no duplicate or unattached docs are left around parse_decimal_bytes or from_str_decimal.benchmarks/scripts/prepare_system.sh (2)
87-88: Avoid variables in printf format string.Using variables directly in the printf format string can cause issues if they contain
%characters. Use%bto interpret escape sequences safely.🧹 Proposed fix
-printf "\n${GRN}System prepared for benchmarking.${RST}\n" +printf "\n%bSystem prepared for benchmarking.%b\n" "$GRN" "$RST"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/scripts/prepare_system.sh` around lines 87 - 88, The printf calls printing colored status messages use variables directly in the format string which can break if they contain % sequences; update the two printf invocations (the lines that currently call printf "\n${GRN}System prepared for benchmarking.${RST}\n" and printf "Run ./check_system.sh to verify.\n") to use a safe format specifier (e.g. "%b\n" or "%s\n") and pass the composed string or variables as arguments so escape sequences and % chars are interpreted safely rather than treated as format directives.
18-21: Remove unusedREDvariable.The
REDvariable is defined but never used in the script. This is flagged by shellcheck (SC2034).🧹 Proposed fix
-RED='\033[0;31m' GRN='\033[0;32m' BLD='\033[1m' RST='\033[0m'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/scripts/prepare_system.sh` around lines 18 - 21, Remove the unused RED variable definition: delete the RED='\033[0;31m' line (symbol RED) from the script so only the used color variables (GRN, BLD, RST) remain; confirm there are no other references to RED elsewhere in the script and run shellcheck to verify SC2034 is resolved.benchmarks/rust/src/queues/mantis_inline.rs (1)
69-74: Match the empty-pop error explicitly.
Err(_) => falsetreats everytry_popfailure as "queue empty". Ifmantis_queuealready has, or later adds, a non-empty error case, the harness will spin forever instead of surfacing it.As per coding guidelines: "No wildcard matches — use explicit destructuring in pattern matching".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/rust/src/queues/mantis_inline.rs` around lines 69 - 74, The match on self.0.try_pop() uses a wildcard Err(_) which hides non-empty errors; change it to explicitly match the queue-empty variant (e.g., Err(<EmptyVariant>) => false) and handle any other Err(e) separately (for example by panicking or returning/propagating the error) so unexpected errors are surfaced. Update the match arms around the call to self.0.try_pop() to use the concrete empty variant from the mantis_queue error enum and a separate Err(e) arm that logs/panics/returns the error.crates/polymarket/src/market/schema.rs (1)
1-1: Doc comment slightly overstates "zero-allocation".The
Vec<BookLevel<'a>>fields inPolymarketBookMsgallocate heap memory for the level vectors. Consider updating to "low-allocation" or "zero-copy string" to be more precise, since the borrowed&'a strfields do avoid per-field string allocations.📝 Suggested doc update
-//! Zero-allocation serde structs for Polymarket WebSocket messages. +//! Low-allocation serde structs for Polymarket WebSocket messages. +//! +//! String fields borrow from the input buffer; only `Vec` fields allocate.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/polymarket/src/market/schema.rs` at line 1, The crate doc comment claims "Zero-allocation serde structs", but PolymarketBookMsg contains Vec<BookLevel<'a>> which allocates heap memory; change the module-level doc to a more accurate phrase like "low-allocation" or "zero-copy string serde structs" and mention that fields use borrowed & 'a str to avoid per-field string allocations; update references to PolymarketBookMsg and BookLevel<'a> in the comment to reflect this more precise wording.benchmarks/cpp/src/harness.hpp (1)
75-106:consumer_doneatomic is set but never read.The
consumer_doneflag is stored on line 106 but never checked. Theconsumer_thread.join()on line 130 already ensures the consumer completes. Consider removingconsumer_doneto simplify the code.🧹 Suggested cleanup
CycleHistogram result_hist; - std::atomic<bool> consumer_done{false}; // Spawn consumer thread std::thread consumer_thread([&]() { // ... existing code ... result_hist = histogram; - consumer_done.store(true, std::memory_order_release); });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/cpp/src/harness.hpp` around lines 75 - 106, The consumer_done atomic is written by the consumer thread but never read anywhere; remove the unused variable and its stores to simplify the logic: delete the std::atomic<bool> consumer_done declaration and remove the consumer_done.store(true, std::memory_order_release) call in the consumer thread body (leave consumer_thread.join() and existing result_hist assignment intact), ensuring no other code references consumer_done before committing.benchmarks/cpp/src/main.cpp (2)
56-60:atoi/atollsilently return 0 on invalid input.Invalid numeric arguments (e.g.,
--messages abc) will be silently accepted as 0, which could lead to confusing benchmark behavior. Consider usingstrtoul/strtoullwith error checking, or at minimum validate that parsed values are positive where expected.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/cpp/src/main.cpp` around lines 56 - 60, The parsing of numeric CLI args (when match(...) and next() set args.producer_core, args.consumer_core, args.messages, args.warmup, args.runs) uses atoi/atoll which silently yields 0 on invalid input; replace these with strtoul/strtoull calls, capture the endptr and errno, validate that endptr consumed the whole token and the result is in the expected range (>0 for counts/cores), and on error emit a clear message and exit or set a safe default. Ensure you cast the validated unsigned long/unsigned long long to the target types (unsigned / uint64_t) only after checking for overflow/underflow.
91-97: Missing p99.99 percentile in output.The summary prints
p999(99.9th percentile) but the comment mentions "percentile computation via histograms" with p99.99. Consider adding the p99.99 percentile for consistency with typical latency reporting, or clarify if this is intentional.📊 Suggested enhancement
- fprintf(stderr, " run %u/%u: p50=%lu p99=%lu p999=%lu max=%lu mean=%.1f cycles/op\n", + fprintf(stderr, " run %u/%u: p50=%lu p99=%lu p999=%lu p9999=%lu max=%lu mean=%.1f cycles/op\n", run, args.runs, static_cast<unsigned long>(hist.percentile(50.0)), static_cast<unsigned long>(hist.percentile(99.0)), static_cast<unsigned long>(hist.percentile(99.9)), + static_cast<unsigned long>(hist.percentile(99.99)), static_cast<unsigned long>(hist.max()), hist.mean());🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/cpp/src/main.cpp` around lines 91 - 97, The fprintf summary currently prints p50/p99/p999/max/mean using the fprintf call that references hist.percentile(50.0), hist.percentile(99.0), hist.percentile(99.9), hist.max(), and hist.mean(); update that fprintf invocation to include the 99.99th percentile by calling hist.percentile(99.99) and adding a corresponding format token and label (e.g., add "p99.99=%lu" or rename labels to clarify p99.9 vs p99.99) so the arguments list includes static_cast<unsigned long>(hist.percentile(99.99)) alongside the existing percentile casts (still using run and args.runs, hist.max(), hist.mean()).benchmarks/scripts/run_bench.sh (1)
29-39: Argument parsing can fail silently on missing values at end of args.If the last argument is a flag like
--messageswithout a value,shift 2will fail due toset -u, but the error message fromnext()won't be shown since this is bash, not the C++ parser. This is acceptable givenset -euo pipefail, but consider adding validation after parsing to ensure numeric args are valid.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/scripts/run_bench.sh` around lines 29 - 39, The argument parsing loop can leave flags like --messages without values (causing silent failures under set -euo pipefail); after the while/case block, add explicit validation for the parsed variables (PRODUCER_CORE, CONSUMER_CORE, MESSAGES, WARMUP, RUNS) to ensure they are set and contain valid integers (e.g., non-empty and matching /^[0-9]+$/), and call usage (the existing usage function) with an error if any check fails; also consider validating that no trailing single flag remained by checking "$#" or the raw args before shifting to surface a clear error message for missing values.crates/bench/benches/decode.rs (1)
52-65: Consider adding throughput measurement for better cross-hardware comparison.Per coding guidelines, benchmarks should export JSON for cross-hardware comparison. While Criterion handles JSON export, adding throughput annotations would make comparisons more meaningful across different hardware.
📈 Suggested enhancement
fn bench_binance_decode(c: &mut Criterion) { let mut group = c.benchmark_group("decode/binance_book_ticker"); + let json_size = binance_json().len(); + group.throughput(criterion::Throughput::Bytes(json_size as u64)); group.bench_function("full_pipeline", |b| {Based on learnings: "All benchmarks must export JSON for cross-hardware comparison"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/bench/benches/decode.rs` around lines 52 - 65, The benchmark lacks a throughput annotation for cross-hardware comparison; update bench_binance_decode to call group.throughput(...) before group.bench_function, using an appropriate Throughput unit (e.g., Throughput::Bytes with the JSON buffer size or Throughput::Elements for per-item work) computed from binance_json() (or a constant representing its length), and ensure the Throughput type is imported so Criterion will include throughput in its JSON export when running the full_pipeline benchmark.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@benchmarks/cpp/src/stats.hpp`:
- Around line 22-39: The overflow bucket upper bound currently returns the
exclusive top of the bucket (and overshoots by one), so update
overflow_upper_bound to return the inclusive upper bound (i.e. (DIRECT_BUCKETS
<< (i+1)) - 1) for i < LOG_BUCKETS-1 and still return
numeric_limits<uint64_t>::max() for the final bucket; keep using DIRECT_BUCKETS
and LOG_BUCKETS to locate the calculation. Also review log_bucket usage to
ensure it is only called for values >= 4096 (shifted != 0) and leave its
bit-computation logic as-is; this guarantees percentiles coming from overflow_
report the bucket’s true inclusive top rather than an off-by-one/exclusive
value.
- Around line 3-10: Add the missing header for std::min: include <algorithm> at
the top of the file (stats.hpp) so std::min is available; update the existing
include block that currently has <array>, <cmath>, <cstdint>, <cstdio>,
<fstream>, <limits>, <sstream>, <string> to also include <algorithm> to avoid
relying on transitive includes.
In `@benchmarks/README.md`:
- Around line 97-103: The README example for cache profiling is missing the
required benchmark command argument for scripts/perf_cache.sh; update the second
code block so the perf_cache.sh invocation includes the benchmark command
placeholder (e.g., append " -- <bench-command...>" or a concrete example like "
-- ./scripts/bench.sh" after the flags) so that scripts/perf_cache.sh receives
the expected <bench-command...>; reference the scripts/perf_cache.sh invocation
in the README and mirror the same structure used in the perf_profile.sh example.
In `@benchmarks/rust/src/harness.rs`:
- Around line 13-20: The pin_to_core function should be rewritten to return
Result<(), String> and use let...else (or ok_or/ok_or_else) to early-return on
failures when calling core_affinity::get_core_ids and finding the target core,
and must check the bool result of core_affinity::set_for_current and return an
Err if it fails; update callers (where consumer and producer threads call
pin_to_core) to propagate or handle the Result (do not silently ignore
failures), and add a validation check where producer_core and consumer_core are
parsed/used to ensure they are distinct and return an error if they are equal so
cross-core measurements are not run with the same core IDs.
In `@benchmarks/rust/src/rdtsc.rs`:
- Around line 7-10: The unsafe block calling core::arch::x86_64::_mm_lfence()
and core::arch::x86_64::_rdtsc() lacks the required safety rationale; add a
single-line comment starting with "// SAFETY:" immediately above the unsafe
block that documents the invariants (this code relies on x86_64 architecture and
availability of RDTSC/LFENCE, requires caller to run on that ISA, and that these
intrinsics do not violate memory or UB assumptions), what guarantee is being
upheld (ordering and timestamp read semantics), and the failure mode if the
invariant is not met (undefined behavior or incorrect timing on non-x86_64).
Ensure the comment references the intrinsics _mm_lfence and _rdtsc so reviewers
can map the rationale to the unsafe calls.
In `@benchmarks/rust/src/stats.rs`:
- Around line 56-83: The percentile() routine currently returns the
exclusive/log-bucket upper bound (via overflow_upper_bound) which overstates
values (e.g. 5000→8192) and is off-by-one for documented buckets; update
percentile() so that when walking overflow (the overflow array / LOG_BUCKETS) it
returns the inclusive upper bound (overflow_upper_bound(i) - 1) and clamp that
value by self.max (use saturating_sub/min to avoid underflow), and similarly
apply the same inclusive-bound logic to the other percentile code path noted
(the duplicate at lines ~115-122) so exported tail latencies reflect the real
bucket value range (e.g., bucket 0 -> 4096..8191).
In `@benchmarks/scripts/compare.py`:
- Around line 28-33: group_by_implementation currently groups solely by
r["implementation"] which can mix runs with different metadata; update it to
either validate that key config fields (e.g., r["language"], r["capacity"],
r["message_size"], r["pinning"], r["host_cpu"] or similar keys present in
results) are identical for all entries in a group and raise an error if they
mismatch, or include those fields in the grouping key so runs are only collapsed
when all relevant config fields match; also apply the same validation/adjustment
where print_system_info or any other grouping logic aggregates by implementation
to ensure the “Runs per implementation” and reported CPU/core/message metadata
correspond to homogeneous benchmark configs.
In `@benchmarks/scripts/deploy_and_run.sh`:
- Around line 70-73: The current rsync pulls all remote results into a single
directory and causes overwrites; update the target path to include a
host-specific namespace by creating and syncing into a host subdirectory (use
BENCH_DIR, REMOTE_HOST and REMOTE_DIR from the script) — e.g., ensure mkdir -p
"$BENCH_DIR/results/$REMOTE_HOST" and change the rsync destination to
"$BENCH_DIR/results/$REMOTE_HOST/" (alternatively, prepend REMOTE_HOST to
artifact filenames on the remote side before pulling) so each host’s results are
isolated.
- Around line 63-65: The remote invocation currently collapses BENCH_ARGS into a
single string causing re-splitting and injection; change the call so you do not
expand BENCH_ARGS into the remote shell string but instead stream/exec the
run_bench.sh on the remote host and pass BENCH_ARGS as separate positional
parameters so they arrive unchanged. Concretely, stop using "${BENCH_ARGS[*]}"
in the ssh command that references REMOTE_HOST and REMOTE_DIR and instead invoke
the remote bash with the script fed via stdin (or executed remotely) while
supplying BENCH_ARGS as separate arguments so run_bench.sh receives them as
"$@"; ensure the remote invocation uses the standard "--" convention to
terminate options and still performs the cd/chmod steps before running
run_bench.sh.
In `@benchmarks/scripts/run_bench.sh`:
- Around line 117-120: Add an existence check for the compare.py script before
invoking it: test for the file at "$SCRIPT_DIR/compare.py" (using -f or -x as
appropriate) and if missing print a clear error to STDERR and exit non‑zero;
otherwise run python3 "$SCRIPT_DIR/compare.py" "$RESULTS_DIR" | tee
"$RESULTS_DIR/comparison.md". Reference the existing variables SCRIPT_DIR and
RESULTS_DIR and the compare.py filename so the check is colocated with the
current Step 7 invocation.
In `@benchmarks/scripts/setup_machine.sh`:
- Around line 11-25: The install_packages function currently omits git and curl
so subsequent curl and git clone calls will fail; update install_packages to
include git and curl in the package lists for each package manager branch
(apt-get: add git and curl to the sudo apt-get install line and linux-tools
list, dnf: add git and curl to sudo dnf install, pacman: add git and curl to
sudo pacman -Sy), and also update the fallback warning to mention git and curl
as required tools so a manual install reminder is accurate.
In `@crates/polymarket/src/market/decoder.rs`:
- Around line 61-65: The dispatcher currently treats "tick_size_change" as
unsupported (falls to _ => 0) so tick-size updates never reach the decoder; add
a branch for "tick_size_change" that calls a new handler (e.g.
self.decode_tick_size_change(buf, recv_ts, out)) in the same match where
decode_price_change/decode_trade/decode_book are called, and mirror that change
in the other dispatcher instance referenced around lines 516-521. Implement
decode_tick_size_change to parse the tick-size payload and update the instrument
metadata/state used by price_to_ticks/qty_to_lots (or notify the instrument
registry), ensuring subsequent conversions use the new tick size. Use the
existing function names (decode_price_change, decode_trade, decode_book) as
references when adding the "tick_size_change" branch and implement a
decode_tick_size_change method that updates the shared instrument map/state.
- Around line 170-245: The code currently computes `total` from raw JSON levels
and sets EventFlags::LAST_IN_BATCH inside the `for` loops using that raw count,
which misses cases where `parse_price_qty::<D>` continues or emission stops at
64; instead, stop setting LAST_IN_BATCH when constructing each
`HotEvent::book_delta` (only use EventFlags::IS_SNAPSHOT there), and after both
loops, if `count > 0` set the LAST_IN_BATCH flag on the last emitted event (e.g.
mutate or rebuild `out[count-1]` to OR in EventFlags::LAST_IN_BATCH); refer to
`parse_price_qty::<D>`, `HotEvent::book_delta`, `EventFlags::LAST_IN_BATCH`,
`out`, and `count` to locate and implement this change.
---
Outside diff comments:
In `@crates/transport/src/polymarket/market.rs`:
- Around line 3-5: Update the module docs in
crates/transport/src/polymarket/market.rs to describe the callback API as
receiving raw byte slices rather than “text frames”: replace references to “text
frames” with wording like “raw JSON byte slices (&mut [u8])” and mention that
the FeedThread callback is invoked with a mutable byte slice containing the JSON
payload; also apply the same wording change to the other doc occurrences around
lines 41-43 to avoid API confusion.
---
Nitpick comments:
In `@benchmarks/cpp/src/harness.hpp`:
- Around line 75-106: The consumer_done atomic is written by the consumer thread
but never read anywhere; remove the unused variable and its stores to simplify
the logic: delete the std::atomic<bool> consumer_done declaration and remove the
consumer_done.store(true, std::memory_order_release) call in the consumer thread
body (leave consumer_thread.join() and existing result_hist assignment intact),
ensuring no other code references consumer_done before committing.
In `@benchmarks/cpp/src/main.cpp`:
- Around line 56-60: The parsing of numeric CLI args (when match(...) and next()
set args.producer_core, args.consumer_core, args.messages, args.warmup,
args.runs) uses atoi/atoll which silently yields 0 on invalid input; replace
these with strtoul/strtoull calls, capture the endptr and errno, validate that
endptr consumed the whole token and the result is in the expected range (>0 for
counts/cores), and on error emit a clear message and exit or set a safe default.
Ensure you cast the validated unsigned long/unsigned long long to the target
types (unsigned / uint64_t) only after checking for overflow/underflow.
- Around line 91-97: The fprintf summary currently prints p50/p99/p999/max/mean
using the fprintf call that references hist.percentile(50.0),
hist.percentile(99.0), hist.percentile(99.9), hist.max(), and hist.mean();
update that fprintf invocation to include the 99.99th percentile by calling
hist.percentile(99.99) and adding a corresponding format token and label (e.g.,
add "p99.99=%lu" or rename labels to clarify p99.9 vs p99.99) so the arguments
list includes static_cast<unsigned long>(hist.percentile(99.99)) alongside the
existing percentile casts (still using run and args.runs, hist.max(),
hist.mean()).
In `@benchmarks/rust/src/queues/mantis_inline.rs`:
- Around line 69-74: The match on self.0.try_pop() uses a wildcard Err(_) which
hides non-empty errors; change it to explicitly match the queue-empty variant
(e.g., Err(<EmptyVariant>) => false) and handle any other Err(e) separately (for
example by panicking or returning/propagating the error) so unexpected errors
are surfaced. Update the match arms around the call to self.0.try_pop() to use
the concrete empty variant from the mantis_queue error enum and a separate
Err(e) arm that logs/panics/returns the error.
In `@benchmarks/scripts/prepare_system.sh`:
- Around line 87-88: The printf calls printing colored status messages use
variables directly in the format string which can break if they contain %
sequences; update the two printf invocations (the lines that currently call
printf "\n${GRN}System prepared for benchmarking.${RST}\n" and printf "Run
./check_system.sh to verify.\n") to use a safe format specifier (e.g. "%b\n" or
"%s\n") and pass the composed string or variables as arguments so escape
sequences and % chars are interpreted safely rather than treated as format
directives.
- Around line 18-21: Remove the unused RED variable definition: delete the
RED='\033[0;31m' line (symbol RED) from the script so only the used color
variables (GRN, BLD, RST) remain; confirm there are no other references to RED
elsewhere in the script and run shellcheck to verify SC2034 is resolved.
In `@benchmarks/scripts/run_bench.sh`:
- Around line 29-39: The argument parsing loop can leave flags like --messages
without values (causing silent failures under set -euo pipefail); after the
while/case block, add explicit validation for the parsed variables
(PRODUCER_CORE, CONSUMER_CORE, MESSAGES, WARMUP, RUNS) to ensure they are set
and contain valid integers (e.g., non-empty and matching /^[0-9]+$/), and call
usage (the existing usage function) with an error if any check fails; also
consider validating that no trailing single flag remained by checking "$#" or
the raw args before shifting to surface a clear error message for missing
values.
In `@crates/bench/benches/decode.rs`:
- Around line 52-65: The benchmark lacks a throughput annotation for
cross-hardware comparison; update bench_binance_decode to call
group.throughput(...) before group.bench_function, using an appropriate
Throughput unit (e.g., Throughput::Bytes with the JSON buffer size or
Throughput::Elements for per-item work) computed from binance_json() (or a
constant representing its length), and ensure the Throughput type is imported so
Criterion will include throughput in its JSON export when running the
full_pipeline benchmark.
In `@crates/fixed/src/parse.rs`:
- Around line 93-135: Remove the duplicated/orphaned doc block that describes
from_str_decimal which currently sits immediately before the parse_decimal_bytes
docs (the earlier block that starts the duplicate description); delete that
orphaned documentation so only the parse_decimal_bytes doc (and the existing
from_str_decimal doc attached to the from_str_decimal function) remain, ensuring
no duplicate or unattached docs are left around parse_decimal_bytes or
from_str_decimal.
In `@crates/polymarket/src/market/schema.rs`:
- Line 1: The crate doc comment claims "Zero-allocation serde structs", but
PolymarketBookMsg contains Vec<BookLevel<'a>> which allocates heap memory;
change the module-level doc to a more accurate phrase like "low-allocation" or
"zero-copy string serde structs" and mention that fields use borrowed & 'a str
to avoid per-field string allocations; update references to PolymarketBookMsg
and BookLevel<'a> in the comment to reflect this more precise wording.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c5611b1f-af32-4fe6-ba47-6f3f631d6f80
⛔ Files ignored due to path filters (2)
Cargo.lockis excluded by!**/*.lockbenchmarks/rust/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (62)
.github/workflows/bench.ymlCargo.tomlbenchmarks/.gitignorebenchmarks/README.mdbenchmarks/cpp/CMakeLists.txtbenchmarks/cpp/src/harness.hppbenchmarks/cpp/src/main.cppbenchmarks/cpp/src/message.hppbenchmarks/cpp/src/queues/drogalis.hppbenchmarks/cpp/src/queues/rigtorp.hppbenchmarks/cpp/src/rdtsc.hppbenchmarks/cpp/src/stats.hppbenchmarks/results/.gitkeepbenchmarks/rust/Cargo.tomlbenchmarks/rust/src/harness.rsbenchmarks/rust/src/main.rsbenchmarks/rust/src/message.rsbenchmarks/rust/src/queues/mantis_copy.rsbenchmarks/rust/src/queues/mantis_inline.rsbenchmarks/rust/src/queues/mod.rsbenchmarks/rust/src/queues/rtrb_queue.rsbenchmarks/rust/src/rdtsc.rsbenchmarks/rust/src/stats.rsbenchmarks/scripts/check_system.shbenchmarks/scripts/compare.pybenchmarks/scripts/deploy_and_run.shbenchmarks/scripts/perf_branch.shbenchmarks/scripts/perf_cache.shbenchmarks/scripts/perf_flamegraph.shbenchmarks/scripts/perf_profile.shbenchmarks/scripts/prepare_system.shbenchmarks/scripts/run_bench.shbenchmarks/scripts/setup_machine.shcrates/bench/Cargo.tomlcrates/bench/benches/decode.rscrates/bench/benches/fixed.rscrates/bench/benches/spsc.rscrates/bench/build.rscrates/bench/cpp/drogalis_ffi.cppcrates/bench/cpp/rigtorp_ffi.cppcrates/bench/src/drogalis_ffi.rscrates/bench/src/lib.rscrates/bench/src/messages.rscrates/bench/src/rigtorp_ffi.rscrates/bench/src/workloads.rscrates/binance/Cargo.tomlcrates/binance/src/decoder.rscrates/binance/src/lib.rscrates/binance/src/schema.rscrates/binance/src/spawn.rscrates/fixed/src/parse.rscrates/polymarket/Cargo.tomlcrates/polymarket/src/lib.rscrates/polymarket/src/market/decoder.rscrates/polymarket/src/market/mod.rscrates/polymarket/src/market/schema.rscrates/polymarket/src/market/spawn.rscrates/transport/src/binance/reference.rscrates/transport/src/feed.rscrates/transport/src/polymarket/market.rscrates/transport/src/ws.rscrates/transport/tests/feed_thread.rs
💤 Files with no reviewable changes (9)
- crates/bench/build.rs
- crates/bench/src/lib.rs
- crates/bench/src/messages.rs
- crates/bench/src/rigtorp_ffi.rs
- crates/bench/src/drogalis_ffi.rs
- crates/bench/cpp/rigtorp_ffi.cpp
- crates/bench/src/workloads.rs
- crates/bench/benches/spsc.rs
- crates/bench/cpp/drogalis_ffi.cpp
| fn pin_to_core(core_id: usize) { | ||
| let core_ids = core_affinity::get_core_ids().expect("failed to get core ids"); | ||
| let target = core_ids | ||
| .into_iter() | ||
| .find(|c| c.id == core_id) | ||
| .unwrap_or_else(|| panic!("core {core_id} not found")); | ||
| core_affinity::set_for_current(target); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n benchmarks/rust/src/harness.rs | head -100Repository: Milerius/Mantis
Length of output: 3634
🏁 Script executed:
rg "core_affinity" -A 3 -B 3 benchmarks/rust/Repository: Milerius/Mantis
Length of output: 2074
🏁 Script executed:
# Check Cargo.toml to see the core_affinity version
find benchmarks/rust -name "Cargo.toml" -exec cat {} \;Repository: Milerius/Mantis
Length of output: 433
🌐 Web query:
core_affinity 0.8 set_for_current return type rust
💡 Result:
In core_affinity 0.8.x, set_for_current returns a bool:
pub fn set_for_current(core_id: CoreId) -> boolIt returns false if setting the current thread’s CPU affinity fails. [1]
Source: [1]
Use let...else for early returns and handle set_for_current failure.
Line 19 ignores the bool return value of core_affinity::set_for_current, which fails silently. If affinity setup fails in the consumer thread (line 50), the thread is not actually pinned, but continues to signal readiness at line 58, causing the producer to proceed with unreliable cross-core latency measurements. Additionally, there is no validation that producer_core and consumer_core are distinct, so identical core IDs can also produce misleading results. Rewrite pin_to_core using let...else for early returns per Rust guidelines:
Suggested pattern
fn pin_to_core(core_id: usize) -> Result<(), String> {
let core_ids = core_affinity::get_core_ids()
.ok_or("failed to get core ids")?;
let target = core_ids
.into_iter()
.find(|c| c.id == core_id)
.ok_or_else(|| format!("core {core_id} not found"))?;
let _ = core_affinity::set_for_current(target)
.then_some(())
.ok_or("failed to set core affinity")?;
Ok(())
}Then propagate errors in the consumer (line 50) and producer (line 83) call sites, and add validation that producer and consumer cores are distinct (line 27).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@benchmarks/rust/src/harness.rs` around lines 13 - 20, The pin_to_core
function should be rewritten to return Result<(), String> and use let...else (or
ok_or/ok_or_else) to early-return on failures when calling
core_affinity::get_core_ids and finding the target core, and must check the bool
result of core_affinity::set_for_current and return an Err if it fails; update
callers (where consumer and producer threads call pin_to_core) to propagate or
handle the Result (do not silently ignore failures), and add a validation check
where producer_core and consumer_core are parsed/used to ensure they are
distinct and return an error if they are equal so cross-core measurements are
not run with the same core IDs.
| mkdir -p "$BENCH_DIR/results" | ||
| rsync -avz \ | ||
| "$REMOTE_HOST:$REMOTE_DIR/results/" \ | ||
| "$BENCH_DIR/results/" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n benchmarks/scripts/deploy_and_run.sh | head -100Repository: Milerius/Mantis
Length of output: 3324
🏁 Script executed:
cat -n benchmarks/scripts/run_bench.sh | head -80Repository: Milerius/Mantis
Length of output: 3259
🏁 Script executed:
cat -n benchmarks/scripts/run_bench.sh | tail -50Repository: Milerius/Mantis
Length of output: 2170
🏁 Script executed:
find benchmarks -name "*.rs" -type f | head -20Repository: Milerius/Mantis
Length of output: 367
🏁 Script executed:
grep -n "output" benchmarks/rust/src/main.rs | head -20Repository: Milerius/Mantis
Length of output: 336
🏁 Script executed:
grep -n "output" benchmarks/cpp/src/main.cpp | head -20Repository: Milerius/Mantis
Length of output: 362
🏁 Script executed:
sed -n '100,120p' benchmarks/cpp/src/main.cppRepository: Milerius/Mantis
Length of output: 716
🏁 Script executed:
cat -n benchmarks/scripts/compare.py | head -50Repository: Milerius/Mantis
Length of output: 1913
Add host-specific namespace to prevent result overwrites across multiple hosts.
Lines 70–73 sync all remote results to the same local benchmarks/results/ directory. Since the Rust and C++ benchmark binaries generate fixed filenames like rust_mantis_copy_run_1.json and cpp_rigtorp_run_1.json (based only on queue/implementation name and run number), running this script sequentially against multiple hosts causes later hosts to overwrite earlier hosts' results. Data loss occurs because filenames contain no host identifier.
Sync into host-specific subdirectories (e.g., results/$REMOTE_HOST/) or include the hostname in artifact filenames before pulling results back.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@benchmarks/scripts/deploy_and_run.sh` around lines 70 - 73, The current rsync
pulls all remote results into a single directory and causes overwrites; update
the target path to include a host-specific namespace by creating and syncing
into a host subdirectory (use BENCH_DIR, REMOTE_HOST and REMOTE_DIR from the
script) — e.g., ensure mkdir -p "$BENCH_DIR/results/$REMOTE_HOST" and change the
rsync destination to "$BENCH_DIR/results/$REMOTE_HOST/" (alternatively, prepend
REMOTE_HOST to artifact filenames on the remote side before pulling) so each
host’s results are isolated.
Binance: 5 new tests covering invalid price/qty parse failures. Polymarket: 8 new tests covering invalid side, price, size, empty book levels, skipped invalid levels, and default trade aggressor. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
crates/polymarket/src/market/decoder.rs (1)
170-245:⚠️ Potential issue | 🟠 Major
LAST_IN_BATCHincorrectly computed from raw level count, not emitted count.When
parse_price_qtyfails (line 182/216continue) or the buffer fills at 64, theis_lastcheck at lines 185/219 uses the rawtotalfrom JSON, not the actual number of successfully emitted events. This can result in no event havingLAST_IN_BATCHset.Example:
bids = [invalid, valid], asks = []→total = 2, but only 1 event emits. Atcount = 0,is_last = (0 + 1 == 2) = false, so the only emitted event lacksLAST_IN_BATCH.Proposed fix: set LAST_IN_BATCH post-loop
for (depth_idx, level) in msg.bids.iter().enumerate() { if count >= 64 { break; } let Some((price, qty)) = parse_price_qty::<D>(level.price, level.size, &meta) else { continue; }; - let is_last = count + 1 == total; - let flags = if is_last { - EventFlags::IS_SNAPSHOT | EventFlags::LAST_IN_BATCH - } else { - EventFlags::IS_SNAPSHOT - }; + let flags = EventFlags::IS_SNAPSHOT; self.seq += 1; out[count] = HotEvent::book_delta( recv_ts, SeqNum::from_raw(self.seq), instrument_id, self.source_id, flags, BookDeltaPayload { price, qty, side: Side::Bid, action: UpdateAction::New, depth: depth_idx as u8, _pad: [0; 5], }, ); count += 1; } for (depth_idx, level) in msg.asks.iter().enumerate() { if count >= 64 { break; } let Some((price, qty)) = parse_price_qty::<D>(level.price, level.size, &meta) else { continue; }; - let is_last = count + 1 == total; - let flags = if is_last { - EventFlags::IS_SNAPSHOT | EventFlags::LAST_IN_BATCH - } else { - EventFlags::IS_SNAPSHOT - }; + let flags = EventFlags::IS_SNAPSHOT; self.seq += 1; out[count] = HotEvent::book_delta( recv_ts, SeqNum::from_raw(self.seq), instrument_id, self.source_id, flags, BookDeltaPayload { price, qty, side: Side::Ask, action: UpdateAction::New, depth: depth_idx as u8, _pad: [0; 5], }, ); count += 1; } + if count > 0 { + out[count - 1].header.flags |= EventFlags::LAST_IN_BATCH; + } + count🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/polymarket/src/market/decoder.rs` around lines 170 - 245, The LAST_IN_BATCH flag is computed against the raw JSON total but some levels are skipped (parse_price_qty) or truncated by the 64-entry limit, so update logic to set LAST_IN_BATCH based on the actual emitted count: keep emitting as you do in the bids/asks loops (functions/variables: parse_price_qty, out, count, HotEvent::book_delta, self.seq, EventFlags::LAST_IN_BATCH), but remove/ignore the is_last check inside each loop and after both loops, if count > 0 set the LAST_IN_BATCH bit on the final emitted event in out[count-1] (preserving any existing IS_SNAPSHOT flag) so the last successfully emitted event gets LAST_IN_BATCH.
🧹 Nitpick comments (4)
.github/scripts/bench-report.sh (1)
8-13: Keep the static usage comment in sync with the new argument contract.Runtime usage here is updated, but the header comment still documents the old 2-arg form. Updating that comment will prevent confusion when invoking the script manually.
Proposed doc-only fix
-# Usage: bench-report.sh <linux-json> <macos-json> +# Usage: bench-report.sh <linux-seqlock> <macos-seqlock> [<linux-fixed> <macos-fixed>] [<linux-market-state> <macos-market-state>]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/scripts/bench-report.sh around lines 8 - 13, Update the script header usage comment to match the new positional-argument contract used by the variable assignments (linux_seqlock, macos_seqlock, linux_fixed, macos_fixed, linux_market_state, macos_market_state): replace the old 2-argument usage line with a single-line usage that documents required first two args and the optional third/fourth and fifth/sixth pairs (e.g. bench-report.sh <linux-seqlock> <macos-seqlock> [<linux-fixed> <macos-fixed>] [<linux-market-state> <macos-market-state>]) so the top-of-file static comment is consistent with how the script actually parses arguments.crates/polymarket/src/market/decoder.rs (3)
291-294: MisleadingSAFETYcomment on safe code.Line 292's
// SAFETY:comment implies anunsafeblock, butcore::str::from_utf8is a safe function. Per coding guidelines,// SAFETY:comments should only accompanyunsafeblocks. Consider rewording to a regular comment explaining the ASCII assumption.Suggested rewording
- // SAFETY: JSON values in the "type" field are always ASCII + // Note: JSON "type" field values are ASCII, so UTF-8 conversion is infallible core::str::from_utf8(remaining.get(..end)?).ok()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/polymarket/src/market/decoder.rs` around lines 291 - 294, The comment preceding the call to core::str::from_utf8 in decoder.rs incorrectly uses a "// SAFETY:" marker (implying an unsafe block) even though from_utf8 is safe; replace or reword that comment to a normal explanatory comment (e.g., "// JSON values in the 'type' field are always ASCII") adjacent to the core::str::from_utf8(remaining.get(..end)?).ok() call so it documents the ASCII assumption without using a SAFETY label.
702-723: Test should verifyLAST_IN_BATCHflag on the emitted event.This test validates that invalid levels are skipped, but doesn't check that the single emitted event has
LAST_IN_BATCHset. Adding this assertion would expose theLAST_IN_BATCHcomputation bug flagged earlier.Suggested assertion
// Only the valid level should be emitted assert_eq!(n, 1); + + // The sole emitted event should have LAST_IN_BATCH + assert!(out[0].header.flags.contains(EventFlags::LAST_IN_BATCH)); if let EventBody::BookDelta(p) = out[0].body {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/polymarket/src/market/decoder.rs` around lines 702 - 723, Update the test decode_book_skips_invalid_price_level to also assert the LAST_IN_BATCH flag is set on the single emitted event: after decoding and before inspecting EventBody, check the event's flags (e.g., out[0].flags or out[0].meta.flags) includes EventFlags::LAST_IN_BATCH (or the equivalent bit/constant) so the test verifies PolymarketMarketDecoder::decode correctly marks the only emitted BookDelta as last in batch when make_out produces a single slot.
296-318: Consider usingmemchrcrate for hot-path byte scanning.Since
peek_typeis documented as a "hot-path optimization", the linear scans inmemchr_findandmemchr_bytecould benefit from the SIMD-acceleratedmemchrcrate. This is optional since the current implementation is correct.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/polymarket/src/market/decoder.rs` around lines 296 - 318, Replace the manual linear scans in memchr_find and memchr_byte with the SIMD-accelerated routines from the memchr crate: add memchr = "..." to Cargo.toml, change memchr_byte to call memchr::memchr(byte, haystack) and change memchr_find to use memchr::memmem::Finder::new(needle).find(haystack) (or memchr::memmem::find) to locate the subslice; keep the same Option<usize> return semantics so callers such as peek_type continue to work unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/polymarket/src/market/decoder.rs`:
- Around line 690-700: The test function decode_book_empty_levels_returns_zero
contains a formatting issue around the raw byte literal
br#"{"type":"book","asset_id":"abc123","bids":[],"asks":[]}"#.to_vec() which
triggers rustfmt; run cargo +nightly fmt --all to auto-fix or manually reformat
the literal and surrounding lines so they follow rustfmt rules (ensure spacing
and line breaks around the br! literal and the to_vec() call are normalized) and
re-run tests; check the PolymarketMarketDecoder::<6>::new and decoder.decode
usages remain unchanged after formatting.
---
Duplicate comments:
In `@crates/polymarket/src/market/decoder.rs`:
- Around line 170-245: The LAST_IN_BATCH flag is computed against the raw JSON
total but some levels are skipped (parse_price_qty) or truncated by the 64-entry
limit, so update logic to set LAST_IN_BATCH based on the actual emitted count:
keep emitting as you do in the bids/asks loops (functions/variables:
parse_price_qty, out, count, HotEvent::book_delta, self.seq,
EventFlags::LAST_IN_BATCH), but remove/ignore the is_last check inside each loop
and after both loops, if count > 0 set the LAST_IN_BATCH bit on the final
emitted event in out[count-1] (preserving any existing IS_SNAPSHOT flag) so the
last successfully emitted event gets LAST_IN_BATCH.
---
Nitpick comments:
In @.github/scripts/bench-report.sh:
- Around line 8-13: Update the script header usage comment to match the new
positional-argument contract used by the variable assignments (linux_seqlock,
macos_seqlock, linux_fixed, macos_fixed, linux_market_state,
macos_market_state): replace the old 2-argument usage line with a single-line
usage that documents required first two args and the optional third/fourth and
fifth/sixth pairs (e.g. bench-report.sh <linux-seqlock> <macos-seqlock>
[<linux-fixed> <macos-fixed>] [<linux-market-state> <macos-market-state>]) so
the top-of-file static comment is consistent with how the script actually parses
arguments.
In `@crates/polymarket/src/market/decoder.rs`:
- Around line 291-294: The comment preceding the call to core::str::from_utf8 in
decoder.rs incorrectly uses a "// SAFETY:" marker (implying an unsafe block)
even though from_utf8 is safe; replace or reword that comment to a normal
explanatory comment (e.g., "// JSON values in the 'type' field are always
ASCII") adjacent to the core::str::from_utf8(remaining.get(..end)?).ok() call so
it documents the ASCII assumption without using a SAFETY label.
- Around line 702-723: Update the test decode_book_skips_invalid_price_level to
also assert the LAST_IN_BATCH flag is set on the single emitted event: after
decoding and before inspecting EventBody, check the event's flags (e.g.,
out[0].flags or out[0].meta.flags) includes EventFlags::LAST_IN_BATCH (or the
equivalent bit/constant) so the test verifies PolymarketMarketDecoder::decode
correctly marks the only emitted BookDelta as last in batch when make_out
produces a single slot.
- Around line 296-318: Replace the manual linear scans in memchr_find and
memchr_byte with the SIMD-accelerated routines from the memchr crate: add memchr
= "..." to Cargo.toml, change memchr_byte to call memchr::memchr(byte, haystack)
and change memchr_find to use memchr::memmem::Finder::new(needle).find(haystack)
(or memchr::memmem::find) to locate the subslice; keep the same Option<usize>
return semantics so callers such as peek_type continue to work unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0148a23f-2ec0-41c9-bd57-fb841d81e8b1
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
.github/scripts/bench-report.sh.github/scripts/criterion-to-json.sh.github/workflows/bench.ymlcrates/binance/src/decoder.rscrates/polymarket/src/market/decoder.rscrates/transport/Cargo.tomlcrates/transport/tests/live_feeds.rs
✅ Files skipped from review due to trivial changes (3)
- .github/scripts/criterion-to-json.sh
- .github/workflows/bench.yml
- crates/transport/Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/binance/src/decoder.rs
- Add missing <algorithm> include for std::min in C++ stats.hpp - Add // SAFETY: comment to rdtsc unsafe block per project policy - Assert core_affinity::set_for_current return value in harness - Validate producer and consumer cores are distinct - Fix README perf script examples (missing benchmark binary path) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
crates/polymarket/src/market/decoder.rs (2)
701-722: Test doesn't assert ondepthvalue.This test verifies that invalid levels are skipped, but doesn't check the
depthfield of the emitted event. Addingassert_eq!(p.depth, ...)would clarify the expected semantic (should it be 0 or 1 when the first level is skipped?) and catch regressions if the depth logic changes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/polymarket/src/market/decoder.rs` around lines 701 - 722, In the test decode_book_skips_invalid_price_level, add an assertion that the emitted BookDelta's depth is the expected value after the first (invalid) level is skipped — e.g., assert that p.depth equals Depth::from_raw(0) (or the equivalent zero-depth value used in your code) right after the existing assertions on p.price, p.qty, and p.side so the test verifies depth semantics for PolymarketMarketDecoder::<6>::new and EventBody::BookDelta.
296-318: Considermemchrcrate for hot-path byte searches.These naive
O(n)/O(n*m)search implementations are called on every incoming message. Thememchrcrate provides SIMD-accelerated alternatives (memchr::memmem::findandmemchr::memchr) that would improve throughput on this hot path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/polymarket/src/market/decoder.rs` around lines 296 - 318, The custom byte/slice search helpers memchr_find and memchr_byte are used on the hot path and should be replaced with SIMD-accelerated functions from the memchr crate to improve throughput: swap memchr_find's implementation to call memchr::memmem::find(haystack, needle) and memchr_byte to call memchr::memchr(byte, haystack) (both return Option<usize> so adapt directly), add memchr to Cargo.toml dependencies, and update the use/imports (e.g., use memchr::memmem and memchr::memchr) so tests and callers of memchr_find/memchr_byte keep the same signature and behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/polymarket/src/market/decoder.rs`:
- Around line 177-209: The emitted BookDeltaPayload.depth currently uses the
enumerate index (depth_idx) which advances even when parse_price_qty::<D>
returns None, causing holes; fix by introducing a per-side emitted counter
(e.g., let mut bid_depth = 0u8) inside the bids loop and set
BookDeltaPayload.depth = bid_depth, incrementing bid_depth only when you
successfully parse and push an event; apply the same pattern (e.g., ask_depth)
to the asks loop and ensure you stop at count >= 64 as before and still
increment self.seq and count only when emitting an event.
---
Nitpick comments:
In `@crates/polymarket/src/market/decoder.rs`:
- Around line 701-722: In the test decode_book_skips_invalid_price_level, add an
assertion that the emitted BookDelta's depth is the expected value after the
first (invalid) level is skipped — e.g., assert that p.depth equals
Depth::from_raw(0) (or the equivalent zero-depth value used in your code) right
after the existing assertions on p.price, p.qty, and p.side so the test verifies
depth semantics for PolymarketMarketDecoder::<6>::new and EventBody::BookDelta.
- Around line 296-318: The custom byte/slice search helpers memchr_find and
memchr_byte are used on the hot path and should be replaced with
SIMD-accelerated functions from the memchr crate to improve throughput: swap
memchr_find's implementation to call memchr::memmem::find(haystack, needle) and
memchr_byte to call memchr::memchr(byte, haystack) (both return Option<usize> so
adapt directly), add memchr to Cargo.toml dependencies, and update the
use/imports (e.g., use memchr::memmem and memchr::memchr) so tests and callers
of memchr_find/memchr_byte keep the same signature and behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3786a78b-5af8-41bc-a67c-e4e27ae84f31
📒 Files selected for processing (1)
crates/polymarket/src/market/decoder.rs
| for (depth_idx, level) in msg.bids.iter().enumerate() { | ||
| if count >= 64 { | ||
| break; | ||
| } | ||
| let Some((price, qty)) = parse_price_qty::<D>(level.price, level.size, &meta) else { | ||
| continue; | ||
| }; | ||
|
|
||
| let is_last = count + 1 == total; | ||
| let flags = if is_last { | ||
| EventFlags::IS_SNAPSHOT | EventFlags::LAST_IN_BATCH | ||
| } else { | ||
| EventFlags::IS_SNAPSHOT | ||
| }; | ||
|
|
||
| self.seq += 1; | ||
| out[count] = HotEvent::book_delta( | ||
| recv_ts, | ||
| SeqNum::from_raw(self.seq), | ||
| instrument_id, | ||
| self.source_id, | ||
| flags, | ||
| BookDeltaPayload { | ||
| price, | ||
| qty, | ||
| side: Side::Bid, | ||
| action: UpdateAction::New, | ||
| depth: depth_idx as u8, | ||
| _pad: [0; 5], | ||
| }, | ||
| ); | ||
| count += 1; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "decoder.rs" | head -5Repository: Milerius/Mantis
Length of output: 132
🏁 Script executed:
cd crates/polymarket && pwd && head -300 src/market/decoder.rs | tail -150Repository: Milerius/Mantis
Length of output: 4799
🏁 Script executed:
cat -n crates/polymarket/src/market/decoder.rs | sed -n '150,250p'Repository: Milerius/Mantis
Length of output: 3820
🏁 Script executed:
cat -n crates/polymarket/src/market/decoder.rs | sed -n '210,280p'Repository: Milerius/Mantis
Length of output: 2796
🏁 Script executed:
rg -n "depth:" crates/polymarket/src/ -A 2 -B 2Repository: Milerius/Mantis
Length of output: 1225
🏁 Script executed:
rg -n "struct BookDeltaPayload\|depth:" crates/ -A 5 -B 2Repository: Milerius/Mantis
Length of output: 41
🏁 Script executed:
rg -n "depth.*venue\|venue.*depth\|depth.*index\|depth.*position" crates/polymarket/ -iRepository: Milerius/Mantis
Length of output: 41
🏁 Script executed:
find . -name "*decoder*test*" -o -name "*test*decoder*" 2>/dev/nullRepository: Milerius/Mantis
Length of output: 41
🏁 Script executed:
find . -path "*/test*" -name "*.rs" | xargs rg -l "decoder\|BookDelta" 2>/dev/null | head -5Repository: Milerius/Mantis
Length of output: 41
🏁 Script executed:
cat -n crates/polymarket/src/market/decoder.rs | sed -n '150,300p'Repository: Milerius/Mantis
Length of output: 5820
🏁 Script executed:
rg -n "depth" crates/polymarket/src/market/ -B 3 -A 3Repository: Milerius/Mantis
Length of output: 4824
🏁 Script executed:
rg -n "BookDeltaPayload" crates/ -B 5 -A 10 | head -100Repository: Milerius/Mantis
Length of output: 7370
🏁 Script executed:
find . -name "*.md" -o -name "*.txt" | xargs rg -l "depth" 2>/dev/null | head -3Repository: Milerius/Mantis
Length of output: 134
🏁 Script executed:
cat -n crates/polymarket/src/market/decoder.rs | sed -n '420,500p'Repository: Milerius/Mantis
Length of output: 3761
🏁 Script executed:
rg -n "parse_price_qty\|Invalid|fail" crates/polymarket/src/market/decoder.rs -B 2 -A 2Repository: Milerius/Mantis
Length of output: 285
depth field represents input array index, not emitted event position within each side.
When parse_price_qty fails, continue is used but the enumerate index still advances. If bids[0] fails and bids[1] succeeds, the emitted event has depth: 1 instead of the correct depth: 0. The same issue applies to asks.
Use a separate counter per side to track emitted position:
Fix
- for (depth_idx, level) in msg.bids.iter().enumerate() {
+ let mut bid_depth: u8 = 0;
+ for level in &msg.bids {
if count >= 64 {
break;
}
let Some((price, qty)) = parse_price_qty::<D>(level.price, level.size, &meta) else {
continue;
};
// ... flags logic ...
self.seq += 1;
out[count] = HotEvent::book_delta(
// ...
BookDeltaPayload {
price,
qty,
side: Side::Bid,
action: UpdateAction::New,
- depth: depth_idx as u8,
+ depth: bid_depth,
_pad: [0; 5],
},
);
+ bid_depth += 1;
count += 1;
}Apply the same pattern to the asks loop (lines 211+).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for (depth_idx, level) in msg.bids.iter().enumerate() { | |
| if count >= 64 { | |
| break; | |
| } | |
| let Some((price, qty)) = parse_price_qty::<D>(level.price, level.size, &meta) else { | |
| continue; | |
| }; | |
| let is_last = count + 1 == total; | |
| let flags = if is_last { | |
| EventFlags::IS_SNAPSHOT | EventFlags::LAST_IN_BATCH | |
| } else { | |
| EventFlags::IS_SNAPSHOT | |
| }; | |
| self.seq += 1; | |
| out[count] = HotEvent::book_delta( | |
| recv_ts, | |
| SeqNum::from_raw(self.seq), | |
| instrument_id, | |
| self.source_id, | |
| flags, | |
| BookDeltaPayload { | |
| price, | |
| qty, | |
| side: Side::Bid, | |
| action: UpdateAction::New, | |
| depth: depth_idx as u8, | |
| _pad: [0; 5], | |
| }, | |
| ); | |
| count += 1; | |
| } | |
| let mut bid_depth: u8 = 0; | |
| for level in &msg.bids { | |
| if count >= 64 { | |
| break; | |
| } | |
| let Some((price, qty)) = parse_price_qty::<D>(level.price, level.size, &meta) else { | |
| continue; | |
| }; | |
| let is_last = count + 1 == total; | |
| let flags = if is_last { | |
| EventFlags::IS_SNAPSHOT | EventFlags::LAST_IN_BATCH | |
| } else { | |
| EventFlags::IS_SNAPSHOT | |
| }; | |
| self.seq += 1; | |
| out[count] = HotEvent::book_delta( | |
| recv_ts, | |
| SeqNum::from_raw(self.seq), | |
| instrument_id, | |
| self.source_id, | |
| flags, | |
| BookDeltaPayload { | |
| price, | |
| qty, | |
| side: Side::Bid, | |
| action: UpdateAction::New, | |
| depth: bid_depth, | |
| _pad: [0; 5], | |
| }, | |
| ); | |
| bid_depth += 1; | |
| count += 1; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/polymarket/src/market/decoder.rs` around lines 177 - 209, The emitted
BookDeltaPayload.depth currently uses the enumerate index (depth_idx) which
advances even when parse_price_qty::<D> returns None, causing holes; fix by
introducing a per-side emitted counter (e.g., let mut bid_depth = 0u8) inside
the bids loop and set BookDeltaPayload.depth = bid_depth, incrementing bid_depth
only when you successfully parse and push an event; apply the same pattern
(e.g., ask_depth) to the asks loop and ensure you stop at count >= 64 as before
and still increment self.seq and count only when emitting an event.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
benchmarks/cpp/src/stats.hpp (1)
33-40:⚠️ Potential issue | 🟡 MinorUse the inclusive overflow-bucket ceiling.
percentile()reportsoverflow_upper_bound()directly, but Line 39 currently returns the exclusive end of the documented range. Bucket 0 is described as4096..8191, yet this code reports8192, so any percentile that lands inoverflow_is overstated by one cycle.🩹 Proposed fix
inline uint64_t overflow_upper_bound(size_t i) { if (i >= LOG_BUCKETS - 1) { return std::numeric_limits<uint64_t>::max(); } // bucket i covers [4096 << i, (4096 << (i+1)) - 1] - return static_cast<uint64_t>(DIRECT_BUCKETS) << (i + 1); + return (static_cast<uint64_t>(DIRECT_BUCKETS) << (i + 1)) - 1; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/cpp/src/stats.hpp` around lines 33 - 40, The function overflow_upper_bound(size_t i) currently returns the exclusive end of the documented bucket range (e.g., returns 8192 for bucket 0) which inflates percentiles; change its non-max branch to return the inclusive upper bound by subtracting one from the shifted value (i.e., use (static_cast<uint64_t>(DIRECT_BUCKETS) << (i + 1)) - 1) while keeping the existing numeric_limits<uint64_t>::max() for the final overflow bucket.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@benchmarks/cpp/src/stats.hpp`:
- Around line 314-317: Replace the locale-dependent snprintf usage that formats
hist.mean() (the mean_buf/snprintf sequence and the os << "
\"cycles_per_op_mean\"" emission) with a locale-independent formatter: include
<iomanip> and <locale>, create a std::ostringstream, imbue it with
std::locale::classic(), use std::fixed and std::setprecision(1) to stream
hist.mean() into the ostringstream, then write the resulting string to os for
the "cycles_per_op_mean" field; this ensures JSON uses a dot decimal separator
regardless of LC_NUMERIC.
In `@benchmarks/README.md`:
- Around line 13-20: The README currently claims using `lfence; rdtsc` and
same-CCD + `isolcpus` is sufficient, but it omits that synchronized/invariant
TSC is a hard prerequisite because `lfence` only serializes `rdtsc` locally and
does not eliminate cross-core TSC skew; update the benchmark docs to explicitly
require invariant/synchronized TSC (or mention using a kernel TSC sync feature)
and warn that producer/consumer timestamp deltas (`producer_rdtsc`,
`consumer_rdtsc`) can wrap or produce huge outliers on machines without it; keep
the existing same-CCD/isolation guidance but add a clear note adjacent to the
`lfence; rdtsc` and same-CCD/isolcpus lines stating this hard requirement and
recommended ways to verify or enable synchronized TSC.
---
Duplicate comments:
In `@benchmarks/cpp/src/stats.hpp`:
- Around line 33-40: The function overflow_upper_bound(size_t i) currently
returns the exclusive end of the documented bucket range (e.g., returns 8192 for
bucket 0) which inflates percentiles; change its non-max branch to return the
inclusive upper bound by subtracting one from the shifted value (i.e., use
(static_cast<uint64_t>(DIRECT_BUCKETS) << (i + 1)) - 1) while keeping the
existing numeric_limits<uint64_t>::max() for the final overflow bucket.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8a13487a-888e-4bfd-af46-b0cf048f0ad3
📒 Files selected for processing (4)
benchmarks/README.mdbenchmarks/cpp/src/stats.hppbenchmarks/rust/src/harness.rsbenchmarks/rust/src/rdtsc.rs
✅ Files skipped from review due to trivial changes (1)
- benchmarks/rust/src/rdtsc.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- benchmarks/rust/src/harness.rs
| // Format mean with one decimal place | ||
| char mean_buf[64]; | ||
| std::snprintf(mean_buf, sizeof(mean_buf), "%.1f", hist.mean()); | ||
| os << " \"cycles_per_op_mean\": " << mean_buf << ",\n"; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -type f -name "stats.hpp" | head -20Repository: Milerius/Mantis
Length of output: 89
🏁 Script executed:
if [ -f "benchmarks/cpp/src/stats.hpp" ]; then
wc -l "benchmarks/cpp/src/stats.hpp"
fiRepository: Milerius/Mantis
Length of output: 91
🏁 Script executed:
cat -n "benchmarks/cpp/src/stats.hpp" | sed -n '310,320p'Repository: Milerius/Mantis
Length of output: 621
🏁 Script executed:
head -50 "benchmarks/cpp/src/stats.hpp"Repository: Milerius/Mantis
Length of output: 1651
🏁 Script executed:
grep -n "#include" "benchmarks/cpp/src/stats.hpp"Repository: Milerius/Mantis
Length of output: 244
🏁 Script executed:
grep -n "snprintf\|sprintf" "benchmarks/cpp/src/stats.hpp"Repository: Milerius/Mantis
Length of output: 130
Serialize cycles_per_op_mean with a locale-independent decimal point.
Line 316 uses std::snprintf("%.1f", ...), which follows LC_NUMERIC. On hosts with a comma decimal separator, this emits 12,3 and breaks JSON validity for the cross-hardware comparison tooling. Use std::ostringstream with std::locale::classic() to force the classic locale.
Add these includes at the top of the file:
`#include` <iomanip>
`#include` <locale>Then update the code:
🩹 Proposed fix
- // Format mean with one decimal place
- char mean_buf[64];
- std::snprintf(mean_buf, sizeof(mean_buf), "%.1f", hist.mean());
- os << " \"cycles_per_op_mean\": " << mean_buf << ",\n";
+ // Format mean with one decimal place using a locale-independent decimal point
+ std::ostringstream mean_os;
+ mean_os.imbue(std::locale::classic());
+ mean_os << std::fixed << std::setprecision(1) << hist.mean();
+ os << " \"cycles_per_op_mean\": " << mean_os.str() << ",\n";🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@benchmarks/cpp/src/stats.hpp` around lines 314 - 317, Replace the
locale-dependent snprintf usage that formats hist.mean() (the mean_buf/snprintf
sequence and the os << " \"cycles_per_op_mean\"" emission) with a
locale-independent formatter: include <iomanip> and <locale>, create a
std::ostringstream, imbue it with std::locale::classic(), use std::fixed and
std::setprecision(1) to stream hist.mean() into the ostringstream, then write
the resulting string to os for the "cycles_per_op_mean" field; this ensures JSON
uses a dot decimal separator regardless of LC_NUMERIC.
- Fix isolcpus parsing: use dedicated cmdline param parser instead of read_proc_field which expects colon-separated fields - Fix LAST_IN_BATCH: set flag on last emitted event after both loops, not based on raw JSON level count (which ignores skipped/truncated levels) - Add git/curl to setup_machine.sh bootstrap packages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
crates/polymarket/src/market/decoder.rs (1)
176-201:⚠️ Potential issue | 🟠 Major
BookDeltaPayload.depthis incorrect when invalid levels are skipped.At Line 176 and Line 203,
depthis derived fromenumerate()index. Because Line 180 and Line 207 cancontinue, emitted depths can have holes (e.g., first valid bid emitted with depth 1). This breaks the documented “0 = best level” contract.Suggested fix
- for (depth_idx, level) in msg.bids.iter().enumerate() { + let mut bid_depth: u8 = 0; + for level in &msg.bids { if count >= 64 { break; } let Some((price, qty)) = parse_price_qty::<D>(level.price, level.size, &meta) else { continue; }; self.seq += 1; out[count] = HotEvent::book_delta( recv_ts, SeqNum::from_raw(self.seq), instrument_id, self.source_id, EventFlags::IS_SNAPSHOT, BookDeltaPayload { price, qty, side: Side::Bid, action: UpdateAction::New, - depth: depth_idx as u8, + depth: bid_depth, _pad: [0; 5], }, ); + bid_depth = bid_depth.saturating_add(1); count += 1; } - for (depth_idx, level) in msg.asks.iter().enumerate() { + let mut ask_depth: u8 = 0; + for level in &msg.asks { if count >= 64 { break; } let Some((price, qty)) = parse_price_qty::<D>(level.price, level.size, &meta) else { continue; }; self.seq += 1; out[count] = HotEvent::book_delta( recv_ts, SeqNum::from_raw(self.seq), instrument_id, self.source_id, EventFlags::IS_SNAPSHOT, BookDeltaPayload { price, qty, side: Side::Ask, action: UpdateAction::New, - depth: depth_idx as u8, + depth: ask_depth, _pad: [0; 5], }, ); + ask_depth = ask_depth.saturating_add(1); count += 1; }Also applies to: 203-228
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/polymarket/src/market/decoder.rs` around lines 176 - 201, The emitted BookDeltaPayload.depth currently uses the enumerate() index (depth_idx) which becomes incorrect when parse_price_qty(...) returns None and the loop continues; change each bid and ask loop to maintain a separate depth counter (e.g., bid_depth / ask_depth initialized to 0) that is only incremented when you successfully parse and emit a level so depths are contiguous (0 = best level) and assign that counter to BookDeltaPayload.depth; apply the same fix to both the bids loop (for (depth_idx, level) in msg.bids.iter().enumerate()) and the asks loop so skipped/invalid levels do not create depth holes.
🧹 Nitpick comments (2)
crates/polymarket/src/market/decoder.rs (1)
61-66: Avoid wildcard match arms to comply with repo rule.Both dispatch matches use
_ => .... Please rewrite with explicit branching to satisfy the no-wildcard rule.As per coding guidelines,
No wildcard matches — use explicit destructuring in pattern matching.Also applies to: 248-252
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/polymarket/src/market/decoder.rs` around lines 61 - 66, Replace the wildcard arms in the message-type dispatches with explicit branches: in the match over msg_type (the block that currently has "price_change", "last_trade_price", "book" and `_ => 0`) change the `_` arm to a named binding (e.g., other) and handle it explicitly by logging/handling the unrecognized msg_type and returning 0; do the same for the other dispatch at the same file (the match around lines 248-252). Specifically update the match in the decoder (the function that calls self.decode_price_change / self.decode_trade / self.decode_book) to replace `_ => 0` with an explicit arm like other => { /* log unrecognized message type using other */ 0 } so there are no wildcard `_` matches while preserving existing behavior.benchmarks/rust/src/stats.rs (1)
379-395: Lock the overflow-bucket contract with an exact assertion.This only proves that
p99.99lands somewhere above the direct range. It still passes if a later cleanup changes the intentional exclusive-boundary behavior and Rust/C++ quietly drift. For this fixture, assertingh.percentile(99.99) == 131072(or checkingoverflow_upper_bound()directly) would make that contract explicit.Based on learnings,
overflow_upper_bound(i)intentionally returns the exclusive top of each log bucket rather than an inclusive upper bound.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/rust/src/stats.rs` around lines 379 - 395, The test's assertion is too weak; replace the loose range check with an exact contract check so overflow behavior can't drift. In the histogram_overflow_buckets test, assert that CycleHistogram::percentile(99.99) equals the expected exclusive overflow upper bound (use assert_eq!(p9999, 131072)) or alternatively call h.overflow_upper_bound(...) and assert that it returns 131072; update the assertion replacing the current assert!(p9999 > 4095, ...) to an exact equality assertion referencing percentile (and/or overflow_upper_bound) to lock the overflow-bucket contract.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@benchmarks/rust/src/stats.rs`:
- Line 181: The serialized `compiler` field is calling rustc_version() at
runtime (e.g., the `compiler: rustc_version()` assignment in stats.rs), which
reports the runner's toolchain; instead capture the compiler at build time via a
build script and read that value at runtime. Add a build.rs that queries the
compiler (e.g., using the rustc_version crate or `rustc --version`) and emits it
with println!("cargo:rustc-env=BUILD_RUSTC_VERSION={}") and then replace calls
to rustc_version() in stats.rs (and the other spots around the 212-219 block) to
read the env var (e.g., env!("BUILD_RUSTC_VERSION") or std::env::var at runtime)
so the JSON contains the toolchain used to build the binary rather than the
runner's rustc.
- Around line 19-20: The mean calculation can be corrupted by wrapped TSC deltas
(near u64::MAX); modify the stats update path (the method that processes a new
cycles value—e.g., add() or the code that does self.sum += cycles) to detect and
ignore suspiciously large deltas before mutating sum/count (for example: if
cycles > (u64::MAX / 2) or another large threshold treat it as a wrapped/invalid
sample and return early), and keep min/max/count updates unchanged if you
prefer; also ensure mean() uses checked arithmetic (checked_add/checked_div or
using u128 accumulation) to avoid panics/wraps when computing self.sum /
self.count using the struct fields sum, min, max, count.
In `@crates/polymarket/src/market/decoder.rs`:
- Around line 272-284: peek_type currently only matches the exact bytes
"\"type\":\"" and so misses valid JSON with whitespace; update peek_type to
first locate the key bytes b"\"type\"" (using memchr_find), then advance past
that key, skip any ASCII whitespace bytes, assert/verify the next byte is b':',
skip any ASCII whitespace again, verify the next byte is the opening quote b'"',
and then use memchr_byte to find the closing quote and return the slice between
quotes; refer to peek_type, memchr_find, memchr_byte and the variables
pos/value_start/remaining when making the change.
---
Duplicate comments:
In `@crates/polymarket/src/market/decoder.rs`:
- Around line 176-201: The emitted BookDeltaPayload.depth currently uses the
enumerate() index (depth_idx) which becomes incorrect when parse_price_qty(...)
returns None and the loop continues; change each bid and ask loop to maintain a
separate depth counter (e.g., bid_depth / ask_depth initialized to 0) that is
only incremented when you successfully parse and emit a level so depths are
contiguous (0 = best level) and assign that counter to BookDeltaPayload.depth;
apply the same fix to both the bids loop (for (depth_idx, level) in
msg.bids.iter().enumerate()) and the asks loop so skipped/invalid levels do not
create depth holes.
---
Nitpick comments:
In `@benchmarks/rust/src/stats.rs`:
- Around line 379-395: The test's assertion is too weak; replace the loose range
check with an exact contract check so overflow behavior can't drift. In the
histogram_overflow_buckets test, assert that CycleHistogram::percentile(99.99)
equals the expected exclusive overflow upper bound (use assert_eq!(p9999,
131072)) or alternatively call h.overflow_upper_bound(...) and assert that it
returns 131072; update the assertion replacing the current assert!(p9999 > 4095,
...) to an exact equality assertion referencing percentile (and/or
overflow_upper_bound) to lock the overflow-bucket contract.
In `@crates/polymarket/src/market/decoder.rs`:
- Around line 61-66: Replace the wildcard arms in the message-type dispatches
with explicit branches: in the match over msg_type (the block that currently has
"price_change", "last_trade_price", "book" and `_ => 0`) change the `_` arm to a
named binding (e.g., other) and handle it explicitly by logging/handling the
unrecognized msg_type and returning 0; do the same for the other dispatch at the
same file (the match around lines 248-252). Specifically update the match in the
decoder (the function that calls self.decode_price_change / self.decode_trade /
self.decode_book) to replace `_ => 0` with an explicit arm like other => { /*
log unrecognized message type using other */ 0 } so there are no wildcard `_`
matches while preserving existing behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4f70fa48-1426-4937-a132-d0191d424b06
📒 Files selected for processing (3)
benchmarks/rust/src/stats.rsbenchmarks/scripts/setup_machine.shcrates/polymarket/src/market/decoder.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- benchmarks/scripts/setup_machine.sh
| implementation: implementation.to_string(), | ||
| language: "Rust".to_string(), | ||
| version: env!("CARGO_PKG_VERSION").to_string(), | ||
| compiler: rustc_version(), |
There was a problem hiding this comment.
Capture the build compiler, not the runner's rustc.
Every JSON result serializes this value, but rustc --version here reports whatever toolchain happens to be installed on the machine executing the benchmark. In the deploy/run workflow that can stamp results with the wrong compiler version or "unknown" even though the binary was built with a known toolchain.
Based on learnings, "All benchmarks must export JSON for cross-hardware comparison".
Also applies to: 212-219
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@benchmarks/rust/src/stats.rs` at line 181, The serialized `compiler` field is
calling rustc_version() at runtime (e.g., the `compiler: rustc_version()`
assignment in stats.rs), which reports the runner's toolchain; instead capture
the compiler at build time via a build script and read that value at runtime.
Add a build.rs that queries the compiler (e.g., using the rustc_version crate or
`rustc --version`) and emits it with
println!("cargo:rustc-env=BUILD_RUSTC_VERSION={}") and then replace calls to
rustc_version() in stats.rs (and the other spots around the 212-219 block) to
read the env var (e.g., env!("BUILD_RUSTC_VERSION") or std::env::var at runtime)
so the JSON contains the toolchain used to build the binary rather than the
runner's rustc.
| fn peek_type(buf: &[u8]) -> Option<&str> { | ||
| // Search for the byte pattern: "type":" | ||
| let needle = b"\"type\":\""; | ||
| let pos = memchr_find(buf, needle)?; | ||
| let value_start = pos + needle.len(); | ||
|
|
||
| // Find the closing quote | ||
| let remaining = buf.get(value_start..)?; | ||
| let end = memchr_byte(b'"', remaining)?; | ||
|
|
||
| // SAFETY: JSON values in the "type" field are always ASCII | ||
| core::str::from_utf8(remaining.get(..end)?).ok() | ||
| } |
There was a problem hiding this comment.
peek_type currently rejects valid JSON formatting variants.
Line 274 hardcodes "\"type\":\""; valid JSON like {"type" : "book"} won’t match and gets dropped. This creates avoidable decode misses.
Suggested fix
fn peek_type(buf: &[u8]) -> Option<&str> {
- // Search for the byte pattern: "type":"
- let needle = b"\"type\":\"";
- let pos = memchr_find(buf, needle)?;
- let value_start = pos + needle.len();
-
- // Find the closing quote
- let remaining = buf.get(value_start..)?;
- let end = memchr_byte(b'"', remaining)?;
-
- // SAFETY: JSON values in the "type" field are always ASCII
- core::str::from_utf8(remaining.get(..end)?).ok()
+ let key = b"\"type\"";
+ let pos = memchr_find(buf, key)?;
+ let mut i = pos + key.len();
+ while matches!(buf.get(i), Some(b' ' | b'\n' | b'\r' | b'\t')) { i += 1; }
+ if *buf.get(i)? != b':' {
+ return None;
+ }
+ i += 1;
+ while matches!(buf.get(i), Some(b' ' | b'\n' | b'\r' | b'\t')) { i += 1; }
+ if *buf.get(i)? != b'"' {
+ return None;
+ }
+ i += 1;
+ let rest = buf.get(i..)?;
+ let end = memchr_byte(b'"', rest)?;
+ core::str::from_utf8(rest.get(..end)?).ok()
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/polymarket/src/market/decoder.rs` around lines 272 - 284, peek_type
currently only matches the exact bytes "\"type\":\"" and so misses valid JSON
with whitespace; update peek_type to first locate the key bytes b"\"type\""
(using memchr_find), then advance past that key, skip any ASCII whitespace
bytes, assert/verify the next byte is b':', skip any ASCII whitespace again,
verify the next byte is the opening quote b'"', and then use memchr_byte to find
the closing quote and return the slice between quotes; refer to peek_type,
memchr_find, memchr_byte and the variables pos/value_start/remaining when making
the change.
core_affinity only sees cores in the process's default affinity mask, which excludes isolcpus cores. Use sched_setaffinity(2) directly via libc which can pin to any online core including isolated ones. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pthread_setaffinity_np may silently fail on isolated cores. Use sched_setaffinity(2) directly with error checking, matching the Rust harness approach. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both producer and consumer now run on spawned threads (not main). Consumer records raw cycle deltas into a pre-allocated Vec/vector, then builds the histogram after measurement completes. This avoids cache pollution from histogram writes during the hot loop, which was inflating C++ numbers by ~25x. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
No traits, no Arc, no Vec, no Result unwrapping. Uses leaked Box for shared state, usize-cast for Send, inline asm rdtsc, and sum += delta matching the HFT University run_latency() exactly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two &mut references to the same SpscRing from different threads is UB per Rust aliasing rules. Use spsc_ring() split handles instead — they access the engine through Arc<RingEngine> via &self, which is sound. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Inline RawRing with no Arc, no Result, no &mut self, no generics. push() -> bool, pop(&mut out) -> bool. Stack-local via Box::leak. Raw pointer field projections avoid Rust noalias interference. Same algorithm as RingEngine but stripped to bare metal. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…wrap - Colocate head+tail_cached on same 64B cache line (producer-local) - Colocate tail+head_cached on same 64B cache line (consumer-local) - Use 64-byte alignment instead of 128-byte (x86_64 native cache line) - Branch-based index wrapping (branch predictor > always-execute AND) - Enable LTO + codegen-units=1 for cross-crate inlining - pop() takes raw *mut Msg to avoid any &mut reference overhead Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…l API, inline(always) Benchmark-driven optimizations closing the gap with C++ rigtorp: - Add CacheLine<T> with native alignment (64B x86, 128B ARM) - Colocate head+tail_cached on producer CacheLine, tail+head_cached on consumer CacheLine — reduces hot-path cache footprint from 4 lines to 2 - Add push() -> bool and unsafe pop(*mut T) -> bool to RingEngine, RawRing, Producer, Consumer — no Result overhead, no return-by-value - Add BranchWrap index strategy (branch predictor > bitmask AND) - Upgrade all hot-path functions to #[inline(always)] - Add read_into slot operation for zero-copy pop into caller buffer Fully backwards-compatible: existing try_push/try_pop Result API and Arc-based split handles are unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…iants - Add push_shared/pop_shared on RawRing taking &self instead of &mut self to avoid LLVM noalias interference in two-thread benchmarks - Raw bench now tests all 3 Rust queues: mantis-inline (push_shared), mantis-copy (push/pop &self), rtrb (push/pop Result) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
f4603ab to
a765d86
Compare
- Add SpscRingFast preset using BranchWrap index strategy - Upgrade InlineStorage::capacity and slot_ptr to inline(always) - Raw bench now tests: Pow2Masked, BranchWrap, copy, and rtrb Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
da6e2bd to
c7f2eba
Compare
- Remove dead benchmark modules (harness, message, queues, rdtsc, stats) - Simplify main.rs to only use raw_bench protocol - Fix clippy: backticks in docs, allow inline_always workspace-wide, use &raw mut in tests - Add comprehensive RESULTS.md with methodology, hardware, configuration, and final benchmark numbers (472 cycles/op SpscRingFast) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace single-thread SPSC claims (<10ns) with measured cross-core latency (130ns / 472 TSC cycles on isolated cores). Remove "same ballpark as Optiver/LMAX" claim. Add link to benchmarks/RESULTS.md. Update benchmark table, quick start, and highlights to reflect actual measured performance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…o C++ Rust: consumer thread captures perf_event_open grouped counters (instructions, branch misses, L1D misses, LLC misses) per-op. Printed alongside cycles/op when available (Linux only). C++: add --mode raw flag for sum/count protocol matching HFT University. Tests both rigtorp AND drogalis in raw mode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…parison Identical algorithm to library SpscRingFast but with no Storage trait, no IndexStrategy, no Instrumentation, no RawRing wrapper — pure struct with inline push/pop. Shows the true floor of Rust SPSC latency vs the library abstraction cost. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two optimizations identified via assembly analysis: 1. Shadow tail_local/head_local on x86_64: LLVM won't fold a Relaxed load after a Release store to the same atomic. Cell shadow breaks the dependency chain. Expected: ~30-50 cycles/op improvement. 2. Combine rdtsc inside asm block: "shl rdx,32; or rax,rdx" produces 2 instructions vs the previous 3 (shl + mov eax,eax + add). Expected: ~2-5 cycles/op improvement. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
mantis-binanceandmantis-polymarketcrates with zero-alloc market data decodersparse_decimal_bytesfor zero-alloc hot-path price parsing&strto&mut [u8]for in-place parsingBranchWrap,inline(always)SPSC Benchmark Results (AMD Ryzen 7 PRO 8700GE, isolated cores)
At 3.65 GHz TSC, 472 cycles = 129 ns per cross-core message handoff.
Protocol: two threads on separate pinned cores,
isolcpus,rdtscwithlfence, 48-byte messages, 1M operations,chrt -f 99,taskset -c 2,3. Seebenchmarks/RESULTS.mdfor full methodology.Key optimizations that closed the Rust-C++ gap
push() -> bool+pop(*mut T) -> boolinstead ofResult<T>(~500 cycles vs rtrb)&selfshared references —push_shared/pop_sharedavoid LLVMnoaliasinterferenceinline(always)+ LTO — guaranteed cross-crate inlining of hot pathsTest plan
cargo +nightly test— workspace tests passcargo +nightly clippy --all-targets --features alloc,std -- -D warnings— cleancargo +nightly fmt --all --check— formattedcd benchmarks/rust && cargo +nightly build --release— standalone bench buildsfixed,seqlock,market_state,decode) still run🤖 Generated with Claude Code