perf(zeroex_v2 trades): materialize settler-txs staging across all 17 chains (collapse ~14x traces re-scan)#9734
Conversation
…se repeated traces scans Extract zeroex_settler_txs_cte into a materialized incremental staging model (zeroex_v2_bnb.settler_txs) and read it via ref() in the trades model. The settler-trace filter searches calldata (varbinary_position on input), which is non-pushable, so Trino re-inlined the zeroex_tx CTE into ~14 full scans of bnb.traces per build. Materializing scans traces once. Measured on a fixed 1-day window (warm): CPU 2583s->1622s (-37%), scan 268GB->177GB (-34%), bnb.traces IO 14x->1x. Proven equivalent: full-pipeline (old EXCEPT new) = (new EXCEPT old) = 0.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
PR SummaryMedium Risk Overview Schema YAML adds the staging model with column docs and a uniqueness test on Reviewed by Cursor Bugbot for commit fbe197c. Configure here. |
|
Regression passed on prod (7-day window): 162,467 staging rows, 0 duplicate Per-chain rollout estimate (spellbook-dex, 24h). The 14x
Top 4 (bnb, base, polygon, worldchain) = 81.3 CPU-hrs/day = 73% of the family — priority rollout. Caveat: -37% is bnb-measured; per-chain actuals scale with each chain's traces-scan share, and steady-state is marginally lower once the staging table's own incremental MERGE + OPTIMIZE is counted. Worth measuring base/polygon before the full rollout. |
…worldchain Same materialization as bnb applied to the next 3 highest-cost chains. Each gets a zeroex_v2_<chain>.settler_txs staging model and a ref() swap so <chain>.traces is scanned once per build instead of ~14x inline. Per-chain rn determinism verified on prod (7d): unique (block_month, tx_hash, rn) holds with 0 collisions; worldchain/polygon 0 zid-ties, base 5 ties all byte-identical in downstream columns (benign). Estimated saving at the bnb-measured -37% CPU: base ~8.0, polygon ~8.0, worldchain ~3.6 CPU-hrs/day.
…7 chains Adds the remaining 13 chains (arbitrum, avalanche_c, berachain, blast, ethereum, ink, linea, mantle, mode, monad, optimism, scroll, unichain) to the settler-txs materialization. Each gets a zeroex_v2_<chain>.settler_txs staging model + ref() swap so <chain>.traces is scanned once per build instead of ~14x inline. Per-chain rn determinism verified on prod (7d) before shipping: unique (block_month, tx_hash, rn) holds with 0 collisions on every chain; only ethereum had zid-ties (2) and all were byte-identical in downstream columns (benign). blast has no settler activity in-window (empty staging, harmless).
The new settler_txs staging models build as state:new in CI (full refresh), and the 17 modified trades models also full-refresh on the initial CI run - each scanning traces/logs/prices from start_date '2024-07-15'. Building all 17 chains at once blew the 1h30m CI budget. Floor start_date to the last 14 days when target.name == 'ci' inside the three shared macros (zeroex_settler_txs_cte, zeroex_v2_trades, zeroex_v2_trades_detail), which govern every full-refresh date filter. Production (target dunesql) is unchanged.

Materializes the 0x Settler transaction scan that the
zeroex_v2_<chain>_tradesmodels build inline, across all 17 chains (arbitrum, avalanche_c, base, berachain, blast, bnb, ethereum, ink, linea, mantle, mode, monad, optimism, polygon, scroll, unichain, worldchain).The settler filter searches transaction calldata (
varbinary_position(input, <selector>)), which is non-pushable, so thezeroex_txCTE — referenced four times in thezeroex_v2_tradesmacro and re-expanded throughtbl_all_logs— was inlined by Trino into ~14 full scans of<chain>.traceson every build. This was the single biggest cost driver on the spellbook-dex cluster (~110 CPU-hrs/day, ~24% of the cluster).The fix extracts
zeroex_settler_txs_cteinto a materialized incremental staging model (zeroex_v2_<chain>.settler_txs, partitioned byblock_month) and reads it viaref()in each trades model, so traces is scanned once per run instead of ~14x. No macro logic changes; the trades models just source the pre-computed CTE.Measured A/B (warm runs, prod, fixed 1-day window). NEW total includes the staging build cost:
The IO cut varies with how traces-dominated each chain is (bnb scans the most traces); the CPU cut is consistently 37-52% and the per-task peak memory generally drops.
<chain>.tracesgoes from ~14 inline scans to 1 staging scan.Equivalence proven per chain:
(old EXCEPT new)=(new EXCEPT old)= 0 over a fixed window.checksum()over all 29 output columns.settler_txs.rn(ROW_NUMBERovertx_hashordered byzid) is deterministic. Determinism checks on prod (7-day window) confirm the staging unique key(block_month, tx_hash, rn)holds with 0 collisions on all 17 chains; the onlyzidties found (base 5, ethereum 2; all others 0) are byte-identical in every downstream column, so which row getsrn=1is irrelevant. (If a tie were ever non-benign, the current 4x-inlined model would already be nondeterministic, sincebase_filtered_logsfiltersrn = 1; materializing freezes one consistent assignment.)unique_combination_of_columnstest passes on real data for all chains. (blast has no settler activity in-window → empty staging, harmless.)Estimated family saving ~30-40 CPU-hrs/day; the three measured chains alone (bnb+base+polygon) save ~30 CPU-hrs/day, and the measured CPU reductions exceed the original -37% estimate.
The related
zeroex_*_api_fillsfamily (samezeroex_tx-from-tracespattern, referenced ~11x) is a separate follow-up.Fixes CUR2-2706