charts: add R8 analyzers to benchmark/charts/#72
Conversation
Three standalone Python scripts that consume per-system bench output at /tmp/bench_r8_full/<sys>/ and produce the Solarized-Dark chart set used in the R8 review post (issue #77). - r8_analyze.py: 6-panel overlay across 7 systems (throughput, bloat, CPU, NVMe write, true backlog, delivery-lag p99). LINEAR y-axes everywhere; p99 lag clipped at 5s (no log scale). Backlog column is producer_total - consumer_total, not n_live_tup snapshot. - r8_ash_analyze.py: per-system stacked-area of ASH wait-event categories (CPU* / IO / LWLock / Lock / Client / IPC / Activity / Other) over 2h, 1-minute buckets, LINEAR 0-1.0 proportion. - r8_pgfr_analyze.py: 4-column-x-7-row pgfr deep-dive. Col 1 top-5 queries by cumulative total_exec_time with actual truncated query text (DO blocks unwrapped to first PERFORM/SELECT/UPDATE/DELETE/INSERT statement — no more opaque q1/q2/q3 labels). Col 2 per-query buffer hit rate. Col 3 per-query wal_bytes. Col 4 global WAL rate MiB/s + active backends twin-axis. Falls back to pgss.csv for systems without pgfr installed. Styling (Solarized Dark rcParams block, phase bands, legend placement) inherits from benchmark/charts/r6_smoke_chart.py in PR #66. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous proportion-based (0..1.0) rendering obscured the actual workload difference. User feedback: standard ASH views plot the count of active sessions, with each wait-event category as a stack layer whose height = number of backends sampled in that category for the bucket. Change bucket_stack() to return mean count per bucket (rows per bucket divided by distinct-sample-timestamps), and set y-limit per subplot to max(total) + 1 with integer ticks. Linear scale; no normalization. Effect: pgque/pgq visibly jump from ~1 to ~2 active backends during the TX phase (the held-xmin session joins, sitting on ClientRead); DELETE- based systems sit at ~4-5 (their -c 4 consumers plus the producer) and climb to ~6 during TX. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
e0abe13 to
b9a2a36
Compare
REV review — 5 perspectives (SOC2 skipped)Anti-leak (HEADLINE) — CLEANEngineer's claim of 7 hits removed verified. Ran the full scrub regex against both committed sources and the diff:
Security — CLEAN
Bug hunter — minor findings
Test analyzer — n/a (acceptable)No automated tests on the analyzers, no fixture inputs, no smoke command in README. Consistent with Guidelines compliance
Docs — gap
CI8/9 checks green ( Confidence + classification
Classification: LGTM with minor nits, none blocking. Strongest single ask: fix the binary-unit issue in Posting review only — not approving, not merging, per scope. |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Comment-scrub sweep applied per CLAUDE.md style: no scrub needed — all comments in the new chart scripts describe algorithmic choices or input/output format, none are bug-narrative. |
Summary
Commits the R8 chart-analysis scripts (main + ASH + pgfr) into
benchmark/charts/so future bench rounds can reuse them instead of writing fresh scripts every time. Per user feedback on issue #77: "why every time graphs are new? why don't you use same scripts? they are in git".Three standalone scripts consume per-system bench output at
/tmp/bench_r8_full/<sys>/and produce the Solarized-Dark chart set used in the R8 review post:r8_analyze.py— 6-panel overlay across 7 systems (throughput, bloat, CPU, NVMe write, true backlog, delivery-lag p99). LINEAR y-axes everywhere; p99 lag clipped at 5s (no log scale). Backlog column isproducer_total - consumer_totalfrom pgbench + NOTICE parser, not the oldn_live_tupsnapshot which mislabelled rotation-owned live rows as backlog.r8_ash_analyze.py— per-system stacked-area of ASH wait-event categories (CPU* / IO / LWLock / Lock / Client / IPC / Activity / Other) over 2h, 1-minute buckets, LINEAR 0-1.0 proportion. Fixes the timestamp parse bug (+00vs+00:00) that made all systems render "no ash.csv" in the previous round.r8_pgfr_analyze.py— 4-column × 7-row pgfr deep-dive:total_exec_timewith actual truncated query text (DO blocks unwrapped to firstPERFORM|SELECT|UPDATE|DELETE|INSERTstatement — no more opaqueq1/q2/q3labels).shared_blks_hit / (hit + read); green ≥99%, yellow ≥95%, red below.wal_bytes— WAL amplification per query class.pgfr_snapshots.wal_bytes) + active-backends count twin-axis.pgss.csv(point-in-time dump) for systems wherepgfr_recordisn't installed.Styling (Solarized Dark rcParams block, phase bands, legend placement) inherits from
benchmark/charts/r6_smoke_chart.pyin PR #66.Test plan
python3 benchmark/charts/r8_analyze.py— produces/tmp/r8_main_chart.png+ summary/table; 7 systems all render; no log axes.python3 benchmark/charts/r8_ash_analyze.py— produces/tmp/r8_ash_chart.png; per-system samples counted correctly (11k-35k per system).python3 benchmark/charts/r8_pgfr_analyze.py— produces/tmp/r8_pgfr_chart.png; fallback to pgss.csv where pgfr not installed (pgq/pgmq/river in R8).set_yscale('log')orsymloganywhere — user's hard rule.Draft — staged for next bench round to confirm scripts run on R9 dataset before merge.
🤖 Generated with Claude Code