Skip to content

charts: add R8 analyzers to benchmark/charts/#72

Merged
NikolayS merged 4 commits intomainfrom
charts-r7-r8-analyzers
Apr 30, 2026
Merged

charts: add R8 analyzers to benchmark/charts/#72
NikolayS merged 4 commits intomainfrom
charts-r7-r8-analyzers

Conversation

@NikolayS
Copy link
Copy Markdown
Owner

Summary

Commits the R8 chart-analysis scripts (main + ASH + pgfr) into benchmark/charts/ so future bench rounds can reuse them instead of writing fresh scripts every time. Per user feedback on issue #77: "why every time graphs are new? why don't you use same scripts? they are in git".

Three standalone scripts consume per-system bench output at /tmp/bench_r8_full/<sys>/ and produce the Solarized-Dark chart set used in the R8 review post:

  • r8_analyze.py — 6-panel overlay across 7 systems (throughput, bloat, CPU, NVMe write, true backlog, delivery-lag p99). LINEAR y-axes everywhere; p99 lag clipped at 5s (no log scale). Backlog column is producer_total - consumer_total from pgbench + NOTICE parser, not the old n_live_tup snapshot which mislabelled rotation-owned live rows as backlog.
  • r8_ash_analyze.py — per-system stacked-area of ASH wait-event categories (CPU* / IO / LWLock / Lock / Client / IPC / Activity / Other) over 2h, 1-minute buckets, LINEAR 0-1.0 proportion. Fixes the timestamp parse bug (+00 vs +00:00) that made all systems render "no ash.csv" in the previous round.
  • r8_pgfr_analyze.py — 4-column × 7-row pgfr deep-dive:
    • Col 1: top-5 queries by cumulative total_exec_time with actual truncated query text (DO blocks unwrapped to first PERFORM|SELECT|UPDATE|DELETE|INSERT statement — no more opaque q1/q2/q3 labels).
    • Col 2: per-query buffer hit rate = shared_blks_hit / (hit + read); green ≥99%, yellow ≥95%, red below.
    • Col 3: per-query wal_bytes — WAL amplification per query class.
    • Col 4: global WAL rate MiB/s (from pgfr_snapshots.wal_bytes) + active-backends count twin-axis.
    • Falls back to pgss.csv (point-in-time dump) for systems where pgfr_record isn't installed.

Styling (Solarized Dark rcParams block, phase bands, legend placement) inherits from benchmark/charts/r6_smoke_chart.py in PR #66.

Test plan

  • python3 benchmark/charts/r8_analyze.py — produces /tmp/r8_main_chart.png + summary/table; 7 systems all render; no log axes.
  • python3 benchmark/charts/r8_ash_analyze.py — produces /tmp/r8_ash_chart.png; per-system samples counted correctly (11k-35k per system).
  • python3 benchmark/charts/r8_pgfr_analyze.py — produces /tmp/r8_pgfr_chart.png; fallback to pgss.csv where pgfr not installed (pgq/pgmq/river in R8).
  • Charts posted to gitlab.com/postgres-ai/postgresql-consulting/tests-and-benchmarks#77 note 3264187736 (reviewer-visible).
  • No set_yscale('log') or symlog anywhere — user's hard rule.

Draft — staged for next bench round to confirm scripts run on R9 dataset before merge.

🤖 Generated with Claude Code

Nik Samokhvalov and others added 3 commits April 30, 2026 03:23
Three standalone Python scripts that consume per-system bench output at
/tmp/bench_r8_full/<sys>/ and produce the Solarized-Dark chart set used
in the R8 review post (issue #77).

- r8_analyze.py: 6-panel overlay across 7 systems (throughput, bloat, CPU,
  NVMe write, true backlog, delivery-lag p99). LINEAR y-axes everywhere;
  p99 lag clipped at 5s (no log scale). Backlog column is
  producer_total - consumer_total, not n_live_tup snapshot.
- r8_ash_analyze.py: per-system stacked-area of ASH wait-event categories
  (CPU* / IO / LWLock / Lock / Client / IPC / Activity / Other) over 2h,
  1-minute buckets, LINEAR 0-1.0 proportion.
- r8_pgfr_analyze.py: 4-column-x-7-row pgfr deep-dive. Col 1 top-5 queries
  by cumulative total_exec_time with actual truncated query text (DO
  blocks unwrapped to first PERFORM/SELECT/UPDATE/DELETE/INSERT
  statement — no more opaque q1/q2/q3 labels). Col 2 per-query buffer hit
  rate. Col 3 per-query wal_bytes. Col 4 global WAL rate MiB/s + active
  backends twin-axis. Falls back to pgss.csv for systems without pgfr
  installed.

Styling (Solarized Dark rcParams block, phase bands, legend placement)
inherits from benchmark/charts/r6_smoke_chart.py in PR #66.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous proportion-based (0..1.0) rendering obscured the actual workload
difference. User feedback: standard ASH views plot the count of active
sessions, with each wait-event category as a stack layer whose height =
number of backends sampled in that category for the bucket.

Change bucket_stack() to return mean count per bucket (rows per bucket
divided by distinct-sample-timestamps), and set y-limit per subplot to
max(total) + 1 with integer ticks. Linear scale; no normalization.

Effect: pgque/pgq visibly jump from ~1 to ~2 active backends during the
TX phase (the held-xmin session joins, sitting on ClientRead); DELETE-
based systems sit at ~4-5 (their -c 4 consumers plus the producer) and
climb to ~6 during TX.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@NikolayS NikolayS force-pushed the charts-r7-r8-analyzers branch from e0abe13 to b9a2a36 Compare April 30, 2026 10:25
@NikolayS NikolayS marked this pull request as ready for review April 30, 2026 10:25
@NikolayS
Copy link
Copy Markdown
Owner Author

REV review — 5 perspectives (SOC2 skipped)

Anti-leak (HEADLINE) — CLEAN

Engineer's claim of 7 hits removed verified. Ran the full scrub regex against both committed sources and the diff:

  • gitlab.com, sahmed, artifact[_-]registry, @AR, wi[ -]?#?7[67], round[ -][4-9], \bR[4-9]\b, hetzner, @postgres-ai, nik-... — zero hits in benchmark/charts/r8_*.py
  • The only r[4-9] matches are: file/module-name self-reference (r8_analyze.py in module docstring), and the English word "Round/rounded" used for numeric rounding. Both fine.
  • User-facing strings (fig.suptitle, set_title, print(...)) audited: chart captions read "7 Postgres queue systems · 2h (30m clean + 60m held-xmin + 30m recovery) · R=2000/s", "ASH active sessions...", "pgfr deep dive..." — no round labels, no internal IDs.
  • Filenames r8_*.py retained as path structure: confirmed consistent with the r5_analyze.py / r6_smoke_chart.py pattern already on main. This convention is established prior art, not a new leak.
  • PR description still says /tmp/bench_r8_full/<sys>/ but the committed sources scrubbed this to /tmp/bench_full/<sys>/. Fine — PR body isn't shipped.

Security — CLEAN

  • No subprocess, os.system, popen, or shell=True anywhere.
  • No credentials, tokens, env-var reads.
  • File access limited to fixed /tmp/bench_full/... reads + /tmp/bench_* writes. Path inputs are CSV column values used only as dict keys — no unsafe-deserialization, no eval, no pickle.

Bug hunter — minor findings

  1. Unused imports in all three scripts. import sys is never used in r8_analyze.py, r8_ash_analyze.py, r8_pgfr_analyze.py. Also import numpy as np is unused in r8_analyze.py (numpy is genuinely used in the other two). Lint nit.

  2. Docstring filename mismatches. Module docstring on line 2 of r8_ash_analyze.py says """ash_analyze.py — ... and r8_pgfr_analyze.py says """pgfr_analyze.py — .... Files are named r8_ash_analyze.py / r8_pgfr_analyze.py. Cosmetic but visible if anyone runs python3 -c "import r8_ash_analyze; help(r8_ash_analyze)".

  3. Hardcoded paths, no CLI override. All three scripts hardcode base = Path("/tmp/bench_full") and write to fixed /tmp/bench_*_chart.png outputs. No argparse. This matches r5_analyze.py / r6_smoke_chart.py precedent on main, so it's consistent with the existing convention rather than a regression — but a follow-up that accepts --input-dir / --output-dir would make these reusable across rounds without editing source.

  4. Decimal/binary unit mix. fmt_bytes() in r8_pgfr_analyze.py:247 divides by 1e9 / 1e6 (decimal SI) but labels axes as bare T/G/M/k, applied to a WAL bytes axis. CLAUDE.md mandates binary units (KiB/MiB/GiB/TiB) for data sizes. Same issue, smaller surface, in the three print(f"... {size/1024:.0f} KB)") lines (r8_analyze.py:312, r8_ash_analyze.py:257, r8_pgfr_analyze.py:417) — divides by 1024 (binary) but labels KB (decimal). Should be KiB. Recommend dividing by 1024 throughout fmt_bytes and labelling KiB/MiB/GiB/TiB for full CLAUDE.md compliance. The chart-axis labels for WAL throughput (MiB/s) are already correct.

  5. Duplicated find_t0 / find_bench_t0 helper across r8_ash_analyze.py and r8_pgfr_analyze.py. Same regex (^(\d{10})\s), same producer_agg.* glob, same return semantics. Would naturally fold into a small benchmark/charts/_common.py if/when a third caller arrives — leaving as-is is acceptable for two analyzers.

  6. ASH t0 fallback semantics (r8_ash_analyze.py:108-125). When find_t0() returns None (no producer_agg.* log), the first ash.csv row's epoch becomes t0. If the ash sampler started before pgbench (typical), the ASH series will be offset by the sampler warmup, mislabeling the TX phase band. Not a regression — but worth a one-line warn to stderr when this fallback fires.

  7. Backlog approximation in r8_analyze.py:226-235. prod_rate = 2000.0 is hardcoded, comment says "target rate". If a future round changes -R, this silently mismodels the producer curve. Consider sourcing from producer.log (pgbench prints scale + rate) or making it a module-level RATE_HZ constant near TX_START_MIN for visibility.

  8. np.array([]) vs pts.size (r8_pgfr_analyze.py:200). if len(pts) < 2: return np.array([]), np.array([]) — fine, but later checks use xs_wal.size which on np.array([]) is 0. Consistent and correct, just noting.

Test analyzer — n/a (acceptable)

No automated tests on the analyzers, no fixture inputs, no smoke command in README. Consistent with r5/r6 precedent. Engineer ran each script locally per Test plan checklist; outputs landed in the (private) bench WI. Recommend, as a v0.2 follow-up, a tiny benchmark/charts/fixtures/ with a 10-line synthetic events_consumed_per_sec.csv and a smoke command (python3 r8_analyze.py against the fixture) so CI can at least compile + run them without crashing.

Guidelines compliance

  • No shell scripts in this PR — set -Eeuo pipefail rule n/a.
  • Conventional Commits: 3 commits, charts: and chore(charts): prefixes — both fine (slight nit: charts: is project-defined, not stock cc; matches existing repo usage so OK).
  • Binary units: see Bug hunter Assemble pgque-install.sql and pgque-uninstall.sql with idempotency #4. Axis labels good; print-line + fmt_bytes off.
  • No SQL in this PR — schema-qualification rule n/a.
  • Python style: PEP 8 mostly fine, semicolon-chained statements (BG = "..."; SURF = "...") are unusual but match r6_smoke_chart.py precedent.

Docs — gap

benchmark/README.md has a charts/ block listing r5_analyze.py and r6_smoke_chart.py. This PR adds three new analyzers but does NOT update that block. One-line additions for each new script (matching the 80-col comment format already there) would close the gap. No benchmark/charts/README.md exists — not blocking, but the engineer's own observation that "future bench rounds can reuse them instead of writing fresh scripts every time" only holds if someone can FIND them.

CI

8/9 checks green (test 14/15/16/17/18, verify, plus secondary tooling). claude-review and client-smoke pending — orthogonal to this PR's content (chart scripts don't touch core SQL or clients).

Confidence + classification

  • Anti-leak: clean (high confidence — full regex sweep, user-facing strings audited).
  • Security: clean (high confidence — no shell, no eval, no network).
  • Bug risk: low (analysis-only scripts, fail-graceful on missing CSVs).
  • CLAUDE.md compliance: minor binary-unit nit (Assemble pgque-install.sql and pgque-uninstall.sql with idempotency #4 above).
  • Test/doc gap: real but precedent-consistent.

Classification: LGTM with minor nits, none blocking. Strongest single ask: fix the binary-unit issue in fmt_bytes + the three print lines (#4) — that's a CLAUDE.md hard rule, easy fix. Everything else is follow-up material.

Posting review only — not approving, not merging, per scope.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@NikolayS
Copy link
Copy Markdown
Owner Author

Comment-scrub sweep applied per CLAUDE.md style: no scrub needed — all comments in the new chart scripts describe algorithmic choices or input/output format, none are bug-narrative.

@NikolayS NikolayS merged commit d4d9946 into main Apr 30, 2026
8 checks passed
@NikolayS NikolayS deleted the charts-r7-r8-analyzers branch April 30, 2026 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant