charts: add R8 analyzers to benchmark/charts/ by NikolayS · Pull Request #72 · NikolayS/pgque

NikolayS · 2026-04-19T03:23:14Z

Summary

Commits the R8 chart-analysis scripts (main + ASH + pgfr) into benchmark/charts/ so future bench rounds can reuse them instead of writing fresh scripts every time. Per user feedback on issue #77: "why every time graphs are new? why don't you use same scripts? they are in git".

Three standalone scripts consume per-system bench output at /tmp/bench_r8_full/<sys>/ and produce the Solarized-Dark chart set used in the R8 review post:

r8_analyze.py — 6-panel overlay across 7 systems (throughput, bloat, CPU, NVMe write, true backlog, delivery-lag p99). LINEAR y-axes everywhere; p99 lag clipped at 5s (no log scale). Backlog column is producer_total - consumer_total from pgbench + NOTICE parser, not the old n_live_tup snapshot which mislabelled rotation-owned live rows as backlog.
r8_ash_analyze.py — per-system stacked-area of ASH wait-event categories (CPU* / IO / LWLock / Lock / Client / IPC / Activity / Other) over 2h, 1-minute buckets, LINEAR 0-1.0 proportion. Fixes the timestamp parse bug (+00 vs +00:00) that made all systems render "no ash.csv" in the previous round.
r8_pgfr_analyze.py — 4-column × 7-row pgfr deep-dive:
- Col 1: top-5 queries by cumulative total_exec_time with actual truncated query text (DO blocks unwrapped to first PERFORM|SELECT|UPDATE|DELETE|INSERT statement — no more opaque q1/q2/q3 labels).
- Col 2: per-query buffer hit rate = shared_blks_hit / (hit + read); green ≥99%, yellow ≥95%, red below.
- Col 3: per-query wal_bytes — WAL amplification per query class.
- Col 4: global WAL rate MiB/s (from pgfr_snapshots.wal_bytes) + active-backends count twin-axis.
- Falls back to pgss.csv (point-in-time dump) for systems where pgfr_record isn't installed.

Styling (Solarized Dark rcParams block, phase bands, legend placement) inherits from benchmark/charts/r6_smoke_chart.py in PR #66.

Test plan

python3 benchmark/charts/r8_analyze.py — produces /tmp/r8_main_chart.png + summary/table; 7 systems all render; no log axes.
python3 benchmark/charts/r8_ash_analyze.py — produces /tmp/r8_ash_chart.png; per-system samples counted correctly (11k-35k per system).
python3 benchmark/charts/r8_pgfr_analyze.py — produces /tmp/r8_pgfr_chart.png; fallback to pgss.csv where pgfr not installed (pgq/pgmq/river in R8).
Charts posted to gitlab.com/postgres-ai/postgresql-consulting/tests-and-benchmarks#77 note 3264187736 (reviewer-visible).
No set_yscale('log') or symlog anywhere — user's hard rule.

Draft — staged for next bench round to confirm scripts run on R9 dataset before merge.

🤖 Generated with Claude Code

Three standalone Python scripts that consume per-system bench output at /tmp/bench_r8_full/<sys>/ and produce the Solarized-Dark chart set used in the R8 review post (issue #77). - r8_analyze.py: 6-panel overlay across 7 systems (throughput, bloat, CPU, NVMe write, true backlog, delivery-lag p99). LINEAR y-axes everywhere; p99 lag clipped at 5s (no log scale). Backlog column is producer_total - consumer_total, not n_live_tup snapshot. - r8_ash_analyze.py: per-system stacked-area of ASH wait-event categories (CPU* / IO / LWLock / Lock / Client / IPC / Activity / Other) over 2h, 1-minute buckets, LINEAR 0-1.0 proportion. - r8_pgfr_analyze.py: 4-column-x-7-row pgfr deep-dive. Col 1 top-5 queries by cumulative total_exec_time with actual truncated query text (DO blocks unwrapped to first PERFORM/SELECT/UPDATE/DELETE/INSERT statement — no more opaque q1/q2/q3 labels). Col 2 per-query buffer hit rate. Col 3 per-query wal_bytes. Col 4 global WAL rate MiB/s + active backends twin-axis. Falls back to pgss.csv for systems without pgfr installed. Styling (Solarized Dark rcParams block, phase bands, legend placement) inherits from benchmark/charts/r6_smoke_chart.py in PR #66. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Previous proportion-based (0..1.0) rendering obscured the actual workload difference. User feedback: standard ASH views plot the count of active sessions, with each wait-event category as a stack layer whose height = number of backends sampled in that category for the bucket. Change bucket_stack() to return mean count per bucket (rows per bucket divided by distinct-sample-timestamps), and set y-limit per subplot to max(total) + 1 with integer ticks. Linear scale; no normalization. Effect: pgque/pgq visibly jump from ~1 to ~2 active backends during the TX phase (the held-xmin session joins, sitting on ClientRead); DELETE- based systems sit at ~4-5 (their -c 4 consumers plus the producer) and climb to ~6 during TX. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

NikolayS · 2026-04-30T10:30:05Z

REV review — 5 perspectives (SOC2 skipped)

Anti-leak (HEADLINE) — CLEAN

Engineer's claim of 7 hits removed verified. Ran the full scrub regex against both committed sources and the diff:

gitlab.com, sahmed, artifact[_-]registry, @AR, wi[ -]?#?7[67], round[ -][4-9], \bR[4-9]\b, hetzner, @postgres-ai, nik-... — zero hits in benchmark/charts/r8_*.py
The only r[4-9] matches are: file/module-name self-reference (r8_analyze.py in module docstring), and the English word "Round/rounded" used for numeric rounding. Both fine.
User-facing strings (fig.suptitle, set_title, print(...)) audited: chart captions read "7 Postgres queue systems · 2h (30m clean + 60m held-xmin + 30m recovery) · R=2000/s", "ASH active sessions...", "pgfr deep dive..." — no round labels, no internal IDs.
Filenames r8_*.py retained as path structure: confirmed consistent with the r5_analyze.py / r6_smoke_chart.py pattern already on main. This convention is established prior art, not a new leak.
PR description still says /tmp/bench_r8_full/<sys>/ but the committed sources scrubbed this to /tmp/bench_full/<sys>/. Fine — PR body isn't shipped.

Security — CLEAN

No subprocess, os.system, popen, or shell=True anywhere.
No credentials, tokens, env-var reads.
File access limited to fixed /tmp/bench_full/... reads + /tmp/bench_* writes. Path inputs are CSV column values used only as dict keys — no unsafe-deserialization, no eval, no pickle.

Bug hunter — minor findings

Unused imports in all three scripts. import sys is never used in r8_analyze.py, r8_ash_analyze.py, r8_pgfr_analyze.py. Also import numpy as np is unused in r8_analyze.py (numpy is genuinely used in the other two). Lint nit.
Docstring filename mismatches. Module docstring on line 2 of r8_ash_analyze.py says """ash_analyze.py — ... and r8_pgfr_analyze.py says """pgfr_analyze.py — .... Files are named r8_ash_analyze.py / r8_pgfr_analyze.py. Cosmetic but visible if anyone runs python3 -c "import r8_ash_analyze; help(r8_ash_analyze)".
Hardcoded paths, no CLI override. All three scripts hardcode base = Path("/tmp/bench_full") and write to fixed /tmp/bench_*_chart.png outputs. No argparse. This matches r5_analyze.py / r6_smoke_chart.py precedent on main, so it's consistent with the existing convention rather than a regression — but a follow-up that accepts --input-dir / --output-dir would make these reusable across rounds without editing source.
Decimal/binary unit mix. fmt_bytes() in r8_pgfr_analyze.py:247 divides by 1e9 / 1e6 (decimal SI) but labels axes as bare T/G/M/k, applied to a WAL bytes axis. CLAUDE.md mandates binary units (KiB/MiB/GiB/TiB) for data sizes. Same issue, smaller surface, in the three print(f"... {size/1024:.0f} KB)") lines (r8_analyze.py:312, r8_ash_analyze.py:257, r8_pgfr_analyze.py:417) — divides by 1024 (binary) but labels KB (decimal). Should be KiB. Recommend dividing by 1024 throughout fmt_bytes and labelling KiB/MiB/GiB/TiB for full CLAUDE.md compliance. The chart-axis labels for WAL throughput (MiB/s) are already correct.
Duplicated find_t0 / find_bench_t0 helper across r8_ash_analyze.py and r8_pgfr_analyze.py. Same regex (^(\d{10})\s), same producer_agg.* glob, same return semantics. Would naturally fold into a small benchmark/charts/_common.py if/when a third caller arrives — leaving as-is is acceptable for two analyzers.
ASH t0 fallback semantics (r8_ash_analyze.py:108-125). When find_t0() returns None (no producer_agg.* log), the first ash.csv row's epoch becomes t0. If the ash sampler started before pgbench (typical), the ASH series will be offset by the sampler warmup, mislabeling the TX phase band. Not a regression — but worth a one-line warn to stderr when this fallback fires.
Backlog approximation in r8_analyze.py:226-235. prod_rate = 2000.0 is hardcoded, comment says "target rate". If a future round changes -R, this silently mismodels the producer curve. Consider sourcing from producer.log (pgbench prints scale + rate) or making it a module-level RATE_HZ constant near TX_START_MIN for visibility.
np.array([]) vs pts.size (r8_pgfr_analyze.py:200). if len(pts) < 2: return np.array([]), np.array([]) — fine, but later checks use xs_wal.size which on np.array([]) is 0. Consistent and correct, just noting.

Test analyzer — n/a (acceptable)

No automated tests on the analyzers, no fixture inputs, no smoke command in README. Consistent with r5/r6 precedent. Engineer ran each script locally per Test plan checklist; outputs landed in the (private) bench WI. Recommend, as a v0.2 follow-up, a tiny benchmark/charts/fixtures/ with a 10-line synthetic events_consumed_per_sec.csv and a smoke command (python3 r8_analyze.py against the fixture) so CI can at least compile + run them without crashing.

Guidelines compliance

No shell scripts in this PR — set -Eeuo pipefail rule n/a.
Conventional Commits: 3 commits, charts: and chore(charts): prefixes — both fine (slight nit: charts: is project-defined, not stock cc; matches existing repo usage so OK).
Binary units: see Bug hunter Assemble pgque-install.sql and pgque-uninstall.sql with idempotency #4. Axis labels good; print-line + fmt_bytes off.
No SQL in this PR — schema-qualification rule n/a.
Python style: PEP 8 mostly fine, semicolon-chained statements (BG = "..."; SURF = "...") are unusual but match r6_smoke_chart.py precedent.

Docs — gap

benchmark/README.md has a charts/ block listing r5_analyze.py and r6_smoke_chart.py. This PR adds three new analyzers but does NOT update that block. One-line additions for each new script (matching the 80-col comment format already there) would close the gap. No benchmark/charts/README.md exists — not blocking, but the engineer's own observation that "future bench rounds can reuse them instead of writing fresh scripts every time" only holds if someone can FIND them.

CI

8/9 checks green (test 14/15/16/17/18, verify, plus secondary tooling). claude-review and client-smoke pending — orthogonal to this PR's content (chart scripts don't touch core SQL or clients).

Confidence + classification

Anti-leak: clean (high confidence — full regex sweep, user-facing strings audited).
Security: clean (high confidence — no shell, no eval, no network).
Bug risk: low (analysis-only scripts, fail-graceful on missing CSVs).
CLAUDE.md compliance: minor binary-unit nit (Assemble pgque-install.sql and pgque-uninstall.sql with idempotency #4 above).
Test/doc gap: real but precedent-consistent.

Classification: LGTM with minor nits, none blocking. Strongest single ask: fix the binary-unit issue in fmt_bytes + the three print lines (#4) — that's a CLAUDE.md hard rule, easy fix. Everything else is follow-up material.

Posting review only — not approving, not merging, per scope.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

NikolayS · 2026-04-30T10:42:16Z

Comment-scrub sweep applied per CLAUDE.md style: no scrub needed — all comments in the new chart scripts describe algorithmic choices or input/output format, none are bug-narrative.

This was referenced Apr 30, 2026

Sprint v0.2.0 — Drivers, CI, README hero #85

Closed

W7: PR #72 polish — R8 chart analyzers (blocked by W5/PR #66) #93

Closed

Nik Samokhvalov and others added 3 commits April 30, 2026 03:23

chore(charts): scrub round labels from captions

b9a2a36

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

NikolayS force-pushed the charts-r7-r8-analyzers branch from e0abe13 to b9a2a36 Compare April 30, 2026 10:25

NikolayS marked this pull request as ready for review April 30, 2026 10:25

chore(charts): use binary units + update README index

b79bb50

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

NikolayS merged commit d4d9946 into main Apr 30, 2026
8 checks passed

NikolayS deleted the charts-r7-r8-analyzers branch April 30, 2026 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

charts: add R8 analyzers to benchmark/charts/#72

charts: add R8 analyzers to benchmark/charts/#72
NikolayS merged 4 commits intomainfrom
charts-r7-r8-analyzers

NikolayS commented Apr 19, 2026

Uh oh!

NikolayS commented Apr 30, 2026

Uh oh!

NikolayS commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NikolayS commented Apr 19, 2026

Summary

Test plan

Uh oh!

NikolayS commented Apr 30, 2026

REV review — 5 perspectives (SOC2 skipped)

Anti-leak (HEADLINE) — CLEAN

Security — CLEAN

Bug hunter — minor findings

Test analyzer — n/a (acceptable)

Guidelines compliance

Docs — gap

CI

Confidence + classification

Uh oh!

NikolayS commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant