docs: bench methodology + tooling under benchmark/#66
Conversation
NikolayS
left a comment
There was a problem hiding this comment.
PR #66 Security Audit — NikolayS/pgque docs/bench-methodology
Audited commit: c8618af (tip of origin/docs/bench-methodology)
Scope: 39 files / 3,366 LOC under benchmark/
Auditor: automated scan (read-only, no modifications)
Summary
PASS — no secrets found. Ready to mark ready-for-review.
Hard secrets scan
| Pattern | Findings |
|---|---|
AWS access keys (AKIA…) |
0 |
AWS secret key references (aws_secret_access_key / aws_access_key_id) |
0 |
GitHub tokens (ghp_, gho_, ghs_, github_pat_) |
0 |
GitLab tokens (glpat-…) |
0 |
Slack tokens (xoxb-…) |
0 |
Private key headers (BEGIN OPENSSH/RSA/EC/PRIVATE/DSA) |
0 |
PG credential files (pg_service.conf, .pgpass) |
0 |
Quoted password = '...' |
0 |
Postgres URIs with embedded password (postgres://user:pw@…) |
0 |
Bearer … tokens |
0 |
| Raw 64-hex tokens | 0 |
Soft leakage
| Pattern | Findings | Notes |
|---|---|---|
| Public IPv4 addresses | 0 | Only 127.0.0.1 appears; no RFC1918-external, no public IPs |
arn:aws:… |
0 | |
| 12-digit AWS account IDs | 0 | |
~/.ssh/… or *.pem paths |
0 | |
EC2 public hostnames (ec2-…compute.amazonaws.com) |
0 | |
EC2 instance IDs (i-…) |
0 | |
PGPASSWORD, export *_TOKEN=…, curl Authorization headers |
0 | |
Inline ssh user@host / scp … commands |
0 | |
| Base64-looking long strings | 1 (false positive — Go import path com/riverqueue/river/riverdriver/riverpgxv5 in install/install_river.sh:34) |
IPv4 references (all loopback, expected)
All 12 IPv4 hits in the tree are 127.0.0.1 in passwordless local DSNs, consistent with the pg_hba trust-for-localhost bootstrap in install/bootstrap.sh:77. These are expected for single-VM benchmarks:
tooling/idle_in_tx.py:8,tooling/bloat_sampler.py:10,tooling/pg_stat_statements_snapshot.py:16,tooling/pgq_ticker_daemon.py:9install/install_river.sh:13,53,install/install_pgboss.sh:13,61install/bootstrap.sh:77runners/run_r7.sh:55,62,69
Historical commits
- Commits touching
benchmark/on this branch: 1 (c8618af) - Removed-but-still-in-history secrets: none
- Full
git log -p -- 'benchmark/**'scanned forAKIA|ghp_|gho_|ghs_|glpat|xoxb|BEGIN (OPENSSH|RSA|EC|PRIVATE|DSA)|aws_secret|bearer— zero matches
No squash / force-push required.
File-type specific checks
- Shell scripts (
install/*.sh,runners/*.sh,tooling/*.sh): no hardcoded credential env vars - Python (
tooling/*.py,charts/*.py,gifs/*.py): all DSNs are passwordless localhost; nodsn=…password=…orpsycopg2.connect(…, password=…) - Markdown (
README.md,METHODOLOGY.md,HARDWARE.md,OPS_GOTCHAS.md,install/README.md): zero hits onAWS_,aws_access_key_id,EC2_,S3_BUCKET - SQL (
consumers/*.sql,producers/*.sql,install/pgmq-partitioned_setup_5min.sql): no credentials
Recommendation
PASS — the branch is safe to mark ready-for-review. No redaction, no force-push needed.
Non-blocking observations (not required for merge):
- DSNs throughout
tooling/*.pyandinstall/*.shassumeuser=postgreswith trust auth on127.0.0.1. This is correct for the documented benchmark VM setup but worth a one-line note inbenchmark/README.mdso that copy-pasters don't run these against a prod-like host. Already partially covered byOPS_GOTCHAS.md— minor. - None of the scripts take credentials from env vars (no
PGPASSWORDfallbacks). If the methodology is ever extended to RDS/remote PG, reviewers will wantos.environ.get("PGPASSWORD")hooks rather than hardcoded DSN strings. Not a security issue today.
Scan methodology: hard-pattern regex (AWS/GitHub/GitLab/Slack tokens, private-key headers, quoted passwords, DSN-embedded credentials, Bearer tokens, 64-hex), soft-pattern regex (public IPv4, ARN, 12-digit account IDs, SSH/PEM paths, EC2 hostnames & instance IDs, base64-looking blobs), env-var extraction, and full branch git log -p sweep.
Three standalone Python scripts that consume per-system bench output at /tmp/bench_r8_full/<sys>/ and produce the Solarized-Dark chart set used in the R8 review post (issue #77). - r8_analyze.py: 6-panel overlay across 7 systems (throughput, bloat, CPU, NVMe write, true backlog, delivery-lag p99). LINEAR y-axes everywhere; p99 lag clipped at 5s (no log scale). Backlog column is producer_total - consumer_total, not n_live_tup snapshot. - r8_ash_analyze.py: per-system stacked-area of ASH wait-event categories (CPU* / IO / LWLock / Lock / Client / IPC / Activity / Other) over 2h, 1-minute buckets, LINEAR 0-1.0 proportion. - r8_pgfr_analyze.py: 4-column-x-7-row pgfr deep-dive. Col 1 top-5 queries by cumulative total_exec_time with actual truncated query text (DO blocks unwrapped to first PERFORM/SELECT/UPDATE/DELETE/INSERT statement — no more opaque q1/q2/q3 labels). Col 2 per-query buffer hit rate. Col 3 per-query wal_bytes. Col 4 global WAL rate MiB/s + active backends twin-axis. Falls back to pgss.csv for systems without pgfr installed. Styling (Solarized Dark rcParams block, phase bands, legend placement) inherits from benchmark/charts/r6_smoke_chart.py in PR #66. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
c8618af to
03bc107
Compare
REV Review — PR #66 (post-scrub)CI: 7/8 pass (test 14/15/16/17/18, client-smoke, verify); claude-review pending Blocking
Non-blocking
Potential
Summary
REV-style review (security, bugs, tests, guidelines, docs). SOC2 items skipped per project policy. Anti-leak independently re-verified. |
…ark/ Adds a strictly-additive benchmark/ directory documenting the methodology, tooling, and operational lessons from the pgque-vs-pgq-vs-pgmq-vs-river-vs-que-vs-pgboss-vs-pgmq-partitioned bench that backs #61 and PR #62. - README.md: entry point + quick-start - METHODOLOGY.md: methodology fix per review feedback - OPS_GOTCHAS.md: 15 operational lessons (NEW — NVMe mount, partman stale rows, que func leftovers, pgboss covering index, pgq ticker, pgque xid8 bug, spot reclaim, ASH prereqs, NOTICE instrumentation, etc.) - HARDWARE.md: i4i.2xlarge specs, PG tuning, microbench baselines - tooling/, runners/, consumers/, producers/, install/, charts/, gifs/ No pgque production SQL is touched. Refs: #61, #62. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- benchmark/runners/fix_nvme_mount.sh: switch to /usr/bin/env bash shebang; use set -Eeuo pipefail (was set -euo pipefail without -E) - benchmark/runners/run_r7.sh: add -Ee flags to existing pipefail - benchmark/runners/clean_reinstall.sh: read -r in two while-loops Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove all references to private internal URLs, round numbering (R4/R5/R6/R8), and private repository paths from benchmark/ files. - METHODOLOGY.md: drop internal URL + note IDs from header; remove internal posting-style section (§9); neutralize round refs; fix /tmp/bench_r<N> path reference - README.md: drop internal reference comment - HARDWARE.md: fix binary units (GB→GiB, TB→TiB); drop R7 round ref - OPS_GOTCHAS.md: neutralize R4/R6 round refs in lessons; fix binary units (GB→GiB, MB/s→MiB/s) - consumers/*.sql: drop "R6 instrumented" prefix from all 7 files - runners/run_r7.sh: remove R6/R7 round refs from inline comments - tooling/sys_metrics_sampler.py: remove R7 from docstring - tooling/parse_events_consumed.py: remove R6 from docstring + msg - charts/r5_analyze.py, r6_smoke_chart.py: remove Rn from docstrings, chart titles, and file-size output (KB→KiB) PR description updated separately via gh pr edit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove private WI refs from bootstrap.sh:41,44 (replace with "see methodology notes") - Fix set -Eeuo pipefail in 7 shell scripts that only had partial flags - Fix broken OPS_GOTCHAS.md:185 link (install_pgque.sh → install/README.md) - Fix binary unit in install_pgboss.sh:2 (GB → GiB) - Fix run_r7.sh tool paths: resolve from benchmark/tooling/ by default instead of hardcoded /tmp/r7 and /tmp/r6; override via R7_DIR/R6_DIR Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
03bc107 to
d586e35
Compare
REV Review — PR #66 (round 2, post-loop)CI: 7/8 SUCCESS (test 14/15/16/17/18, client-smoke, verify);
Independent anti-leak re-scan: CLEAN Pattern set (excluding the documented vendored-data exception):
BlockingNone. Non-blocking
Potential issuesNone new in r2. Summary table
Anti-leak independent verification
VerdictREADY FOR USER REVIEW Both R1 BLOCKING anti-leak findings are resolved at history + working-tree + PR-metadata level. The two BLOCKING items from R1 are gone, the four non-blocking items are addressed (shell-style, broken link, binary units, R-round scrub), and r2's independent re-scan is clean. Pending CI: REV-style review (security, bugs, tests, guidelines, docs). SOC2 items skipped per project policy. Anti-leak independently re-verified on commit history, diff, working tree, PR body, and PR comments. |
- logging_collector=off does not mean zero log I/O; journald still writes to disk (#123) - premake=20 planner cost is first-query-in-session, not per-query; root cause of steady-state regression is a follow-up (#124) - add RAISE NOTICE observer-effect caveat for high-frequency use (#127) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three standalone Python scripts that consume per-system bench output at /tmp/bench_r8_full/<sys>/ and produce the Solarized-Dark chart set used in the R8 review post (issue #77). - r8_analyze.py: 6-panel overlay across 7 systems (throughput, bloat, CPU, NVMe write, true backlog, delivery-lag p99). LINEAR y-axes everywhere; p99 lag clipped at 5s (no log scale). Backlog column is producer_total - consumer_total, not n_live_tup snapshot. - r8_ash_analyze.py: per-system stacked-area of ASH wait-event categories (CPU* / IO / LWLock / Lock / Client / IPC / Activity / Other) over 2h, 1-minute buckets, LINEAR 0-1.0 proportion. - r8_pgfr_analyze.py: 4-column-x-7-row pgfr deep-dive. Col 1 top-5 queries by cumulative total_exec_time with actual truncated query text (DO blocks unwrapped to first PERFORM/SELECT/UPDATE/DELETE/INSERT statement — no more opaque q1/q2/q3 labels). Col 2 per-query buffer hit rate. Col 3 per-query wal_bytes. Col 4 global WAL rate MiB/s + active backends twin-axis. Falls back to pgss.csv for systems without pgfr installed. Styling (Solarized Dark rcParams block, phase bands, legend placement) inherits from benchmark/charts/r6_smoke_chart.py in PR #66. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* charts: add R8 analyzers (main + ASH + pgfr) to benchmark/charts/ Three standalone Python scripts that consume per-system bench output at /tmp/bench_r8_full/<sys>/ and produce the Solarized-Dark chart set used in the R8 review post (issue #77). - r8_analyze.py: 6-panel overlay across 7 systems (throughput, bloat, CPU, NVMe write, true backlog, delivery-lag p99). LINEAR y-axes everywhere; p99 lag clipped at 5s (no log scale). Backlog column is producer_total - consumer_total, not n_live_tup snapshot. - r8_ash_analyze.py: per-system stacked-area of ASH wait-event categories (CPU* / IO / LWLock / Lock / Client / IPC / Activity / Other) over 2h, 1-minute buckets, LINEAR 0-1.0 proportion. - r8_pgfr_analyze.py: 4-column-x-7-row pgfr deep-dive. Col 1 top-5 queries by cumulative total_exec_time with actual truncated query text (DO blocks unwrapped to first PERFORM/SELECT/UPDATE/DELETE/INSERT statement — no more opaque q1/q2/q3 labels). Col 2 per-query buffer hit rate. Col 3 per-query wal_bytes. Col 4 global WAL rate MiB/s + active backends twin-axis. Falls back to pgss.csv for systems without pgfr installed. Styling (Solarized Dark rcParams block, phase bands, legend placement) inherits from benchmark/charts/r6_smoke_chart.py in PR #66. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * charts: ASH y-axis shows active-session COUNT (standard ASH convention) Previous proportion-based (0..1.0) rendering obscured the actual workload difference. User feedback: standard ASH views plot the count of active sessions, with each wait-event category as a stack layer whose height = number of backends sampled in that category for the bucket. Change bucket_stack() to return mean count per bucket (rows per bucket divided by distinct-sample-timestamps), and set y-limit per subplot to max(total) + 1 with integer ticks. Linear scale; no normalization. Effect: pgque/pgq visibly jump from ~1 to ~2 active backends during the TX phase (the held-xmin session joins, sitting on ClientRead); DELETE- based systems sit at ~4-5 (their -c 4 consumers plus the producer) and climb to ~6 during TX. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(charts): scrub round labels from captions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * chore(charts): use binary units + update README index Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Nik Samokhvalov <nik@Niks-MacBook-Pro.local> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Adds a complete `benchmark/` directory documenting the methodology, tooling, and operational lessons from the pgque-vs-pgq-vs-pgmq-vs-river-vs-que-vs-pgboss-vs-pgmq-partitioned bench that backs #61 (the held-xmin bloat issue) and #62 (the subscription/tick rotation fix).
The content is strictly additive — no pgque production SQL is touched. Everything new sits under `benchmark/`.
Contents
Why now
PR #62 is the rotation fix for the held-xmin bloat pattern from #61. This PR is the evidence PR: the harness that proves #62 works, the ops playbook so a reviewer can reproduce, and the catalog of operational surprises we hit so the next person doesn't rediscover them.
Sanitation
[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}matches in committed files are127.0.0.1(DB connections on the VM itself) — no AWS IPs, no public IPs.<pgque-ip>,<your-ssh-key>placeholders.127.0.0.1connection strings since that's operationally correct for local psql / pgbench.Test plan
METHODOLOGY.mdcross-references (all in-doc links resolve to files that exist underbenchmark/).OPS_GOTCHAS.mdfor completeness — any ops lesson missing?install/bootstrap.shfor any leaked IPs or tokens before merge.runners/run_r7.sh pgqueon a fresh i4i.2xlarge once PR Rotate subscription and tick tables to avoid held-xmin bloat (#61) #62 lands.Refs: #61, #62.