NewGraphEnvironment · NewGraphEnvironment · May 5, 2026 · May 4, 2026 · May 4, 2026 · May 4, 2026
diff --git a/.claude/visibility b/.claude/visibility
@@ -0,0 +1 @@
+public
diff --git a/CLAUDE.md b/CLAUDE.md
diff --git a/docker/docker-compose.yml b/docker/docker-compose.yml
@@ -33,6 +33,9 @@ services:
         -c max_parallel_workers_per_gather=8
         -c max_parallel_workers=14
         -c max_worker_processes=16
+        -c random_page_cost=1.1
+        -c effective_io_concurrency=200
+        -c temp_buffers=64MB
     volumes:
       - ./postgres-data:/var/lib/postgresql/data
     ports:

diff --git a/docker/tuning.md b/docker/tuning.md
@@ -21,6 +21,26 @@ Tuning notes for the local Docker fwapg instance. Settings in `docker-compose.ym
 | `max_parallel_workers` | 14 | Leave 2 cores for OS/Docker |
 | `max_worker_processes` | 16 | Match core count |
 | `shm_size` | 36gb | Must exceed shared_buffers (Docker constraint) |
+| `random_page_cost` | 1.1 | SSD: random reads ≈ sequential. Default 4.0 (spinning rust) over-prices index scans, biases planner toward seq scans |
+| `effective_io_concurrency` | 200 | SSD/NVMe: hundreds of concurrent requests. Default 1 under-issues prefetch on bitmap heap scans |
+| `temp_buffers` | 64MB | Per-session temp-table memory. Default 8MB spills working tables (per-WSG segments, breaks, scratch) to disk gratuitously on a 128 GB box |
+
+## SSD assumption
+
+All NewGraph hosts run on SSD (M4 NVMe, M1 Colima virtiofs over APFS,
+cypher DO block storage). The `random_page_cost`,
+`effective_io_concurrency`, and `temp_buffers` defaults above bias the
+planner toward index scans for segment-keyed lookups (the
+`WHERE blue_line_key = … AND drm <= …` against `streams_breaks` hot
+path in link's pipeline). Out-of-the-box PostgreSQL defaults (4.0 / 1
+/ 8MB) are calibrated for spinning rust and are wrong for any modern
+host. Don't lower these without a spinning-disk reason.
+
+**M1/cypher hosts use a docker-compose override file.** Compose merges
+override `command:` lists by **replacing** the base, not by appending —
+so any `-c` flags added here must also be added to the M1/cypher
+override (tracked in the rtj repo). Forgetting this silently leaves
+the 32 GB hosts on PostgreSQL defaults.
 
 ## Scaling for other machines
 

diff --git a/planning/active/findings.md b/planning/active/findings.md
@@ -1,63 +1,85 @@
-# Findings — fresh#158
+# Findings — pg tuning: SSD planner-cost defaults (#199)
 
-## Diagnosis (2026-05-01)
+## Issue context
 
-bcfishpass's per-species rear rule has an inline OR clause:
+### Problem
 
-```sql
-(cw.channel_width >= t.rear_channel_width_min OR
- (s.stream_order_parent >= 5 AND s.stream_order = 1))
-```
+`fresh/docker/docker-compose.yml` doesn't set `random_page_cost` or
+`effective_io_concurrency`, so postgres falls through to PostgreSQL
+defaults of `4.0` and `1` — both calibrated for spinning rust. All
+NewGraph hosts run on SSD (M4 NVMe, M1 Colima virtiofs over APFS,
+cypher DO block storage). With `random_page_cost=4`, the planner
+systematically biases away from index scans for segment-keyed lookups
+in link's pipeline (`WHERE blue_line_key = … AND drm <= …` against
+`streams_breaks` is the hot path).
 
-at [`bcfishpass/model/02_habitat_linear/sql/load_habitat_linear_bt.sql`](https://github.com/smnorris/bcfishpass/blob/main/model/02_habitat_linear/sql/load_habitat_linear_bt.sql) lines 89–96 (BT — same pattern in CH/CO/ST/WCT). Direct order-1 tributaries of order-5+ mainstems get credited as rearing **even when cw < rear_min**.
+Verified on M4 + M1 + cypher today (2026-05-04) — all three show
+`random_page_cost=4`, `effective_io_concurrency=1`, `temp_buffers=8MB`.
 
-Biology: small tribs of large rivers support juvenile rearing despite small FWA-measured channel width — parent supplies flow / temperature / access; cool tributary water mixes at confluence; backwater + off-channel habitat near the mouth is high-value.
+### Proposed
 
-fresh has no implementation. link's `dimensions.csv::rear_stream_order_bypass = no` for all species in both bundles (correctly anticipating fresh has nothing to read). Provincial parity baseline (link 0.20.1) shows this gap on HORS / COLR / KHOR / CLRH / etc — Class B in `link/research/provincial_parity_2026_05_01.md`.
+Add to the `db.command` block in base `docker-compose.yml`:
 
-## Function design (per issue body)
+```yaml
+-c random_page_cost=1.1
+-c effective_io_concurrency=200
+-c temp_buffers=64MB
+```
 
-`frs_order_child(conn, table, habitat, species, label = "rearing", parent_order_min = 5, child_order_min = NULL, child_order_max = NULL, distance_max = NULL)`
+Document in `fresh/docker/tuning.md` alongside the existing memory
+rationale: SSD cost values, why they matter for link's segment-heavy
+SQL.
 
-Post-classification UPDATE:
+### Verification
 
-```sql
-UPDATE habitat
-SET <label> = TRUE
-FROM streams s
-WHERE habitat.id_segment = s.id_segment
-  AND habitat.species_code = '<species>'
-  AND habitat.accessible = TRUE
-  AND habitat.<label> IS NOT TRUE
-  AND s.stream_order = s.stream_order_max         -- direct child
-  AND s.stream_order_parent >= <parent_order_min> -- of large river
-  AND s.stream_order >= <child_order_min>         -- if set
-  AND s.stream_order <= <child_order_max>         -- if set
-  AND (<distance_max> IS NULL OR
-       s.downstream_route_measure <= <distance_max>);
-```
+Post-merge, M4 reruns one of link's provincial-style runs. Compare
+per-WSG wall times in `data-raw/logs/<TS>_per_wsg_times.csv` against
+today's baseline (2026-05-04 `default_extrabreaks` run,
+`data-raw/logs/provincial_default_extrabreaks/<TS>_per_wsg_times.csv`).
+Acceptance: median per-WSG wall reduced by ≥ 10 % at unchanged segment
+count.
+
+### Cross-ref
+
+Companion rtj issue for the M1/cypher override file — the override's
+`command:` list REPLACES the base under docker-compose's merge
+semantics, so settings added here don't propagate to the 32 GB hosts.
+Same change must land in both.
 
-Direct-child filter: `s.stream_order = s.stream_order_max` ensures we stop at the order-change point (once the segment's order would exceed `stream_order_max`, you're no longer on the direct-child reach). Captures "small tributary directly into a large parent."
+## Setting-by-setting rationale
 
-`accessible = TRUE` guard: never adds rearing on segments above a definite barrier.
+### `random_page_cost = 1.1`
 
-`<label> IS NOT TRUE` guard: idempotent + additive — already-classified segments untouched.
+PostgreSQL default is `4.0` — the cost ratio for a random vs
+sequential page read on spinning rust (~4× slower for random I/O).
+On SSD, random reads are nearly as cheap as sequential. PostgreSQL
+docs and tuning consensus recommend `1.1` to `1.5` for SSD. With the
+default, the planner systematically over-prices index scans and
+prefers seq scans even when an index would be faster — exactly the
+wrong bias for segment-keyed lookups.
 
-## Pre vs post cluster — design decision (post-cluster wins)
+### `effective_io_concurrency = 200`
 
-bcfishpass embeds the bypass in the rule predicate (pre-cluster). Issue body's analysis: post-cluster is cleaner because:
-- Bypassed segments don't need to pass connectivity (they're connected to the large parent by definition; the biology of the bypass IS the connectivity)
-- `accessible = TRUE` already gates barrier-blocked reaches
-- Post-cluster matches the parametric form (function takes WSG/species and applies, not a rule predicate that needs SQL grammar in fresh)
-- Same end-state numbers as bcfp on the BCFP-parity caller-side defaults
+PostgreSQL default is `1`. This setting tells the planner how many
+concurrent I/O requests the storage can usefully serve, used to drive
+prefetch on bitmap heap scans. SSDs (especially NVMe) handle hundreds
+of concurrent requests trivially — `200` is the standard recommendation
+for SSD. Value of `1` causes the planner to under-issue prefetches and
+underestimate bitmap-scan throughput.
 
-## Caller defaults (link side, separate PR)
+### `temp_buffers = 64MB`
 
-- bcfishpass bundle: `frs_order_child(species, parent_order_min = 5)` (defaults) for BT/CH/CO/ST/WCT
-- default bundle: methodology-pending. Could ship same as bcfp first, tune later
+PostgreSQL default is `8MB` — per-session memory for temporary tables.
+Bumping to `64MB` lets short-lived temp/working tables live in RAM
+instead of spilling to disk. fresh/link pipelines build per-WSG temp
+tables (segments, breaks, working scratch) that are read repeatedly
+within a session; spilling these to disk is gratuitous on a dev box
+with 128 GB RAM.
 
-## Versions at start
+## Repo state
 
-- fresh main: 253abf2 (0.26.0)
-- link main: 9643be5 (0.21.0)
-- bcfishpass: 440bc1e
+- `docker/docker-compose.yml:21-35` — current `db.command` block has
+  memory + parallelism flags but no I/O cost flags
+- `docker/tuning.md` — has "Settings rationale" table and a "Scaling
+  for other machines" recipe; new flags are global (not RAM-scaled),
+  no scaling change needed
diff --git a/planning/active/progress.md b/planning/active/progress.md
@@ -1,7 +1,21 @@
-# Progress — fresh#158
+# Progress — pg tuning: SSD planner-cost defaults (#199)
 
-## Session 2026-05-01
+## Session 2026-05-04
 
-- Branch `158-frs-order-child` from main HEAD `253abf2` (post-fresh#191 release).
-- PWF baseline.
-- Next: write `R/frs_order_child.R` per issue body spec, mock SQL tests.
+- Plan-mode exploration — phases approved by user
+- Archived stale top-level planning files (#177) and stale active/ (#158)
+  with outcome READMEs in commit `80170b4`
+- Created branch `199-pg-tuning-ssd-planner-cost-defaults` off main (HEAD `80170b4`)
+- Scaffolded PWF baseline from issue #199 with approved phases
+- Phase 1 complete: added `random_page_cost=1.1`, `effective_io_concurrency=200`,
+  `temp_buffers=64MB` to `db.command`. Restarted local DB; all three values
+  verified live via `SHOW`.
+- Phase 2 complete: added three rows to the Settings rationale table in
+  `docker/tuning.md`, plus an "SSD assumption" section noting the
+  M1/cypher override-file caveat (override `command:` replaces base, so
+  same flags must land in the rtj-tracked override).
+- Phase 3 complete: pushed branch, opened PR #200 with `Closes #199`.
+  Manual code-check on the docker/ diff — clean (pure ASCII in the YAML
+  block, no secrets, indentation matches surrounding `-c` flags, DB
+  came up after restart and live `SHOW` confirmed all three values).
+- Next: merge, archive PWF, companion rtj-side change (separate work).
diff --git a/planning/active/task_plan.md b/planning/active/task_plan.md
@@ -1,45 +1,46 @@
-# Task Plan — fresh#158: frs_order_child
-
-## Phase 1: Setup
-- [x] Branch `158-frs-order-child` from main (HEAD `253abf2`)
-- [ ] PWF baseline (task_plan, findings, progress)
-
-## Phase 2: Code change — fresh-side
-- [ ] New `R/frs_order_child.R` exporting `frs_order_child(conn, table, habitat, species, label = "rearing", parent_order_min = 5, child_order_min = NULL, child_order_max = NULL, distance_max = NULL)`
-- [ ] SQL per the [issue spec](https://github.com/NewGraphEnvironment/fresh/issues/158): post-classification UPDATE that adds `<label> = TRUE` to segments where `accessible IS TRUE`, `<label> IS NOT TRUE`, `s.stream_order = s.stream_order_max`, `s.stream_order_parent >= parent_order_min`, optionally bounded by `child_order_min/max` and `distance_max`
-- [ ] roxygen docstring with parameters, examples, biology rationale
-- [ ] `devtools::document()` clean
-
-## Phase 3: Tests
-- [ ] `tests/testthat/test-frs_order_child.R`: SQL shape (mock `.frs_db_execute`)
-  - default invocation emits the canonical SQL with parent_order_min=5
-  - `child_order_min/max` bounds appear when set
-  - `distance_max` adds `downstream_route_measure <= ...` clause
-  - `accessible IS TRUE` and `<label> IS NOT TRUE` guards always present
-  - `species` substituted correctly
-- [ ] `devtools::test(filter = "frs_order_child")` clean
-
-## Phase 4: Code-check
-- [ ] `/code-check` on staged diff
-
-## Phase 5: Release
-- [ ] DESCRIPTION: 0.26.0 → 0.27.0
-- [ ] NEWS.md: 0.27.0 entry
-- [ ] PR with `Fixes #158`
-- [ ] Merge, tag
-
-## Phase 6: Link follow-up
-- [ ] Bump fresh dep 0.26.0 → 0.27.0
-- [ ] dimensions.csv: flip `rear_stream_order_bypass = yes` for BT/CH/CO/ST/WCT in bcfishpass bundle
-- [ ] Default-bundle TBD per methodology decision (issue mentions parametric `distance_max` could differ)
-- [ ] `lnk_rules_build` already emits `channel_width_min_bypass` field but fresh doesn't read it yet — wire to call `frs_order_child` per-species after classify
-- [ ] Verify HORS BT closes from -7.68% to within ±1%
-- [ ] 15-WSG distributed re-run
-
-## Verification
-
-- HORS BT rearing_stream: -7.68% → expected within ±1%
-- HORS CH/CO/ST: similar closure expected
-- COLR/KHOR/CLRH WCT: similar closure expected (provincial Class B)
-- bcfishpass-bundle other WSGs: bit-identical (function only fires where `rear_stream_order_bypass: yes` is set)
-- default-bundle: TBD per methodology
+# Task: pg tuning — SSD planner-cost defaults in docker-compose.yml (#199)
+
+`docker/docker-compose.yml` doesn't set `random_page_cost` or
+`effective_io_concurrency`, so postgres falls through to PostgreSQL
+defaults of `4.0` and `1` — both calibrated for spinning rust. All
+NewGraph hosts run on SSD (M4 NVMe, M1 Colima virtiofs over APFS,
+cypher DO block storage). With `random_page_cost=4`, the planner
+systematically biases away from index scans for segment-keyed lookups
+in link's pipeline (`WHERE blue_line_key = … AND drm <= …` against
+`streams_breaks` is the hot path).
+
+Verified on M4 + M1 + cypher (2026-05-04) — all three show
+`random_page_cost=4`, `effective_io_concurrency=1`, `temp_buffers=8MB`.
+
+## Phase 1: Add SSD planner-cost flags to docker-compose.yml
+- [x] Append to `db.command` block in `docker/docker-compose.yml`:
+  - `-c random_page_cost=1.1`
+  - `-c effective_io_concurrency=200`
+  - `-c temp_buffers=64MB`
+- [x] Restart local Docker DB (`docker compose down && docker compose up -d db`)
+- [x] Verify via `SHOW random_page_cost; SHOW effective_io_concurrency; SHOW temp_buffers;`
+  (verified live: `random_page_cost=1.1`, `effective_io_concurrency=200`, `temp_buffers=64MB`)
+
+## Phase 2: Document in tuning.md
+- [x] Add three rows to the "Settings rationale" table
+  (`random_page_cost`, `effective_io_concurrency`, `temp_buffers`) with
+  SSD justification per setting
+- [x] Add a short "SSD assumption" note: all NewGraph hosts run on SSD
+  (M4 NVMe, M1 Colima virtiofs over APFS, cypher DO block storage),
+  these defaults bias the planner toward index scans for segment-keyed
+  lookups in link's pipeline
+- [x] Cross-reference the companion rtj issue for the M1/cypher override
+  file — override `command:` REPLACES base under docker-compose merge
+  semantics; same change must land in both
+
+## Phase 3: Verify and PR
+- [x] `/code-check` on the diff (manual; clean)
+- [x] Push branch, open PR with `Closes #199` (PR #200)
+- [x] Note benchmark verification is a post-merge step (≥10% median
+  per-WSG wall reduction at unchanged segment count) — owner runs from
+  M4 against `data-raw/logs/provincial_default_extrabreaks/<TS>_per_wsg_times.csv`
+  baseline (2026-05-04). Documented in PR test-plan.
+
+## Validation
+- [ ] PWF checkboxes match landed work
+- [ ] `/planning-archive` on completion
diff --git a/planning/archive/2026-04-issue-177-frs-habitat-overlay-format/README.md b/planning/archive/2026-04-issue-177-frs-habitat-overlay-format/README.md
@@ -0,0 +1,7 @@
+## Outcome
+
+Dropped `format` and `long_value_col` parameters from `frs_habitat_overlay()`. Function now accepts a single canonical shape (row-per-(segment × species) with per-habitat indicator columns), parameterized by `species_col`, `by`, and `habitat_types`. Callers with non-canonical sources transform first (SQL view, R pivot, or link's `lnk_ingest_bcfishpass()`). Bridge mode retained.
+
+## Closed By
+
+PR #176.
diff --git a/planning/findings.md → ...77-frs-habitat-overlay-format/findings.md b/planning/findings.md → ...77-frs-habitat-overlay-format/findings.md
diff --git a/planning/progress.md → ...77-frs-habitat-overlay-format/progress.md b/planning/progress.md → ...77-frs-habitat-overlay-format/progress.md
diff --git a/planning/task_plan.md → ...7-frs-habitat-overlay-format/task_plan.md b/planning/task_plan.md → ...7-frs-habitat-overlay-format/task_plan.md
diff --git a/planning/archive/2026-05-issue-158-frs-order-child/README.md b/planning/archive/2026-05-issue-158-frs-order-child/README.md
@@ -0,0 +1,7 @@
+## Outcome
+
+Added `frs_order_child()` — post-classification UPDATE that flags direct-children of large-order parent streams (`stream_order = stream_order_max` AND `stream_order_parent >= parent_order_min`). Wired into the rules YAML via `channel_width_min_bypass` block so per-species rearing classifiers can opt in. Closes the BT/CO/CH/ST/WCT rearing-stream gap on bcfishpass-bundled WSGs (HORS, COLR, KHOR, CLRH, etc.).
+
+## Closed By
+
+PR #193 (initial impl), with follow-ups #195 and #196 (SQL fixes for the missing/restored `stream_order_max` column via CTE).
diff --git a/planning/archive/2026-05-issue-158-frs-order-child/findings.md b/planning/archive/2026-05-issue-158-frs-order-child/findings.md
@@ -0,0 +1,63 @@
+# Findings — fresh#158
+
+## Diagnosis (2026-05-01)
+
+bcfishpass's per-species rear rule has an inline OR clause:
+
+```sql
+(cw.channel_width >= t.rear_channel_width_min OR
+ (s.stream_order_parent >= 5 AND s.stream_order = 1))
+```
+
+at [`bcfishpass/model/02_habitat_linear/sql/load_habitat_linear_bt.sql`](https://github.com/smnorris/bcfishpass/blob/main/model/02_habitat_linear/sql/load_habitat_linear_bt.sql) lines 89–96 (BT — same pattern in CH/CO/ST/WCT). Direct order-1 tributaries of order-5+ mainstems get credited as rearing **even when cw < rear_min**.
+
+Biology: small tribs of large rivers support juvenile rearing despite small FWA-measured channel width — parent supplies flow / temperature / access; cool tributary water mixes at confluence; backwater + off-channel habitat near the mouth is high-value.
+
+fresh has no implementation. link's `dimensions.csv::rear_stream_order_bypass = no` for all species in both bundles (correctly anticipating fresh has nothing to read). Provincial parity baseline (link 0.20.1) shows this gap on HORS / COLR / KHOR / CLRH / etc — Class B in `link/research/provincial_parity_2026_05_01.md`.
+
+## Function design (per issue body)
+
+`frs_order_child(conn, table, habitat, species, label = "rearing", parent_order_min = 5, child_order_min = NULL, child_order_max = NULL, distance_max = NULL)`
+
+Post-classification UPDATE:
+
+```sql
+UPDATE habitat
+SET <label> = TRUE
+FROM streams s
+WHERE habitat.id_segment = s.id_segment
+  AND habitat.species_code = '<species>'
+  AND habitat.accessible = TRUE
+  AND habitat.<label> IS NOT TRUE
+  AND s.stream_order = s.stream_order_max         -- direct child
+  AND s.stream_order_parent >= <parent_order_min> -- of large river
+  AND s.stream_order >= <child_order_min>         -- if set
+  AND s.stream_order <= <child_order_max>         -- if set
+  AND (<distance_max> IS NULL OR
+       s.downstream_route_measure <= <distance_max>);
+```
+
+Direct-child filter: `s.stream_order = s.stream_order_max` ensures we stop at the order-change point (once the segment's order would exceed `stream_order_max`, you're no longer on the direct-child reach). Captures "small tributary directly into a large parent."
+
+`accessible = TRUE` guard: never adds rearing on segments above a definite barrier.
+
+`<label> IS NOT TRUE` guard: idempotent + additive — already-classified segments untouched.
+
+## Pre vs post cluster — design decision (post-cluster wins)
+
+bcfishpass embeds the bypass in the rule predicate (pre-cluster). Issue body's analysis: post-cluster is cleaner because:
+- Bypassed segments don't need to pass connectivity (they're connected to the large parent by definition; the biology of the bypass IS the connectivity)
+- `accessible = TRUE` already gates barrier-blocked reaches
+- Post-cluster matches the parametric form (function takes WSG/species and applies, not a rule predicate that needs SQL grammar in fresh)
+- Same end-state numbers as bcfp on the BCFP-parity caller-side defaults
+
+## Caller defaults (link side, separate PR)
+
+- bcfishpass bundle: `frs_order_child(species, parent_order_min = 5)` (defaults) for BT/CH/CO/ST/WCT
+- default bundle: methodology-pending. Could ship same as bcfp first, tune later
+
+## Versions at start
+
+- fresh main: 253abf2 (0.26.0)
+- link main: 9643be5 (0.21.0)
+- bcfishpass: 440bc1e
diff --git a/planning/archive/2026-05-issue-158-frs-order-child/progress.md b/planning/archive/2026-05-issue-158-frs-order-child/progress.md
@@ -0,0 +1,7 @@
+# Progress — fresh#158
+
+## Session 2026-05-01
+
+- Branch `158-frs-order-child` from main HEAD `253abf2` (post-fresh#191 release).
+- PWF baseline.
+- Next: write `R/frs_order_child.R` per issue body spec, mock SQL tests.