Updated scope (2026-04-28)
Original scope (single hardcoded in_waterbody: false on stream rules) replaced with a fully dimensions-driven approach. The new shape covers two related but independent methodology questions: which segments contribute to LINEAR habitat km, and which contribute to AREA-based polygon rollups.
Problem (refined)
Today's stream-edge rules silently overlap with polygon rules — mainlines threading through a polygon are matched by both. Adjacent issue: today's polygon rules (waterbody_type: L/W) tie linear rearing_km contribution to area-rollup lake_rearing_ha/wetland_rearing_ha contribution. The two are coupled, so a user can't say "count the lake area but exclude the polygon-mainline from linear."
These are independent dials and the package should expose them independently. Every model decision should be readable from dimensions.csv — no buried emission rules.
Proposed Solution
Two new per-species columns in dimensions.csv
| Column |
Values |
Effect |
spawn_stream_in_waterbody |
yes / no |
Emit in_waterbody: <value> on the stream-spawn rule. no excludes polygon-mainlines from spawn classification. |
rear_stream_in_waterbody |
yes / no |
Same shape on the stream-rear rule. |
Driven by fresh#180 — predicate must land in fresh first.
Add edge_types_explicit: [1000, 1100] to L/W polygon-rule emission
Today's waterbody_type: L / waterbody_type: W rules have no edge_types filter, so they match every segment in the polygon (shorelines 1500/1700, banks 1800/1850, island edges, etc.) and credit them all to linear rearing. Filtering to mainlines only:
- Linear rollup includes only mainlines through the polygon (shorelines/banks excluded).
- Area rollup unchanged — fresh's bucket predicate sums polygon area where any segment carries the bucket flag, so as long as a polygon contains at least one tagged mainline it counts.
Per-species area_only columns (depends on fresh#182)
| Column |
Values |
Effect |
rear_lake_area_only |
yes / no |
Emit area_only: true on the L polygon rule. When yes, fresh derives the lake_rearing bucket flag from the rule but excludes it from main rear predicate. Lake area still rolls up; mainlines through lakes don't count in linear via this rule. |
rear_wetland_area_only |
yes / no |
Same shape for wetlands. |
These are the dials that make the "use case 2" model expressible (linear strict, area generous).
Use cases this expresses
Both bundles' dimensions.csv cells determine which use case applies per species. Two examples:
Use case 1 — linear includes mainlines + area rollups:
species, rear_stream_in_waterbody, rear_lake, rear_lake_area_only, rear_wetland_polygon, rear_wetland_area_only
BT, yes, yes, no, yes, no
Use case 2 — linear excludes mainlines + area rollups:
species, rear_stream_in_waterbody, rear_lake, rear_lake_area_only, rear_wetland_polygon, rear_wetland_area_only
BT, no, yes, yes, yes, yes
bcfishpass bundle (strict partition, no polygon-area rollup):
species, rear_stream_in_waterbody, rear_lake, rear_lake_area_only, rear_wetland_polygon, rear_wetland_area_only
all, no, no, n/a, no, n/a
Proof artifact
Add a research doc / vignette section: research/rule_flexibility.md (or extend bcfishpass_comparison.md). The doc:
- Walks through the matrix above with a small worked example (one species, one WSG — DEAD or BABL).
- Runs the same pipeline three times (use case 1, use case 2, bcfishpass) by swapping just the dimensions.csv cells listed above.
- Tabulates the rollup output for each:
rearing_km, lake_rearing_ha, wetland_rearing_ha.
- Shows the rules.yaml diff for one species across the three configs to demonstrate the dimensions.csv → rules.yaml propagation is mechanical and visible.
The intent: produce a single page where a reader can see the flexibility, compare to bcfishpass's approach (where the same logic is buried in per-species access SQL templates that are hard to diff), and validate that any future model variant is a CSV edit, not a code change.
Implementation order
- fresh#180 / PR #181 —
in_waterbody predicate. Must land first.
- link#69 phase 1 —
spawn_stream_in_waterbody + rear_stream_in_waterbody columns + emission. Polygon-rule edge_types filter to [1000, 1100]. Two-bundle defaults set:
- bcfishpass bundle:
no everywhere (matches bcfp's strict partition).
- default bundle:
yes everywhere for rear_* (today's permissive behaviour for linear); no for spawn_* (biology — spawning happens in stream channels).
- fresh#182 —
area_only predicate.
- link#69 phase 2 —
rear_lake_area_only + rear_wetland_area_only columns + emission. Defaults set per bundle.
- link#69 phase 3 — proof artifact (research doc + three-config rollup matrix).
Test plan
- Unit test: regenerated
rules.yaml for both bundles carries in_waterbody on every stream-edge rule block per the column values.
- Unit test: regenerated
rules.yaml carries edge_types_explicit: [1000, 1100] on every L/W polygon rule.
- Unit test: regenerated
rules.yaml carries area_only per the column values.
- Pipeline test: BABL × CO under all three configs (use case 1, use case 2, bcfishpass) — rollup numbers match expected direction (case 1 ≥ case 2 ≥ bcfishpass on linear
rearing_km; lake/wetland_ha equal between cases 1 and 2; zero in bcfishpass per rear_lake: no / rear_wetland_polygon: no).
- Reproducibility: two consecutive
tar_make() runs produce bit-identical rollups under each config.
Coordinates with
Relates to NewGraphEnvironment/sred-2025-2026#24
Updated scope (2026-04-28)
Original scope (single hardcoded
in_waterbody: falseon stream rules) replaced with a fully dimensions-driven approach. The new shape covers two related but independent methodology questions: which segments contribute to LINEAR habitat km, and which contribute to AREA-based polygon rollups.Problem (refined)
Today's stream-edge rules silently overlap with polygon rules — mainlines threading through a polygon are matched by both. Adjacent issue: today's polygon rules (
waterbody_type: L/W) tie linearrearing_kmcontribution to area-rolluplake_rearing_ha/wetland_rearing_hacontribution. The two are coupled, so a user can't say "count the lake area but exclude the polygon-mainline from linear."These are independent dials and the package should expose them independently. Every model decision should be readable from
dimensions.csv— no buried emission rules.Proposed Solution
Two new per-species columns in
dimensions.csvspawn_stream_in_waterbodyin_waterbody: <value>on the stream-spawn rule.noexcludes polygon-mainlines from spawn classification.rear_stream_in_waterbodyDriven by fresh#180 — predicate must land in fresh first.
Add
edge_types_explicit: [1000, 1100]to L/W polygon-rule emissionToday's
waterbody_type: L/waterbody_type: Wrules have noedge_typesfilter, so they match every segment in the polygon (shorelines 1500/1700, banks 1800/1850, island edges, etc.) and credit them all to linearrearing. Filtering to mainlines only:Per-species
area_onlycolumns (depends on fresh#182)rear_lake_area_onlyarea_only: trueon the L polygon rule. Whenyes, fresh derives thelake_rearingbucket flag from the rule but excludes it from mainrearpredicate. Lake area still rolls up; mainlines through lakes don't count in linear via this rule.rear_wetland_area_onlyThese are the dials that make the "use case 2" model expressible (linear strict, area generous).
Use cases this expresses
Both bundles' dimensions.csv cells determine which use case applies per species. Two examples:
Use case 1 — linear includes mainlines + area rollups:
Use case 2 — linear excludes mainlines + area rollups:
bcfishpass bundle (strict partition, no polygon-area rollup):
Proof artifact
Add a research doc / vignette section:
research/rule_flexibility.md(or extendbcfishpass_comparison.md). The doc:rearing_km,lake_rearing_ha,wetland_rearing_ha.The intent: produce a single page where a reader can see the flexibility, compare to bcfishpass's approach (where the same logic is buried in per-species access SQL templates that are hard to diff), and validate that any future model variant is a CSV edit, not a code change.
Implementation order
in_waterbodypredicate. Must land first.spawn_stream_in_waterbody+rear_stream_in_waterbodycolumns + emission. Polygon-rule edge_types filter to[1000, 1100]. Two-bundle defaults set:noeverywhere (matches bcfp's strict partition).yeseverywhere forrear_*(today's permissive behaviour for linear);noforspawn_*(biology — spawning happens in stream channels).area_onlypredicate.rear_lake_area_only+rear_wetland_area_onlycolumns + emission. Defaults set per bundle.Test plan
rules.yamlfor both bundles carriesin_waterbodyon every stream-edge rule block per the column values.rules.yamlcarriesedge_types_explicit: [1000, 1100]on every L/W polygon rule.rules.yamlcarriesarea_onlyper the column values.rearing_km; lake/wetland_ha equal between cases 1 and 2; zero in bcfishpass perrear_lake: no/rear_wetland_polygon: no).tar_make()runs produce bit-identical rollups under each config.Coordinates with
in_waterbodypredicate (must land first for phase 1).area_onlypredicate (must land before phase 2).Relates to NewGraphEnvironment/sred-2025-2026#24