Skip to content

Dimensions-driven in_waterbody + area_only emission with polygon-edge filter and proof matrix #69

@NewGraphEnvironment

Description

@NewGraphEnvironment

Updated scope (2026-04-28)

Original scope (single hardcoded in_waterbody: false on stream rules) replaced with a fully dimensions-driven approach. The new shape covers two related but independent methodology questions: which segments contribute to LINEAR habitat km, and which contribute to AREA-based polygon rollups.

Problem (refined)

Today's stream-edge rules silently overlap with polygon rules — mainlines threading through a polygon are matched by both. Adjacent issue: today's polygon rules (waterbody_type: L/W) tie linear rearing_km contribution to area-rollup lake_rearing_ha/wetland_rearing_ha contribution. The two are coupled, so a user can't say "count the lake area but exclude the polygon-mainline from linear."

These are independent dials and the package should expose them independently. Every model decision should be readable from dimensions.csv — no buried emission rules.

Proposed Solution

Two new per-species columns in dimensions.csv

Column Values Effect
spawn_stream_in_waterbody yes / no Emit in_waterbody: <value> on the stream-spawn rule. no excludes polygon-mainlines from spawn classification.
rear_stream_in_waterbody yes / no Same shape on the stream-rear rule.

Driven by fresh#180 — predicate must land in fresh first.

Add edge_types_explicit: [1000, 1100] to L/W polygon-rule emission

Today's waterbody_type: L / waterbody_type: W rules have no edge_types filter, so they match every segment in the polygon (shorelines 1500/1700, banks 1800/1850, island edges, etc.) and credit them all to linear rearing. Filtering to mainlines only:

  • Linear rollup includes only mainlines through the polygon (shorelines/banks excluded).
  • Area rollup unchanged — fresh's bucket predicate sums polygon area where any segment carries the bucket flag, so as long as a polygon contains at least one tagged mainline it counts.

Per-species area_only columns (depends on fresh#182)

Column Values Effect
rear_lake_area_only yes / no Emit area_only: true on the L polygon rule. When yes, fresh derives the lake_rearing bucket flag from the rule but excludes it from main rear predicate. Lake area still rolls up; mainlines through lakes don't count in linear via this rule.
rear_wetland_area_only yes / no Same shape for wetlands.

These are the dials that make the "use case 2" model expressible (linear strict, area generous).

Use cases this expresses

Both bundles' dimensions.csv cells determine which use case applies per species. Two examples:

Use case 1 — linear includes mainlines + area rollups:

species, rear_stream_in_waterbody, rear_lake, rear_lake_area_only, rear_wetland_polygon, rear_wetland_area_only
BT,      yes,                      yes,       no,                  yes,                  no

Use case 2 — linear excludes mainlines + area rollups:

species, rear_stream_in_waterbody, rear_lake, rear_lake_area_only, rear_wetland_polygon, rear_wetland_area_only
BT,      no,                       yes,       yes,                 yes,                  yes

bcfishpass bundle (strict partition, no polygon-area rollup):

species, rear_stream_in_waterbody, rear_lake, rear_lake_area_only, rear_wetland_polygon, rear_wetland_area_only
all,     no,                       no,        n/a,                 no,                   n/a

Proof artifact

Add a research doc / vignette section: research/rule_flexibility.md (or extend bcfishpass_comparison.md). The doc:

  1. Walks through the matrix above with a small worked example (one species, one WSG — DEAD or BABL).
  2. Runs the same pipeline three times (use case 1, use case 2, bcfishpass) by swapping just the dimensions.csv cells listed above.
  3. Tabulates the rollup output for each: rearing_km, lake_rearing_ha, wetland_rearing_ha.
  4. Shows the rules.yaml diff for one species across the three configs to demonstrate the dimensions.csv → rules.yaml propagation is mechanical and visible.

The intent: produce a single page where a reader can see the flexibility, compare to bcfishpass's approach (where the same logic is buried in per-species access SQL templates that are hard to diff), and validate that any future model variant is a CSV edit, not a code change.

Implementation order

  1. fresh#180 / PR #181 — in_waterbody predicate. Must land first.
  2. link#69 phase 1spawn_stream_in_waterbody + rear_stream_in_waterbody columns + emission. Polygon-rule edge_types filter to [1000, 1100]. Two-bundle defaults set:
    • bcfishpass bundle: no everywhere (matches bcfp's strict partition).
    • default bundle: yes everywhere for rear_* (today's permissive behaviour for linear); no for spawn_* (biology — spawning happens in stream channels).
  3. fresh#182 — area_only predicate.
  4. link#69 phase 2rear_lake_area_only + rear_wetland_area_only columns + emission. Defaults set per bundle.
  5. link#69 phase 3 — proof artifact (research doc + three-config rollup matrix).

Test plan

  • Unit test: regenerated rules.yaml for both bundles carries in_waterbody on every stream-edge rule block per the column values.
  • Unit test: regenerated rules.yaml carries edge_types_explicit: [1000, 1100] on every L/W polygon rule.
  • Unit test: regenerated rules.yaml carries area_only per the column values.
  • Pipeline test: BABL × CO under all three configs (use case 1, use case 2, bcfishpass) — rollup numbers match expected direction (case 1 ≥ case 2 ≥ bcfishpass on linear rearing_km; lake/wetland_ha equal between cases 1 and 2; zero in bcfishpass per rear_lake: no / rear_wetland_polygon: no).
  • Reproducibility: two consecutive tar_make() runs produce bit-identical rollups under each config.

Coordinates with

Relates to NewGraphEnvironment/sred-2025-2026#24

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions