Skip to content

Wire CABD waterfalls + 4 edit CSVs into the falls source #102

@NewGraphEnvironment

Description

@NewGraphEnvironment

Problem

link's falls source is fresh::system.file("extdata", "falls.csv", package = "fresh") — a static 3,295-row CSV. bcfp pulls falls from cabd.waterfalls + applies four edit CSVs. The static fresh extract is missing falls that exist in CABD, including known dam-coincident falls in WSGs we'd expect to compare:

Site WSG In fresh falls.csv? In bcfp output (via CABD)?
Elk Falls CAMB yes (354154635 / 7662) yes
Stave Falls LFRA no yes
Alouette Falls LFRA no yes
Ladore Falls (Vancouver Island) no yes
Strathcona Dam falls (Vancouver Island) no yes
John Hart falls (Vancouver Island) no yes
Coquitlam / Seymour / Jordan / Capilano LFRA / nearby no yes

Where these falls exist in bcfp's output, link's habitat models classify habitat above the natural fall as accessible — over-crediting habitat that bcfp correctly excludes. This is the actual mechanism that bites the bcfp-bundle parity rollup on Vancouver Island and Lower Fraser. The "dams" framing was a red herring — bcfp's per-species habitat models are dam-blind (verified, see link#83 body); the natural falls at dam sites are what bcfp uses, via CABD.

bcfp's load_falls.sql, mapped

bcfp/model/01_access/sql/load_falls.sql — single transaction, four CABD-edit CSVs:

-- 1. Pull CABD waterfalls, exclude false positives, snap blkey via xref
with cabd as (
  select w.cabd_id as falls_id, blk.blue_line_key, st_transform(w.geom, 3005) as geom
  from cabd.waterfalls w
  left outer join bcfishpass.cabd_exclusions x on w.cabd_id = x.cabd_id        -- exclusions.csv
  left outer join bcfishpass.cabd_blkey_xref blk on w.cabd_id = blk.cabd_id    -- blkey_xref.csv
  where x.cabd_id is null
),

-- 2. Snap to nearest stream segment within 65 m, choose closest
matched as ( ...lateral join on fwa_stream_networks_sp... ),

-- 3. Apply passability override; convert passability_status_code = 1 → barrier_ind = TRUE
cabd_pts as (
  select n.*, cabd.fall_name_en as falls_name, cabd.fall_height_m as height_m,
         case when coalesce(u.passability_status_code, cabd.passability_status_code) = 1
              then true else false end as barrier_ind
  from matched n
  inner join cabd.waterfalls cabd on n.falls_id = cabd.cabd_id
  left outer join bcfishpass.cabd_passability_status_updates u on n.falls_id = u.cabd_id  -- passability_updates.csv
)

insert into bcfishpass.falls (...)
  select * from cabd_pts
  union all
  -- 4. Add hand-curated additions (additions.csv where feature_type = 'waterfalls')
  select ... from bcfishpass.cabd_additions p where p.feature_type = 'waterfalls';

CABD edit CSVs — what each one does + current row counts

All four edit CSVs are shared between dams and falls — they key on cabd_id which spans both cabd.waterfalls and cabd.dams. They're used by load_dams.sql and load_falls.sql symmetrically.

CSV Function Rows Falls relevance
cabd_exclusions.csv Drop specific cabd_ids (false positives in upstream CABD) 12 Likely some falls in this set
cabd_passability_status_updates.csv Override passability_status_code from CABD 12 Likely some falls
cabd_blkey_xref.csv Override the blkey snap when default lateral snap picks wrong stream 1 Tiny
cabd_additions.csv Add features not in CABD; filtered by feature_type (dams / waterfalls) 4 dams (US placeholders only); 0 waterfalls Zero impact on falls

None are deprecated; all four are actively wired in bcfp's 01_access pipeline. cabd_additions for waterfalls is currently empty, so for the falls path the edit layer is just exclusions + passability + (one) blkey.

The big lift is the cabd.waterfalls source pull — likely tens of thousands of rows in the upstream CABD database — not the ~30 edit rows on top.

Proposed approach

Three components, each gated:

  1. Source pull: a script that reads cabd.waterfalls (DB join via the tunnel? or a download from CABD's public release? — needs decision based on what NGE's licensing path looks like) and writes to <schema>.falls_cabd mirroring bcfp's intermediate.

  2. Edit application: ingest the four CSVs into <schema>.cabd_exclusions, cabd_blkey_xref, cabd_passability_status_updates, cabd_additions via lnk_load_overrides. Apply them in the same SQL pattern as load_falls.sql to produce the final <schema>.falls.

  3. Cutover: drop the static fresh::falls.csv consumption in link's pipeline. prep_load_aux reads from the DB-built <schema>.falls instead.

The 4 edit CSVs go into inst/extdata/configs/{bcfishpass,default}/overrides/, sourced from bcfp upstream — same redistribution pattern as user_barriers_definite_control.csv and the other bcfp-sourced files. Bundle config declares them in files: for lnk_load_overrides to pick up.

Verification — find the bite, prove the fix

Before closing this issue we want to demonstrate the fix on a specific case. Pre-flight WSG candidates (multiple known dam-coincident natural falls):

  • CAMP (Campbell River system, Vancouver Island) — Elk Falls / John Hart / Strathcona Dam area
  • NTOL / SQAM (Vancouver Island other)
  • LFRA — Stave Falls, Alouette Falls, Ruskin

Verification steps:

  1. Pre-fix snapshot: query bcfp tunnel for bcfishpass.falls rows in CAMP (or whichever), get the named-falls inventory.
  2. Confirm gap on link: query link's pre-fix <schema>.falls for the same WSG, list which named falls are absent.
  3. Apply fix: wire CABD source pull + edit layer per above.
  4. Post-fix verify: confirm the previously-absent named falls now appear in <schema>.falls with matching (blue_line_key, downstream_route_measure) to bcfp's coordinates (within rounding tolerance).
  5. Rollup confirm: re-run the WSG comparison, confirm the parity rollup tightens — segments now correctly inaccessible above each newly-added fall.

Won't close the issue until #5 is logged.

Out of scope

  • Dams data (tracked separately — see link#? CABD dams ingestion, this PR's sibling)
  • CABD release version pin / refresh cadence — open question, file follow-up after this lands
  • frs_falls_* function family in fresh (if the source-pull SQL ends up generic enough to belong there) — defer naming until prototype works on link side first

Reference

  • bcfp model/01_access/sql/load_falls.sql — implementation reference
  • bcfp data/cabd_*.csv — the 4 edit CSVs
  • research/bcfishpass_comparison.md § "2026-04-29 — parity claim retracted" — names this gap
  • link#83 — per-species dam methodology (sibling concern, separate scope)

Relates to NewGraphEnvironment/sred-2025-2026#24

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions