Skip to content

data-raw: manual snapshot of bcfp dependencies into local fresh schema #137

@NewGraphEnvironment

Description

@NewGraphEnvironment

Plan (revised 2026-05-07)

Snapshot bcfp dependencies into local fwapg fresh.* schema so Surface 2 mapping_code (lnk_pipeline_access / lnk_pipeline_mapping_code) works against local data without tunnel reads at runtime.

Inputs (no tunnel pg_dump needed)

  • bcfishpass.* views — pull from public S3:
    • aws s3 cp s3://newgraph/bcfishpass.crossings_vw.fgb.zipogr2ogr → fresh.crossings
    • aws s3 cp s3://newgraph/bcfishpass.streams_vw.fgb.zipogr2ogr → fresh.streams_bcfp (if needed for parity comparison)
    • aws s3 cp s3://newgraph/bcfishobs.fiss_fish_obsrvtn_events_vw.fgb.zip → ogr2ogr → fresh.fiss_fish_obsrvtn_events
    • All publicly readable — no AWS auth required.
  • Modelled crossingscurl https://nrs.objectstore.gov.bc.ca/bchamp/modelled_stream_crossings.gpkg.zip + ogr2ogrfresh.modelled_stream_crossings (mirrors bcfishpass/jobs/load_modelled_stream_crossings).
  • Override CSVs at the rebuild SHA — pulled from s3://fresh-bc/bcfishpass/csvs/ (populated by NewGraphEnvironment/db_newgraph#4). csv-sync: switch to weekly cadence SHA-pinned to tunnel rebuild for comparison stability #117 wires this for the link bundle.

Habitat: use link's per-species tables, not bcfp's wide rollup

lnk_pipeline_mapping_code currently takes bcfp's streams_habitat_linear shape (numeric km columns). Adapt to consume link's per-species <bundle>.streams_habitat_<sp> tables (booleans):

  • Caller-side pivot helper that joins per-species tables into the expected wide shape, OR
  • Function refactor to accept habitat_per_sp = list(bt = "...streams_habitat_bt", ...).
  • Coerce booleans → 0/1 numeric so existing > 0 / < 1 CASE conditions work without rewrite.

No streams_habitat_linear dependency on tunnel.

Run

  • New data-raw/snapshot_bcfp_dependencies.sh — manual run on demand by anyone with local fwapg write access. ~5 min end-to-end (no SSH, no pg_dump).
  • Reads from public S3 + bchamp objectstore.

SHA pinning

Automation

Weekly drift-monitor pattern, same shape as #117 (auto-merge clean / halt on shape drift):

Wed cron (after db_newgraph#4 populates s3) → pull view shapes from s3://newgraph/*.fgb.zip
                                            → pull bchamp gpkg layer schema
                                            → crate schema-validate against committed fingerprints
                                                ↓
                       baseline-row append only        → auto-merge PR
                       fingerprint shift (column add/drop) → halt + review
                       no change                         → exit clean
  • PR'd artifacts:
    • Appended row in data-raw/logs/bcfp_baselines.csv (model_run_id, model_version, date_completed).
    • Updated schema fingerprints in inst/extdata/bcfp_view_schemas/<view>.json (column lists per fgb).
    • Optional sample fixtures in inst/testdata/ for unit tests.
  • crate gate: crt_schema_validate against the committed fingerprints. Catches when bcfp views or the bchamp gpkg add/rename/drop columns.
  • Auto-merge: baseline-row append only (proves the snapshot ran with stable shapes).
  • Halt + review: schema fingerprint shift — real work for link to track.
  • Snapshot side effect: workflow can also run snapshot_bcfp_dependencies.sh against a long-lived cypher schema so parity runs see fresh data. Either via tailscale-bridged GHA or a self-hosted runner.

Acceptance

  • data-raw/snapshot_bcfp_dependencies.sh runs end-to-end against local fwapg with no SSH/tunnel access.
  • lnk_pipeline_access(barrier_sources = list(anthropogenic = "fresh.barriers_anthropogenic", ...)) returns matching counts to a tunnel-pointed run.
  • lnk_pipeline_mapping_code consumes link's per-species habitat tables; ADMS parity preserved.
  • README documents both manual-run path and automation/cadence.
  • Weekly workflow in .github/workflows/snapshot-bcfp-dependencies.yaml runs Wed, opens PR on drift, auto-merges baseline-only changes.

Cross-refs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions