Add lnk_load_overrides(config) source-agnostic API consuming crate::crt_ingest()

## Resolution / approach update — 2026-04-29

The original proposal (preserved below) added a parallel `lnk_load_overrides()` alongside the existing `lnk_config()`. Inventorying the current code surfaced significant overlap — `lnk_config()` already reads every override CSV via `read.csv()` and exposes them as `cfg$overrides$X` / `cfg$habitat_classification` / `cfg$observation_exclusions` / `cfg$wsg_species`. Two functions reading the same files would create parallel APIs with the same job.

Updated approach: **decompose into manifest + materialized data, single PR, single bump (v0.18.0)**. link has zero external R-code consumers (verified — only `@seealso` doc references in fresh; rtj refs are archived planning), so no backwards-compat shim is needed.

### What `lnk_config()` becomes

Manifest-only. Returns paths, provenance, `cfg$files` entries with `{source, path, canonical_schema}`. **No data frames** in the result. Cheap to call. `lnk_config_verify()` and `lnk_stamp()` run without parsing CSVs.

### What `lnk_load_overrides()` does

Exported. Takes a config object (or name/path), returns named list of canonical-shape tibbles. Routes registered entries through `crate::crt_ingest(source, file_name, path)` (today: `bcfp/user_habitat_classification`). Unregistered entries fall through to plain CSV read until crate adds their schemas (one issue per file, follow-up).

### Config schema (per-entry source/schema declarations)

```yaml
# inst/extdata/configs/default/config.yaml
name: default
files:
  rules_yaml:
    path: rules.yaml
  dimensions_csv:
    path: dimensions.csv
  parameters_fresh:
    path: parameters_fresh.csv
  user_habitat_classification:
    source: bcfp                                  # crate-registered
    path: overrides/user_habitat_classification.csv
    canonical_schema: bcfp/user_habitat_classification
  user_barriers_definite:
    source: bcfp                                  # not yet in crate registry — falls through to plain CSV
    path: overrides/user_barriers_definite.csv
  # ... rest of overrides
extends: null   # supported in resolver; bundled configs don't use it
provenance:
  # unchanged from current schema — already byte/shape checksums per file
```

Pipeline knobs (`break_order`, `cluster`, `spawn_connected`, `apply_habitat_overlay`) stay where they are.

### Pipeline phase migration

Each `lnk_pipeline_*` phase that reads `cfg$overrides$X` or `cfg$habitat_classification` becomes a phase that takes a `loaded` object alongside `cfg`. ~25 reference points across 8 files (lnk_pipeline_{load,break,classify,connect,prepare,species}, lnk_stamp, lnk_config_verify) + tests + targets + vignette.

### Why not staged across two PRs

link is its own only consumer. A backwards-compat shim during transition would be dead code on arrival — written and removed in the same week. CLAUDE.md guidance: don't write shims when you can just change the code.

### Safety bar

- Pre-flight on a single WSG (~100s) before full tar_make
- Full `tar_make()` on 5 WSGs × 2 configs (~20 min) — bit-identical rollup vs pre-refactor baseline is the merge gate
- `lnk_config_verify()` + manifest-shape tests catch config-load failures
- Pipeline phase signature change is mechanical — same data, different access point

### Acceptance criteria (revised)

- [ ] `DESCRIPTION` declares `crate (>= 0.0.1)` in `Imports` (or `Remotes` if not yet on registry)
- [ ] `lnk_config()` returns manifest-only object — no data frames in `cfg$overrides$X`, `cfg$habitat_classification`, etc.
- [ ] `lnk_load_overrides(cfg)` exported, returns named list of canonical tibbles
- [ ] `user_habitat_classification` routes through `crt_ingest("bcfp", "user_habitat_classification", path)` and shields callers from upstream long↔wide pivots
- [ ] Other registered files (when crate adds them) plug in by config edit alone — no link R code change
- [ ] Config schema supports `extends:` (project configs inherit defaults) and per-entry overrides
- [ ] All `lnk_pipeline_*` phases take `loaded` alongside `cfg` and use it
- [ ] `lnk_config_verify()` + `lnk_stamp()` work without parsing data CSVs
- [ ] Tests cover: manifest loads correctly without data; load_overrides dispatches via crate; project config with extends + overrides; missing files / mis-shape inputs throw fail-loud
- [ ] No source name (`bcfp`, `nge`, `local`) appears in link's R code — only in YAML configs
- [ ] `tar_make()` rollup bit-identical vs pre-refactor baseline on all 5 WSGs

### Out of scope (defer to follow-up issues)

- Crate schemas for the other ~9 bcfp-sourced CSVs (one issue per file as canonical-shape decisions concretize). They fall through to plain CSV read until then.
- `nge` and `local` source families (when project-experimental configs surface real need)
- Type-aware variant matching (crate v0.1.x roadmap)
- Lazy / per-WSG loading (`crossings.csv` is parsed twice today across 2 configs — possible follow-up since the manifest/data split makes it trivial)

---

## Original proposal (preserved for audit trail)

## Problem

When smnorris reshapes a bcfishpass override CSV (long→wide, type changes, column renames), link's processing code is shape-fragile — every consumer of the file knows the upstream shape directly. Yesterday's `user_habitat_classification.csv` reshape rippled into fresh's API (fresh#176, #177) and threatens cached report output. Beyond that, **link's API today couples to bcfp by hardcoding paths/shapes** — but link is meant to be source-agnostic. We want to add new data types (e.g. NGE-curated `user_habitat_known`) and swap experimental files (e.g. project-local `user_barriers_definite`) without changing link's code.

## Proposed Solution

Adopt a **config-driven, source-agnostic API** in link that delegates per-file ingest to crate's source-explicit dispatcher (`crate::crt_ingest(source, file_name, path)`). link's code never names a source; the source comes from config metadata.

### Public API

```r
lnk_load_overrides(config = "default")
# config: name of bundled config OR path to a config YAML file
# Returns: named list of canonical-shape tibbles, one per file in the resolved config
```

**Return contract:** named list of canonical-shape tibbles (one per file in resolved config). Caller decides whether to write to a DB. `lnk_load_overrides()` is pure-R-side, testable without a DB connection. No DB writes happen inside this function.

### Bundled config schema

```yaml
# inst/extdata/configs/default/config.yaml
name: default
files:
  user_barriers_definite:
    source: bcfp
    path_relative: overrides/user_barriers_definite.csv     # bundled, relative to config dir
    canonical_schema: bcfp/user_barriers_definite           # crate schema this file conforms to
  user_habitat_classification:
    source: bcfp
    path_relative: overrides/user_habitat_classification.csv
    canonical_schema: bcfp/user_habitat_classification
  # ... other bcfp-sourced files
```

### Project / experimental config example

A project repo (e.g. `restoration_wedzin_kwa_2024`) supplies its own config. Supports `extends:` to inherit the default, and `overrides:` to swap or add entries:

```yaml
# wedzin_kwa_2024/configs/experimental.yaml
name: experimental_wedzin_kwa
extends: default                                            # inherit all bcfp entries from link's default
overrides:
  user_barriers_definite:
    source: local                                           # OUR experimental table
    path: data/wedzin_kwa/barriers_definite_v3.csv          # absolute or relative to config file
    canonical_schema: bcfp/user_barriers_definite           # validate against same canonical shape
  user_habitat_known:                                       # NEW logical entry, doesn't exist in default
    source: nge                                             # NGE-curated source family
    path: data/wedzin_kwa/habitat_field_2024.csv
    canonical_schema: nge/user_habitat_known                # crate schema (lands when this domain is added to crate)
```

`lnk_load_overrides("wedzin_kwa_2024/configs/experimental.yaml")` resolves `extends`, applies `overrides`, dispatches each entry via `crate::crt_ingest(source, file_name, path)`, returns named list of canonical tibbles.

### Wrangling stays in project / consumer code

Once `lnk_load_overrides()` returns canonical-shape tibbles, project scripts wrangle them with plain dplyr/tidyr — combining experimental + bcfp tables, dedup, semi-joins, AOI filters. No special "merge" framework in link; pure data composition on canonical shapes.

### Implementation outline

- `R/lnk_load_overrides.R` — exported, resolves config (incl. extends/overrides), iterates files, calls `crate::crt_ingest()`
- `R/lnk_config.R` — internal helpers for config resolution (parse YAML, resolve extends, apply overrides, validate paths exist)
- `R/lnk_source.R` — internal `.lnk_source_resolve(entry)` — given a config entry, returns the source location handle (today: a file path resolved from `path` or `path_relative`; forward-compatible with S3 URLs, postgres connections, anything `crate::crt_ingest()` accepts in future). Not exported. Reserves the `lnk_source_*` namespace for future siblings (`lnk_source_list(config)`, `lnk_source_validate(entry)`, `lnk_source_check(config)`) that ship as project-experimental work surfaces real need.
- `inst/extdata/configs/default/config.yaml` — bundled default config listing bcfp entries
- `inst/extdata/configs/bcfishpass/config.yaml` — existing bundle; updated to new schema (still source: bcfp for everything; mirrors upstream verbatim for regression testing)
- `DESCRIPTION` — adds `crate (>= 0.0.1)` to `Imports` (or `Remotes` until crate is on registry)
- Tests: load default config → returns expected files in canonical shape; load synthetic experimental config (extends + overrides + new file type) → returns merged result

### Why no source name appears in link's R code

That's the load-bearing property. Adding a new source (NGE, lab, provincial) means:
- crate adds the schema YAML + adapter for the new source
- Project config references `source: <new-source>` in its YAML
- link's R code does NOT change

Link is fully source-agnostic at the API level. Source knowledge lives in:
1. Crate (the adapter code)
2. Configs (data, in link's bundle for default + project repos for experimental)
3. NEVER in link's R code

## Acceptance criteria

- [ ] `DESCRIPTION` declares `crate` (>= 0.0.1) in `Imports` (or `Remotes` if not yet on registry)
- [ ] `lnk_load_overrides()` resolves bundled configs by name (`config = "default"`) and arbitrary paths (`config = "/path/to/config.yaml"`)
- [ ] Config resolver supports `extends:` (inherit entries from another config) and `overrides:` (replace or add entries)
- [ ] `lnk_load_overrides()` dispatches each entry via `crate::crt_ingest(source, file_name, path)` and returns named list of canonical tibbles
- [ ] Bundled `inst/extdata/configs/default/config.yaml` lists all bcfp-sourced files with `source: bcfp` + `canonical_schema: bcfp/<file>`
- [ ] Bundled `inst/extdata/configs/bcfishpass/config.yaml` updated to new schema (retains regression-test purpose)
- [ ] Existing direct `read.csv()` calls of `user_habitat_classification.csv` in link's R/ are replaced by access through `lnk_load_overrides()$user_habitat_classification`
- [ ] Tests cover: default config loads correctly; synthetic project config with extends + overrides + new file type loads correctly; missing files / mis-shape inputs throw fail-loud
- [ ] No `format` / `species_col` shape-aware parameters needed in link code that consumed bcfp-sourced files (those moved to crate's adapter; link sees only canonical)
- [ ] No source name (`bcfp`, `nge`, `local`) appears in link's R code — only in YAML configs

## Scope

First-instance integration: `user_habitat_classification.csv`. Other ~9 bcfp-sourced CSVs in default config conform to the same `lnk_load_overrides` pattern automatically (they're config entries, not code changes), but their crate schema YAMLs land as separate follow-up issues (one per file as canonical-shape decisions concretize).

## Dependencies / coordination

- **Depends on:** NewGraphEnvironment/crate#2 released as v0.0.1 (crate ships `crt_ingest(source, file_name, path)` with `source = "bcfp"` registered + first-instance handler for `user_habitat_classification`)
- **Coupled to:** #64 (Phase 1 sync workflow shape fingerprint — same coordinated 2-PR dance whenever upstream reshapes)
- **Downstream effect:** fresh#177 simplifies (`format` parameter goes away once link normalizes via crate at ingest)
- **Future:** when project-experimental configs need it (e.g. `restoration_wedzin_kwa_2024`), crate releases v0.1.x adding `source = "local"` and `source = "nge"` registrations + corresponding schemas

## Context / related

- Comms thread (architectural design): `comms/crate/20260427_fresh_bcfishpass_csv_consumers.md`
- Implementation plan thread (forthcoming): `comms/crate/20260427_bcfp_ingest_impl_plan.md`
- Crate boundary doc: `crate/CLAUDE.md` (Boundary with rfp section — link consumes canonical, crate owns canonicalization framework)
- Path E architectural choice (config-driven source-agnostic API in link, source-explicit dispatcher in crate) — settled in comms thread after considering Path C (link calls per-source) and Path D (crate owns sync)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add lnk_load_overrides(config) source-agnostic API consuming crate::crt_ingest() #65

Resolution / approach update — 2026-04-29

What `lnk_config()` becomes

What `lnk_load_overrides()` does

Config schema (per-entry source/schema declarations)

Pipeline phase migration

Why not staged across two PRs

Safety bar

Acceptance criteria (revised)

Out of scope (defer to follow-up issues)

Original proposal (preserved for audit trail)

Problem

Proposed Solution

Public API

Bundled config schema

Project / experimental config example

Wrangling stays in project / consumer code

Implementation outline

Why no source name appears in link's R code

Acceptance criteria

Scope

Dependencies / coordination

Context / related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add lnk_load_overrides(config) source-agnostic API consuming crate::crt_ingest() #65

Description

Resolution / approach update — 2026-04-29

What lnk_config() becomes

What lnk_load_overrides() does

Config schema (per-entry source/schema declarations)

Pipeline phase migration

Why not staged across two PRs

Safety bar

Acceptance criteria (revised)

Out of scope (defer to follow-up issues)

Original proposal (preserved for audit trail)

Problem

Proposed Solution

Public API

Bundled config schema

Project / experimental config example

Wrangling stays in project / consumer code

Implementation outline

Why no source name appears in link's R code

Acceptance criteria

Scope

Dependencies / coordination

Context / related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

What `lnk_config()` becomes

What `lnk_load_overrides()` does