Skip to content

Manifest/data split: lnk_config + lnk_load_overrides + crate dispatch (#65)#79

Merged
NewGraphEnvironment merged 4 commits intomainfrom
65-config-manifest-data-split
Apr 29, 2026
Merged

Manifest/data split: lnk_config + lnk_load_overrides + crate dispatch (#65)#79
NewGraphEnvironment merged 4 commits intomainfrom
65-config-manifest-data-split

Conversation

@NewGraphEnvironment
Copy link
Copy Markdown
Owner

Summary

  • Decompose lnk_config() into a manifest layer (paths, file declarations, pipeline knobs, provenance — no parsed CSVs) and a new lnk_load_overrides(cfg) data-ingest layer that returns canonical-shape tibbles.
  • Registered entries (today: bcfp/user_habitat_classification) dispatch through crate::crt_ingest() for source-agnostic canonicalization. Unregistered entries fall through to local reads dispatched on path extension. Adding a new source family is a config edit + crate registration — no link R code change.
  • New config.yaml schema: top-level rules: / dimensions: paths (no format suffix), one flat files: map keyed by filename stem, optional extends: for project configs.
  • Every lnk_pipeline_* phase that reads a data table now takes both cfg and loaded; callers materialize once and thread loaded through.
  • Bumps crate (>= 0.0.2) for the schema-driven type enforcement (crt_schema_validate + crt_schema_apply) shipped in crate#5.

Verification

tar_make() on 5 WSGs × 2 configs reproduces the v0.17.0 baseline rollup bit-identically:

  • Baseline (v0.17.0): sha256:a82de9928809b9751213e08916c476b4ee3f99286bc9ea2dc53f9659eeb92097
  • Post-refactor run 1 (crate 0.0.0.9000): match ✓
  • Post-refactor run 2 (crate 0.0.2 / Convention C): match ✓

Refactor introduces zero behaviour change. 608 unit tests passing.

Migration map

Old New
cfg$rules_yaml cfg$rules
cfg$dimensions_csv cfg$dimensions
cfg$parameters_fresh loaded$parameters_fresh
cfg$habitat_classification loaded$user_habitat_classification
cfg$observation_exclusions loaded$observation_exclusions
cfg$wsg_species loaded$wsg_species_presence
cfg$overrides$X loaded$X (filename-stem keys)

Test plan

  • devtools::test() — 608 passing
  • lintr::lint_package() — only repo-conventional style notes
  • tar_make() parity vs v0.17.0 baseline (run twice; both match)
  • lnk_config_verify(lnk_config("bcfishpass")) clean post-refactor
  • Pipeline phase migration verified end-to-end on all 5 WSGs (ADMS / BULK / BABL / ELKR / DEAD)

Out of scope (follow-up issues)

  • Crate schemas for the other 9 bcfp-sourced files — each gets its own issue as canonical-shape decisions concretize.
  • nge / local source families — added when project-experimental configs need them.
  • Attribution / NOTICE for redistributed upstream data — #78.

Fixes #65
Relates to NewGraphEnvironment/sred-2025-2026#28

NewGraphEnvironment and others added 4 commits April 29, 2026 06:48
PWF baseline. Path C unified rewrite in single PR — decompose lnk_config
into manifest (paths, provenance, file declarations) plus new
lnk_load_overrides() (canonical-shape data via crate::crt_ingest()).
Single v0.18.0 bump, no backwards-compat shim (link has zero external
R-code consumers).

Issue body updated with resolution preamble + revised acceptance
criteria; original proposal preserved below for audit trail.

Relates to #65
Relates to NewGraphEnvironment/sred-2025-2026#28

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ma_* family

Notifies link-Claude that crate v0.0.2 is tagged + pkgdown site updated.
Public API surface (crt_ingest, crt_files) unchanged so lnk_load_overrides()
callers don't change. Only impact on link#65: bump crate dep to >= 0.0.2 +
retest.

Documents internal renames table for context (internal_bcfp_* →
crt_handler_bcfp_*, registry_load → crt_registry_load, schema_apply →
crt_schema_apply, plus new crt_schema_read + crt_schema_validate). Notes
that local 65-schema-driven-types branch can be deleted (work superseded
under Convention C names). Includes brief process-note about the comms-
thread-first norm for cross-repo design changes.

Relates to NewGraphEnvironment/sred-2025-2026#28
Relates to NewGraphEnvironment/crate#4

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reply to crate-Claude's 20260429_crate_v002_refactor_shipped thread.
Confirms link is on crate (>= 0.0.2), Convention C renames don't touch
link (link only uses crt_ingest public API), and going forward
crate-side work surfaced during link integration goes through
comms-first then a crate release before link consumes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes #65. Decompose lnk_config() into a manifest layer and a
data-ingest layer; route registered files through crate for
source-agnostic canonicalization.

lnk_config() is now manifest-only. Reads config.yaml and returns
paths, file declarations, pipeline knobs, and provenance — no
parsed CSVs. Cheap to call. lnk_config_verify() and lnk_stamp() no
longer pay for CSV parsing they don't need.

lnk_load_overrides(cfg) materializes the data files declared in
cfg$files and returns a named list of canonical-shape tibbles.
Entries with `source` + `canonical_schema` declarations dispatch
through crate::crt_ingest() (today: bcfp/user_habitat_classification);
others fall through to local reads dispatched on path extension.
New source families plug in by config edit alone — no link R code
change.

New config.yaml schema. Top-level `rules:` and `dimensions:` paths
replace `files.rules_yaml` / `files.dimensions_csv` (format follows
from the path's extension, not the key name). The previous `files:`
and `overrides:` maps merge into one flat `files:` map keyed by
filename stem. Each entry carries `path:` and optionally `source:`
and `canonical_schema:`. Configs may declare `extends:` to inherit
from another config; child entries override same-key parent entries.

Pipeline phase signatures gain `loaded`. Every lnk_pipeline_* phase
that reads a data table now takes cfg and loaded together. Callers
materialize once and thread the result through phases.
cfg$overrides$X and cfg$habitat_classification access points become
loaded$X. See data-raw/_targets.R and data-raw/compare_bcfishpass_wsg.R
for the pattern.

Verification: tar_make() on 5 WSGs × 2 configs reproduces the
v0.17.0 baseline rollup bit-identically
(sha256:a82de9928809b9751213e08916c476b4ee3f99286bc9ea2dc53f9659eeb92097)
under both crate 0.0.0.9000 and crate 0.0.2 (Convention C). Refactor
introduces no behaviour change.

Migration map for existing callers:

  cfg$rules_yaml                  -> cfg$rules
  cfg$dimensions_csv              -> cfg$dimensions
  cfg$parameters_fresh            -> loaded$parameters_fresh
  cfg$habitat_classification      -> loaded$user_habitat_classification
  cfg$observation_exclusions      -> loaded$observation_exclusions
  cfg$wsg_species                 -> loaded$wsg_species_presence
  cfg$overrides$X                 -> loaded$X (filename-stem keys)

Out of scope (follow-up issues):

  - crate schemas for the other 9 bcfp-sourced files (one issue per
    file as canonical-shape decisions concretize). Today they fall
    through to plain CSV read.
  - nge / local source families (when project-experimental configs
    need them).
  - Attribution / NOTICE for redistributed upstream data — link#78.

Fixes #65
Relates to NewGraphEnvironment/sred-2025-2026#28

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@NewGraphEnvironment NewGraphEnvironment merged commit 749a69f into main Apr 29, 2026
1 check passed
@NewGraphEnvironment NewGraphEnvironment deleted the 65-config-manifest-data-split branch April 29, 2026 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add lnk_load_overrides(config) source-agnostic API consuming crate::crt_ingest()

1 participant