Manifest/data split: lnk_config + lnk_load_overrides + crate dispatch (#65)#79
Merged
NewGraphEnvironment merged 4 commits intomainfrom Apr 29, 2026
Merged
Conversation
PWF baseline. Path C unified rewrite in single PR — decompose lnk_config into manifest (paths, provenance, file declarations) plus new lnk_load_overrides() (canonical-shape data via crate::crt_ingest()). Single v0.18.0 bump, no backwards-compat shim (link has zero external R-code consumers). Issue body updated with resolution preamble + revised acceptance criteria; original proposal preserved below for audit trail. Relates to #65 Relates to NewGraphEnvironment/sred-2025-2026#28 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ma_* family Notifies link-Claude that crate v0.0.2 is tagged + pkgdown site updated. Public API surface (crt_ingest, crt_files) unchanged so lnk_load_overrides() callers don't change. Only impact on link#65: bump crate dep to >= 0.0.2 + retest. Documents internal renames table for context (internal_bcfp_* → crt_handler_bcfp_*, registry_load → crt_registry_load, schema_apply → crt_schema_apply, plus new crt_schema_read + crt_schema_validate). Notes that local 65-schema-driven-types branch can be deleted (work superseded under Convention C names). Includes brief process-note about the comms- thread-first norm for cross-repo design changes. Relates to NewGraphEnvironment/sred-2025-2026#28 Relates to NewGraphEnvironment/crate#4 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reply to crate-Claude's 20260429_crate_v002_refactor_shipped thread. Confirms link is on crate (>= 0.0.2), Convention C renames don't touch link (link only uses crt_ingest public API), and going forward crate-side work surfaced during link integration goes through comms-first then a crate release before link consumes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes #65. Decompose lnk_config() into a manifest layer and a data-ingest layer; route registered files through crate for source-agnostic canonicalization. lnk_config() is now manifest-only. Reads config.yaml and returns paths, file declarations, pipeline knobs, and provenance — no parsed CSVs. Cheap to call. lnk_config_verify() and lnk_stamp() no longer pay for CSV parsing they don't need. lnk_load_overrides(cfg) materializes the data files declared in cfg$files and returns a named list of canonical-shape tibbles. Entries with `source` + `canonical_schema` declarations dispatch through crate::crt_ingest() (today: bcfp/user_habitat_classification); others fall through to local reads dispatched on path extension. New source families plug in by config edit alone — no link R code change. New config.yaml schema. Top-level `rules:` and `dimensions:` paths replace `files.rules_yaml` / `files.dimensions_csv` (format follows from the path's extension, not the key name). The previous `files:` and `overrides:` maps merge into one flat `files:` map keyed by filename stem. Each entry carries `path:` and optionally `source:` and `canonical_schema:`. Configs may declare `extends:` to inherit from another config; child entries override same-key parent entries. Pipeline phase signatures gain `loaded`. Every lnk_pipeline_* phase that reads a data table now takes cfg and loaded together. Callers materialize once and thread the result through phases. cfg$overrides$X and cfg$habitat_classification access points become loaded$X. See data-raw/_targets.R and data-raw/compare_bcfishpass_wsg.R for the pattern. Verification: tar_make() on 5 WSGs × 2 configs reproduces the v0.17.0 baseline rollup bit-identically (sha256:a82de9928809b9751213e08916c476b4ee3f99286bc9ea2dc53f9659eeb92097) under both crate 0.0.0.9000 and crate 0.0.2 (Convention C). Refactor introduces no behaviour change. Migration map for existing callers: cfg$rules_yaml -> cfg$rules cfg$dimensions_csv -> cfg$dimensions cfg$parameters_fresh -> loaded$parameters_fresh cfg$habitat_classification -> loaded$user_habitat_classification cfg$observation_exclusions -> loaded$observation_exclusions cfg$wsg_species -> loaded$wsg_species_presence cfg$overrides$X -> loaded$X (filename-stem keys) Out of scope (follow-up issues): - crate schemas for the other 9 bcfp-sourced files (one issue per file as canonical-shape decisions concretize). Today they fall through to plain CSV read. - nge / local source families (when project-experimental configs need them). - Attribution / NOTICE for redistributed upstream data — link#78. Fixes #65 Relates to NewGraphEnvironment/sred-2025-2026#28 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
lnk_config()into a manifest layer (paths, file declarations, pipeline knobs, provenance — no parsed CSVs) and a newlnk_load_overrides(cfg)data-ingest layer that returns canonical-shape tibbles.bcfp/user_habitat_classification) dispatch throughcrate::crt_ingest()for source-agnostic canonicalization. Unregistered entries fall through to local reads dispatched on path extension. Adding a new source family is a config edit + crate registration — no link R code change.config.yamlschema: top-levelrules:/dimensions:paths (no format suffix), one flatfiles:map keyed by filename stem, optionalextends:for project configs.lnk_pipeline_*phase that reads a data table now takes bothcfgandloaded; callers materialize once and threadloadedthrough.crate (>= 0.0.2)for the schema-driven type enforcement (crt_schema_validate+crt_schema_apply) shipped in crate#5.Verification
tar_make()on 5 WSGs × 2 configs reproduces the v0.17.0 baseline rollup bit-identically:sha256:a82de9928809b9751213e08916c476b4ee3f99286bc9ea2dc53f9659eeb92097Refactor introduces zero behaviour change. 608 unit tests passing.
Migration map
cfg$rules_yamlcfg$rulescfg$dimensions_csvcfg$dimensionscfg$parameters_freshloaded$parameters_freshcfg$habitat_classificationloaded$user_habitat_classificationcfg$observation_exclusionsloaded$observation_exclusionscfg$wsg_speciesloaded$wsg_species_presencecfg$overrides$Xloaded$X(filename-stem keys)Test plan
devtools::test()— 608 passinglintr::lint_package()— only repo-conventional style notestar_make()parity vs v0.17.0 baseline (run twice; both match)lnk_config_verify(lnk_config("bcfishpass"))clean post-refactorOut of scope (follow-up issues)
nge/localsource families — added when project-experimental configs need them.Fixes #65
Relates to NewGraphEnvironment/sred-2025-2026#28