Skip to content

Migrate remaining pipeline probes to manifest-driven gating #46

@NewGraphEnvironment

Description

@NewGraphEnvironment

Problem

Two pipeline helpers still use information_schema.tables probes to discover whether a per-WSG table exists, rather than reading the config manifest:

  • .lnk_pipeline_prep_gradient() probes for <schema>.barriers_definite_control before running the DELETE FROM gradient_barriers_raw ... USING barriers_definite_control pruning step.
  • .lnk_pipeline_prep_overrides() probes for <schema>.user_habitat_classification before passing habitat to lnk_barrier_overrides().

Both probes work correctly today — the tables exist exactly when the corresponding manifest key (overrides.barriers_definite_control, habitat_classification) is declared, because .lnk_pipeline_prep_load_aux() writes the tables from the loaded CSVs. But they are indirect: they rely on the load step having succeeded rather than on the manifest's declared intent.

#44 introduced a manifest-driven gate for the new control argument to lnk_barrier_overrides(). The key cfg$overrides$barriers_definite_control is the direct expression of "this config activates the control mechanism." Same pattern should apply to the two probes above.

Proposed change

Replace the probes with manifest-key checks:

.lnk_pipeline_prep_gradient() — requires adding cfg to its signature (currently it only takes conn, aoi, schema). Change:

ctrl_exists <- DBI::dbGetQuery(conn, sprintf(
  "SELECT 1 FROM information_schema.tables
   WHERE table_schema = %s AND table_name = 'barriers_definite_control'",
  .lnk_quote_literal(schema)))
if (nrow(ctrl_exists) > 0) {
  .lnk_db_execute(conn, sprintf(
    "DELETE FROM %s.gradient_barriers_raw ...", schema, schema))
}

to:

if (!is.null(cfg$overrides$barriers_definite_control)) {
  .lnk_db_execute(conn, sprintf(
    "DELETE FROM %s.gradient_barriers_raw ...", schema, schema))
}

.lnk_pipeline_prep_overrides() — already receives cfg. Change:

habitat_exists <- DBI::dbGetQuery(conn, sprintf(
  "SELECT 1 FROM information_schema.tables
   WHERE table_schema = %s AND table_name = 'user_habitat_classification'",
  .lnk_quote_literal(schema)))
habitat_arg <- if (nrow(habitat_exists) > 0) habitat_tbl else NULL

to:

habitat_arg <- if (!is.null(cfg$habitat_classification)) habitat_tbl else NULL

Not behaviour-changing on bcfishpass config

Current probes return the same answers the manifest would give. Two consecutive tar_make() runs pre- and post-change should produce bit-identical rollups.

Rationale

  • The config manifest is the declarative contract for which capabilities a pipeline variant activates. Manifest-driven gating is consistent across the override-role family (modelled_fixes, pscis_barrier_status, barriers_definite, barriers_definite_control).
  • Indirect probes are one more thing to keep in sync when the load step changes (empty-table semantics, drops, etc.). See the asymmetric-gating bug fixed in Wire barriers_definite_control into lnk_barrier_overrides #44 — rooted in the same "probe vs. manifest" seam.
  • Capability activation becomes locally readable from the config bundle alone, without cross-referencing DB state.

Scope

  • Modify two functions in R/lnk_pipeline_prepare.R.
  • Update .lnk_pipeline_prep_gradient()'s signature in lnk_pipeline_prepare()'s caller.
  • Update mocked tests in tests/testthat/test-lnk_pipeline_prepare.R that exercise these probes (swap mocked DBI::dbGetQuery bindings for manifest-based stubs).
  • tar_make() verification: bit-identical rollup across pre- and post-change runs.

Versions at filing

Cross-references

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions