Ground causal-graph predicates: 28-mapping cohort, 482 edges grounded by realmarcin · Pull Request #61 · CultureBotAI/TraitMech

realmarcin · 2026-05-23T08:25:00Z

Summary

Adds the predicate-grounding pipeline that fills causal_graphs[].edges[].predicate_id from a curated label→CURIE TSV. This pass grounds 482 edges across 212 trait YAMLs using 28 high-confidence mappings spanning METPO, RO, biolink, and rdfs.

Mapping cohort (`mappings/predicate_grounding.tsv`, 28 rows)

6 METPO ObjectProperty matches (produces, oxidizes, uses carbon/electron-donor/electron-acceptor/energy-source)
4 RO matches (enables RO:0002327, contributes to RO:0002326, regulates RO:0002211, depends on RO:0002502)
17 biolink slot matches (causes, catalyzes, associated_with, located_in + 3 aliases, participates_in, part_of, occurs_in, interacts_with, develops_into, consumes, produces, encodes, etc.)
1 rdfs:subClassOf covering is a, specializes, example of

New machinery

scripts/ground_causal_predicates.py — idempotent: never overwrites existing groundings, validates closed-mode before write, appends one CurationEvent per modified file.
scripts/check_biolink_coverage.py — cross-checks applied mappings + residual labels against data/raw/biolink-model.yaml (vendored, 499 KB).
just ground-predicates and just check-biolink-coverage recipes.

Residual

537 edges across 191 distinct labels remain ungrounded. See reports/predicate_grounding_residual.tsv for the ranked tail; top residuals (manifests as 52, supports 26, selects for 20, drives 19) are curator-paraphrased predicates without a clean RO/Biolink home — candidates for an upstream METPO predicate proposal.

Audit snapshot

Includes audit-pass output from the audit-schema-gaps skill (reports/{gap_fix_backlog,schema_gap_audit,instance_validation_*,pipeline_*}). Corpus passes just validate-strict clean (0 ERROR rows / 357 files). The CI gate that locks this in is tracked as G01 in reports/gap_fix_backlog.md and lands in a follow-up PR.

Test plan

just validate-strict — 0 ERROR rows / 357 files
just ground-predicates (dry-run after --apply) — reports 0 additional groundings (idempotent)
just check-biolink-coverage — 28 applied mappings indexed, residual cross-checked
CI re-runs validate-strict on the diff (gate ships in PR2)

🤖 Generated with Claude Code

Adds the predicate-grounding pipeline used to populate `causal_graphs[].edges[].predicate_id` from a curated label→CURIE TSV. This pass grounds 412 edges across 212 trait YAMLs using 28 high-confidence mappings (METPO, RO, biolink, rdfs). New machinery: - `scripts/ground_causal_predicates.py` — walks `data/traits/**/*.yaml`, fills empty `predicate_id` from `mappings/predicate_grounding.tsv`, validates closed-mode before write, appends one CurationEvent per modified file, never overwrites existing groundings. - `scripts/check_biolink_coverage.py` — cross-checks applied mappings and residual labels against the Biolink model (`data/raw/biolink-model.yaml`, vendored to keep CI self-contained). - `just ground-predicates` and `just check-biolink-coverage` recipes. Initial mapping cohort (`mappings/predicate_grounding.tsv`, 28 rows): - 6 METPO ObjectProperty matches (produces, oxidizes, uses carbon/electron-donor/electron-acceptor/energy-source). - 4 RO matches (enables RO:0002327, contributes to RO:0002326, regulates RO:0002211, depends on RO:0002502). - 17 biolink slot matches (causes, catalyzes, associated_with, located_in, participates_in, part_of, occurs_in, interacts_with, develops_into, consumes, produces, encodes, plus three located_in aliases — localized in/to, localizes to). - 1 rdfs:subClassOf for `is a`, `specializes`, `example of`. Residual: 537 edges across 191 distinct labels remain ungrounded. See `reports/predicate_grounding_residual.tsv` for the ranked tail; top residuals (`manifests as`, `supports`, `selects for`, `drives`) are curator-paraphrased predicates without a clean RO/Biolink home and are candidates for an upstream METPO predicate proposal. Includes audit-pass output from the audit-schema-gaps skill (`reports/{gap_fix_backlog,schema_gap_audit,instance_validation_*, pipeline_*}`). Corpus passes `just validate-strict` clean: 0 ERROR rows across 357 files. The CI gate that locks this in is tracked as G01 in `reports/gap_fix_backlog.md` and lands in a follow-up PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR adds a predicate-grounding workflow for TraitMech causal graphs, populating causal_graphs[].edges[].predicate_id from a curated label→CURIE TSV and wiring the workflow into just targets, alongside audit/coverage reports.

Changes:

Add predicate grounding/coverage tooling (new scripts + just recipes) driven by mappings/predicate_grounding.tsv.
Apply predicate groundings across many trait YAMLs by adding predicate_id plus new curation_history events.
Add audit/summary/report artifacts under reports/ to capture writer-audit and validation snapshots.

Reviewed changes

Copilot reviewed 228 out of 229 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`mappings/predicate_grounding.tsv`	Adds the curated label→CURIE mapping cohort used for grounding.
`justfile`	Adds `ground-predicates` and `check-biolink-coverage` recipes.
`reports/pipeline_writers_audit.tsv`	Snapshot TSV of YAML-writer audit (currently appears incomplete vs new scripts).
`reports/pipeline_gap_audit.md`	Narrative audit of YAML-writing scripts and pipeline gaps (needs updates for new writer).
`reports/instance_validation_summary.md`	Summary of strict instance validation run.
`reports/instance_validation_failures.tsv`	Empty/placeholder failures TSV (header only).
`reports/gap_fix_backlog.tsv`	Backlog of pipeline/schema follow-ups.
`data/traits/**.yaml`	Adds `predicate_id` groundings and `GROUND_CAUSAL_PREDICATES` curation events across many traits.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+path	writes_yaml	appends_curation_history	has_write_safeguard	validates_before_write	wired_into_just
+scripts/audit_writers.py	yes	yes	yes	yes	yes
+scripts/build_embedding_index.py	yes	no	no	no	yes
+scripts/render_trait_pages.py	yes	no	yes	no	yes
+scripts/seed_from_metpo.py	yes	yes	yes	no	yes


+### `scripts/seed_from_metpo.py` — the only real trait-YAML writer
+
+This is the entry point for new trait records. It uses the safer **opt-in** convention (`--apply` defaults off; bare invocation is dry-run) and appends `CurationEvent` entries when it writes — both correct. It does **not** validate output against the schema before writing.
+
+**Gap (P1):** add an in-process strict validation pass before each write, using the same `linkml.validator.Validator(closed=True)` configured in `scripts/validate_strict.py`. If a record fails, log + skip rather than abort the whole run, so one bad record doesn't poison a 357-file seed. Effort: M (refactor the writer loop to construct a per-process Validator and call it before `path.write_text`). This is the highest-leverage fix on the pipeline axis because the seeder is the *only* path producing new trait records.
+


 - timestamp: '2026-05-09T00:00:00-07:00'
  curator: Codex
  action: CURATED_CAUSAL_GRAPH


  llm_assisted: false
 - timestamp: '2026-05-09T00:00:00-07:00'
  curator: Codex
  action: CURATED_CAUSAL_GRAPH


+| path | writes_yaml | curation_history | safeguard | validates_first | wired_into_just |
+|---|---|---|---|---|---|
+| `scripts/audit_writers.py` | yes | yes | yes | yes | yes |
+| `scripts/build_embedding_index.py` | yes | no | no | no | yes |
+| `scripts/render_trait_pages.py` | yes | no | yes | no | yes |
+| `scripts/seed_from_metpo.py` | yes | yes | yes | no | yes |


…#63) Lifts the TraitMech causal-graph subsystem into METPO so downstream consumers can filter trait records by mechanism axis using METPO-native queries instead of TraitMech-internal LinkML enum codes. Cohort is committed here in-repo only — not filed upstream in this PR. Cohort (proposals/metpo_traitmech_v1/): - 3 top-level domain classes under METPO:1000000: - METPO:1007400 trait causal graph - METPO:1007401 trait causal node - METPO:1007402 trait causal edge - 1 enum-parent under METPO:1007401: - METPO:1007410 trait causal node type - 10 leaf classes under METPO:1007410, one per CausalNodeTypeEnum permissible value (METPO:1007411–1007420), e.g. causal-graph trait node, causal-graph pathway node, causal-graph environmental factor node (xref ENVO:01000254), causal-graph experimental factor node (xref EFO:0000001), etc. Out of scope (documented in proposal.md): - Scope A: no traitmech:NNNNNN synthetic IDs exist in corpus today. - Scope B (causal-graph predicates): deferred until the predicate-grounding migration (#61) reduces the 191-label residual. - 5 other LinkML enums (TraitCategoryEnum, TermKindEnum, SynonymTypeEnum, PriorityEnum, MappingStatusEnum) — workflow internals, not ontology axes. Tooling: - scripts/verify_metpo_proposal.py — column-count, header, parent integrity, subset tag, scope-A/C coverage. Wired as `just verify-proposal <cohort>`. - scripts/robot_validate_proposal.py — `robot template → merge with metpo.owl → reason ELK`. Wired as `just robot-validate-proposal <cohort>`. Discovers robot via $ROBOT, $ROBOT_BIN, PATH, then ../kg-microbe/data/raw/robot. - .gitignore: reports/robot/ (regenerable, dominated by re-serialized metpo.owl at ~500 KB per file). Verification (run locally on this branch): - `just verify-proposal metpo_traitmech_v1` → PASS, 0 failures. - `just robot-validate-proposal metpo_traitmech_v1` → PASS, no UNSAT, ELK delta +6 axioms (the inferred subclass closure). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…al-only) (#65) * Add METPO predicate proposal cohort metpo_traitmech_v2 (8 predicates) Scope-B proposal: 8 new METPO object properties (METPO:2007400-METPO:2007407) covering 128 of the 537 residual causal-edge predicates left by the v1 grounding pass (#61). Cohort (proposals/metpo_traitmech_v2/): - manifests as (METPO:2007400, 52 edges) — state → observable trait - selects for (METPO:2007401, 20 edges) — env condition → adapted trait - feeds electrons into (METPO:2007402, 12 edges) — donor → transport chain - transfers electrons to (METPO:2007403, 6 edges) — single-step redox - fixed by (METPO:2007404, 9 edges) — substrate → fixation pathway - oxidized to (METPO:2007405, 8 edges) — substrate → oxidized product - challenges (METPO:2007406, 9 edges) — stressor → tolerance trait - mitigates (METPO:2007407, 12 edges) — defense → stressor (paired) Each candidate was checked against RO/Biolink first; rejections documented per-row in proposal.md (e.g. biolink:manifestation_of has range `disease` — too narrow; biolink:treats is clinical; etc.). Subset tag: metpo_traitmech_2026_06. Domain = range = METPO:1007401 (trait causal node, minted in v1). ROBOT/ELK validates clean: delta +6 axioms, no UNSAT (v1's METPO:1007401 resolves to unnamed external IRI without v1 merged, which is fine — no error, just preserved domain/range constraint). Per-corpus impact (after re-running ground-predicates --apply with the expanded 38-row mappings TSV): - Edges grounded: 482 → 618 (+136) - Edges residual: 537 → 401 (−136) - Distinct labels: 191 → 181 (−10) Also adds 2 RO mappings (controls, directs → RO:0002211 regulates) that match the RO definition of regulation but were not in the v1 mapping cohort. NOT filed upstream in this PR (per user instruction). Cohort lives in this repo only; upstream filing path documented in proposal.md. Mapping TSV notes flag the 8 proposed CURIEs as "proposed upstream in proposals/metpo_traitmech_v2" so reviewers know they're pending METPO adoption. Verified locally: - just verify-proposal metpo_traitmech_v2 → PASS (0 failures) - just robot-validate-proposal metpo_traitmech_v2 → PASS (ELK +6) - just validate-strict → 0 ERROR rows / 357 files Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Address Copilot review on PR #65 Three fixes per Copilot inline comments: - TSV row 3 (manifests as): swap definition_source from `copiotrophic.yaml` (which has no `manifests as` edge — only `selects for`, `supports`, etc.) to `nutrient_adaptation.yaml#nutrient_adaptation_life_history_axis`, which is the canonical graph where `manifests as` first appears. - TSV row 9 (challenges): swap definition_source from `acidophilic.yaml` (no `challenges` edge — uses `selects for`) to `acidotolerant.yaml#acidotolerant_acid_stress_homeostasis`, which carries the `acidic_exposure challenges ...` edge directly. - proposal.md context paragraph: correct grounded counts from the incorrect "648 of 1185" to the actual "618 of 1019", matching the Corpus Impact table. Verified via `uv run python <<<` count over data/traits/**/causal_graphs[].edges[] (total=1019, grounded=618, residual=401). - proposal.md paired-predicate heading: rephrase "the only paired pair" → "the only paired predicate set" (removes the redundancy). Verified: `just verify-proposal metpo_traitmech_v2` → PASS, 0 failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds the node-grounding pipeline, mirror of the predicate-grounding work in #61. Fills empty `causal_graphs[].nodes[].grounding` from a curated (label, node_type) → CURIE TSV. This pass grounds 77 nodes across trait YAMLs (38% → 45% of causal-graph nodes are now grounded; 1252 nodes total). New machinery: - scripts/ground_causal_nodes.py — walks data/traits/**/*.yaml, fills empty `grounding` from mappings/node_grounding.tsv, validates closed-mode before write, appends one CurationEvent per modified file, never overwrites existing groundings. Keyed on (label, node_type) since the same free-text label can refer to different ontology classes depending on node type (e.g. "terminal electron acceptor" as CHEMICAL vs MOLECULAR_FUNCTION). - just ground-nodes recipe. Hardening per the original Copilot review: - load_mapping validates required headers (label, node_type, target_curie) up-front; raises ValueError with a helpful message if any are missing (instead of silently producing an empty mapping when DictReader returns None for missing keys). - ground_nodes_in_doc returns a `grounded_keys` counter alongside the residual counter, so when a file fails validation the just-grounded nodes are re-added to the residual TSV. Without this, those nodes were invisible (removed from per-CURIE counts but not added to residual, even though the file is rejected and the nodes remain ungrounded on disk). - reports/pipeline_writers_audit.tsv refreshed to include the new writer (4 → 5 rows). Initial mapping cohort (mappings/node_grounding.tsv, 39 rows): - 14 CHEBI mappings for canonical metabolic chemicals (O2, CO2, CO, H2, CH4, methanol, NH3, NO3-, SO4(2-), S(2-), H+, Fe(2+), organic carbon, compatible solutes). - 10 GO-BP mappings for canonical processes (peptidoglycan synthesis, methanogenesis, aerobic/anaerobic respiration, photosynthesis, N2 fixation, fermentation, C fixation, oxidative phosphorylation, cellular pH regulation, response to osmotic stress). - 4 GO-CC mappings for canonical compartments (periplasmic space, outer membrane, plasma membrane, cytoplasm). - 2 GO-MF mappings (kinase activity, oxidoreductase activity). - 4 GO-BP/pathway mappings (ETC, photosynthetic ETC, Calvin-Benson, Wood-Ljungdahl). - 2 PATO + 3 ENVO env-factor mappings (light intensity, decreased temperature, anaerobic + anoxic environment). Residual: 688 nodes across 511 distinct (label, type) keys remain ungrounded. See reports/node_grounding_residual.tsv. The largest clusters are BIOLOGICAL_PROCESS abstractions (proton motive force, biomass, membrane fluidity) and GENE_OR_PROTEIN families (MreB, CRT enzymes, RuBisCO, FtsZ) — candidates for either a METPO node-class proposal cohort or upstream UniProt/PRO grounding. audit-writers TSV grows from 4 → 5 rows; the new script reports appends_curation_history + has_write_safeguard + validates_before_write all `yes` (matches the ground_causal_predicates contract from #61). Verified locally: - just ground-nodes (dry-run after --apply) → 0 additional groundings (idempotent) - header-missing test: TSV with bad headers raises ValueError naming the missing columns - just validate-strict → 0 ERROR rows / 357 files - just audit-writers → 5 writers, all wired into justfile Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds the node-grounding pipeline, mirror of the predicate-grounding work in #61. Fills empty `causal_graphs[].nodes[].grounding` from a curated (label, node_type) → CURIE TSV. This pass grounds 77 nodes across trait YAMLs (38% → 45% of causal-graph nodes are now grounded; 1252 nodes total). New machinery: - scripts/ground_causal_nodes.py — walks data/traits/**/*.yaml, fills empty `grounding` from mappings/node_grounding.tsv, validates closed-mode before write, appends one CurationEvent per modified file, never overwrites existing groundings. Keyed on (label, node_type) since the same free-text label can refer to different ontology classes depending on node type (e.g. "terminal electron acceptor" as CHEMICAL vs MOLECULAR_FUNCTION). - just ground-nodes recipe. Hardening per the original Copilot review: - load_mapping validates required headers (label, node_type, target_curie) up-front; raises ValueError with a helpful message if any are missing (instead of silently producing an empty mapping when DictReader returns None for missing keys). - ground_nodes_in_doc returns a `grounded_keys` counter alongside the residual counter, so when a file fails validation the just-grounded nodes are re-added to the residual TSV. Without this, those nodes were invisible (removed from per-CURIE counts but not added to residual, even though the file is rejected and the nodes remain ungrounded on disk). - reports/pipeline_writers_audit.tsv refreshed to include the new writer (4 → 5 rows). Initial mapping cohort (mappings/node_grounding.tsv, 39 rows): - 14 CHEBI mappings for canonical metabolic chemicals (O2, CO2, CO, H2, CH4, methanol, NH3, NO3-, SO4(2-), S(2-), H+, Fe(2+), organic carbon, compatible solutes). - 10 GO-BP mappings for canonical processes (peptidoglycan synthesis, methanogenesis, aerobic/anaerobic respiration, photosynthesis, N2 fixation, fermentation, C fixation, oxidative phosphorylation, cellular pH regulation, response to osmotic stress). - 4 GO-CC mappings for canonical compartments (periplasmic space, outer membrane, plasma membrane, cytoplasm). - 2 GO-MF mappings (kinase activity, oxidoreductase activity). - 4 GO-BP/pathway mappings (ETC, photosynthetic ETC, Calvin-Benson, Wood-Ljungdahl). - 2 PATO + 3 ENVO env-factor mappings (light intensity, decreased temperature, anaerobic + anoxic environment). Residual: 688 nodes across 511 distinct (label, type) keys remain ungrounded. See reports/node_grounding_residual.tsv. The largest clusters are BIOLOGICAL_PROCESS abstractions (proton motive force, biomass, membrane fluidity) and GENE_OR_PROTEIN families (MreB, CRT enzymes, RuBisCO, FtsZ) — candidates for either a METPO node-class proposal cohort or upstream UniProt/PRO grounding. audit-writers TSV grows from 4 → 5 rows; the new script reports appends_curation_history + has_write_safeguard + validates_before_write all `yes` (matches the ground_causal_predicates contract from #61). Verified locally: - just ground-nodes (dry-run after --apply) → 0 additional groundings (idempotent) - header-missing test: TSV with bad headers raises ValueError naming the missing columns - just validate-strict → 0 ERROR rows / 357 files - just audit-writers → 5 writers, all wired into justfile Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The grounding pipelines and audit scripts have been load-bearing infrastructure for the last 7 PRs (#61, #66, #67, #69, #70 — all of which rewrite causal-graph fields based on these scripts' output). They had zero unit-test coverage. A silent regression in idempotency, header validation, or self-suppression would not be caught by validate-strict (which only checks per-record schema conformance, not pipeline correctness). Test counts: tests/test_ground_causal_predicates.py 9 tests tests/test_ground_causal_nodes.py 12 tests tests/test_validate_strict.py 11 tests tests/test_audit_writers.py 11 tests --- total new 43 tests total suite 54 tests (was 11) Coverage highlights: ground_causal_predicates.py: - load_mapping: basic happy path, conflict detection (same label → different CURIEs raises ValueError), incomplete-row skipping, missing-file error. - ground_edges_in_doc: idempotency (second pass = 0 changes), existing predicate_id never overwritten, residual counting for unmapped labels, empty/missing-predicate edges skipped. ground_causal_nodes.py: - All of the predicate suite plus: - (label, node_type) keyed lookup — same label, different node_types map to different CURIEs without aliasing. - Header validation (Copilot fix from PR #66): TSV with `nodetype` / `targetcurie` typo'd headers raises ValueError naming both missing columns. - grounded_keys-on-validation-failure separability (Copilot fix from PR #66): caller can union residual + grounded_keys to recover the corpus-state residual after rolling back an invalid file write. validate_strict.py: - classify: parametrized over the 5 categories (unexpected_field, missing_required, enum_mismatch, pattern_mismatch, other) — the messages must match the actual jsonschema phrasings the validator emits. - validate_one: clean record produces 0 errors; unknown field surfaces unexpected_field (the G01 gate behavior); missing required field surfaces missing_required; YAML parse error surfaces as yaml_parse_error category. - iter_yaml_files: walks directories, filters .txt, picks up nested *.yaml. audit_writers.py: - looks_like_yaml_writer: yaml.safe_dump / yaml.dump positive, bare .write_text negative, .write_text near .yaml hint positive, arbitrary code negative. - audit: full-safeguards writer flagged yes/yes/yes/yes; no-safeguards writer flagged no/no/no; non-writer returns None; wired_into_just yes when justfile mentions the script stem. - Self-suppression (Copilot fix from PR #64): audit_writers.py itself returns None even though its own source matches yaml.safe_dump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add tests for grounding pipeline + audit scripts (+43 tests) The grounding pipelines and audit scripts have been load-bearing infrastructure for the last 7 PRs (#61, #66, #67, #69, #70 — all of which rewrite causal-graph fields based on these scripts' output). They had zero unit-test coverage. A silent regression in idempotency, header validation, or self-suppression would not be caught by validate-strict (which only checks per-record schema conformance, not pipeline correctness). Test counts: tests/test_ground_causal_predicates.py 9 tests tests/test_ground_causal_nodes.py 12 tests tests/test_validate_strict.py 11 tests tests/test_audit_writers.py 11 tests --- total new 43 tests total suite 54 tests (was 11) Coverage highlights: ground_causal_predicates.py: - load_mapping: basic happy path, conflict detection (same label → different CURIEs raises ValueError), incomplete-row skipping, missing-file error. - ground_edges_in_doc: idempotency (second pass = 0 changes), existing predicate_id never overwritten, residual counting for unmapped labels, empty/missing-predicate edges skipped. ground_causal_nodes.py: - All of the predicate suite plus: - (label, node_type) keyed lookup — same label, different node_types map to different CURIEs without aliasing. - Header validation (Copilot fix from PR #66): TSV with `nodetype` / `targetcurie` typo'd headers raises ValueError naming both missing columns. - grounded_keys-on-validation-failure separability (Copilot fix from PR #66): caller can union residual + grounded_keys to recover the corpus-state residual after rolling back an invalid file write. validate_strict.py: - classify: parametrized over the 5 categories (unexpected_field, missing_required, enum_mismatch, pattern_mismatch, other) — the messages must match the actual jsonschema phrasings the validator emits. - validate_one: clean record produces 0 errors; unknown field surfaces unexpected_field (the G01 gate behavior); missing required field surfaces missing_required; YAML parse error surfaces as yaml_parse_error category. - iter_yaml_files: walks directories, filters .txt, picks up nested *.yaml. audit_writers.py: - looks_like_yaml_writer: yaml.safe_dump / yaml.dump positive, bare .write_text negative, .write_text near .yaml hint positive, arbitrary code negative. - audit: full-safeguards writer flagged yes/yes/yes/yes; no-safeguards writer flagged no/no/no; non-writer returns None; wired_into_just yes when justfile mentions the script stem. - Self-suppression (Copilot fix from PR #64): audit_writers.py itself returns None even though its own source matches yaml.safe_dump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Address Copilot review on PR #71 Add explicit `assert "b.yml" not in names` to test_iter_yaml_files_walks_directory_and_filters — the prior test documented the .yml-skipping behavior in a comment but never asserted it, so a regression that started picking up .yml during directory walks would have slipped through silently. Also add test_iter_yaml_files_accepts_yml_file_passed_directly to lock in the asymmetry that the previous test only hinted at: iter_yaml_files() does accept .yml when passed as a file argument (only the rglob('*.yaml') walk is .yaml-only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Update audit_writers tests to match #75's tightened heuristic PR #75 changed `looks_like_yaml_writer` to require that the yaml-serializer call feed directly into write_text on the same line (instead of the looser "any .write_text + any .yaml token" heuristic, which produced false positives for scripts that only READ trait YAMLs). The pre-#75 test asserted that `path.write_text(content) # .yaml` counted as a YAML writer. That returned True under the old heuristic and False under the new (correct) one. Replace it with two tests that lock in the new contract: test_looks_like_yaml_writer_write_text_of_yaml_dump Positive: write_text(yaml.safe_dump(...)) / write_text(yaml.dump(...)) both count. test_looks_like_yaml_writer_write_text_of_json_is_false Negative: a script that reads *.yaml then writes JSON via write_text is NOT a YAML writer — this is the false-positive case #75 explicitly fixed for scripts/build_embedding_index.py and scripts/render_trait_pages.py. Also rename test_looks_like_yaml_writer_write_text_without_yaml_hint_is_false to test_looks_like_yaml_writer_write_text_plain_is_false since the "yaml hint" phrasing was tied to the old heuristic. 56 tests pass (was 54; +2). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 23, 2026 08:25

Copilot started reviewing on behalf of realmarcin May 23, 2026 08:25 View session

realmarcin mentioned this pull request May 23, 2026

Add validate-strict CI gate (G01) #62

Merged

4 tasks

Copilot AI reviewed May 23, 2026

View reviewed changes

realmarcin mentioned this pull request May 23, 2026

METPO proposal cohort metpo_traitmech_v1 (14 classes, local-only) #63

Merged

3 tasks

realmarcin merged commit 36b9fdd into main May 23, 2026
4 checks passed

realmarcin deleted the ground-causal-predicates-v1 branch May 23, 2026 20:46

realmarcin mentioned this pull request May 23, 2026

METPO predicate proposal cohort metpo_traitmech_v2 (8 predicates, local-only) #65

Merged

5 tasks

realmarcin mentioned this pull request May 24, 2026

Ground causal-graph nodes: 39-mapping cohort, 77 nodes grounded (38% → 45%) #66

Merged

4 tasks

realmarcin mentioned this pull request May 24, 2026

Tests for grounding pipeline + audit scripts (+43 tests) #71

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ground causal-graph predicates: 28-mapping cohort, 482 edges grounded#61

Ground causal-graph predicates: 28-mapping cohort, 482 edges grounded#61
realmarcin merged 1 commit into
mainfrom
ground-causal-predicates-v1

realmarcin commented May 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

realmarcin commented May 23, 2026

Summary

Mapping cohort (mappings/predicate_grounding.tsv, 28 rows)

New machinery

Residual

Audit snapshot

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Mapping cohort (`mappings/predicate_grounding.tsv`, 28 rows)