METPO node-class proposal cohort metpo_traitmech_v3 (6 classes, local-only)#69
Merged
Conversation
Scope-D follow-on to v1 (causal-graph scaffolding) and v2 (causal-graph predicates). Lifts 6 microbe-trait-specific node-class abstractions that the node-grounding pipeline (#66) cannot reach because no clean CHEBI/GO/ENVO/PRO upstream home exists. Cohort (proposals/metpo_traitmech_v3/, 6 class rows): - METPO:1007500 proton motive force (16 nodes) - METPO:1007501 microbial biomass (16 nodes) - METPO:1007502 inorganic electron donor (7 nodes) - METPO:1007503 reducing power (4 nodes) - METPO:1007504 terminal electron acceptor (5 nodes, dual CHEMICAL+MF typing) - METPO:1007505 membrane fluidity (6 nodes) The 6 classes ground 54 nodes; an accompanying GO mapping (carotenoid biosynthesis → GO:0016117) grounds 4 more for a total of 58 newly-grounded nodes. Each candidate's rejected upstream alternative documented per-row in proposal.md (PATO viscosity is generic, CHEBI lacks class-of role for electron-donor, GO has the PMF-generation process but not the gradient-as-state). Four of these are mis-typed in the corpus (PMF/biomass/membrane fluidity as BIOLOGICAL_PROCESS though semantically a state / material entity / quality; reducing power as CHEMICAL though semantically a capacity). The mapping TSV grounds them at the current typing so the type-cleanup migration can land separately. ID block: METPO:1007500-1007505 (fresh 1007500+ block per skill). Subset tag: metpo_traitmech_2026_07. All parented to METPO:1000000. Mapping additions (mappings/node_grounding.tsv, 39 → 47 rows): - 6 new METPO mappings (1007500-1007505) - 1 dual-typing alias for terminal electron acceptor as MOLECULAR_FUNCTION - 1 GO mapping for carotenoid biosynthesis → GO:0016117 Per-corpus impact: Nodes grounded: 564 → 622 (+58, 45% → 50%) Nodes residual: 688 → 630 (-58) Distinct keys: 511 → 503 (-8) NOT filed upstream in this PR (per established convention from v1+v2). Cohort committed in-repo only. Addresses Copilot review on the original PR #68 branch: - Mapping-row count corrected from "7" / "10" to actual 8 rows (6 METPO + 1 GO + 1 dual-typing alias). - Mis-typed-node count corrected from "Three" to "Four" (the four-item list now matches). - Typo "dispoportionation" → "disproportionation". Verified locally: - just verify-proposal metpo_traitmech_v3 → PASS (0 failures) - just robot-validate-proposal metpo_traitmech_v3 → PASS, ELK +6 - just validate-strict → 0 ERROR rows / 357 files - just ground-nodes --apply (second run) → 0 additional groundings (idempotent) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a new local-only METPO proposal cohort (metpo_traitmech_v3) to lift 6 microbe-trait-specific node-class abstractions into METPO placeholder IDs, and expands node-grounding mappings to ground previously residual causal-graph nodes across the TraitMech corpus.
Changes:
- Add
proposals/metpo_traitmech_v3/(narrative + ROBOT-template TSV) proposing 6 new METPO classes (METPO:1007500–METPO:1007505). - Extend
mappings/node_grounding.tsvwith 8 new mapping rows (6 METPO + 1 GO + 1 dual-typing alias) and apply them across trait YAML causal-graph nodes. - Update
reports/node_grounding_residual.tsvto reflect newly grounded residual clusters.
Reviewed changes
Copilot reviewed 46 out of 46 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| reports/node_grounding_residual.tsv | Removes now-grounded residual keys from the residual report. |
| proposals/metpo_traitmech_v3/proposal.md | Documents rationale/scope/impact for the v3 METPO node-class cohort. |
| proposals/metpo_traitmech_v3/metpo_proposal_classes_robot.tsv | ROBOT-template TSV defining the 6 proposed METPO classes. |
| mappings/node_grounding.tsv | Adds GO + METPO mappings (incl. dual-typed terminal electron acceptor) for node grounding. |
| data/traits/physiology/phototrophic.yaml | Grounds the reducing power causal node to METPO:1007503 and records curation history. |
| data/traits/physiology/photoorganoheterotrophic.yaml | Grounds proton motive force/biomass causal nodes to METPO and records curation history. |
| data/traits/physiology/photolithotrophic.yaml | Grounds inorganic electron donor/biomass causal nodes to METPO and records curation history. |
| data/traits/physiology/photolithoautotrophic.yaml | Grounds inorganic electron donor/reducing power/biomass causal nodes to METPO and records curation history. |
| data/traits/physiology/photoheterotrophic.yaml | Grounds biomass causal node to METPO:1007501 and records curation history. |
| data/traits/physiology/photoautotrophic.yaml | Grounds biomass causal node to METPO:1007501 and records curation history. |
| data/traits/physiology/organotrophic.yaml | Grounds proton motive force causal node to METPO:1007500 and records curation history. |
| data/traits/physiology/organoheterotrophic.yaml | Grounds biomass causal node to METPO:1007501 and records curation history. |
| data/traits/physiology/mixotrophic.yaml | Grounds biomass causal node to METPO:1007501 and records curation history. |
| data/traits/physiology/lithotrophic.yaml | Grounds inorganic electron donor and proton motive force causal nodes to METPO and records curation history. |
| data/traits/physiology/lithoheterotrophic.yaml | Grounds inorganic electron donor/biomass causal nodes to METPO and records curation history. |
| data/traits/physiology/lithoautotrophic.yaml | Grounds inorganic electron donor/reducing power causal nodes to METPO and records curation history. |
| data/traits/physiology/hydrogenotrophic.yaml | Grounds proton motive force causal node to METPO:1007500 and records curation history. |
| data/traits/physiology/heterotrophic.yaml | Grounds biomass causal node to METPO:1007501 and records curation history. |
| data/traits/physiology/chemotrophic.yaml | Grounds terminal electron acceptor/proton motive force causal nodes to METPO and records curation history. |
| data/traits/physiology/chemoorganotrophic.yaml | Grounds terminal electron acceptor/proton motive force/biomass causal nodes to METPO and records curation history. |
| data/traits/physiology/chemoorganoheterotrophic.yaml | Grounds biomass causal node to METPO:1007501 and records curation history. |
| data/traits/physiology/chemolithoheterotrophic.yaml | Grounds proton motive force/biomass causal nodes to METPO and records curation history. |
| data/traits/physiology/chemolithoautotrophic.yaml | Grounds inorganic electron donor/proton motive force/reducing power/biomass causal nodes to METPO and records curation history. |
| data/traits/physiology/chemoheterotrophic.yaml | Grounds biomass causal node to METPO:1007501 and records curation history. |
| data/traits/physiology/chemoautolithotrophic.yaml | Grounds inorganic electron donor/biomass causal nodes to METPO and records curation history. |
| data/traits/physiology/carboxydotrophic.yaml | Grounds proton motive force causal node to METPO:1007500 and records curation history. |
| data/traits/morphology/yellow_pigmented.yaml | Grounds carotenoid biosynthesis causal node to GO:0016117 and records curation history. |
| data/traits/morphology/pink_pigmented.yaml | Grounds carotenoid biosynthesis causal node to GO:0016117 and records curation history. |
| data/traits/morphology/orange_pigmented.yaml | Grounds carotenoid biosynthesis causal node to GO:0016117 and records curation history. |
| data/traits/morphology/carotenoid_pigmentation.yaml | Grounds carotenoid biosynthesis causal node to GO:0016117 and records curation history. |
| data/traits/morphology/gliding.yaml | Grounds proton motive force causal node to METPO:1007500 and records curation history. |
| data/traits/metabolism/respiration.yaml | Grounds terminal electron acceptor/proton motive force causal nodes to METPO and records curation history. |
| data/traits/metabolism/oxidative_phosphorylation.yaml | Grounds proton motive force causal node to METPO:1007500 and records curation history. |
| data/traits/metabolism/anaerobic_respiration.yaml | Grounds terminal electron acceptor (MF-typed) to METPO:1007504, wraps long text lines, and records curation history. |
| data/traits/metabolism/aerobic_respiration.yaml | Grounds proton motive force causal node to METPO:1007500 and records curation history. |
| data/traits/environment/temperature_preference.yaml | Grounds membrane fluidity causal node to METPO:1007505 and records curation history. |
| data/traits/environment/temperature_optimum.yaml | Grounds membrane fluidity causal node to METPO:1007505 and records curation history. |
| data/traits/environment/psychrotolerant.yaml | Grounds membrane fluidity causal node to METPO:1007505 and records curation history. |
| data/traits/environment/psychrophilic.yaml | Grounds membrane fluidity causal node to METPO:1007505 and records curation history. |
| data/traits/environment/ph_optimum.yaml | Grounds proton motive force causal node to METPO:1007500 and records curation history. |
| data/traits/environment/obligately_aerobic.yaml | Grounds proton motive force causal node to METPO:1007500 and records curation history. |
| data/traits/environment/neutrophilic.yaml | Grounds proton motive force causal node to METPO:1007500 and records curation history. |
| data/traits/environment/mesophilic.yaml | Grounds membrane fluidity causal node to METPO:1007505 and records curation history. |
| data/traits/environment/facultative_psychrophilic.yaml | Grounds membrane fluidity causal node to METPO:1007505 and records curation history. |
| data/traits/environment/aerobic.yaml | Grounds terminal electron acceptor (MF-typed) to METPO:1007504, wraps long text lines, and records curation history. |
Comments suppressed due to low confidence (1)
proposals/metpo_traitmech_v3/proposal.md:152
- The “Mappings TSV size” row appears to be using the mapping-row count (excluding the header), but the after value is listed as 49.
mappings/node_grounding.tsvcurrently has 47 mapping rows (plus 1 header row), so this should read 39 → 47 to match the actual file.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+16
to
+19
| `data/traits/**/causal_graphs[].nodes[]`. As of this proposal, | ||
| **622 of 1252 causal-graph nodes are grounded** (50%) using 39 | ||
| CHEBI / GO / ENVO / PATO mappings from #66 plus the 7 additions in | ||
| this cohort (6 METPO + 1 GO). |
This was referenced May 24, 2026
realmarcin
added a commit
that referenced
this pull request
May 24, 2026
…0 nodes) Adds 14 mappings across ENVIRONMENTAL_FACTOR (PATO + ENVO), CHEMICAL (CHEBI), and PATHWAY (GO) to ground the top remaining groundable labels in reports/node_grounding_residual.tsv after #66 (39 base) + #69 (8 METPO) + #70 (27 UniProt) + #72 (biomass retype). Additions (mappings/node_grounding.tsv, 75 → 89 rows): ENVIRONMENTAL_FACTOR (5 rows, PATO + ENVO): acidic external ph → PATO:0001428 acidic pH (5 nodes) alkaline external ph → PATO:0001429 alkaline pH (5 nodes) near-neutral external ph → PATO:0001432 neutral pH (4 nodes) very high temperature → PATO:0001637 extremely high temperature (2 nodes) high-salt environment → ENVO:01000687 saline environment (2 nodes) CHEMICAL (3 rows, CHEBI): thiosulfate → CHEBI:33542 thiosulfate(2-) (2 nodes) electron donor → CHEBI:17499 electron donor (2 nodes) organic compound → CHEBI:50860 organic molecular entity (6 nodes) PATHWAY (6 rows, GO): membrane electron transport chain → GO:0022900 ETC (3 nodes) electron transport chain → GO:0022900 (1 node) electron transport system → GO:0022900 (1 node) co2-fixation pathway → GO:0015977 carbon fixation (3 nodes) autotrophic co2 fixation → GO:0015977 (3 nodes) co2 fixation pathway → GO:0015977 (1 node) The PATHWAY additions collapse 6 distinct corpus-paraphrased labels onto 2 GO terms, demonstrating that the (label, node_type) mapping convention supports multi-label-→-one-CURIE without conflict. Per-corpus impact: Mapping TSV: 75 → 89 rows (+14) Nodes grounded: ~704 (53%) → ~744 (59%) Verified: - just ground-nodes --apply → 40 newly grounded - just ground-nodes (idempotency) → 0 - just validate-strict → 0 ERROR rows / 357 files Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
realmarcin
added a commit
that referenced
this pull request
May 24, 2026
…0 nodes) (#74) Adds 14 mappings across ENVIRONMENTAL_FACTOR (PATO + ENVO), CHEMICAL (CHEBI), and PATHWAY (GO) to ground the top remaining groundable labels in reports/node_grounding_residual.tsv after #66 (39 base) + #69 (8 METPO) + #70 (27 UniProt) + #72 (biomass retype). Additions (mappings/node_grounding.tsv, 75 → 89 rows): ENVIRONMENTAL_FACTOR (5 rows, PATO + ENVO): acidic external ph → PATO:0001428 acidic pH (5 nodes) alkaline external ph → PATO:0001429 alkaline pH (5 nodes) near-neutral external ph → PATO:0001432 neutral pH (4 nodes) very high temperature → PATO:0001637 extremely high temperature (2 nodes) high-salt environment → ENVO:01000687 saline environment (2 nodes) CHEMICAL (3 rows, CHEBI): thiosulfate → CHEBI:33542 thiosulfate(2-) (2 nodes) electron donor → CHEBI:17499 electron donor (2 nodes) organic compound → CHEBI:50860 organic molecular entity (6 nodes) PATHWAY (6 rows, GO): membrane electron transport chain → GO:0022900 ETC (3 nodes) electron transport chain → GO:0022900 (1 node) electron transport system → GO:0022900 (1 node) co2-fixation pathway → GO:0015977 carbon fixation (3 nodes) autotrophic co2 fixation → GO:0015977 (3 nodes) co2 fixation pathway → GO:0015977 (1 node) The PATHWAY additions collapse 6 distinct corpus-paraphrased labels onto 2 GO terms, demonstrating that the (label, node_type) mapping convention supports multi-label-→-one-CURIE without conflict. Per-corpus impact: Mapping TSV: 75 → 89 rows (+14) Nodes grounded: ~704 (53%) → ~744 (59%) Verified: - just ground-nodes --apply → 40 newly grounded - just ground-nodes (idempotency) → 0 - just validate-strict → 0 ERROR rows / 357 files Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
realmarcin
added a commit
that referenced
this pull request
May 24, 2026
The grounding pipelines and audit scripts have been load-bearing infrastructure for the last 7 PRs (#61, #66, #67, #69, #70 — all of which rewrite causal-graph fields based on these scripts' output). They had zero unit-test coverage. A silent regression in idempotency, header validation, or self-suppression would not be caught by validate-strict (which only checks per-record schema conformance, not pipeline correctness). Test counts: tests/test_ground_causal_predicates.py 9 tests tests/test_ground_causal_nodes.py 12 tests tests/test_validate_strict.py 11 tests tests/test_audit_writers.py 11 tests --- total new 43 tests total suite 54 tests (was 11) Coverage highlights: ground_causal_predicates.py: - load_mapping: basic happy path, conflict detection (same label → different CURIEs raises ValueError), incomplete-row skipping, missing-file error. - ground_edges_in_doc: idempotency (second pass = 0 changes), existing predicate_id never overwritten, residual counting for unmapped labels, empty/missing-predicate edges skipped. ground_causal_nodes.py: - All of the predicate suite plus: - (label, node_type) keyed lookup — same label, different node_types map to different CURIEs without aliasing. - Header validation (Copilot fix from PR #66): TSV with `nodetype` / `targetcurie` typo'd headers raises ValueError naming both missing columns. - grounded_keys-on-validation-failure separability (Copilot fix from PR #66): caller can union residual + grounded_keys to recover the corpus-state residual after rolling back an invalid file write. validate_strict.py: - classify: parametrized over the 5 categories (unexpected_field, missing_required, enum_mismatch, pattern_mismatch, other) — the messages must match the actual jsonschema phrasings the validator emits. - validate_one: clean record produces 0 errors; unknown field surfaces unexpected_field (the G01 gate behavior); missing required field surfaces missing_required; YAML parse error surfaces as yaml_parse_error category. - iter_yaml_files: walks directories, filters .txt, picks up nested *.yaml. audit_writers.py: - looks_like_yaml_writer: yaml.safe_dump / yaml.dump positive, bare .write_text negative, .write_text near .yaml hint positive, arbitrary code negative. - audit: full-safeguards writer flagged yes/yes/yes/yes; no-safeguards writer flagged no/no/no; non-writer returns None; wired_into_just yes when justfile mentions the script stem. - Self-suppression (Copilot fix from PR #64): audit_writers.py itself returns None even though its own source matches yaml.safe_dump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
realmarcin
added a commit
that referenced
this pull request
May 24, 2026
* Add tests for grounding pipeline + audit scripts (+43 tests) The grounding pipelines and audit scripts have been load-bearing infrastructure for the last 7 PRs (#61, #66, #67, #69, #70 — all of which rewrite causal-graph fields based on these scripts' output). They had zero unit-test coverage. A silent regression in idempotency, header validation, or self-suppression would not be caught by validate-strict (which only checks per-record schema conformance, not pipeline correctness). Test counts: tests/test_ground_causal_predicates.py 9 tests tests/test_ground_causal_nodes.py 12 tests tests/test_validate_strict.py 11 tests tests/test_audit_writers.py 11 tests --- total new 43 tests total suite 54 tests (was 11) Coverage highlights: ground_causal_predicates.py: - load_mapping: basic happy path, conflict detection (same label → different CURIEs raises ValueError), incomplete-row skipping, missing-file error. - ground_edges_in_doc: idempotency (second pass = 0 changes), existing predicate_id never overwritten, residual counting for unmapped labels, empty/missing-predicate edges skipped. ground_causal_nodes.py: - All of the predicate suite plus: - (label, node_type) keyed lookup — same label, different node_types map to different CURIEs without aliasing. - Header validation (Copilot fix from PR #66): TSV with `nodetype` / `targetcurie` typo'd headers raises ValueError naming both missing columns. - grounded_keys-on-validation-failure separability (Copilot fix from PR #66): caller can union residual + grounded_keys to recover the corpus-state residual after rolling back an invalid file write. validate_strict.py: - classify: parametrized over the 5 categories (unexpected_field, missing_required, enum_mismatch, pattern_mismatch, other) — the messages must match the actual jsonschema phrasings the validator emits. - validate_one: clean record produces 0 errors; unknown field surfaces unexpected_field (the G01 gate behavior); missing required field surfaces missing_required; YAML parse error surfaces as yaml_parse_error category. - iter_yaml_files: walks directories, filters .txt, picks up nested *.yaml. audit_writers.py: - looks_like_yaml_writer: yaml.safe_dump / yaml.dump positive, bare .write_text negative, .write_text near .yaml hint positive, arbitrary code negative. - audit: full-safeguards writer flagged yes/yes/yes/yes; no-safeguards writer flagged no/no/no; non-writer returns None; wired_into_just yes when justfile mentions the script stem. - Self-suppression (Copilot fix from PR #64): audit_writers.py itself returns None even though its own source matches yaml.safe_dump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Address Copilot review on PR #71 Add explicit `assert "b.yml" not in names` to test_iter_yaml_files_walks_directory_and_filters — the prior test documented the .yml-skipping behavior in a comment but never asserted it, so a regression that started picking up .yml during directory walks would have slipped through silently. Also add test_iter_yaml_files_accepts_yml_file_passed_directly to lock in the asymmetry that the previous test only hinted at: iter_yaml_files() does accept .yml when passed as a file argument (only the rglob('*.yaml') walk is .yaml-only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Update audit_writers tests to match #75's tightened heuristic PR #75 changed `looks_like_yaml_writer` to require that the yaml-serializer call feed directly into write_text on the same line (instead of the looser "any .write_text + any .yaml token" heuristic, which produced false positives for scripts that only READ trait YAMLs). The pre-#75 test asserted that `path.write_text(content) # .yaml` counted as a YAML writer. That returned True under the old heuristic and False under the new (correct) one. Replace it with two tests that lock in the new contract: test_looks_like_yaml_writer_write_text_of_yaml_dump Positive: write_text(yaml.safe_dump(...)) / write_text(yaml.dump(...)) both count. test_looks_like_yaml_writer_write_text_of_json_is_false Negative: a script that reads *.yaml then writes JSON via write_text is NOT a YAML writer — this is the false-positive case #75 explicitly fixed for scripts/build_embedding_index.py and scripts/render_trait_pages.py. Also rename test_looks_like_yaml_writer_write_text_without_yaml_hint_is_false to test_looks_like_yaml_writer_write_text_plain_is_false since the "yaml hint" phrasing was tied to the old heuristic. 56 tests pass (was 54; +2). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Scope-D follow-on to v1 (causal-graph scaffolding, #63) and v2 (causal-graph predicates, #65). Lifts 6 microbe-trait-specific node-class abstractions (
METPO:1007500–METPO:1007505) that the node-grounding pipeline in #66 cannot reach because no clean CHEBI / GO / ENVO / PRO upstream home exists.Not filed upstream in this PR — cohort lives in this repo only, matching the v1+v2 pattern.
Cohort (
proposals/metpo_traitmech_v3/)METPO:1007500METPO:1007501METPO:1007502METPO:1007503METPO:1007504METPO:1007505The 6 classes ground 54 nodes; an accompanying GO mapping (
carotenoid biosynthesis→GO:0016117) grounds 4 more, for 58 newly-grounded nodes total. Each candidate's rejected upstream alternative is documented per-row inproposal.md.Four of these are mis-typed in the corpus (PMF/biomass/membrane fluidity as
BIOLOGICAL_PROCESSthough semantically a state / material entity / quality; reducing power asCHEMICALthough semantically a capacity). The mapping TSV grounds them at the current typing so the type-cleanup migration can land separately.Mapping expansion
mappings/node_grounding.tsvgrows 39 → 47 rows:METPO:1007500–METPO:1007505)terminal electron acceptorasMOLECULAR_FUNCTIONcarotenoid biosynthesis→GO:0016117)Corpus impact
Copilot fixes (from PR #68)
dispoportionation→disproportionation.Verification (local)
Test plan
🤖 Generated with Claude Code