Skip to content

METPO node-class proposal cohort metpo_traitmech_v3 (6 classes, local-only)#69

Merged
realmarcin merged 1 commit into
mainfrom
metpo-proposal-traitmech-v3
May 24, 2026
Merged

METPO node-class proposal cohort metpo_traitmech_v3 (6 classes, local-only)#69
realmarcin merged 1 commit into
mainfrom
metpo-proposal-traitmech-v3

Conversation

@realmarcin
Copy link
Copy Markdown
Contributor

Replaces #68 (auto-closed when its stacked base branch ground-causal-nodes-v1 was deleted on merge of #66). Same content + Copilot review fixes applied.

Summary

Scope-D follow-on to v1 (causal-graph scaffolding, #63) and v2 (causal-graph predicates, #65). Lifts 6 microbe-trait-specific node-class abstractions (METPO:1007500METPO:1007505) that the node-grounding pipeline in #66 cannot reach because no clean CHEBI / GO / ENVO / PRO upstream home exists.

Not filed upstream in this PR — cohort lives in this repo only, matching the v1+v2 pattern.

Cohort (proposals/metpo_traitmech_v3/)

METPO ID Label Nodes Cluster
METPO:1007500 proton motive force 16 bioenergetic state — electrochemical proton gradient
METPO:1007501 microbial biomass 16 material entity — carbon-assimilation sink
METPO:1007502 inorganic electron donor 7 class-of-chemicals — lithotrophic substrates
METPO:1007503 reducing power 4 metabolic capacity — reduced-cofactor pool
METPO:1007504 terminal electron acceptor 5 role-of-chemical — end of ETC (dual CHEMICAL + MF node-type uses)
METPO:1007505 membrane fluidity 6 membrane quality — modulated by temperature / lipid composition

The 6 classes ground 54 nodes; an accompanying GO mapping (carotenoid biosynthesisGO:0016117) grounds 4 more, for 58 newly-grounded nodes total. Each candidate's rejected upstream alternative is documented per-row in proposal.md.

Four of these are mis-typed in the corpus (PMF/biomass/membrane fluidity as BIOLOGICAL_PROCESS though semantically a state / material entity / quality; reducing power as CHEMICAL though semantically a capacity). The mapping TSV grounds them at the current typing so the type-cleanup migration can land separately.

Mapping expansion

mappings/node_grounding.tsv grows 39 → 47 rows:

  • 6 new METPO mappings (METPO:1007500METPO:1007505)
  • 1 dual-typing alias for terminal electron acceptor as MOLECULAR_FUNCTION
  • 1 GO mapping (carotenoid biosynthesisGO:0016117)

Corpus impact

Before v3 After v3
Nodes grounded 564 622 (+58, 45% → 50%)
Nodes residual 688 630 (−58)
Distinct (label, type) residuals 511 503 (−8)

Copilot fixes (from PR #68)

  1. Mapping-row count corrected: "7 additions" → 8 rows (6 METPO + 1 GO + 1 dual-typing alias).
  2. Mis-typed-node count corrected: "Three of these" → "Four" (matches the 4-item list).
  3. Typo: dispoportionationdisproportionation.

Verification (local)

$ just verify-proposal metpo_traitmech_v3
  failures: 0
  status:   PASS

$ just robot-validate-proposal metpo_traitmech_v3
  merged.owl lines:    8425
  reasoned.owl lines:  8431
  delta:               +6
  status:              PASS (no UNSAT, ELK exited 0)

$ just validate-strict
  files scanned:      357
  files with ERROR:   0

Test plan

  • verify-proposal PASS
  • robot-validate-proposal PASS
  • validate-strict clean
  • Idempotency: ground-nodes --apply (2nd run) → 0 additional groundings
  • CI re-runs validate-strict on YAML diff

🤖 Generated with Claude Code

Scope-D follow-on to v1 (causal-graph scaffolding) and v2
(causal-graph predicates). Lifts 6 microbe-trait-specific
node-class abstractions that the node-grounding pipeline (#66)
cannot reach because no clean CHEBI/GO/ENVO/PRO upstream home
exists.

Cohort (proposals/metpo_traitmech_v3/, 6 class rows):
- METPO:1007500 proton motive force        (16 nodes)
- METPO:1007501 microbial biomass          (16 nodes)
- METPO:1007502 inorganic electron donor   (7 nodes)
- METPO:1007503 reducing power             (4 nodes)
- METPO:1007504 terminal electron acceptor (5 nodes, dual CHEMICAL+MF typing)
- METPO:1007505 membrane fluidity          (6 nodes)

The 6 classes ground 54 nodes; an accompanying GO mapping
(carotenoid biosynthesis → GO:0016117) grounds 4 more for a total
of 58 newly-grounded nodes. Each candidate's rejected upstream
alternative documented per-row in proposal.md (PATO viscosity is
generic, CHEBI lacks class-of role for electron-donor, GO has the
PMF-generation process but not the gradient-as-state).

Four of these are mis-typed in the corpus (PMF/biomass/membrane
fluidity as BIOLOGICAL_PROCESS though semantically a state /
material entity / quality; reducing power as CHEMICAL though
semantically a capacity). The mapping TSV grounds them at the
current typing so the type-cleanup migration can land separately.

ID block: METPO:1007500-1007505 (fresh 1007500+ block per skill).
Subset tag: metpo_traitmech_2026_07. All parented to METPO:1000000.

Mapping additions (mappings/node_grounding.tsv, 39 → 47 rows):
- 6 new METPO mappings (1007500-1007505)
- 1 dual-typing alias for terminal electron acceptor as MOLECULAR_FUNCTION
- 1 GO mapping for carotenoid biosynthesis → GO:0016117

Per-corpus impact:
  Nodes grounded:   564 → 622 (+58, 45% → 50%)
  Nodes residual:   688 → 630 (-58)
  Distinct keys:    511 → 503 (-8)

NOT filed upstream in this PR (per established convention from
v1+v2). Cohort committed in-repo only.

Addresses Copilot review on the original PR #68 branch:
- Mapping-row count corrected from "7" / "10" to actual 8 rows
  (6 METPO + 1 GO + 1 dual-typing alias).
- Mis-typed-node count corrected from "Three" to "Four" (the
  four-item list now matches).
- Typo "dispoportionation" → "disproportionation".

Verified locally:
  - just verify-proposal metpo_traitmech_v3 → PASS (0 failures)
  - just robot-validate-proposal metpo_traitmech_v3 → PASS, ELK +6
  - just validate-strict → 0 ERROR rows / 357 files
  - just ground-nodes --apply (second run) → 0 additional groundings (idempotent)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 24, 2026 04:43
@realmarcin realmarcin merged commit e713b68 into main May 24, 2026
3 checks passed
@realmarcin realmarcin deleted the metpo-proposal-traitmech-v3 branch May 24, 2026 04:44
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new local-only METPO proposal cohort (metpo_traitmech_v3) to lift 6 microbe-trait-specific node-class abstractions into METPO placeholder IDs, and expands node-grounding mappings to ground previously residual causal-graph nodes across the TraitMech corpus.

Changes:

  • Add proposals/metpo_traitmech_v3/ (narrative + ROBOT-template TSV) proposing 6 new METPO classes (METPO:1007500METPO:1007505).
  • Extend mappings/node_grounding.tsv with 8 new mapping rows (6 METPO + 1 GO + 1 dual-typing alias) and apply them across trait YAML causal-graph nodes.
  • Update reports/node_grounding_residual.tsv to reflect newly grounded residual clusters.

Reviewed changes

Copilot reviewed 46 out of 46 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
reports/node_grounding_residual.tsv Removes now-grounded residual keys from the residual report.
proposals/metpo_traitmech_v3/proposal.md Documents rationale/scope/impact for the v3 METPO node-class cohort.
proposals/metpo_traitmech_v3/metpo_proposal_classes_robot.tsv ROBOT-template TSV defining the 6 proposed METPO classes.
mappings/node_grounding.tsv Adds GO + METPO mappings (incl. dual-typed terminal electron acceptor) for node grounding.
data/traits/physiology/phototrophic.yaml Grounds the reducing power causal node to METPO:1007503 and records curation history.
data/traits/physiology/photoorganoheterotrophic.yaml Grounds proton motive force/biomass causal nodes to METPO and records curation history.
data/traits/physiology/photolithotrophic.yaml Grounds inorganic electron donor/biomass causal nodes to METPO and records curation history.
data/traits/physiology/photolithoautotrophic.yaml Grounds inorganic electron donor/reducing power/biomass causal nodes to METPO and records curation history.
data/traits/physiology/photoheterotrophic.yaml Grounds biomass causal node to METPO:1007501 and records curation history.
data/traits/physiology/photoautotrophic.yaml Grounds biomass causal node to METPO:1007501 and records curation history.
data/traits/physiology/organotrophic.yaml Grounds proton motive force causal node to METPO:1007500 and records curation history.
data/traits/physiology/organoheterotrophic.yaml Grounds biomass causal node to METPO:1007501 and records curation history.
data/traits/physiology/mixotrophic.yaml Grounds biomass causal node to METPO:1007501 and records curation history.
data/traits/physiology/lithotrophic.yaml Grounds inorganic electron donor and proton motive force causal nodes to METPO and records curation history.
data/traits/physiology/lithoheterotrophic.yaml Grounds inorganic electron donor/biomass causal nodes to METPO and records curation history.
data/traits/physiology/lithoautotrophic.yaml Grounds inorganic electron donor/reducing power causal nodes to METPO and records curation history.
data/traits/physiology/hydrogenotrophic.yaml Grounds proton motive force causal node to METPO:1007500 and records curation history.
data/traits/physiology/heterotrophic.yaml Grounds biomass causal node to METPO:1007501 and records curation history.
data/traits/physiology/chemotrophic.yaml Grounds terminal electron acceptor/proton motive force causal nodes to METPO and records curation history.
data/traits/physiology/chemoorganotrophic.yaml Grounds terminal electron acceptor/proton motive force/biomass causal nodes to METPO and records curation history.
data/traits/physiology/chemoorganoheterotrophic.yaml Grounds biomass causal node to METPO:1007501 and records curation history.
data/traits/physiology/chemolithoheterotrophic.yaml Grounds proton motive force/biomass causal nodes to METPO and records curation history.
data/traits/physiology/chemolithoautotrophic.yaml Grounds inorganic electron donor/proton motive force/reducing power/biomass causal nodes to METPO and records curation history.
data/traits/physiology/chemoheterotrophic.yaml Grounds biomass causal node to METPO:1007501 and records curation history.
data/traits/physiology/chemoautolithotrophic.yaml Grounds inorganic electron donor/biomass causal nodes to METPO and records curation history.
data/traits/physiology/carboxydotrophic.yaml Grounds proton motive force causal node to METPO:1007500 and records curation history.
data/traits/morphology/yellow_pigmented.yaml Grounds carotenoid biosynthesis causal node to GO:0016117 and records curation history.
data/traits/morphology/pink_pigmented.yaml Grounds carotenoid biosynthesis causal node to GO:0016117 and records curation history.
data/traits/morphology/orange_pigmented.yaml Grounds carotenoid biosynthesis causal node to GO:0016117 and records curation history.
data/traits/morphology/carotenoid_pigmentation.yaml Grounds carotenoid biosynthesis causal node to GO:0016117 and records curation history.
data/traits/morphology/gliding.yaml Grounds proton motive force causal node to METPO:1007500 and records curation history.
data/traits/metabolism/respiration.yaml Grounds terminal electron acceptor/proton motive force causal nodes to METPO and records curation history.
data/traits/metabolism/oxidative_phosphorylation.yaml Grounds proton motive force causal node to METPO:1007500 and records curation history.
data/traits/metabolism/anaerobic_respiration.yaml Grounds terminal electron acceptor (MF-typed) to METPO:1007504, wraps long text lines, and records curation history.
data/traits/metabolism/aerobic_respiration.yaml Grounds proton motive force causal node to METPO:1007500 and records curation history.
data/traits/environment/temperature_preference.yaml Grounds membrane fluidity causal node to METPO:1007505 and records curation history.
data/traits/environment/temperature_optimum.yaml Grounds membrane fluidity causal node to METPO:1007505 and records curation history.
data/traits/environment/psychrotolerant.yaml Grounds membrane fluidity causal node to METPO:1007505 and records curation history.
data/traits/environment/psychrophilic.yaml Grounds membrane fluidity causal node to METPO:1007505 and records curation history.
data/traits/environment/ph_optimum.yaml Grounds proton motive force causal node to METPO:1007500 and records curation history.
data/traits/environment/obligately_aerobic.yaml Grounds proton motive force causal node to METPO:1007500 and records curation history.
data/traits/environment/neutrophilic.yaml Grounds proton motive force causal node to METPO:1007500 and records curation history.
data/traits/environment/mesophilic.yaml Grounds membrane fluidity causal node to METPO:1007505 and records curation history.
data/traits/environment/facultative_psychrophilic.yaml Grounds membrane fluidity causal node to METPO:1007505 and records curation history.
data/traits/environment/aerobic.yaml Grounds terminal electron acceptor (MF-typed) to METPO:1007504, wraps long text lines, and records curation history.
Comments suppressed due to low confidence (1)

proposals/metpo_traitmech_v3/proposal.md:152

  • The “Mappings TSV size” row appears to be using the mapping-row count (excluding the header), but the after value is listed as 49. mappings/node_grounding.tsv currently has 47 mapping rows (plus 1 header row), so this should read 39 → 47 to match the actual file.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +16 to +19
`data/traits/**/causal_graphs[].nodes[]`. As of this proposal,
**622 of 1252 causal-graph nodes are grounded** (50%) using 39
CHEBI / GO / ENVO / PATO mappings from #66 plus the 7 additions in
this cohort (6 METPO + 1 GO).
realmarcin added a commit that referenced this pull request May 24, 2026
…0 nodes)

Adds 14 mappings across ENVIRONMENTAL_FACTOR (PATO + ENVO),
CHEMICAL (CHEBI), and PATHWAY (GO) to ground the top remaining
groundable labels in reports/node_grounding_residual.tsv after
#66 (39 base) + #69 (8 METPO) + #70 (27 UniProt) + #72 (biomass
retype).

Additions (mappings/node_grounding.tsv, 75 → 89 rows):

  ENVIRONMENTAL_FACTOR (5 rows, PATO + ENVO):
    acidic external ph        → PATO:0001428  acidic pH        (5 nodes)
    alkaline external ph      → PATO:0001429  alkaline pH      (5 nodes)
    near-neutral external ph  → PATO:0001432  neutral pH       (4 nodes)
    very high temperature     → PATO:0001637  extremely high temperature (2 nodes)
    high-salt environment     → ENVO:01000687 saline environment (2 nodes)

  CHEMICAL (3 rows, CHEBI):
    thiosulfate               → CHEBI:33542  thiosulfate(2-)        (2 nodes)
    electron donor            → CHEBI:17499  electron donor          (2 nodes)
    organic compound          → CHEBI:50860  organic molecular entity (6 nodes)

  PATHWAY (6 rows, GO):
    membrane electron transport chain → GO:0022900 ETC (3 nodes)
    electron transport chain          → GO:0022900     (1 node)
    electron transport system         → GO:0022900     (1 node)
    co2-fixation pathway              → GO:0015977 carbon fixation (3 nodes)
    autotrophic co2 fixation          → GO:0015977     (3 nodes)
    co2 fixation pathway              → GO:0015977     (1 node)

The PATHWAY additions collapse 6 distinct corpus-paraphrased
labels onto 2 GO terms, demonstrating that the
(label, node_type) mapping convention supports
multi-label-→-one-CURIE without conflict.

Per-corpus impact:
  Mapping TSV:      75 → 89 rows (+14)
  Nodes grounded:   ~704 (53%) → ~744 (59%)

Verified:
  - just ground-nodes --apply → 40 newly grounded
  - just ground-nodes (idempotency) → 0
  - just validate-strict → 0 ERROR rows / 357 files

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
realmarcin added a commit that referenced this pull request May 24, 2026
…0 nodes) (#74)

Adds 14 mappings across ENVIRONMENTAL_FACTOR (PATO + ENVO),
CHEMICAL (CHEBI), and PATHWAY (GO) to ground the top remaining
groundable labels in reports/node_grounding_residual.tsv after
#66 (39 base) + #69 (8 METPO) + #70 (27 UniProt) + #72 (biomass
retype).

Additions (mappings/node_grounding.tsv, 75 → 89 rows):

  ENVIRONMENTAL_FACTOR (5 rows, PATO + ENVO):
    acidic external ph        → PATO:0001428  acidic pH        (5 nodes)
    alkaline external ph      → PATO:0001429  alkaline pH      (5 nodes)
    near-neutral external ph  → PATO:0001432  neutral pH       (4 nodes)
    very high temperature     → PATO:0001637  extremely high temperature (2 nodes)
    high-salt environment     → ENVO:01000687 saline environment (2 nodes)

  CHEMICAL (3 rows, CHEBI):
    thiosulfate               → CHEBI:33542  thiosulfate(2-)        (2 nodes)
    electron donor            → CHEBI:17499  electron donor          (2 nodes)
    organic compound          → CHEBI:50860  organic molecular entity (6 nodes)

  PATHWAY (6 rows, GO):
    membrane electron transport chain → GO:0022900 ETC (3 nodes)
    electron transport chain          → GO:0022900     (1 node)
    electron transport system         → GO:0022900     (1 node)
    co2-fixation pathway              → GO:0015977 carbon fixation (3 nodes)
    autotrophic co2 fixation          → GO:0015977     (3 nodes)
    co2 fixation pathway              → GO:0015977     (1 node)

The PATHWAY additions collapse 6 distinct corpus-paraphrased
labels onto 2 GO terms, demonstrating that the
(label, node_type) mapping convention supports
multi-label-→-one-CURIE without conflict.

Per-corpus impact:
  Mapping TSV:      75 → 89 rows (+14)
  Nodes grounded:   ~704 (53%) → ~744 (59%)

Verified:
  - just ground-nodes --apply → 40 newly grounded
  - just ground-nodes (idempotency) → 0
  - just validate-strict → 0 ERROR rows / 357 files

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
realmarcin added a commit that referenced this pull request May 24, 2026
The grounding pipelines and audit scripts have been load-bearing
infrastructure for the last 7 PRs (#61, #66, #67, #69, #70 — all
of which rewrite causal-graph fields based on these scripts'
output). They had zero unit-test coverage. A silent regression in
idempotency, header validation, or self-suppression would not be
caught by validate-strict (which only checks per-record schema
conformance, not pipeline correctness).

Test counts:
  tests/test_ground_causal_predicates.py    9 tests
  tests/test_ground_causal_nodes.py        12 tests
  tests/test_validate_strict.py            11 tests
  tests/test_audit_writers.py              11 tests
  ---
  total new                                43 tests
  total suite                              54 tests (was 11)

Coverage highlights:

ground_causal_predicates.py:
- load_mapping: basic happy path, conflict detection (same label →
  different CURIEs raises ValueError), incomplete-row skipping,
  missing-file error.
- ground_edges_in_doc: idempotency (second pass = 0 changes),
  existing predicate_id never overwritten, residual counting for
  unmapped labels, empty/missing-predicate edges skipped.

ground_causal_nodes.py:
- All of the predicate suite plus:
- (label, node_type) keyed lookup — same label, different node_types
  map to different CURIEs without aliasing.
- Header validation (Copilot fix from PR #66): TSV with `nodetype`
  / `targetcurie` typo'd headers raises ValueError naming both
  missing columns.
- grounded_keys-on-validation-failure separability (Copilot fix
  from PR #66): caller can union residual + grounded_keys to
  recover the corpus-state residual after rolling back an invalid
  file write.

validate_strict.py:
- classify: parametrized over the 5 categories
  (unexpected_field, missing_required, enum_mismatch,
  pattern_mismatch, other) — the messages must match the actual
  jsonschema phrasings the validator emits.
- validate_one: clean record produces 0 errors; unknown field
  surfaces unexpected_field (the G01 gate behavior); missing
  required field surfaces missing_required; YAML parse error
  surfaces as yaml_parse_error category.
- iter_yaml_files: walks directories, filters .txt, picks up
  nested *.yaml.

audit_writers.py:
- looks_like_yaml_writer: yaml.safe_dump / yaml.dump positive,
  bare .write_text negative, .write_text near .yaml hint positive,
  arbitrary code negative.
- audit: full-safeguards writer flagged yes/yes/yes/yes;
  no-safeguards writer flagged no/no/no; non-writer returns None;
  wired_into_just yes when justfile mentions the script stem.
- Self-suppression (Copilot fix from PR #64): audit_writers.py
  itself returns None even though its own source matches
  yaml.safe_dump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
realmarcin added a commit that referenced this pull request May 24, 2026
* Add tests for grounding pipeline + audit scripts (+43 tests)

The grounding pipelines and audit scripts have been load-bearing
infrastructure for the last 7 PRs (#61, #66, #67, #69, #70 — all
of which rewrite causal-graph fields based on these scripts'
output). They had zero unit-test coverage. A silent regression in
idempotency, header validation, or self-suppression would not be
caught by validate-strict (which only checks per-record schema
conformance, not pipeline correctness).

Test counts:
  tests/test_ground_causal_predicates.py    9 tests
  tests/test_ground_causal_nodes.py        12 tests
  tests/test_validate_strict.py            11 tests
  tests/test_audit_writers.py              11 tests
  ---
  total new                                43 tests
  total suite                              54 tests (was 11)

Coverage highlights:

ground_causal_predicates.py:
- load_mapping: basic happy path, conflict detection (same label →
  different CURIEs raises ValueError), incomplete-row skipping,
  missing-file error.
- ground_edges_in_doc: idempotency (second pass = 0 changes),
  existing predicate_id never overwritten, residual counting for
  unmapped labels, empty/missing-predicate edges skipped.

ground_causal_nodes.py:
- All of the predicate suite plus:
- (label, node_type) keyed lookup — same label, different node_types
  map to different CURIEs without aliasing.
- Header validation (Copilot fix from PR #66): TSV with `nodetype`
  / `targetcurie` typo'd headers raises ValueError naming both
  missing columns.
- grounded_keys-on-validation-failure separability (Copilot fix
  from PR #66): caller can union residual + grounded_keys to
  recover the corpus-state residual after rolling back an invalid
  file write.

validate_strict.py:
- classify: parametrized over the 5 categories
  (unexpected_field, missing_required, enum_mismatch,
  pattern_mismatch, other) — the messages must match the actual
  jsonschema phrasings the validator emits.
- validate_one: clean record produces 0 errors; unknown field
  surfaces unexpected_field (the G01 gate behavior); missing
  required field surfaces missing_required; YAML parse error
  surfaces as yaml_parse_error category.
- iter_yaml_files: walks directories, filters .txt, picks up
  nested *.yaml.

audit_writers.py:
- looks_like_yaml_writer: yaml.safe_dump / yaml.dump positive,
  bare .write_text negative, .write_text near .yaml hint positive,
  arbitrary code negative.
- audit: full-safeguards writer flagged yes/yes/yes/yes;
  no-safeguards writer flagged no/no/no; non-writer returns None;
  wired_into_just yes when justfile mentions the script stem.
- Self-suppression (Copilot fix from PR #64): audit_writers.py
  itself returns None even though its own source matches
  yaml.safe_dump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address Copilot review on PR #71

Add explicit `assert "b.yml" not in names` to
test_iter_yaml_files_walks_directory_and_filters — the prior test
documented the .yml-skipping behavior in a comment but never
asserted it, so a regression that started picking up .yml during
directory walks would have slipped through silently.

Also add test_iter_yaml_files_accepts_yml_file_passed_directly
to lock in the asymmetry that the previous test only hinted at:
iter_yaml_files() does accept .yml when passed as a file argument
(only the rglob('*.yaml') walk is .yaml-only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Update audit_writers tests to match #75's tightened heuristic

PR #75 changed `looks_like_yaml_writer` to require that the
yaml-serializer call feed directly into write_text on the same
line (instead of the looser "any .write_text + any .yaml token"
heuristic, which produced false positives for scripts that only
READ trait YAMLs).

The pre-#75 test asserted that
`path.write_text(content)  # .yaml` counted as a YAML writer.
That returned True under the old heuristic and False under the
new (correct) one. Replace it with two tests that lock in the
new contract:

  test_looks_like_yaml_writer_write_text_of_yaml_dump
    Positive: write_text(yaml.safe_dump(...)) / write_text(yaml.dump(...))
    both count.

  test_looks_like_yaml_writer_write_text_of_json_is_false
    Negative: a script that reads *.yaml then writes JSON via
    write_text is NOT a YAML writer — this is the false-positive
    case #75 explicitly fixed for scripts/build_embedding_index.py
    and scripts/render_trait_pages.py.

Also rename test_looks_like_yaml_writer_write_text_without_yaml_hint_is_false
to test_looks_like_yaml_writer_write_text_plain_is_false since the
"yaml hint" phrasing was tied to the old heuristic.

56 tests pass (was 54; +2).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants