Skip to content

Audit backlog cleanup: G02, G03, G04, G05#64

Merged
realmarcin merged 2 commits into
mainfrom
audit-backlog-g02-g05
May 24, 2026
Merged

Audit backlog cleanup: G02, G03, G04, G05#64
realmarcin merged 2 commits into
mainfrom
audit-backlog-g02-g05

Conversation

@realmarcin
Copy link
Copy Markdown
Contributor

Summary

Closes four tier-2/tier-3 items from reports/gap_fix_backlog.md, all unblocked by the validate-strict CI gate (#62).

Item What changed Effort
G02 just validate-all now delegates to validate-strict (was open-mode linkml-validate per file via xargs, silently passing unknown fields). README updated. S
G03 scripts/seed_from_metpo.py validates each record with a process-local closed-mode Validator before path.write_text. Invalid records are logged and skipped per-record (does not abort the run); exit is non-zero if any were skipped. M
G04 Both TraitRecord.evidence and CausalEdge.evidence descriptions now explain the required: asymmetry rationale — trait evidence inherits METPO provenance (optional), causal-edge evidence is curator-asserted (required). Pure description change. S
G05 scripts/audit_writers.py suppresses self-match via path-identity check at top of audit(). Auditor goes from 5 → 4 writers. S

Side effect of G03

The seeder now matches the auditor's validates_before_write regex, so seed_from_metpo.py flips from no to yes in the audit-writers TSV — 4/4 writers (after self-suppression) now have validation safeguards.

Verification (local)

$ just validate-all
=== validate-strict summary ===
  files scanned:      357
  files with ERROR:   0

$ uv run python3 scripts/seed_from_metpo.py
  TOTAL                      354
Mode:                       DRY-RUN
# (no SKIP (invalid) rows — every constructed record validates)

$ just audit-writers
=== writers audit summary (4 writers) ===
  validates before write:     2 / 4
scripts/seed_from_metpo.py	yes	yes	yes	yes	yes

$ just validate-strict
  files with ERROR:   0

Test plan

  • just validate-all exits 0 (delegating to validate-strict)
  • Seeder dry-run validates 354 records, no SKIP (invalid)
  • Auditor: 5 → 4 rows; seeder now validates_before_write: yes
  • just validate-strict 0 ERROR / 357 files after schema description edits
  • CI re-runs validate-strict on schema edit + seeder change

🤖 Generated with Claude Code

Closes four tier-2/tier-3 items from reports/gap_fix_backlog.md
unblocked by the validate-strict CI gate (#62).

G02 — `just validate-all` delegates to `validate-strict`
  Previous implementation ran `linkml-validate` per-file via xargs in
  open mode and silently passed unknown fields. Now it's a thin
  wrapper around the strict closed-mode harness; README updated to
  describe the actual behavior.

G03 — `scripts/seed_from_metpo.py` validates before write
  Adds a process-local closed-mode Validator built lazily on first
  use. Each constructed record is validated before `path.write_text`;
  invalid records are logged and skipped (per record, not aborting
  the whole run — the seed touches hundreds of records). Exit code
  is non-zero if any record was skipped as invalid.

G04 — document the `evidence` required: asymmetry
  Expanded descriptions on both `TraitRecord.evidence` (optional —
  inherits METPO provenance) and `CausalEdge.evidence` (required —
  causal claims are curator-asserted and must carry their own
  citation). Pure description change; no schema-shape diff.

G05 — `scripts/audit_writers.py` suppresses self-match
  The auditor's own regex source contains `yaml.safe_dump`, so it
  appeared in its own output. Suppressed via path-identity check at
  the top of `audit()`. Auditor now reports 4 writers (was 5).

Side effect of G03: the seeder now matches the auditor's
`validates_before_write` regex, so `seed_from_metpo.py` flips from
`no` to `yes` in the audit-writers TSV (3/4 → 4/4 of validating
writers, counting after self-suppression).

Verified locally:
  - `just validate-all` → exits 0 (delegates to validate-strict)
  - `uv run python3 scripts/seed_from_metpo.py` (dry-run) → all 354
    candidate records validate clean, no SKIP (invalid) rows
  - `just audit-writers` → 4 rows, seed_from_metpo flagged as
    `validates_before_write: yes`
  - `just validate-strict` → 0 ERROR rows / 357 files (schema
    description edits don't change validation shape)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 23, 2026 21:24
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR closes several audit backlog items by tightening validation behavior across the repo: just validate-all now runs the closed-mode validator, the METPO seeder validates each generated record before writing, schema slot descriptions clarify evidence requirements, and the writer-auditor no longer reports itself as a writer.

Changes:

  • Make just validate-all delegate to validate-strict (closed-mode, fails on unknown fields / missing required attributes).
  • Add per-record LinkML closed-mode validation to scripts/seed_from_metpo.py, skipping invalid records and exiting non-zero if any were skipped.
  • Clarify schema evidence rationale and suppress scripts/audit_writers.py self-matching.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/traitmech/schema/traitmech.yaml Expands descriptions explaining why TraitRecord.evidence is optional but CausalEdge.evidence is required.
scripts/seed_from_metpo.py Adds an in-process closed-mode validator and validates each generated record before writing; skips invalid records.
scripts/audit_writers.py Prevents the auditor from including itself in the writer audit output.
README.md Updates documentation to reflect validate-all delegating to validate-strict and the stricter failure behavior.
justfile Replaces the old per-file open-mode validate-all with a delegation to validate-strict.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/seed_from_metpo.py Outdated
Defensive path handling in seed_from_metpo.py: fall back to str(path)
when --out is set outside the repo root (otherwise path.relative_to
raises ValueError exactly when an invalid record is encountered,
turning a per-record skip into a crash).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@realmarcin realmarcin merged commit 257c9d6 into main May 24, 2026
1 check passed
@realmarcin realmarcin deleted the audit-backlog-g02-g05 branch May 24, 2026 03:55
realmarcin added a commit that referenced this pull request May 24, 2026
The grounding pipelines and audit scripts have been load-bearing
infrastructure for the last 7 PRs (#61, #66, #67, #69, #70 — all
of which rewrite causal-graph fields based on these scripts'
output). They had zero unit-test coverage. A silent regression in
idempotency, header validation, or self-suppression would not be
caught by validate-strict (which only checks per-record schema
conformance, not pipeline correctness).

Test counts:
  tests/test_ground_causal_predicates.py    9 tests
  tests/test_ground_causal_nodes.py        12 tests
  tests/test_validate_strict.py            11 tests
  tests/test_audit_writers.py              11 tests
  ---
  total new                                43 tests
  total suite                              54 tests (was 11)

Coverage highlights:

ground_causal_predicates.py:
- load_mapping: basic happy path, conflict detection (same label →
  different CURIEs raises ValueError), incomplete-row skipping,
  missing-file error.
- ground_edges_in_doc: idempotency (second pass = 0 changes),
  existing predicate_id never overwritten, residual counting for
  unmapped labels, empty/missing-predicate edges skipped.

ground_causal_nodes.py:
- All of the predicate suite plus:
- (label, node_type) keyed lookup — same label, different node_types
  map to different CURIEs without aliasing.
- Header validation (Copilot fix from PR #66): TSV with `nodetype`
  / `targetcurie` typo'd headers raises ValueError naming both
  missing columns.
- grounded_keys-on-validation-failure separability (Copilot fix
  from PR #66): caller can union residual + grounded_keys to
  recover the corpus-state residual after rolling back an invalid
  file write.

validate_strict.py:
- classify: parametrized over the 5 categories
  (unexpected_field, missing_required, enum_mismatch,
  pattern_mismatch, other) — the messages must match the actual
  jsonschema phrasings the validator emits.
- validate_one: clean record produces 0 errors; unknown field
  surfaces unexpected_field (the G01 gate behavior); missing
  required field surfaces missing_required; YAML parse error
  surfaces as yaml_parse_error category.
- iter_yaml_files: walks directories, filters .txt, picks up
  nested *.yaml.

audit_writers.py:
- looks_like_yaml_writer: yaml.safe_dump / yaml.dump positive,
  bare .write_text negative, .write_text near .yaml hint positive,
  arbitrary code negative.
- audit: full-safeguards writer flagged yes/yes/yes/yes;
  no-safeguards writer flagged no/no/no; non-writer returns None;
  wired_into_just yes when justfile mentions the script stem.
- Self-suppression (Copilot fix from PR #64): audit_writers.py
  itself returns None even though its own source matches
  yaml.safe_dump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
realmarcin added a commit that referenced this pull request May 24, 2026
* Add tests for grounding pipeline + audit scripts (+43 tests)

The grounding pipelines and audit scripts have been load-bearing
infrastructure for the last 7 PRs (#61, #66, #67, #69, #70 — all
of which rewrite causal-graph fields based on these scripts'
output). They had zero unit-test coverage. A silent regression in
idempotency, header validation, or self-suppression would not be
caught by validate-strict (which only checks per-record schema
conformance, not pipeline correctness).

Test counts:
  tests/test_ground_causal_predicates.py    9 tests
  tests/test_ground_causal_nodes.py        12 tests
  tests/test_validate_strict.py            11 tests
  tests/test_audit_writers.py              11 tests
  ---
  total new                                43 tests
  total suite                              54 tests (was 11)

Coverage highlights:

ground_causal_predicates.py:
- load_mapping: basic happy path, conflict detection (same label →
  different CURIEs raises ValueError), incomplete-row skipping,
  missing-file error.
- ground_edges_in_doc: idempotency (second pass = 0 changes),
  existing predicate_id never overwritten, residual counting for
  unmapped labels, empty/missing-predicate edges skipped.

ground_causal_nodes.py:
- All of the predicate suite plus:
- (label, node_type) keyed lookup — same label, different node_types
  map to different CURIEs without aliasing.
- Header validation (Copilot fix from PR #66): TSV with `nodetype`
  / `targetcurie` typo'd headers raises ValueError naming both
  missing columns.
- grounded_keys-on-validation-failure separability (Copilot fix
  from PR #66): caller can union residual + grounded_keys to
  recover the corpus-state residual after rolling back an invalid
  file write.

validate_strict.py:
- classify: parametrized over the 5 categories
  (unexpected_field, missing_required, enum_mismatch,
  pattern_mismatch, other) — the messages must match the actual
  jsonschema phrasings the validator emits.
- validate_one: clean record produces 0 errors; unknown field
  surfaces unexpected_field (the G01 gate behavior); missing
  required field surfaces missing_required; YAML parse error
  surfaces as yaml_parse_error category.
- iter_yaml_files: walks directories, filters .txt, picks up
  nested *.yaml.

audit_writers.py:
- looks_like_yaml_writer: yaml.safe_dump / yaml.dump positive,
  bare .write_text negative, .write_text near .yaml hint positive,
  arbitrary code negative.
- audit: full-safeguards writer flagged yes/yes/yes/yes;
  no-safeguards writer flagged no/no/no; non-writer returns None;
  wired_into_just yes when justfile mentions the script stem.
- Self-suppression (Copilot fix from PR #64): audit_writers.py
  itself returns None even though its own source matches
  yaml.safe_dump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address Copilot review on PR #71

Add explicit `assert "b.yml" not in names` to
test_iter_yaml_files_walks_directory_and_filters — the prior test
documented the .yml-skipping behavior in a comment but never
asserted it, so a regression that started picking up .yml during
directory walks would have slipped through silently.

Also add test_iter_yaml_files_accepts_yml_file_passed_directly
to lock in the asymmetry that the previous test only hinted at:
iter_yaml_files() does accept .yml when passed as a file argument
(only the rglob('*.yaml') walk is .yaml-only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Update audit_writers tests to match #75's tightened heuristic

PR #75 changed `looks_like_yaml_writer` to require that the
yaml-serializer call feed directly into write_text on the same
line (instead of the looser "any .write_text + any .yaml token"
heuristic, which produced false positives for scripts that only
READ trait YAMLs).

The pre-#75 test asserted that
`path.write_text(content)  # .yaml` counted as a YAML writer.
That returned True under the old heuristic and False under the
new (correct) one. Replace it with two tests that lock in the
new contract:

  test_looks_like_yaml_writer_write_text_of_yaml_dump
    Positive: write_text(yaml.safe_dump(...)) / write_text(yaml.dump(...))
    both count.

  test_looks_like_yaml_writer_write_text_of_json_is_false
    Negative: a script that reads *.yaml then writes JSON via
    write_text is NOT a YAML writer — this is the false-positive
    case #75 explicitly fixed for scripts/build_embedding_index.py
    and scripts/render_trait_pages.py.

Also rename test_looks_like_yaml_writer_write_text_without_yaml_hint_is_false
to test_looks_like_yaml_writer_write_text_plain_is_false since the
"yaml hint" phrasing was tied to the old heuristic.

56 tests pass (was 54; +2).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants