Skip to content

Tests for grounding pipeline + audit scripts (+43 tests)#71

Merged
realmarcin merged 3 commits into
mainfrom
tests-grounding-pipeline
May 24, 2026
Merged

Tests for grounding pipeline + audit scripts (+43 tests)#71
realmarcin merged 3 commits into
mainfrom
tests-grounding-pipeline

Conversation

@realmarcin
Copy link
Copy Markdown
Contributor

Summary

The grounding pipelines and audit scripts have been load-bearing infrastructure for the last 7 PRs (#61, #66, #67, #69, #70 — all of which rewrite causal-graph fields based on these scripts' output). They had zero unit-test coverage. A silent regression in idempotency, header validation, or self-suppression would not be caught by validate-strict (which only checks per-record schema conformance, not pipeline correctness).

This PR adds 43 new tests across 4 new test files; total suite grows from 11 → 54.

File Tests What it locks in
tests/test_ground_causal_predicates.py 9 load_mapping happy/conflict/missing/incomplete; ground_edges idempotency, existing-id-never-overwritten, residual counting
tests/test_ground_causal_nodes.py 12 all of the above plus (label, node_type) keyed lookup, header validation (Copilot fix from #66), grounded_keys-on-validation-failure separability (Copilot fix from #66)
tests/test_validate_strict.py 11 classify parametrized over the 5 categories (matching actual jsonschema phrasings); validate_one unknown-field → unexpected_field (the G01 gate); missing-required, YAML parse-error coverage; iter_yaml_files walks dirs + filters non-YAML
tests/test_audit_writers.py 11 looks_like_yaml_writer positive/negative cases; audit full vs no safeguards; wired_into_just detection; self-suppression (Copilot fix from #64)

Why now

Verified locally

$ uv run pytest tests/ -v
============================== 54 passed in 2.87s ==============================

Test plan

  • All 54 tests green locally
  • No changes to production code — pure test additions
  • CI doesn't run pytest yet (only validate-strict); future PR could add a pytest workflow

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 24, 2026 07:32
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds unit-test coverage for TraitMech’s “grounding” pipelines and audit/validation scripts to prevent regressions in idempotency, mapping ingestion, residual reporting, and writer-auditing behaviors that aren’t caught by schema-only validation.

Changes:

  • Introduces focused unit tests for predicate grounding and node grounding (mapping load behavior, conflicts, idempotency, residual accounting).
  • Adds unit tests for validate_strict classification and per-file validation behaviors (unknown fields, missing required fields, YAML parse errors, directory walking).
  • Adds unit tests for audit_writers detection logic including self-suppression and safeguards classification.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
tests/test_ground_causal_predicates.py New unit tests for predicate mapping ingestion + edge grounding idempotency/residuals.
tests/test_ground_causal_nodes.py New unit tests for node mapping ingestion (header validation) + node grounding contracts.
tests/test_validate_strict.py New unit tests for error classification, closed-mode validation, and YAML file discovery.
tests/test_audit_writers.py New unit tests for YAML-writer heuristics, safeguard detection, and auditor self-suppression.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_validate_strict.py
realmarcin added a commit that referenced this pull request May 24, 2026
Add explicit `assert "b.yml" not in names` to
test_iter_yaml_files_walks_directory_and_filters — the prior test
documented the .yml-skipping behavior in a comment but never
asserted it, so a regression that started picking up .yml during
directory walks would have slipped through silently.

Also add test_iter_yaml_files_accepts_yml_file_passed_directly
to lock in the asymmetry that the previous test only hinted at:
iter_yaml_files() does accept .yml when passed as a file argument
(only the rglob('*.yaml') walk is .yaml-only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
realmarcin and others added 3 commits May 24, 2026 01:15
The grounding pipelines and audit scripts have been load-bearing
infrastructure for the last 7 PRs (#61, #66, #67, #69, #70 — all
of which rewrite causal-graph fields based on these scripts'
output). They had zero unit-test coverage. A silent regression in
idempotency, header validation, or self-suppression would not be
caught by validate-strict (which only checks per-record schema
conformance, not pipeline correctness).

Test counts:
  tests/test_ground_causal_predicates.py    9 tests
  tests/test_ground_causal_nodes.py        12 tests
  tests/test_validate_strict.py            11 tests
  tests/test_audit_writers.py              11 tests
  ---
  total new                                43 tests
  total suite                              54 tests (was 11)

Coverage highlights:

ground_causal_predicates.py:
- load_mapping: basic happy path, conflict detection (same label →
  different CURIEs raises ValueError), incomplete-row skipping,
  missing-file error.
- ground_edges_in_doc: idempotency (second pass = 0 changes),
  existing predicate_id never overwritten, residual counting for
  unmapped labels, empty/missing-predicate edges skipped.

ground_causal_nodes.py:
- All of the predicate suite plus:
- (label, node_type) keyed lookup — same label, different node_types
  map to different CURIEs without aliasing.
- Header validation (Copilot fix from PR #66): TSV with `nodetype`
  / `targetcurie` typo'd headers raises ValueError naming both
  missing columns.
- grounded_keys-on-validation-failure separability (Copilot fix
  from PR #66): caller can union residual + grounded_keys to
  recover the corpus-state residual after rolling back an invalid
  file write.

validate_strict.py:
- classify: parametrized over the 5 categories
  (unexpected_field, missing_required, enum_mismatch,
  pattern_mismatch, other) — the messages must match the actual
  jsonschema phrasings the validator emits.
- validate_one: clean record produces 0 errors; unknown field
  surfaces unexpected_field (the G01 gate behavior); missing
  required field surfaces missing_required; YAML parse error
  surfaces as yaml_parse_error category.
- iter_yaml_files: walks directories, filters .txt, picks up
  nested *.yaml.

audit_writers.py:
- looks_like_yaml_writer: yaml.safe_dump / yaml.dump positive,
  bare .write_text negative, .write_text near .yaml hint positive,
  arbitrary code negative.
- audit: full-safeguards writer flagged yes/yes/yes/yes;
  no-safeguards writer flagged no/no/no; non-writer returns None;
  wired_into_just yes when justfile mentions the script stem.
- Self-suppression (Copilot fix from PR #64): audit_writers.py
  itself returns None even though its own source matches
  yaml.safe_dump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add explicit `assert "b.yml" not in names` to
test_iter_yaml_files_walks_directory_and_filters — the prior test
documented the .yml-skipping behavior in a comment but never
asserted it, so a regression that started picking up .yml during
directory walks would have slipped through silently.

Also add test_iter_yaml_files_accepts_yml_file_passed_directly
to lock in the asymmetry that the previous test only hinted at:
iter_yaml_files() does accept .yml when passed as a file argument
(only the rglob('*.yaml') walk is .yaml-only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #75 changed `looks_like_yaml_writer` to require that the
yaml-serializer call feed directly into write_text on the same
line (instead of the looser "any .write_text + any .yaml token"
heuristic, which produced false positives for scripts that only
READ trait YAMLs).

The pre-#75 test asserted that
`path.write_text(content)  # .yaml` counted as a YAML writer.
That returned True under the old heuristic and False under the
new (correct) one. Replace it with two tests that lock in the
new contract:

  test_looks_like_yaml_writer_write_text_of_yaml_dump
    Positive: write_text(yaml.safe_dump(...)) / write_text(yaml.dump(...))
    both count.

  test_looks_like_yaml_writer_write_text_of_json_is_false
    Negative: a script that reads *.yaml then writes JSON via
    write_text is NOT a YAML writer — this is the false-positive
    case #75 explicitly fixed for scripts/build_embedding_index.py
    and scripts/render_trait_pages.py.

Also rename test_looks_like_yaml_writer_write_text_without_yaml_hint_is_false
to test_looks_like_yaml_writer_write_text_plain_is_false since the
"yaml hint" phrasing was tied to the old heuristic.

56 tests pass (was 54; +2).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@realmarcin realmarcin force-pushed the tests-grounding-pipeline branch from 6249749 to d938511 Compare May 24, 2026 08:16
@realmarcin realmarcin merged commit ebe7e21 into main May 24, 2026
@realmarcin realmarcin deleted the tests-grounding-pipeline branch May 24, 2026 08:17
@realmarcin realmarcin mentioned this pull request May 26, 2026
3 tasks
realmarcin added a commit that referenced this pull request May 26, 2026
#71 added 43 unit tests across 4 files locking in idempotency,
header validation, and other pipeline contracts — but none of
them run in CI. validate-strict.yaml only checks per-record schema
validity; it can't catch a regression that, say, breaks
grounded_keys propagation in ground_causal_nodes.py or removes
self-suppression from audit_writers.py.

This workflow mirrors validate-strict.yaml's shape:
- Triggers on PRs and pushes-to-main that touch scripts/**,
  src/traitmech/**, tests/**, pyproject.toml, or this workflow.
- Uses astral-sh/setup-uv@v3 + Python 3.12 + uv sync --extra dev
  (pytest is in the dev extra per pyproject.toml).
- Runs uv run pytest tests/ with -v --tb=short for readable
  failures in the GitHub Actions log.

Verified locally:
  - python3 -c "import yaml; yaml.safe_load(open(...))" → yaml ok
  - uv run pytest tests/ → 56 passed

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants