Tests for grounding pipeline + audit scripts (+43 tests) by realmarcin · Pull Request #71 · CultureBotAI/TraitMech

realmarcin · 2026-05-24T07:32:57Z

Summary

The grounding pipelines and audit scripts have been load-bearing infrastructure for the last 7 PRs (#61, #66, #67, #69, #70 — all of which rewrite causal-graph fields based on these scripts' output). They had zero unit-test coverage. A silent regression in idempotency, header validation, or self-suppression would not be caught by validate-strict (which only checks per-record schema conformance, not pipeline correctness).

This PR adds 43 new tests across 4 new test files; total suite grows from 11 → 54.

File	Tests	What it locks in
`tests/test_ground_causal_predicates.py`	9	load_mapping happy/conflict/missing/incomplete; ground_edges idempotency, existing-id-never-overwritten, residual counting
`tests/test_ground_causal_nodes.py`	12	all of the above plus `(label, node_type)` keyed lookup, header validation (Copilot fix from #66), grounded_keys-on-validation-failure separability (Copilot fix from #66)
`tests/test_validate_strict.py`	11	`classify` parametrized over the 5 categories (matching actual jsonschema phrasings); `validate_one` unknown-field → `unexpected_field` (the G01 gate); missing-required, YAML parse-error coverage; `iter_yaml_files` walks dirs + filters non-YAML
`tests/test_audit_writers.py`	11	`looks_like_yaml_writer` positive/negative cases; `audit` full vs no safeguards; wired_into_just detection; self-suppression (Copilot fix from #64)

Why now

Idempotency is the single most important pipeline contract — every PR that runs ground-predicates --apply or ground-nodes --apply relies on it. Currently no test would catch a future refactor that breaks "second pass = 0 changes."
Header validation in ground_causal_nodes.load_mapping was added in PR Ground causal-graph nodes: 39-mapping cohort, 77 nodes grounded (38% → 45%) #66 in response to Copilot. Without a test, the next refactor could silently regress.
grounded_keys-on-validation-failure was the other Copilot fix in Ground causal-graph nodes: 39-mapping cohort, 77 nodes grounded (38% → 45%) #66 — a subtle reporting bug where rolled-back grounded nodes were invisible in residual. Now locked in.
audit_writers.py self-suppression was the Copilot fix in Audit backlog cleanup: G02, G03, G04, G05 #64. Easy to break by a future refactor that drops the path-identity check.

Verified locally

$ uv run pytest tests/ -v
============================== 54 passed in 2.87s ==============================

Test plan

All 54 tests green locally
No changes to production code — pure test additions
CI doesn't run pytest yet (only validate-strict); future PR could add a pytest workflow

🤖 Generated with Claude Code

Copilot

Pull request overview

Adds unit-test coverage for TraitMech’s “grounding” pipelines and audit/validation scripts to prevent regressions in idempotency, mapping ingestion, residual reporting, and writer-auditing behaviors that aren’t caught by schema-only validation.

Changes:

Introduces focused unit tests for predicate grounding and node grounding (mapping load behavior, conflicts, idempotency, residual accounting).
Adds unit tests for validate_strict classification and per-file validation behaviors (unknown fields, missing required fields, YAML parse errors, directory walking).
Adds unit tests for audit_writers detection logic including self-suppression and safeguards classification.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
`tests/test_ground_causal_predicates.py`	New unit tests for predicate mapping ingestion + edge grounding idempotency/residuals.
`tests/test_ground_causal_nodes.py`	New unit tests for node mapping ingestion (header validation) + node grounding contracts.
`tests/test_validate_strict.py`	New unit tests for error classification, closed-mode validation, and YAML file discovery.
`tests/test_audit_writers.py`	New unit tests for YAML-writer heuristics, safeguard detection, and auditor self-suppression.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Add explicit `assert "b.yml" not in names` to test_iter_yaml_files_walks_directory_and_filters — the prior test documented the .yml-skipping behavior in a comment but never asserted it, so a regression that started picking up .yml during directory walks would have slipped through silently. Also add test_iter_yaml_files_accepts_yml_file_passed_directly to lock in the asymmetry that the previous test only hinted at: iter_yaml_files() does accept .yml when passed as a file argument (only the rglob('*.yaml') walk is .yaml-only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The grounding pipelines and audit scripts have been load-bearing infrastructure for the last 7 PRs (#61, #66, #67, #69, #70 — all of which rewrite causal-graph fields based on these scripts' output). They had zero unit-test coverage. A silent regression in idempotency, header validation, or self-suppression would not be caught by validate-strict (which only checks per-record schema conformance, not pipeline correctness). Test counts: tests/test_ground_causal_predicates.py 9 tests tests/test_ground_causal_nodes.py 12 tests tests/test_validate_strict.py 11 tests tests/test_audit_writers.py 11 tests --- total new 43 tests total suite 54 tests (was 11) Coverage highlights: ground_causal_predicates.py: - load_mapping: basic happy path, conflict detection (same label → different CURIEs raises ValueError), incomplete-row skipping, missing-file error. - ground_edges_in_doc: idempotency (second pass = 0 changes), existing predicate_id never overwritten, residual counting for unmapped labels, empty/missing-predicate edges skipped. ground_causal_nodes.py: - All of the predicate suite plus: - (label, node_type) keyed lookup — same label, different node_types map to different CURIEs without aliasing. - Header validation (Copilot fix from PR #66): TSV with `nodetype` / `targetcurie` typo'd headers raises ValueError naming both missing columns. - grounded_keys-on-validation-failure separability (Copilot fix from PR #66): caller can union residual + grounded_keys to recover the corpus-state residual after rolling back an invalid file write. validate_strict.py: - classify: parametrized over the 5 categories (unexpected_field, missing_required, enum_mismatch, pattern_mismatch, other) — the messages must match the actual jsonschema phrasings the validator emits. - validate_one: clean record produces 0 errors; unknown field surfaces unexpected_field (the G01 gate behavior); missing required field surfaces missing_required; YAML parse error surfaces as yaml_parse_error category. - iter_yaml_files: walks directories, filters .txt, picks up nested *.yaml. audit_writers.py: - looks_like_yaml_writer: yaml.safe_dump / yaml.dump positive, bare .write_text negative, .write_text near .yaml hint positive, arbitrary code negative. - audit: full-safeguards writer flagged yes/yes/yes/yes; no-safeguards writer flagged no/no/no; non-writer returns None; wired_into_just yes when justfile mentions the script stem. - Self-suppression (Copilot fix from PR #64): audit_writers.py itself returns None even though its own source matches yaml.safe_dump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add explicit `assert "b.yml" not in names` to test_iter_yaml_files_walks_directory_and_filters — the prior test documented the .yml-skipping behavior in a comment but never asserted it, so a regression that started picking up .yml during directory walks would have slipped through silently. Also add test_iter_yaml_files_accepts_yml_file_passed_directly to lock in the asymmetry that the previous test only hinted at: iter_yaml_files() does accept .yml when passed as a file argument (only the rglob('*.yaml') walk is .yaml-only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR #75 changed `looks_like_yaml_writer` to require that the yaml-serializer call feed directly into write_text on the same line (instead of the looser "any .write_text + any .yaml token" heuristic, which produced false positives for scripts that only READ trait YAMLs). The pre-#75 test asserted that `path.write_text(content) # .yaml` counted as a YAML writer. That returned True under the old heuristic and False under the new (correct) one. Replace it with two tests that lock in the new contract: test_looks_like_yaml_writer_write_text_of_yaml_dump Positive: write_text(yaml.safe_dump(...)) / write_text(yaml.dump(...)) both count. test_looks_like_yaml_writer_write_text_of_json_is_false Negative: a script that reads *.yaml then writes JSON via write_text is NOT a YAML writer — this is the false-positive case #75 explicitly fixed for scripts/build_embedding_index.py and scripts/render_trait_pages.py. Also rename test_looks_like_yaml_writer_write_text_without_yaml_hint_is_false to test_looks_like_yaml_writer_write_text_plain_is_false since the "yaml hint" phrasing was tied to the old heuristic. 56 tests pass (was 54; +2). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

#71 added 43 unit tests across 4 files locking in idempotency, header validation, and other pipeline contracts — but none of them run in CI. validate-strict.yaml only checks per-record schema validity; it can't catch a regression that, say, breaks grounded_keys propagation in ground_causal_nodes.py or removes self-suppression from audit_writers.py. This workflow mirrors validate-strict.yaml's shape: - Triggers on PRs and pushes-to-main that touch scripts/**, src/traitmech/**, tests/**, pyproject.toml, or this workflow. - Uses astral-sh/setup-uv@v3 + Python 3.12 + uv sync --extra dev (pytest is in the dev extra per pyproject.toml). - Runs uv run pytest tests/ with -v --tb=short for readable failures in the GitHub Actions log. Verified locally: - python3 -c "import yaml; yaml.safe_load(open(...))" → yaml ok - uv run pytest tests/ → 56 passed Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 24, 2026 07:32

Copilot started reviewing on behalf of realmarcin May 24, 2026 07:33 View session

Copilot AI reviewed May 24, 2026

View reviewed changes

Comment thread tests/test_validate_strict.py

realmarcin mentioned this pull request May 24, 2026

Tighten audit_writers heuristic — drop false-positive YAML writers #75

Merged

4 tasks

realmarcin and others added 3 commits May 24, 2026 01:15

realmarcin force-pushed the tests-grounding-pipeline branch from 6249749 to d938511 Compare May 24, 2026 08:16

realmarcin merged commit ebe7e21 into main May 24, 2026

realmarcin deleted the tests-grounding-pipeline branch May 24, 2026 08:17

realmarcin mentioned this pull request May 26, 2026

Add pytest CI workflow #79

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests for grounding pipeline + audit scripts (+43 tests)#71

Tests for grounding pipeline + audit scripts (+43 tests)#71
realmarcin merged 3 commits into
mainfrom
tests-grounding-pipeline

realmarcin commented May 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

realmarcin commented May 24, 2026

Summary

Why now

Verified locally

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants