Skip to content

feat: add final judge and RewriteWorkflow orchestrator#64

Merged
lipikaramaswamy merged 25 commits into
mainfrom
lipikaramaswamy/feat/rewrite-engine-final-judge
Mar 26, 2026
Merged

feat: add final judge and RewriteWorkflow orchestrator#64
lipikaramaswamy merged 25 commits into
mainfrom
lipikaramaswamy/feat/rewrite-engine-final-judge

Conversation

@lipikaramaswamy
Copy link
Copy Markdown
Collaborator

@lipikaramaswamy lipikaramaswamy commented Mar 23, 2026

Note: This PR is stacked on #61 (evaluate & repair workflows). The base branch is lipikaramaswamy/feat/rewrite-engine-eval-repair. Please review only the diff against that branch. Once #61 merges, this will be rebased onto main.

Summary

  • FinalJudgeWorkflow (engine/rewrite/final_judge.py) -- holistic LLM evaluation using LLMJudgeColumnConfig with three rubrics (privacy, quality, naturalness) on a 1-10 scale, ported verbatim from the research repo. needs_human_review flagging based on objective metrics only (failed rewrite, utility below threshold, leakage above threshold, any HIGH-sensitivity leak); judge scores are informational -- not used for automated decisions (deferred to feat(rewrite): judge rubric refinement #37).

  • RewriteWorkflow (engine/rewrite/rewrite_workflow.py) -- top-level orchestrator chaining all 6 sub-workflows: domain classification, sensitivity disposition, QA generation, rewrite generation, evaluate-repair loop, and final judge. Evaluate-repair loop runs up to max_repair_iterations (from EvaluationCriteria), exits early when all rows pass, and only sends failing rows to repair. Final judge is non-critical (failure logged, defaults applied). Fast path skips all LLM calls when no entities detected.

  • Schema cleanup -- removed JudgeScoreSchema / JudgeEvaluationSchema (redundant with LLMJudgeColumnConfig + Score rubrics).

  • Row split/merge helpers extracted as free functions (_split_by_entities, _merge_and_reorder, _apply_passthrough_defaults) to prepare for refactor: extract shared entity-row split/reorder/recombine helper #60.

Config gaps closed

  • evaluation.max_repair_iterations -- consumed by evaluate-repair loop
  • evaluation.flag_utility_below -- consumed by needs_human_review
  • evaluation.flag_leakage_mass_above -- consumed by needs_human_review

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update
  • Refactoring

Testing

  • Tests pass locally
  • Added/updated tests for changes

Test plan

  • 15 tests for FinalJudgeWorkflow (prompt, rubrics, human review flagging)
  • 12 tests for RewriteWorkflow (fast path, call order, failed records, attrs, judge failure tolerance, repair loop)
  • Full suite: 391 passed

Related Issues

Closes #35

@lipikaramaswamy lipikaramaswamy requested a review from a team as a code owner March 23, 2026 19:29
@lipikaramaswamy lipikaramaswamy changed the base branch from main to lipikaramaswamy/feat/rewrite-engine-eval-repair March 23, 2026 20:34
Move all evaluation LLM calls from LLMStructuredColumnConfig to custom
columns so we can pass Pydantic validation context with expected IDs
per row. DD's correction loop retries when the LLM skips answers.
Also addresses PR #61 review: shared parsers module, repair unit tests,
COL_NEEDS_REPAIR to constants, consistent field() validation.
…ine-eval-repair' into lipikaramaswamy/feat/rewrite-engine-final-judge
…ine-eval-repair' into lipikaramaswamy/feat/rewrite-engine-final-judge
…ipikaramaswamy/feat/rewrite-engine-final-judge
…, use a new rewritten text colname in repair workflow
…ipikaramaswamy/feat/rewrite-engine-final-judge
Comment thread tests/engine/test_rewrite_workflow.py Outdated
Comment thread src/anonymizer/engine/schemas/__init__.py
Comment thread src/anonymizer/engine/rewrite/rewrite_workflow.py
Copy link
Copy Markdown
Collaborator

@andreatgretel andreatgretel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with #68 handling the public wiring, the main thing I am still wondering about in this PR is the short-output case in RewriteWorkflow. right now we can warn on a row-count mismatch and keep going, but the loop still seems to assume the joined columns are present afterwards. maybe this wants either stricter failure semantics here, or a more explicit way of carrying partial-row failures through the loop.

Base automatically changed from lipikaramaswamy/feat/rewrite-engine-eval-repair to main March 25, 2026 16:14
…h exist in our workflows, when lengths differ; added test_evaluate_dropping_rows_degrades_gracefully
@lipikaramaswamy
Copy link
Copy Markdown
Collaborator Author

with #68 handling the public wiring, the main thing I am still wondering about in this PR is the short-output case in RewriteWorkflow. right now we can warn on a row-count mismatch and keep going, but the loop still seems to assume the joined columns are present afterwards. maybe this wants either stricter failure semantics here, or a more explicit way of carrying partial-row failures through the loop.

Addressed in the latest commit. _join_new_columns no longer skips the join on row-count mismatch -- it filters the working dataframe to surviving rows (via RECORD_ID_COLUMN), joins the new columns onto those rows, and lets the dropped rows propagate through failed_records. Added a degraded-case test to cover this path.

Comment thread src/anonymizer/engine/rewrite/rewrite_workflow.py Outdated
…latten tests

- _join_new_columns aligns on RECORD_ID_COLUMN when adapter drops rows
  instead of crashing or skipping the join
- RECORD_ID_COLUMN included in all seed lists for stable ID across calls
- _join_judge_columns preserves all rows on partial judge failure,
  defaulting missing rows to needs_human_review=True
- Initial evaluate runs before repair loop (max_repair_iterations=0 fix)
- Flatten test_rewrite_workflow.py from classes to module-level functions
- Add tests: judge partial row loss, evaluate row drop degradation
Comment thread src/anonymizer/engine/rewrite/final_judge.py Outdated
Comment thread src/anonymizer/engine/rewrite/rewrite_workflow.py
Comment thread src/anonymizer/engine/rewrite/rewrite_workflow.py
@lipikaramaswamy
Copy link
Copy Markdown
Collaborator Author

Thanks all for your review, this was a hairy one. Merging now :)

@lipikaramaswamy lipikaramaswamy merged commit e4645f0 into main Mar 26, 2026
5 checks passed
@lipikaramaswamy lipikaramaswamy deleted the lipikaramaswamy/feat/rewrite-engine-final-judge branch March 26, 2026 01:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(rewrite): engine — final judge + RewriteWorkflow

4 participants