feat: add final judge and RewriteWorkflow orchestrator by lipikaramaswamy · Pull Request #64 · NVIDIA-NeMo/Anonymizer

lipikaramaswamy · 2026-03-23T19:29:52Z

Note: This PR is stacked on #61 (evaluate & repair workflows). The base branch is lipikaramaswamy/feat/rewrite-engine-eval-repair. Please review only the diff against that branch. Once #61 merges, this will be rebased onto main.

Summary

FinalJudgeWorkflow (engine/rewrite/final_judge.py) -- holistic LLM evaluation using LLMJudgeColumnConfig with three rubrics (privacy, quality, naturalness) on a 1-10 scale, ported verbatim from the research repo. needs_human_review flagging based on objective metrics only (failed rewrite, utility below threshold, leakage above threshold, any HIGH-sensitivity leak); judge scores are informational -- not used for automated decisions (deferred to feat(rewrite): judge rubric refinement #37).
RewriteWorkflow (engine/rewrite/rewrite_workflow.py) -- top-level orchestrator chaining all 6 sub-workflows: domain classification, sensitivity disposition, QA generation, rewrite generation, evaluate-repair loop, and final judge. Evaluate-repair loop runs up to max_repair_iterations (from EvaluationCriteria), exits early when all rows pass, and only sends failing rows to repair. Final judge is non-critical (failure logged, defaults applied). Fast path skips all LLM calls when no entities detected.
Schema cleanup -- removed JudgeScoreSchema / JudgeEvaluationSchema (redundant with LLMJudgeColumnConfig + Score rubrics).
Row split/merge helpers extracted as free functions (_split_by_entities, _merge_and_reorder, _apply_passthrough_defaults) to prepare for refactor: extract shared entity-row split/reorder/recombine helper #60.

Config gaps closed

evaluation.max_repair_iterations -- consumed by evaluate-repair loop
evaluation.flag_utility_below -- consumed by needs_human_review
evaluation.flag_leakage_mass_above -- consumed by needs_human_review

Type of Change

Testing

Tests pass locally
Added/updated tests for changes

Test plan

15 tests for FinalJudgeWorkflow (prompt, rubrics, human review flagging)
12 tests for RewriteWorkflow (fast path, call order, failed records, attrs, judge failure tolerance, repair loop)
Full suite: 391 passed

Related Issues

Closes #35

Move all evaluation LLM calls from LLMStructuredColumnConfig to custom columns so we can pass Pydantic validation context with expected IDs per row. DD's correction loop retries when the LLM skips answers. Also addresses PR #61 review: shared parsers module, repair unit tests, COL_NEEDS_REPAIR to constants, consistent field() validation.

…ine-eval-repair' into lipikaramaswamy/feat/rewrite-engine-final-judge

…fe replace order

…ine-eval-repair' into lipikaramaswamy/feat/rewrite-engine-final-judge

…ipikaramaswamy/feat/rewrite-engine-final-judge

…, use a new rewritten text colname in repair workflow

…ipikaramaswamy/feat/rewrite-engine-final-judge

…air __next rename

andreatgretel

with #68 handling the public wiring, the main thing I am still wondering about in this PR is the short-output case in RewriteWorkflow. right now we can warn on a row-count mismatch and keep going, but the loop still seems to assume the joined columns are present afterwards. maybe this wants either stricter failure semantics here, or a more explicit way of carrying partial-row failures through the loop.

…h exist in our workflows, when lengths differ; added test_evaluate_dropping_rows_degrades_gracefully

…rewrite-engine-final-judge

lipikaramaswamy · 2026-03-25T17:56:30Z

with #68 handling the public wiring, the main thing I am still wondering about in this PR is the short-output case in RewriteWorkflow. right now we can warn on a row-count mismatch and keep going, but the loop still seems to assume the joined columns are present afterwards. maybe this wants either stricter failure semantics here, or a more explicit way of carrying partial-row failures through the loop.

Addressed in the latest commit. _join_new_columns no longer skips the join on row-count mismatch -- it filters the working dataframe to surviving rows (via RECORD_ID_COLUMN), joins the new columns onto those rows, and lets the dropped rows propagate through failed_records. Added a degraded-case test to cover this path.

…latten tests - _join_new_columns aligns on RECORD_ID_COLUMN when adapter drops rows instead of crashing or skipping the join - RECORD_ID_COLUMN included in all seed lists for stable ID across calls - _join_judge_columns preserves all rows on partial judge failure, defaulting missing rows to needs_human_review=True - Initial evaluate runs before repair loop (max_repair_iterations=0 fix) - Flatten test_rewrite_workflow.py from classes to module-level functions - Add tests: judge partial row loss, evaluate row drop degradation

lipikaramaswamy · 2026-03-26T01:44:29Z

Thanks all for your review, this was a hairy one. Merging now :)

lipikaramaswamy added 3 commits March 20, 2026 00:25

feat: introduce evaluate and repair workflows

b99b5bd

update vars after merging main, add warning

784b653

fix: remove unused schema:

13d2d33

lipikaramaswamy requested a review from a team as a code owner March 23, 2026 19:29

asteier2026 approved these changes Mar 23, 2026

View reviewed changes

feat(rewrite): add final judge workflow and rewrite orchestrator

205dd24

lipikaramaswamy changed the base branch from main to lipikaramaswamy/feat/rewrite-engine-eval-repair March 23, 2026 20:34

asteier2026 approved these changes Mar 23, 2026

View reviewed changes

lipikaramaswamy added 13 commits March 23, 2026 19:24

Merge remote-tracking branch 'origin/lipikaramaswamy/feat/rewrite-eng…

504dabc

…ine-eval-repair' into lipikaramaswamy/feat/rewrite-engine-final-judge

fix: remove unused logging import

34ca931

fix: reject duplicate/extra answer IDs, complete required_columns, sa…

22ad899

…fe replace order

Merge remote-tracking branch 'origin/lipikaramaswamy/feat/rewrite-eng…

576c382

…ine-eval-repair' into lipikaramaswamy/feat/rewrite-engine-final-judge

use re.sub for replacmeents in prompts

b84d652

Merge branch 'lipikaramaswamy/feat/rewrite-engine-eval-repair' into l…

7de1c2f

…ipikaramaswamy/feat/rewrite-engine-final-judge

refactor: use render_template instead of str replace

fade562

fix: (looking forward to wiring) make custom cols output model_dump()…

8467bac

…, use a new rewritten text colname in repair workflow

add tests for parsers

551f3b7

Merge branch 'lipikaramaswamy/feat/rewrite-engine-eval-repair' into l…

1fc8517

…ipikaramaswamy/feat/rewrite-engine-final-judge

feat(rewrite): strip seed dataframes before adapter calls, handle rep…

9bda864

…air __next rename

fix: update pre gen seed col'

b0bf4d6

lipikaramaswamy mentioned this pull request Mar 25, 2026

fix: rewrite: deduplicate entities before sensitivity disposition #67

Closed

lipikaramaswamy added 2 commits March 24, 2026 22:54

rewrite generation gets a stripped seed dataset

47a6a99

avoid failed row crash, log warning

1f233e1

lipikaramaswamy mentioned this pull request Mar 25, 2026

feat: interface wiring + display #68

Merged

11 tasks

andreatgretel reviewed Mar 25, 2026

View reviewed changes

Comment thread tests/engine/test_rewrite_workflow.py Outdated

andreatgretel reviewed Mar 25, 2026

View reviewed changes

Comment thread src/anonymizer/engine/schemas/__init__.py

andreatgretel reviewed Mar 25, 2026

View reviewed changes

Comment thread src/anonymizer/engine/rewrite/rewrite_workflow.py

andreatgretel requested changes Mar 25, 2026

View reviewed changes

Base automatically changed from lipikaramaswamy/feat/rewrite-engine-eval-repair to main March 25, 2026 16:14

lipikaramaswamy added 2 commits March 25, 2026 10:51

fix: update _join_new_columns to align on RECORD_ID_COLUMN, which muc…

e80c22c

…h exist in our workflows, when lengths differ; added test_evaluate_dropping_rows_degrades_gracefully

Merge remote-tracking branch 'origin/main' into lipikaramaswamy/feat/…

95b0655

…rewrite-engine-final-judge

lipikaramaswamy requested a review from andreatgretel March 25, 2026 17:56

lint-fix

b23247c

andreatgretel reviewed Mar 25, 2026

View reviewed changes

Comment thread src/anonymizer/engine/rewrite/rewrite_workflow.py Outdated

andreatgretel approved these changes Mar 25, 2026

View reviewed changes

johnnygreco reviewed Mar 25, 2026

View reviewed changes

Comment thread src/anonymizer/engine/rewrite/final_judge.py Outdated

johnnygreco reviewed Mar 25, 2026

View reviewed changes

Comment thread src/anonymizer/engine/rewrite/rewrite_workflow.py

Comment thread src/anonymizer/engine/rewrite/rewrite_workflow.py

lipikaramaswamy added 2 commits March 25, 2026 17:36

fix: harden repair and judge row-loss handling

89a428a

lintfix

0247d3a

lipikaramaswamy requested a review from johnnygreco March 26, 2026 00:39

andreatgretel approved these changes Mar 26, 2026

View reviewed changes

lipikaramaswamy merged commit e4645f0 into main Mar 26, 2026
5 checks passed

lipikaramaswamy deleted the lipikaramaswamy/feat/rewrite-engine-final-judge branch March 26, 2026 01:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add final judge and RewriteWorkflow orchestrator#64

feat: add final judge and RewriteWorkflow orchestrator#64
lipikaramaswamy merged 25 commits into
mainfrom
lipikaramaswamy/feat/rewrite-engine-final-judge

lipikaramaswamy commented Mar 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andreatgretel left a comment

Uh oh!

lipikaramaswamy commented Mar 25, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lipikaramaswamy commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

lipikaramaswamy commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Config gaps closed

Type of Change

Testing

Test plan

Related Issues

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andreatgretel left a comment

Choose a reason for hiding this comment

Uh oh!

lipikaramaswamy commented Mar 25, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lipikaramaswamy commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lipikaramaswamy commented Mar 23, 2026 •

edited

Loading