feat: improve sensitivity disposition calibration by asteier2026 · Pull Request #150 · NVIDIA-NeMo/Anonymizer

asteier2026 · 2026-05-11T15:24:48Z

Changes include:

Refactor entity_categories to assign explicit sensitivity tiers (high/medium/low) for
quasi-identifiers and latent identifiers, grounded in re-identification risk rather than content
sensitivity
- Add combined_risk_assessment block clarifying the distinction between sensitivity and
  combined_risk_level, and establishing that needs_protection follows from combined_risk_level
- Add bundle over-protection guidance to prevent medium assignment to every element of an identifying
  bundle
- Strengthen output_requirements to enforce combined_risk_level as the driver of needs_protection and
  require protection_reason to demonstrate persistence after high-risk anchors are suppressed

greptile-apps · 2026-05-11T15:27:56Z

Greptile Summary

This PR closes three previously-flagged P1 issues by adding StrictCombinedRiskLevel (constraining strict-mode entities to medium/high), wiring schema validators that enforce the combined_risk_level → needs_protection invariant, and fixing the misquoted strict-mode override string. It also refactors the LLM sensitivity-disposition prompt to ground protection decisions in re-identification risk rather than content sensitivity, and removes the now-obsolete sensitive_attribute and latent_sensitive_attribute categories from the detection and rewrite schemas.

Schema hardening (rewrite.py): StrictEntityDispositionSchema now rejects combined_risk_level='low' at parse time, and EntityDispositionSchema._validate_protection_consistency enforces high→needs_protection=true and low→needs_protection=false as hard invariants.
Prompt overhaul (sensitivity_disposition.py): Entity categories gain explicit sensitivity tiers (high/medium/low), a new combined_risk_assessment block clarifies the sensitivity-vs-combined_risk_level distinction and adds bundle over-protection guidance, and output requirements now name combined_risk_level as the sole driver of needs_protection.
Category cleanup (detection.py, rewrite.py, qa_generation.py): sensitive_attribute and latent_sensitive_attribute are removed from all schemas with no remaining references in the codebase.

Confidence Score: 5/5

Safe to merge — all three previously-identified crash paths in the strict conversion flow are now closed.

The StrictCombinedRiskLevel enum prevents combined_risk_level='low' from ever reaching to_entity_disposition(), the new validators enforce both directions of the combined_risk_level → needs_protection contract, and no remaining references to the removed sensitive_attribute categories exist in the codebase.

tests/engine/test_schemas.py — the two new combined_risk_level invariants lack direct failure-path test cases.

Important Files Changed

Filename	Overview
src/anonymizer/engine/schemas/rewrite.py	Adds StrictCombinedRiskLevel enum (medium/high only) and wires it into StrictEntityDispositionSchema; adds combined_risk_level→needs_protection invariant validators to EntityDispositionSchema — all three prior P1s are closed.
src/anonymizer/engine/rewrite/sensitivity_disposition.py	Refactors entity_categories guidance to assign explicit sensitivity tiers, adds combined_risk_assessment block, strengthens output_requirements, and fixes the misquoted strict-mode override string.
src/anonymizer/engine/schemas/detection.py	Removes latent_sensitive_attribute from LatentCategory enum; no downstream references remain in src or tests.
tests/engine/test_schemas.py	Adds combined_risk_level='low' to fixtures that needed it; missing explicit tests for the two new combined_risk_level invariants (high→needs_protection=true, low→needs_protection=false).

_{Reviews (6): Last reviewed commit: "Update src/anonymizer/engine/rewrite/sen..." | Re-trigger Greptile}

asteier2026 · 2026-05-13T15:19:29Z

I made the changes greptile-apps recommended Wed morning.

lipikaramaswamy · 2026-05-15T07:05:33Z

Q: in strict entity protection mode, should combined_risk_level="low" still be allowed?

I noticed StrictEntityDispositionSchema accepts combined_risk_level="low", but to_entity_disposition() always converts strict entities with needs_protection=True. With the new validator, that creates low + needs_protection=True, which raises because low combined risk now requires needs_protection=False.

I may be missing the intended strict-mode semantics here, but should strict mode either constrain combined_risk_level to high/medium, or handle this conversion differently?

asteier2026 · 2026-05-15T15:36:19Z

Good catch Lipika, I made the adjustments.

lipikaramaswamy

Looks good. I rechecked the strict-mode edge case: combined_risk_level="low" is now rejected by the strict schema, and medium/high convert cleanly with needs_protection=True. Tests ran great, and I'm assuming the end-to-end pipeline works with the updated prompt/schema.

Co-authored-by: lipikaramaswamy <31832945+lipikaramaswamy@users.noreply.github.com>

* feature: sensitivity enhancements * feature: more sensitivity enhancements * fix: address greptile suggestion to add checks for combined_risk_level * fix: add test updates * feature: more sensitivity enhancements * fix: combined_risk_level and strict protection * Update src/anonymizer/engine/rewrite/sensitivity_disposition.py Co-authored-by: lipikaramaswamy <31832945+lipikaramaswamy@users.noreply.github.com> --------- Co-authored-by: lipikaramaswamy <31832945+lipikaramaswamy@users.noreply.github.com>

asteier2026 added 2 commits May 8, 2026 13:56

feature: sensitivity enhancements

0f8bb7c

feature: more sensitivity enhancements

6576a00

asteier2026 requested a review from a team as a code owner May 11, 2026 15:24

asteier2026 requested a review from lipikaramaswamy May 12, 2026 15:09

asteier2026 added 3 commits May 13, 2026 08:49

fix: address greptile suggestion to add checks for combined_risk_level

104579c

fix: add test updates

4ecfdf2

feature: more sensitivity enhancements

eb87464

fix: combined_risk_level and strict protection

5223827

lipikaramaswamy reviewed May 15, 2026

View reviewed changes

Comment thread src/anonymizer/engine/rewrite/sensitivity_disposition.py Outdated

lipikaramaswamy approved these changes May 15, 2026

View reviewed changes

Update src/anonymizer/engine/rewrite/sensitivity_disposition.py

f4fbe54

Co-authored-by: lipikaramaswamy <31832945+lipikaramaswamy@users.noreply.github.com>

asteier2026 merged commit 9ca3e27 into main May 15, 2026
11 checks passed

asteier2026 deleted the asteier2026/feature/sensitivity-improvements branch May 15, 2026 17:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve sensitivity disposition calibration#150

feat: improve sensitivity disposition calibration#150
asteier2026 merged 7 commits into
mainfrom
asteier2026/feature/sensitivity-improvements

asteier2026 commented May 11, 2026

Uh oh!

greptile-apps Bot commented May 11, 2026 •

edited

Loading

Uh oh!

asteier2026 commented May 13, 2026

Uh oh!

lipikaramaswamy commented May 15, 2026

Uh oh!

asteier2026 commented May 15, 2026

Uh oh!

Uh oh!

lipikaramaswamy left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

asteier2026 commented May 11, 2026

Uh oh!

greptile-apps Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

asteier2026 commented May 13, 2026

Uh oh!

lipikaramaswamy commented May 15, 2026

Uh oh!

asteier2026 commented May 15, 2026

Uh oh!

Uh oh!

lipikaramaswamy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented May 11, 2026 •

edited

Loading