Skip to content

feat: improve sensitivity disposition calibration#150

Merged
asteier2026 merged 7 commits into
mainfrom
asteier2026/feature/sensitivity-improvements
May 15, 2026
Merged

feat: improve sensitivity disposition calibration#150
asteier2026 merged 7 commits into
mainfrom
asteier2026/feature/sensitivity-improvements

Conversation

@asteier2026
Copy link
Copy Markdown
Contributor

Changes include:

  • Refactor entity_categories to assign explicit sensitivity tiers (high/medium/low) for
    quasi-identifiers and latent identifiers, grounded in re-identification risk rather than content
    sensitivity
    • Add combined_risk_assessment block clarifying the distinction between sensitivity and
      combined_risk_level, and establishing that needs_protection follows from combined_risk_level
    • Add bundle over-protection guidance to prevent medium assignment to every element of an identifying
      bundle
    • Strengthen output_requirements to enforce combined_risk_level as the driver of needs_protection and
      require protection_reason to demonstrate persistence after high-risk anchors are suppressed

@asteier2026 asteier2026 requested a review from a team as a code owner May 11, 2026 15:24
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 11, 2026

Greptile Summary

This PR closes three previously-flagged P1 issues by adding StrictCombinedRiskLevel (constraining strict-mode entities to medium/high), wiring schema validators that enforce the combined_risk_level → needs_protection invariant, and fixing the misquoted strict-mode override string. It also refactors the LLM sensitivity-disposition prompt to ground protection decisions in re-identification risk rather than content sensitivity, and removes the now-obsolete sensitive_attribute and latent_sensitive_attribute categories from the detection and rewrite schemas.

  • Schema hardening (rewrite.py): StrictEntityDispositionSchema now rejects combined_risk_level='low' at parse time, and EntityDispositionSchema._validate_protection_consistency enforces high→needs_protection=true and low→needs_protection=false as hard invariants.
  • Prompt overhaul (sensitivity_disposition.py): Entity categories gain explicit sensitivity tiers (high/medium/low), a new combined_risk_assessment block clarifies the sensitivity-vs-combined_risk_level distinction and adds bundle over-protection guidance, and output requirements now name combined_risk_level as the sole driver of needs_protection.
  • Category cleanup (detection.py, rewrite.py, qa_generation.py): sensitive_attribute and latent_sensitive_attribute are removed from all schemas with no remaining references in the codebase.

Confidence Score: 5/5

Safe to merge — all three previously-identified crash paths in the strict conversion flow are now closed.

The StrictCombinedRiskLevel enum prevents combined_risk_level='low' from ever reaching to_entity_disposition(), the new validators enforce both directions of the combined_risk_level → needs_protection contract, and no remaining references to the removed sensitive_attribute categories exist in the codebase.

tests/engine/test_schemas.py — the two new combined_risk_level invariants lack direct failure-path test cases.

Important Files Changed

Filename Overview
src/anonymizer/engine/schemas/rewrite.py Adds StrictCombinedRiskLevel enum (medium/high only) and wires it into StrictEntityDispositionSchema; adds combined_risk_level→needs_protection invariant validators to EntityDispositionSchema — all three prior P1s are closed.
src/anonymizer/engine/rewrite/sensitivity_disposition.py Refactors entity_categories guidance to assign explicit sensitivity tiers, adds combined_risk_assessment block, strengthens output_requirements, and fixes the misquoted strict-mode override string.
src/anonymizer/engine/schemas/detection.py Removes latent_sensitive_attribute from LatentCategory enum; no downstream references remain in src or tests.
tests/engine/test_schemas.py Adds combined_risk_level='low' to fixtures that needed it; missing explicit tests for the two new combined_risk_level invariants (high→needs_protection=true, low→needs_protection=false).

Reviews (6): Last reviewed commit: "Update src/anonymizer/engine/rewrite/sen..." | Re-trigger Greptile

@asteier2026
Copy link
Copy Markdown
Contributor Author

I made the changes greptile-apps recommended Wed morning.

@lipikaramaswamy
Copy link
Copy Markdown
Collaborator

Q: in strict entity protection mode, should combined_risk_level="low" still be allowed?

I noticed StrictEntityDispositionSchema accepts combined_risk_level="low", but to_entity_disposition() always converts strict entities with needs_protection=True. With the new validator, that creates low + needs_protection=True, which raises because low combined risk now requires needs_protection=False.

I may be missing the intended strict-mode semantics here, but should strict mode either constrain combined_risk_level to high/medium, or handle this conversion differently?

@asteier2026
Copy link
Copy Markdown
Contributor Author

Good catch Lipika, I made the adjustments.

Comment thread src/anonymizer/engine/rewrite/sensitivity_disposition.py Outdated
Copy link
Copy Markdown
Collaborator

@lipikaramaswamy lipikaramaswamy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I rechecked the strict-mode edge case: combined_risk_level="low" is now rejected by the strict schema, and medium/high convert cleanly with needs_protection=True. Tests ran great, and I'm assuming the end-to-end pipeline works with the updated prompt/schema.

Co-authored-by: lipikaramaswamy <31832945+lipikaramaswamy@users.noreply.github.com>
@asteier2026 asteier2026 merged commit 9ca3e27 into main May 15, 2026
11 checks passed
@asteier2026 asteier2026 deleted the asteier2026/feature/sensitivity-improvements branch May 15, 2026 17:43
asteier2026 added a commit that referenced this pull request May 15, 2026
* feature: sensitivity enhancements

* feature: more sensitivity enhancements

* fix: address greptile suggestion to add checks for combined_risk_level

* fix: add test updates

* feature: more sensitivity enhancements

* fix: combined_risk_level and strict protection

* Update src/anonymizer/engine/rewrite/sensitivity_disposition.py

Co-authored-by: lipikaramaswamy <31832945+lipikaramaswamy@users.noreply.github.com>

---------

Co-authored-by: lipikaramaswamy <31832945+lipikaramaswamy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants