feat: improve sensitivity disposition calibration#150
Conversation
Greptile SummaryThis PR closes three previously-flagged P1 issues by adding
Confidence Score: 5/5Safe to merge — all three previously-identified crash paths in the strict conversion flow are now closed. The StrictCombinedRiskLevel enum prevents combined_risk_level='low' from ever reaching to_entity_disposition(), the new validators enforce both directions of the combined_risk_level → needs_protection contract, and no remaining references to the removed sensitive_attribute categories exist in the codebase. tests/engine/test_schemas.py — the two new combined_risk_level invariants lack direct failure-path test cases. Important Files Changed
Reviews (6): Last reviewed commit: "Update src/anonymizer/engine/rewrite/sen..." | Re-trigger Greptile |
|
I made the changes greptile-apps recommended Wed morning. |
|
Q: in strict entity protection mode, should I noticed I may be missing the intended strict-mode semantics here, but should strict mode either constrain |
|
Good catch Lipika, I made the adjustments. |
lipikaramaswamy
left a comment
There was a problem hiding this comment.
Looks good. I rechecked the strict-mode edge case: combined_risk_level="low" is now rejected by the strict schema, and medium/high convert cleanly with needs_protection=True. Tests ran great, and I'm assuming the end-to-end pipeline works with the updated prompt/schema.
Co-authored-by: lipikaramaswamy <31832945+lipikaramaswamy@users.noreply.github.com>
* feature: sensitivity enhancements * feature: more sensitivity enhancements * fix: address greptile suggestion to add checks for combined_risk_level * fix: add test updates * feature: more sensitivity enhancements * fix: combined_risk_level and strict protection * Update src/anonymizer/engine/rewrite/sensitivity_disposition.py Co-authored-by: lipikaramaswamy <31832945+lipikaramaswamy@users.noreply.github.com> --------- Co-authored-by: lipikaramaswamy <31832945+lipikaramaswamy@users.noreply.github.com>
Changes include:
quasi-identifiers and latent identifiers, grounded in re-identification risk rather than content
sensitivity
combined_risk_level, and establishing that needs_protection follows from combined_risk_level
bundle
require protection_reason to demonstrate persistence after high-risk anchors are suppressed