fix: align filtered display and harden detection prompt notation by lipikaramaswamy · Pull Request #43 · NVIDIA-NeMo/Anonymizer

lipikaramaswamy · 2026-03-13T07:11:52Z

Summary

Fix display rendering so filtered replace runs use final_entities when present, even if the filtered set is empty, instead of falling back to _detected_entities
Tighten detection prompt guidance around partial-token drops and technical-value classification to reduce noisy tagging
Rename inline tag markers from PII to SENSITIVE so prompt examples and tagged text better reflect the broader privacy-sensitive scope

Type of Change

Testing

Tests pass locally
Added/updated tests for changes

andreatgretel · 2026-03-13T19:56:32Z

src/anonymizer/engine/detection/postprocess.py:369
if needle[0].isalnum() or needle[0] == "_":
    escaped = rf"(?<![A-Za-z0-9_]){escaped}"
if needle[-1].isalnum() or needle[-1] == "_":
    escaped = rf"{escaped}(?![A-Za-z0-9_])"
I think this still matches inside hyphenated tokens, so something like internal-procID-id may get tagged again during expansion. maybe worth tightening the boundary check and adding a small regression test?

lipikaramaswamy · 2026-03-16T07:06:05Z

src/anonymizer/engine/detection/postprocess.py:369
if needle[0].isalnum() or needle[0] == "_":
    escaped = rf"(?<![A-Za-z0-9_]){escaped}"
if needle[-1].isalnum() or needle[-1] == "_":
    escaped = rf"{escaped}(?![A-Za-z0-9_])"
I think this still matches inside hyphenated tokens, so something like internal-procID-id may get tagged again during expansion. maybe worth tightening the boundary check and adding a small regression test?

Thanks, @asteier2026 and I discussed this, and we will move work on hyphens to a separate PR since there are nuances depending on the type of data (#46)

lipikaramaswamy · 2026-03-16T16:43:22Z

I'll merge this PR, we will pick up hyphen work as part of a different PR linked to #46

lipikaramaswamy added 3 commits March 13, 2026 00:04

fix: use final entities in filtered preview display

9b98103

fix: tighten prompt guidance for partial token and technical cases

91c62a9

refactor: rename inline privacy markers to sensitive (PII -> SENSITIVE)

388b470

lipikaramaswamy requested a review from a team as a code owner March 13, 2026 07:11

asteier2026 approved these changes Mar 13, 2026

View reviewed changes

remove hyphen from instruction

aed4aeb

lipikaramaswamy mentioned this pull request Mar 16, 2026

feat: add domain-aware hyphen handling for partial-token detection (maybe after rewrite integration) #46

Open

asteier2026 approved these changes Mar 16, 2026

View reviewed changes

lipikaramaswamy merged commit 3943fdb into main Mar 16, 2026
5 checks passed

lipikaramaswamy deleted the lipikaramaswamy/refactor/filtered-display-and-detection-prompts branch March 16, 2026 16:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: align filtered display and harden detection prompt notation#43

fix: align filtered display and harden detection prompt notation#43
lipikaramaswamy merged 4 commits into
mainfrom
lipikaramaswamy/refactor/filtered-display-and-detection-prompts

lipikaramaswamy commented Mar 13, 2026

Uh oh!

andreatgretel commented Mar 13, 2026

Uh oh!

lipikaramaswamy commented Mar 16, 2026

Uh oh!

lipikaramaswamy commented Mar 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lipikaramaswamy commented Mar 13, 2026

Summary

Type of Change

Testing

Uh oh!

andreatgretel commented Mar 13, 2026

Uh oh!

lipikaramaswamy commented Mar 16, 2026

Uh oh!

lipikaramaswamy commented Mar 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants