Skip to content

feat(extension): configurable PII entity types and custom regex patterns#470

Open
yjouini wants to merge 2 commits into
dataiku:mainfrom
yjouini:feat/configurable-pii-entity-types
Open

feat(extension): configurable PII entity types and custom regex patterns#470
yjouini wants to merge 2 commits into
dataiku:mainfrom
yjouini:feat/configurable-pii-entity-types

Conversation

@yjouini
Copy link
Copy Markdown
Contributor

@yjouini yjouini commented May 27, 2026

Closes #458, Closes #459

Chrome extension

  • PII entity types card — checkbox grid of all labels fetched from GET /api/pii/labels. Unchecked types are persisted to chrome.storage.sync and sent as enabled_labels with each /api/pii/check from background.js. "Disable all / Enable all" toggle included.
  • Custom patterns card — form to add regex rules (label, regex, optional replacement). Live preview tests regex against sample text before saving. Saved patterns list with inline Edit/Remove. Label grid refreshes after each save so custom labels appear in the type toggles.

Backend — API

Endpoint Description
GET /api/pii/labels Returns model labels from label_mappings.json (BIO-prefix stripped) merged with custom pattern labels
GET /api/pii/patterns List all custom patterns
POST /api/pii/patterns Create a pattern (validates regex compiles, ≤ 500 chars)
PUT /api/pii/patterns/{id} Update a pattern
DELETE /api/pii/patterns/{id} Delete a pattern

Backend — Internals

  • SQLite: new custom_patterns table with enabled/replacement columns. ALTER TABLE migrations for existing databases.
  • Masking: MaskText accepts enabledLabels to filter which entity types are masked (nil = all, used by proxy pipeline). Custom regex runner appended after ML entities, respecting the same filter.
  • Deduplication: overlapping entities resolved by longest span; equal-length ties go to custom regex (appended last).
  • Generator fallback: unknown labels (e.g. EMPLOYEE_ID) produce [LABEL] instead of a generic placeholder, so the LLM retains semantic context.
  • Handler: MaskPIIInTextFiltered exposed for the extension flow, MaskPIIInText unchanged for the proxy pipeline.

Testing

TestCustomPattern_CreateAndList, TestCustomPattern_Update, TestCustomPattern_Delete, TestCustomPattern_EmptyListIsNotNil, plus 5 TestDeduplicateEntities_* cases. All Go tests pass: make test-go

All 55 Go tests pass: make test-go

feature.select.pii.and.custom.regex.mp4

yjouini and others added 2 commits May 27, 2026 11:34
…u#458, dataiku#459)

Extension:
- PII types card: fetch labels from backend, persist disabled set in
  chrome.storage.sync, send enabled_labels per-request so the backend
  only masks the chosen types
- Custom patterns card: add/edit/remove user-defined regex rules with
  label, regex, description, and a custom replacement string; live
  preview against sample text
- Labels list merges ML-model labels with custom-pattern labels so
  custom types appear in the entity-type grid

Backend:
- GET /api/pii/labels — returns all known labels (model + custom)
- GET/POST /api/pii/patterns — list and create custom patterns
- PUT/DELETE /api/pii/patterns/{id} — update or remove a pattern
- custom_patterns table in SQLite with enabled/replacement columns;
  ALTER TABLE migrations handle existing databases
- MaskText accepts enabled_labels to filter which types are masked
  (nil = all, used by the transparent proxy)
- Custom regex runner (regex_detector.go) appended after ML entities;
  entity-level Replacement field lets each pattern specify its own
  placeholder instead of falling back to [LABEL]
- deduplicateEntities removes overlapping spans before masking,
  preferring the longer match; same-length ties go to the custom
  regex entity

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove Description field
- Shorten placeholders, move examples to title tooltips
- Add visible hint for Replacement default behavior
- Remove text-overflow ellipsis so saved regexes aren't trimmed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow user-defined custom regex patterns from the extension UI Make detected entity types configurable in the extension

1 participant