Skip to content

feat(anonymize): add NULL, NAME, PHONE_NUMBER, DATE_OF_BIRTH, TEXT, WORD types#38

Merged
Andarius merged 1 commit into
masterfrom
feat/more-anonymize-types
May 12, 2026
Merged

feat(anonymize): add NULL, NAME, PHONE_NUMBER, DATE_OF_BIRTH, TEXT, WORD types#38
Andarius merged 1 commit into
masterfrom
feat/more-anonymize-types

Conversation

@Andarius
Copy link
Copy Markdown
Contributor

Summary

anonymize previously supported only EMAIL. This extends FieldType so config files can cover more PII columns without falling back to custom SQL post-steps:

  • NULL — blank the column. Useful for hashes / tokens / anything that shouldn't carry information in a shared dump.
  • FIRST_NAME, LAST_NAME, NAMEfaker.first_name() / last_name() / name().
  • PHONE_NUMBERfaker.phone_number().
  • DATE_OF_BIRTHfaker.date_of_birth(), accepts minimum_age / maximum_age extra args.
  • TEXT, WORDfaker.text() / faker.word().

All types forward their extra_args to the underlying Faker call, so the existing EMAIL domain: pattern works for the new ones too.

Heads-up

YAML's bare NULL parses as a real null, so users need to quote the type:

fields:
  - column: password_hash
    type: "NULL"          # quoted, otherwise parsed as None

Documented in the README example.

Tests

  • 7 new parametrized cases over _get_fake_value (one per supported type) verifying the output matches the type's shape.
  • Explicit pytest.raises(ValueError, match="unimplemented field type") to keep the fallback honest.
  • All 8 unit tests pass locally (DB-dependent tests untouched).

Test plan

  • CI green
  • Manual smoke against a dev DB with a config exercising each new type

…ORD types

`anonymize` previously supported only `EMAIL`. To cover more PII columns
without falling back to custom SQL post-steps, extend `FieldType` with:

- `NULL` — blank the column. Useful for hashes / tokens / anything that
  shouldn't carry information in a shared dump. Note YAML's bare `NULL`
  parses as null, so quote the type: `type: "NULL"`.
- `FIRST_NAME`, `LAST_NAME`, `NAME` — `faker.first_name()` / `last_name()` / `name()`.
- `PHONE_NUMBER` — `faker.phone_number()`.
- `DATE_OF_BIRTH` — `faker.date_of_birth()`, accepts `minimum_age` / `maximum_age`.
- `TEXT`, `WORD` — `faker.text()` / `faker.word()`.

All types forward their extra_args to the underlying Faker call, so the
existing `EMAIL` `domain:` pattern works for the new types where Faker
exposes options.

Tests: 7 new parametrized cases over `_get_fake_value` plus an explicit
check that an unknown type raises `ValueError`.

README: new "3. Anonymization" section with the type table + an example
config; subsequent sections renumbered.
@Andarius Andarius self-assigned this May 12, 2026
@Andarius Andarius merged commit 12a3bae into master May 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant