chore: upgrade data-designer to 0.5.5 by andreatgretel · Pull Request #95 · NVIDIA-NeMo/Anonymizer

andreatgretel · 2026-04-07T18:00:44Z

Summary

Bump data-designer from 0.5.0 to 0.5.5
Work around a logging regression introduced in DD 0.5.5 (configure_logging() called before DataDesigner() is silently overwritten DataDesigner#388): DD's _initialize_interface_runtime() now calls configure_logging() during DataDesigner.__init__(), which overwrites the data_designer logger level and noisy-logger suppression that Anonymizer sets up. We re-apply our logger levels after constructing DataDesigner via reapply_log_levels().
Clean up tests_e2e/test_e2e.py (see below)

Closes #87

E2E test changes

The original e2e test was a script that ran the full pipeline on all 25 records with debug logging. In practice this took 15+ minutes, produced very verbose output, and had no assertions - making it hard to tell whether things were passing or stuck.

Changes:

Fixed broken data path (docs/notebook_source/data/ no longer exists, data moved to docs/data/)
Preview only (3 records) instead of preview + full run on all 25 records. The preview exercises the same code paths (detection, validation, augmentation, rewrite, evaluation, repair) - the only difference is DD's preview() vs create(), which is a DD concern, not Anonymizer's.
Default logging instead of debug - less noise, progress is still visible via INFO-level stage markers
Added assertions on output (records returned > 0, expected rewrite columns present)
Kept it as a script (not pytest) so logs stream live - important for a multi-minute LLM-backed test where you need to see progress
Updated Makefile test-e2e target to run the script directly

Test plan

516 unit tests pass
E2e smoke test passes (3 records, 0 failures, ~6 min)
Verified logging fix: data_designer logger stays at WARNING after Anonymizer() init

lipikaramaswamy · 2026-04-07T18:40:22Z

The data-designer 0.5.5 workaround makes sense, but I wanted to flag one thing based on the logging contract we established in the original logging PR. My understanding from that discussion was that Anonymizer() should try to avoid invasive logging side effects, and that configure_logging(enabled=False) was added specifically to give callers a clean opt-out when they already manage logging themselves.

So thinking about the global-state interaction here -- configure_logging(enabled=False) sets _configured without clearing or updating _active_config, and Anonymizer.__init__() now unconditionally calls reapply_log_levels(). In a long-lived process, that means constructing Anonymizer() can still mutate data_designer / noisy logger levels based on stale prior config even when the caller opted out of Anonymizer-managed logging.

One possible workaround would be to scope reapply_log_levels() to the branch where Anonymizer creates the DataDesigner itself, instead of calling it unconditionally. That would avoid overriding logger state for callers who opt out of Anonymizer-managed logging or pass in their own preconfigured runtime. I’d also consider making reapply_log_levels() a no-op unless logging was explicitly configured with enabled=True, and adding a unit test for the configure_logging(enabled=False) path so we know stale global config can’t leak into later Anonymizer() construction.

lipikaramaswamy · 2026-04-07T20:50:41Z

Also,

Related Issues

Closes #87

- Bump data-designer dependency from 0.5.0 to 0.5.5 - Work around DD logging override (NVIDIA-NeMo/DataDesigner#388): reapply Anonymizer's logger levels after DataDesigner init - Fix stale data path in e2e test (docs/notebook_source/data -> docs/data) - Clean up e2e test: default logging, assertions, run as script

…ig on enabled=False - configure_logging(enabled=False) now clears _active_config so reapply_log_levels() cannot act on stale prior config - reapply_log_levels() only runs when Anonymizer creates its own DataDesigner, not when a pre-configured instance is passed in - Add test for the enabled=False path

andreatgretel · 2026-04-07T23:19:21Z

@lipikaramaswamy Good catch — you're right that the notebook/long-lived process scenario makes this a real concern.

Addressed in the latest push:

configure_logging(enabled=False) now clears _active_config, so reapply_log_levels() can't act on stale prior config
reapply_log_levels() is now scoped to the branch where Anonymizer creates its own DataDesigner — skipped when a pre-configured instance is passed in
Added test_enabled_false_clears_active_config covering the stale-config path

lipikaramaswamy

lgtm, thanks!!

lipikaramaswamy · 2026-04-07T23:29:30Z

~~Actually sorry, hold on, e2e test ran with errors.~~

Never mind, I had set DATA_DESIGNER_ASYNC_ENGINE=1 🙃 All good without that envvar. I will go ahead and merge.

andreatgretel requested review from a team as code owners April 7, 2026 18:00

andreatgretel added 2 commits April 7, 2026 20:11

andreatgretel force-pushed the chore/upgrade-dd-0-5-5 branch from 74c238b to 2856bc7 Compare April 7, 2026 23:18

lipikaramaswamy approved these changes Apr 7, 2026

View reviewed changes

lipikaramaswamy merged commit 5cc6bfc into main Apr 8, 2026
13 checks passed

lipikaramaswamy deleted the chore/upgrade-dd-0-5-5 branch April 8, 2026 01:21

andreatgretel mentioned this pull request Apr 8, 2026

bug: async engine drops side-effect column values and silently misresolves collisions NVIDIA-NeMo/DataDesigner#508

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: upgrade data-designer to 0.5.5#95

chore: upgrade data-designer to 0.5.5#95
lipikaramaswamy merged 2 commits intomainfrom
chore/upgrade-dd-0-5-5

andreatgretel commented Apr 7, 2026 •

edited

Loading

Uh oh!

lipikaramaswamy commented Apr 7, 2026

Uh oh!

lipikaramaswamy commented Apr 7, 2026 •

edited

Loading

Uh oh!

andreatgretel commented Apr 7, 2026

Uh oh!

lipikaramaswamy left a comment

Uh oh!

lipikaramaswamy commented Apr 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andreatgretel commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

E2E test changes

Test plan

Uh oh!

lipikaramaswamy commented Apr 7, 2026

Uh oh!

lipikaramaswamy commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issues

Uh oh!

andreatgretel commented Apr 7, 2026

Uh oh!

lipikaramaswamy left a comment

Choose a reason for hiding this comment

Uh oh!

lipikaramaswamy commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andreatgretel commented Apr 7, 2026 •

edited

Loading

lipikaramaswamy commented Apr 7, 2026 •

edited

Loading

lipikaramaswamy commented Apr 7, 2026 •

edited

Loading