Skip to content

chore: upgrade data-designer to 0.5.5#95

Merged
lipikaramaswamy merged 2 commits intomainfrom
chore/upgrade-dd-0-5-5
Apr 8, 2026
Merged

chore: upgrade data-designer to 0.5.5#95
lipikaramaswamy merged 2 commits intomainfrom
chore/upgrade-dd-0-5-5

Conversation

@andreatgretel
Copy link
Copy Markdown
Contributor

@andreatgretel andreatgretel commented Apr 7, 2026

Summary

  • Bump data-designer from 0.5.0 to 0.5.5
  • Work around a logging regression introduced in DD 0.5.5 (configure_logging() called before DataDesigner() is silently overwritten DataDesigner#388): DD's _initialize_interface_runtime() now calls configure_logging() during DataDesigner.__init__(), which overwrites the data_designer logger level and noisy-logger suppression that Anonymizer sets up. We re-apply our logger levels after constructing DataDesigner via reapply_log_levels().
  • Clean up tests_e2e/test_e2e.py (see below)

Closes #87

E2E test changes

The original e2e test was a script that ran the full pipeline on all 25 records with debug logging. In practice this took 15+ minutes, produced very verbose output, and had no assertions - making it hard to tell whether things were passing or stuck.

Changes:

  • Fixed broken data path (docs/notebook_source/data/ no longer exists, data moved to docs/data/)
  • Preview only (3 records) instead of preview + full run on all 25 records. The preview exercises the same code paths (detection, validation, augmentation, rewrite, evaluation, repair) - the only difference is DD's preview() vs create(), which is a DD concern, not Anonymizer's.
  • Default logging instead of debug - less noise, progress is still visible via INFO-level stage markers
  • Added assertions on output (records returned > 0, expected rewrite columns present)
  • Kept it as a script (not pytest) so logs stream live - important for a multi-minute LLM-backed test where you need to see progress
  • Updated Makefile test-e2e target to run the script directly

Test plan

  • 516 unit tests pass
  • E2e smoke test passes (3 records, 0 failures, ~6 min)
  • Verified logging fix: data_designer logger stays at WARNING after Anonymizer() init

@andreatgretel andreatgretel requested review from a team as code owners April 7, 2026 18:00
@lipikaramaswamy
Copy link
Copy Markdown
Collaborator

The data-designer 0.5.5 workaround makes sense, but I wanted to flag one thing based on the logging contract we established in the original logging PR. My understanding from that discussion was that Anonymizer() should try to avoid invasive logging side effects, and that configure_logging(enabled=False) was added specifically to give callers a clean opt-out when they already manage logging themselves.

So thinking about the global-state interaction here -- configure_logging(enabled=False) sets _configured without clearing or updating _active_config, and Anonymizer.__init__() now unconditionally calls reapply_log_levels(). In a long-lived process, that means constructing Anonymizer() can still mutate data_designer / noisy logger levels based on stale prior config even when the caller opted out of Anonymizer-managed logging.

One possible workaround would be to scope reapply_log_levels() to the branch where Anonymizer creates the DataDesigner itself, instead of calling it unconditionally. That would avoid overriding logger state for callers who opt out of Anonymizer-managed logging or pass in their own preconfigured runtime. I’d also consider making reapply_log_levels() a no-op unless logging was explicitly configured with enabled=True, and adding a unit test for the configure_logging(enabled=False) path so we know stale global config can’t leak into later Anonymizer() construction.

@lipikaramaswamy
Copy link
Copy Markdown
Collaborator

lipikaramaswamy commented Apr 7, 2026

Also,

Related Issues

Closes #87

- Bump data-designer dependency from 0.5.0 to 0.5.5
- Work around DD logging override (NVIDIA-NeMo/DataDesigner#388):
  reapply Anonymizer's logger levels after DataDesigner init
- Fix stale data path in e2e test (docs/notebook_source/data -> docs/data)
- Clean up e2e test: default logging, assertions, run as script
…ig on enabled=False

- configure_logging(enabled=False) now clears _active_config so
  reapply_log_levels() cannot act on stale prior config
- reapply_log_levels() only runs when Anonymizer creates its own
  DataDesigner, not when a pre-configured instance is passed in
- Add test for the enabled=False path
@andreatgretel andreatgretel force-pushed the chore/upgrade-dd-0-5-5 branch from 74c238b to 2856bc7 Compare April 7, 2026 23:18
@andreatgretel
Copy link
Copy Markdown
Contributor Author

@lipikaramaswamy Good catch — you're right that the notebook/long-lived process scenario makes this a real concern.

Addressed in the latest push:

  1. configure_logging(enabled=False) now clears _active_config, so reapply_log_levels() can't act on stale prior config
  2. reapply_log_levels() is now scoped to the branch where Anonymizer creates its own DataDesigner — skipped when a pre-configured instance is passed in
  3. Added test_enabled_false_clears_active_config covering the stale-config path

Copy link
Copy Markdown
Collaborator

@lipikaramaswamy lipikaramaswamy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks!!

@lipikaramaswamy
Copy link
Copy Markdown
Collaborator

lipikaramaswamy commented Apr 7, 2026

Actually sorry, hold on, e2e test ran with errors.

Never mind, I had set DATA_DESIGNER_ASYNC_ENGINE=1 🙃 All good without that envvar. I will go ahead and merge.

@lipikaramaswamy lipikaramaswamy merged commit 5cc6bfc into main Apr 8, 2026
13 checks passed
@lipikaramaswamy lipikaramaswamy deleted the chore/upgrade-dd-0-5-5 branch April 8, 2026 01:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: upgrade data-designer to 0.5.4 release

2 participants