Skip to content

test(datasets): regression coverage for #16141 (export with same table name, different schemas)#40123

Merged
rusackas merged 2 commits into
masterfrom
test/16141-dataset-export-same-name-schema
May 14, 2026
Merged

test(datasets): regression coverage for #16141 (export with same table name, different schemas)#40123
rusackas merged 2 commits into
masterfrom
test/16141-dataset-export-same-name-schema

Conversation

@rusackas
Copy link
Copy Markdown
Member

SUMMARY

TDD-first regression coverage for #16141 (Aug 2021): exporting two datasets with the same table_name but different schemas (e.g. prod.users + dev.users) historically returned only one file because the export filename did not disambiguate the pair.

The current ExportDatasetsCommand already produces <table_name>_<id>.yaml, so the original collision should no longer occur — but no test pinned the behavior, leaving the bug technically open and at risk of silent regression.

This PR is a small experiment in a test-PR-first workflow: open a failing-or-passing test that encodes the bug's contract before doing any code changes. If CI passes, the bug is provably fixed and #16141 can close. If CI fails, the test documents exactly what still needs work for whoever picks up the fix.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A — test-only.

TESTING INSTRUCTIONS

pytest tests/unit_tests/datasets/commands/export_test.py::test_export_two_datasets_same_table_name_different_schema -xvs

The test creates two SqlaTable rows with identical table_name="users" and distinct schemas (prod, dev), runs ExportDatasetsCommand._export on each, and asserts:

  1. The two emitted file paths are distinct (no filename collision).
  2. Both emitted YAML payloads carry the correct schema: field — neither is silently overwritten.

ADDITIONAL INFORMATION

…e name, different schemas)

Issue #16141 (Aug 2021) reported that the dataset export API returned
only one file when exporting two datasets that shared a table name but
lived in different schemas (e.g. prod.users + dev.users). The current
ExportDatasetsCommand disambiguates the export filename with the
dataset id (`<table_name>_<id>.yaml`), so the collision should no
longer occur — but no test pinned the behavior.

This is a TDD-first PR: the test asserts the *fixed* behavior. If CI
green, the bug is provably gone and #16141 can be closed; if CI red,
the test documents exactly what still needs to be fixed for whoever
picks up the work.

Closes #16141

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@dosubot dosubot Bot added the data:dataset Related to dataset configurations label May 14, 2026
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 14, 2026

Code Review Agent Run #3d5e37

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: 233eb49..233eb49
    • tests/unit_tests/datasets/commands/export_test.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@codecov
Copy link
Copy Markdown

codecov Bot commented May 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.16%. Comparing base (62dc237) to head (3587a72).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #40123      +/-   ##
==========================================
+ Coverage   64.15%   64.16%   +0.01%     
==========================================
  Files        2590     2590              
  Lines      138104   138087      -17     
  Branches    32039    32039              
==========================================
+ Hits        88599    88608       +9     
+ Misses      47984    47954      -30     
- Partials     1521     1525       +4     
Flag Coverage Δ
hive 39.47% <ø> (+0.03%) ⬆️
mysql 59.17% <ø> (+0.04%) ⬆️
postgres 59.25% <ø> (+0.04%) ⬆️
presto 41.16% <ø> (+0.03%) ⬆️
python 60.69% <ø> (+0.04%) ⬆️
sqlite 58.89% <ø> (+0.04%) ⬆️
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a regression test for issue #16141, which previously caused dataset exports to collide when two datasets shared a table_name but had different schemas. The current ExportDatasetsCommand filenames already include the dataset id (<table_name>_<id>.yaml), so the test should now pass and pin the fix.

Changes:

  • Adds a single unit test that creates two SqlaTable rows with identical table_name but distinct schemas and verifies the exports produce distinct paths and preserve each schema in the YAML payload.

Comment thread tests/unit_tests/datasets/commands/export_test.py Outdated
Copy link
Copy Markdown
Member

@sadpandajoe sadpandajoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one question but not blocking so approving.

@bito-code-review
Copy link
Copy Markdown
Contributor

Yes, using yaml.safe_load would be a more robust alternative to string splitting for parsing the YAML content, as it properly handles the structure and avoids potential errors from manual splitting. However, since the current approach works for this test's specific needs and isn't causing issues, it's not urgent to change.

tests/unit_tests/datasets/commands/export_test.py

import yaml

# Alternative parsing
schemas_in_yaml = {yaml.safe_load(c)['schema'] for c in contents}

…edback)

@sadpandajoe and the bito reviewer both flagged that the schema
extraction was hand-splitting the YAML string. yaml.safe_load is
both clearer and more robust to formatting changes (key order,
spacing, multi-line values).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@rusackas rusackas merged commit 4e09889 into master May 14, 2026
65 checks passed
@rusackas rusackas deleted the test/16141-dataset-export-same-name-schema branch May 14, 2026 18:08
@bito-code-review
Copy link
Copy Markdown
Contributor

Bito Automatic Review Skipped – PR Already Merged

Bito scheduled an automatic review for this pull request, but the review was skipped because this PR was merged before the review could be run.
No action is needed if you didn't intend to review it. To get a review, you can type /review in a comment and save it

sha174n pushed a commit to sha174n/superset that referenced this pull request May 15, 2026
…e table name, different schemas) (apache#40123)

Co-authored-by: Superset Dev <dev@superset.apache.org>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data:dataset Related to dataset configurations size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

API: Export: Exporting 2 physical datasets with the same table name but different schema returns only 1 dataset

3 participants