Skip to content

feat(versioning): capture and expose version history for charts, dashboards, and datasets#39603

Draft
mikebridge wants to merge 44 commits into
apache:masterfrom
mikebridge:sc-103156-versioning
Draft

feat(versioning): capture and expose version history for charts, dashboards, and datasets#39603
mikebridge wants to merge 44 commits into
apache:masterfrom
mikebridge:sc-103156-versioning

Conversation

@mikebridge
Copy link
Copy Markdown
Contributor

@mikebridge mikebridge commented Apr 23, 2026

SUMMARY

Adds backend plumbing to capture a version history for every save of a chart, dashboard, or dataset, and to expose that history via three new REST endpoints per entity (list, get, restore). No frontend in this PR. Second PR in the Versioning epic (sc-103156), depending on #39859 (composite-PK reshape on M2M association tables — sc-105349) and orthogonal to #39286 (sc-103157 soft-delete). See SIP-210 / issue #39492 for full design rationale.

🚧🚧🚧 This is still a draft/spike, not ready for final review. Branch contains two temp(*) commits (demo UI dropdowns + French i18n; URL-param stripping on restore navigation) that will revert before merge. 🚧🚧🚧

What changed:

  • Continuum wiring. Adds sqlalchemy-continuum as a base dependency. Wired in superset/extensions/__init__.py with the validity strategy; a custom VersionTransactionFactory renames the transaction table to version_transaction (the word transaction is reserved in several dialects) and a VersioningFlaskPlugin supplies the acting user via get_user_id() (not Flask-Login's current_user) so CLI / Celery / JWT-auth API saves all attribute correctly.

  • Six shadow tables, all Continuum-native:

    • parent shadows: dashboards_version, slices_version, tables_version
    • child shadows: table_columns_version, sql_metrics_version
    • M2M shadow: dashboard_slices_version

    Plus one version_transaction table (the per-flush "who/when/where" envelope) and one version_changes table (structured diff records, FK to version_transaction with ON DELETE CASCADE).

  • Three endpoints per entity type (/chart, /dashboard, /dataset):

    • GET /api/v1/<resource>/<uuid>/versions/ — list history (version_number, version_uuid, issued_at, changed_by)
    • GET /api/v1/<resource>/<uuid>/versions/<version_uuid>/ — one snapshot (scalar fields; plus columns / metrics for datasets, slices for dashboards)
    • POST /api/v1/<resource>/<uuid>/versions/<version_uuid>/restore — restore entity to that version
  • <version_uuid> is a deterministic UUIDv5 (fixed namespace, derived from the entity UUID + Continuum transaction id). Stable across replicas and retention pruning — the same transaction always produces the same version uuid, so API consumers can cache references safely.

  • ETag headers (ETag: W/"<version_uuid>") on all three GET endpoints + the live entity GET. Foundation for optimistic-locking enforcement on writes (Phase 2); not enforced in this PR.

  • Restore uses Continuum's native Reverter wrapped in a single_flush_scope context manager (suppresses autoflush inside the block, emits one trailing flush). The single-revert / single-flush shape was the spike outcome — earlier attempts at split-revert and JSON-snapshot tables were abandoned (see spike-continuum-restore.md and the revised ADR-004 in the spec folder).

  • Baseline capture. First save under versioning of an entity that pre-existed the migration inserts a synthetic operation_type=0 row capturing the pre-edit state, attributed to the entity's existing changed_on / changed_by_fk. Listener runs before Continuum's own before_flush so the baseline transaction_id is lower than the edit's (correct ordering).

  • No-op suppression. A SkipUnmodifiedPlugin marks Continuum Operations processed=True when post-flush column values are content-equal to the previous shadow row — including JSON-aware comparison for Dashboard.json_metadata that strips frontend-stamped audit sub-keys (map_label_colors, chart_configuration, …) so saves that only re-stamp those don't pollute history.

  • Force-parent-dirty on child changes. A before_flush listener flags the versioned parent (SqlaTable) as dirty when only its versioned children (TableColumn / SqlMetric) changed, so child-only edits surface in the parent's version dropdown.

  • Structured change records. Every save writes per-field diff records to version_changes keyed to the same transaction_id. Records carry kind / path / from_value / to_value — backbone for the Phase-2 UI's "Added column X" rendering, captured in V1 so the data is available from day one without a backfill.

  • Retention is time-based, run by a Celery beat task. SUPERSET_VERSION_HISTORY_RETENTION_DAYS (default 90; 0 or None disables versioning entirely). Deletes shadow rows older than the cutoff while preserving the live row regardless of age. ON DELETE CASCADE on version_changes.transaction_id keeps diffs in sync. No write-path overhead; the prune is asynchronous.

  • Composite PK reshape on M2M associations (sc-105349, PR refactor(db): composite PK on M2M association tables (sc-105349) #39859 — required for Continuum's M2M tracker to populate dashboard_slices_version correctly). The PRs are intended to merge in order refactor(db): composite PK on M2M association tables (sc-105349) #39859 → this one; the migration is included on this branch because the rebased history depends on it.

  • Authorisation. Version endpoints reuse the resource's existing can_write permission. No new FAB permissions. Row-level access enforced via security_manager.raise_for_ownership(entity) in the restore command.

What is NOT versioned in v1 (see specs/sc-103156-entity-versioning/future-work.md):

  • Tags, owners, roles — explicitly excluded from the versioned column set; restore leaves these at their live values.
  • Per-chart position inside a dashboardposition_json is versioned as an opaque blob (restored wholesale on dashboard restore); finer-grained layout versioning is Phase 2.
  • Auto-generated human-readable change summary text, change-type icons, search over diff content, AI attribution — all captured as Phase-2 frontend work.

Coordination with #39286 (sc-103157 soft-delete) — orthogonal in design; merge order can go either way. When sc-103157 merges, one small change hooks deleted_at into find_active_by_uuid() and the versioned models' Continuum exclude lists. Tracked as T043 in the spec.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A — backend-only. No UI is wired to the new endpoints yet. The temp(*) commits add non-final demo dropdowns and i18n strings for manual testing only; they will revert before merge.

TESTING INSTRUCTIONS

  1. Save a chart a few times to generate history:
CHART_UUID=$(curl -s http://localhost:8088/api/v1/chart/?q='(page_size:1)' | jq -r '.result[0].uuid')
for i in 1 2 3; do
  curl -s -X PUT http://localhost:8088/api/v1/chart/$CHART_UUID -d "{\"slice_name\":\"Test v$i\"}" -H 'Content-Type: application/json'
done
  1. List the history:
curl http://localhost:8088/api/v1/chart/$CHART_UUID/versions/ | jq

Expect an ordered array with version_number, version_uuid, issued_at, changed_by. Response will include an ETag: W/"<version_uuid>" header for the most recent version.

  1. Fetch one version:
VERSION_UUID=$(curl -s http://localhost:8088/api/v1/chart/$CHART_UUID/versions/ | jq -r '.result[0].version_uuid')
curl -i http://localhost:8088/api/v1/chart/$CHART_UUID/versions/$VERSION_UUID/ | head -20

Expect 200 + ETag header. A second request with If-None-Match: "<that-etag>" returns 304.

  1. Restore:
curl -X POST http://localhost:8088/api/v1/chart/$CHART_UUID/versions/$VERSION_UUID/restore

Expect 200. GET /api/v1/chart/$CHART_UUID should now reflect the restored state, and a new version row (the restore itself) appears in the version list.

  1. Inspect structured diffs:
curl http://localhost:8088/api/v1/chart/$CHART_UUID/versions/$VERSION_UUID/ | jq '.result.changes'

Each change carries {kind, path, from_value, to_value}.

  1. Repeat for dashboards and datasets using /api/v1/dashboard/ and /api/v1/dataset/. Datasets exercise child shadows (columns / metrics); dashboards exercise the M2M shadow (slices).

  2. Run the test suite:

pytest tests/integration_tests/charts/version_history_tests.py \
       tests/integration_tests/dashboards/version_history_tests.py \
       tests/integration_tests/datasets/version_history_tests.py \
       tests/integration_tests/versioning/ -v
  1. Run the performance validation harness (skipped in CI; run on demand):
SUPERSET_PERF_VALIDATION=1 pytest tests/integration_tests/versioning/perf_validation_tests.py -v -s

Asserts the three Success Criteria: list < 1 s, restore < 3 s, save p95 overhead < 50 ms.

ADDITIONAL INFORMATION

Migration list (in dependency order):

Migration Operation Reversible?
2bee73611e32 Composite PK reshape on dashboard_slices + 7 other association tables (sc-105349 / #39859) Yes
56cd24c07170 Create version_transaction + parent shadow tables (dashboards_version, slices_version, tables_version) Yes
e1f3c5a7b9d0 Create version_changes (structured diff records, ON DELETE CASCADE FK to version_transaction) Yes
f7a2b3c4d5e6 Create child shadow tables (table_columns_version, sql_metrics_version) + M2M shadow (dashboard_slices_version) Yes

All migrations are additive on the pre-existing slices / dashboards / tables / child tables — no existing columns altered. The composite-PK migration (2bee73611e32) reshapes the M2M association tables; round-trip tested on PostgreSQL, MySQL, and SQLite, including the MySQL FK / AUTO_INCREMENT quirks that required raw SQL workarounds (commits 56c36fde54, 65a3491861).

Write cost per save:

Table Rows added
version_transaction 1
*_version parent shadow 0 if SkipUnmodifiedPlugin filtered the save; 1 if scalars changed
Child/M2M shadows one per changed child/M2M entry
version_changes one per atomic field-level change (zero for skipped saves)

No write-path retention overhead — pruning is asynchronous via Celery beat.

Performance:

The numbers below were captured before the ADR-004 reversal (JSON-snapshot → full Continuum). Architecture has changed since — child writes now go through Continuum shadows instead of dataset_snapshots / dashboard_snapshots JSON tables. Re-validation against the final architecture pending before review. Targets unchanged:

Criterion Target
SC-002 list endpoint < 1000 ms
SC-003 restore endpoint < 3000 ms
SC-004 save p95 overhead < 50 ms

Harness: SUPERSET_PERF_VALIDATION=1 pytest tests/integration_tests/versioning/perf_validation_tests.py -v -s.

  • Has associated issue: sc-103156 / SIP-210 #39492
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@github-actions github-actions Bot added risk:db-migration PRs that require a DB migration api Related to the REST API labels Apr 23, 2026
mikebridge pushed a commit to mikebridge/superset that referenced this pull request Apr 23, 2026
Phase 1 of versioning added ``sqlalchemy-continuum==1.6.0`` to
``requirements/base.txt`` directly, but the pin was missing from
``pyproject.toml``'s ``[project.dependencies]``. CI's
``check-python-deps`` job regenerates the pinned files from the
``.in`` sources via ``scripts/uv-pip-compile.sh``; without the
pyproject declaration, regeneration strips the pin out, causing:

  ModuleNotFoundError: No module named 'sqlalchemy_continuum'

…on every Python-based job (test-sqlite, test-postgres, test-mysql,
unit-tests, test-postgres-hive, test-postgres-presto,
test-load-examples, docker-build) because ``superset/extensions/
__init__.py`` unconditionally imports from it at module load time.

Adds ``"sqlalchemy-continuum>=1.6.0, <2.0.0"`` to pyproject and
re-runs ``uv-pip-compile.sh`` to sync ``base.txt`` and
``development.txt``. One package regenerates in place; the only
other diffs are uv-resolver comment-graph updates (numpy's ``# via``
list) which CI's filter ignores.

Fixes CI failures on PR apache#39603.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mikebridge pushed a commit to mikebridge/superset that referenced this pull request Apr 24, 2026
Phase 1 of versioning added ``sqlalchemy-continuum==1.6.0`` to
``requirements/base.txt`` directly, but the pin was missing from
``pyproject.toml``'s ``[project.dependencies]``. CI's
``check-python-deps`` job regenerates the pinned files from the
``.in`` sources via ``scripts/uv-pip-compile.sh``; without the
pyproject declaration, regeneration strips the pin out, causing:

  ModuleNotFoundError: No module named 'sqlalchemy_continuum'

…on every Python-based job (test-sqlite, test-postgres, test-mysql,
unit-tests, test-postgres-hive, test-postgres-presto,
test-load-examples, docker-build) because ``superset/extensions/
__init__.py`` unconditionally imports from it at module load time.

Adds ``"sqlalchemy-continuum>=1.6.0, <2.0.0"`` to pyproject and
re-runs ``uv-pip-compile.sh`` to sync ``base.txt`` and
``development.txt``. One package regenerates in place; the only
other diffs are uv-resolver comment-graph updates (numpy's ``# via``
list) which CI's filter ignores.

Fixes CI failures on PR apache#39603.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mikebridge mikebridge force-pushed the sc-103156-versioning branch from 8774778 to a1f0ddb Compare April 24, 2026 00:09
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

❌ Patch coverage is 81.55864% with 239 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.00%. Comparing base (512ba43) to head (c338d49).

Files with missing lines Patch % Lines
superset/daos/version.py 83.16% 28 Missing and 22 partials ⚠️
superset/versioning/changes.py 76.43% 33 Missing and 12 partials ⚠️
superset/versioning/diff.py 84.17% 21 Missing and 4 partials ⚠️
superset/dashboards/api.py 67.60% 23 Missing ⚠️
superset/versioning/schemas.py 0.00% 22 Missing ⚠️
superset/versioning/baseline.py 72.72% 14 Missing and 4 partials ⚠️
superset/datasets/api.py 84.72% 10 Missing and 1 partial ⚠️
superset/versioning/dataset_snapshots.py 81.96% 9 Missing and 2 partials ⚠️
superset/charts/api.py 90.27% 7 Missing ⚠️
superset/initialization/__init__.py 85.00% 5 Missing and 1 partial ⚠️
... and 6 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #39603      +/-   ##
==========================================
- Coverage   64.41%   64.00%   -0.41%     
==========================================
  Files        2567     2578      +11     
  Lines      134411   135679    +1268     
  Branches    31203    31381     +178     
==========================================
+ Hits        86584    86846     +262     
- Misses      46330    47277     +947     
- Partials     1497     1556      +59     
Flag Coverage Δ
hive 39.40% <24.45%> (-0.31%) ⬇️
mysql 60.44% <81.01%> (+0.43%) ⬆️
postgres 60.51% <80.55%> (+0.42%) ⬆️
presto 41.51% <41.43%> (+0.05%) ⬆️
python 60.79% <81.40%> (-0.86%) ⬇️
sqlite 60.17% <81.25%> (+0.44%) ⬆️
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

mikebridge pushed a commit to mikebridge/superset that referenced this pull request Apr 27, 2026
Phase 1 of versioning added ``sqlalchemy-continuum==1.6.0`` to
``requirements/base.txt`` directly, but the pin was missing from
``pyproject.toml``'s ``[project.dependencies]``. CI's
``check-python-deps`` job regenerates the pinned files from the
``.in`` sources via ``scripts/uv-pip-compile.sh``; without the
pyproject declaration, regeneration strips the pin out, causing:

  ModuleNotFoundError: No module named 'sqlalchemy_continuum'

…on every Python-based job (test-sqlite, test-postgres, test-mysql,
unit-tests, test-postgres-hive, test-postgres-presto,
test-load-examples, docker-build) because ``superset/extensions/
__init__.py`` unconditionally imports from it at module load time.

Adds ``"sqlalchemy-continuum>=1.6.0, <2.0.0"`` to pyproject and
re-runs ``uv-pip-compile.sh`` to sync ``base.txt`` and
``development.txt``. One package regenerates in place; the only
other diffs are uv-resolver comment-graph updates (numpy's ``# via``
list) which CI's filter ignores.

Fixes CI failures on PR apache#39603.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mikebridge mikebridge force-pushed the sc-103156-versioning branch from 7979999 to 70e21bc Compare April 27, 2026 22:25
@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 27, 2026

Deploy Preview for superset-docs-preview ready!

Name Link
🔨 Latest commit 9d5a459
🔍 Latest deploy log https://app.netlify.com/projects/superset-docs-preview/deploys/6a0d03c0bab6e00008295931
😎 Deploy Preview https://deploy-preview-39603--superset-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

mikebridge pushed a commit to mikebridge/superset that referenced this pull request Apr 28, 2026
Phase 1 of versioning added ``sqlalchemy-continuum==1.6.0`` to
``requirements/base.txt`` directly, but the pin was missing from
``pyproject.toml``'s ``[project.dependencies]``. CI's
``check-python-deps`` job regenerates the pinned files from the
``.in`` sources via ``scripts/uv-pip-compile.sh``; without the
pyproject declaration, regeneration strips the pin out, causing:

  ModuleNotFoundError: No module named 'sqlalchemy_continuum'

…on every Python-based job (test-sqlite, test-postgres, test-mysql,
unit-tests, test-postgres-hive, test-postgres-presto,
test-load-examples, docker-build) because ``superset/extensions/
__init__.py`` unconditionally imports from it at module load time.

Adds ``"sqlalchemy-continuum>=1.6.0, <2.0.0"`` to pyproject and
re-runs ``uv-pip-compile.sh`` to sync ``base.txt`` and
``development.txt``. One package regenerates in place; the only
other diffs are uv-resolver comment-graph updates (numpy's ``# via``
list) which CI's filter ignores.

Fixes CI failures on PR apache#39603.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mikebridge mikebridge force-pushed the sc-103156-versioning branch 2 times, most recently from db1b08c to c338d49 Compare April 30, 2026 17:29
mikebridge pushed a commit to mikebridge/superset that referenced this pull request Apr 30, 2026
Phase 1 of versioning added ``sqlalchemy-continuum==1.6.0`` to
``requirements/base.txt`` directly, but the pin was missing from
``pyproject.toml``'s ``[project.dependencies]``. CI's
``check-python-deps`` job regenerates the pinned files from the
``.in`` sources via ``scripts/uv-pip-compile.sh``; without the
pyproject declaration, regeneration strips the pin out, causing:

  ModuleNotFoundError: No module named 'sqlalchemy_continuum'

…on every Python-based job (test-sqlite, test-postgres, test-mysql,
unit-tests, test-postgres-hive, test-postgres-presto,
test-load-examples, docker-build) because ``superset/extensions/
__init__.py`` unconditionally imports from it at module load time.

Adds ``"sqlalchemy-continuum>=1.6.0, <2.0.0"`` to pyproject and
re-runs ``uv-pip-compile.sh`` to sync ``base.txt`` and
``development.txt``. One package regenerates in place; the only
other diffs are uv-resolver comment-graph updates (numpy's ``# via``
list) which CI's filter ignores.

Fixes CI failures on PR apache#39603.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mikebridge pushed a commit to mikebridge/superset that referenced this pull request May 4, 2026
Phase 1 of versioning added ``sqlalchemy-continuum==1.6.0`` to
``requirements/base.txt`` directly, but the pin was missing from
``pyproject.toml``'s ``[project.dependencies]``. CI's
``check-python-deps`` job regenerates the pinned files from the
``.in`` sources via ``scripts/uv-pip-compile.sh``; without the
pyproject declaration, regeneration strips the pin out, causing:

  ModuleNotFoundError: No module named 'sqlalchemy_continuum'

…on every Python-based job (test-sqlite, test-postgres, test-mysql,
unit-tests, test-postgres-hive, test-postgres-presto,
test-load-examples, docker-build) because ``superset/extensions/
__init__.py`` unconditionally imports from it at module load time.

Adds ``"sqlalchemy-continuum>=1.6.0, <2.0.0"`` to pyproject and
re-runs ``uv-pip-compile.sh`` to sync ``base.txt`` and
``development.txt``. One package regenerates in place; the only
other diffs are uv-resolver comment-graph updates (numpy's ``# via``
list) which CI's filter ignores.

Fixes CI failures on PR apache#39603.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mikebridge mikebridge force-pushed the sc-103156-versioning branch from c338d49 to 04c6b8b Compare May 4, 2026 23:36
mikebridge pushed a commit to mikebridge/superset that referenced this pull request May 5, 2026
Phase 1 of versioning added ``sqlalchemy-continuum==1.6.0`` to
``requirements/base.txt`` directly, but the pin was missing from
``pyproject.toml``'s ``[project.dependencies]``. CI's
``check-python-deps`` job regenerates the pinned files from the
``.in`` sources via ``scripts/uv-pip-compile.sh``; without the
pyproject declaration, regeneration strips the pin out, causing:

  ModuleNotFoundError: No module named 'sqlalchemy_continuum'

…on every Python-based job (test-sqlite, test-postgres, test-mysql,
unit-tests, test-postgres-hive, test-postgres-presto,
test-load-examples, docker-build) because ``superset/extensions/
__init__.py`` unconditionally imports from it at module load time.

Adds ``"sqlalchemy-continuum>=1.6.0, <2.0.0"`` to pyproject and
re-runs ``uv-pip-compile.sh`` to sync ``base.txt`` and
``development.txt``. One package regenerates in place; the only
other diffs are uv-resolver comment-graph updates (numpy's ``# via``
list) which CI's filter ignores.

Fixes CI failures on PR apache#39603.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mikebridge mikebridge force-pushed the sc-103156-versioning branch 2 times, most recently from d632e5e to e4f548e Compare May 7, 2026 21:37
@github-actions github-actions Bot added i18n Namespace | Anything related to localization i18n:french Translation related to French language risk:ci-script PR modifies scripts that execute in CI (supply chain risk) labels May 7, 2026
@mikebridge mikebridge force-pushed the sc-103156-versioning branch 6 times, most recently from 5549d67 to f031a2c Compare May 14, 2026 21:52
@mikebridge mikebridge force-pushed the sc-103156-versioning branch from f031a2c to c2b6db7 Compare May 18, 2026 21:47
Mike Bridge and others added 17 commits May 19, 2026 18:42
Adds a scheduled Celery task that prunes version history older than
``SUPERSET_VERSION_HISTORY_RETENTION_DAYS`` (default 30; settable
via env var; ``0`` disables retention entirely).

**Task** — ``superset.tasks.version_history_retention.prune_old_versions``:

1. Computes ``cutoff = utcnow() - timedelta(days=N)``.
2. Selects ``version_transaction.id`` rows with ``issued_at <
   cutoff`` and filters out any tx whose parent shadow includes a
   live row (``end_transaction_id IS NULL``). The live row is the
   only preservation rule — closed historical rows including the
   baseline (``operation_type=0``) age out. Per-entity minimum-history
   floor is an open question tracked in ``future-work.md``.
3. Deletes rows owned by surviving txs in each parent shadow
   table (``dashboards_version`` / ``slices_version`` /
   ``tables_version``).
4. Deletes child-shadow rows for the same transactions
   (``table_columns_version`` / ``sql_metrics_version`` /
   ``dashboard_slices_version``).
5. Drops the surviving ``version_transaction`` rows. The
   ``version_changes`` rows cascade via the FK from the previous
   commit.

Idempotent and safely retried on partial failure.

**Schedule** — ``superset/config.py`` adds the task to the default
``CeleryConfig.beat_schedule`` (nightly at 03:00). Operators who
override ``CeleryConfig`` in their ``superset_config.py`` need to
merge this entry — see UPDATING.md.

Also adds ``"expose_headers": ["ETag"]`` to the default
``CORS_OPTIONS`` so cross-origin browser clients can read the
``ETag`` header introduced in the next commit. (Co-located here
because both touch ``superset/config.py``; the ETag mechanism
itself ships in the next commit.)

**Auto-discovery** — ``superset/tasks/celery_app.py`` adds
``version_history_retention`` to its late-imports so Celery's
auto-discovery picks up the task.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Helper module that derives the strong-validator ``ETag`` value from
an entity's current live ``version_uuid`` and attaches it to a
Flask response. Two functions:

- ``set_version_etag(response, version_uuid)`` — direct path used by
  PUT handlers that already compute ``new_version_uuid`` (see the
  REST API commit two prior). Cheap; no extra query.
- ``set_version_etag_by_uuid(response, model_cls, entity_uuid)`` —
  used by version endpoints that operate on ``entity_uuid``; looks
  up ``entity_id`` then derives ``version_uuid`` via ``VersionDAO``.
  Costs one extra ``SELECT id WHERE uuid = ?``; documented in the
  docstring so callers prefer the cheap variant when they have the
  id already.

Integration tests cover all three entity types and four endpoint
shapes (entity GET, save PUT, version-list GET, single-version GET)
plus the entity-with-no-versions edge case (header is correctly
absent).

The ETag is wired into the API endpoints in the REST-API commit
(group 3) and the CORS ``expose_headers: ["ETag"]`` ships with the
retention commit (group 4) since both touch ``superset/config.py``.
Locking enforcement (``If-Match`` → 412) is explicitly NOT in this
change — deferred to the follow-up UI SIP per Open Question §7.
``ETag`` is informational in v1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Locks in the no-op-suppression behavior implemented by
``SkipUnmodifiedPlugin`` (which lives in ``superset/versioning/factory.py``
shipping with the foundation commit). Five integration tests:

1. Owners-only edit doesn't mint a version row — exercises the
   case where every dirty column is an excluded relationship.
2. Re-save with identical scalar values doesn't mint a row —
   exercises the json_metadata re-serialise path where
   ``set_dash_metadata`` rewrites the column to a different byte
   sequence with identical parsed content; the plugin must compare
   post-flush values against the prior shadow row to detect this.
3. Real scalar change DOES mint a row — guards against the plugin
   over-suppressing.
4. Same assertion on a Slice (covers the ``String`` column path on
   a different entity type).
5. ``json_metadata`` sub-key edit DOES mint a row — covers the
   ``MediumText`` column path past the plugin's content-equality
   check.

Tests are designed so a column-type change in the parent entities
(e.g. flipping ``json_metadata`` from ``MediumText`` to ``JSON``)
will fail one of these if the plugin's Python ``!=`` comparison
breaks for the new type.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds debug-only ``VersionHistoryDropdown`` widgets to the chart,
dashboard, and dataset list pages so the version surface can be
exercised from the UI during the spike. Each row's actions column
gets a clock-icon dropdown that fetches ``/api/v1/{resource}/<uuid>/
versions/`` on click, lists the ten most recent versions with a
formatted change-log summary, and offers per-version restore via
``POST .../versions/<uuid>/restore``.

Strings are wrapped in ``t('...')`` with placeholder formatting
(e.g. ``t('Added %(kind)s "%(name)s"', { kind, name })``) so
translators can reorder verbs and nouns rather than concatenating
fragments. ``KIND_LABELS`` is a static map keying English layout
kinds (``chart``, ``row``, ``column``, ``tab``, ``markdown``, etc.)
to ``t(...)``-extractable labels. Empty change lists render as
"Baseline" rather than "No changes recorded" since the empty case
is overwhelmingly the ``operation_type=0`` baseline row.

Locale-aware date rendering: ``new Date(iso).toLocaleString(lang)``
where ``lang`` comes from ``document.documentElement.lang`` (set
by ``src/views/App.tsx`` from the bootstrap ``locale``), so dates
follow the user's chosen Superset locale rather than the browser's.

French translations for the new strings are appended to
``superset/translations/fr/LC_MESSAGES/messages.po`` (Ajouté,
Supprimé, Modifié, Version initiale, kind labels, …). Run
``npm run build-translation`` and ``pybabel compile -l fr`` to
regenerate the JSON / MO packs.

This commit is **demo-only** per ADR-005 (V1 is backend-only). It
is intentionally marked ``temp`` so it can be reverted before the
PR splits — the production V1 ships without UI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The v1 import pipeline previously wrote dashboard ↔ chart membership
via raw Core DML (``db.session.execute(delete(dashboard_slices)…)`` +
``db.session.execute(insert(dashboard_slices)…)``). With Continuum's
M2M tracker enabled by the versioning feature, those Core writes
emit malformed shadow INSERTs into ``dashboard_slices_version`` —
the tracker can't see the composite-PK columns through the Core
layer and produces rows with only ``(transaction_id, operation_type)``
populated, triggering a ``NOT NULL`` violation on
``(dashboard_id, slice_id)``.

Rewrites both import paths (``ImportAssetsCommand._import`` in
``commands/importers/v1/assets.py`` and ``ImportDashboardsCommand._import``
in ``commands/dashboard/importers/v1/__init__.py``) to use ORM-level
``dashboard.slices = [...]`` reassignment followed by an explicit
``db.session.flush()``. The explicit flush is necessary to land the
M2M rows before any subsequent autoflush fires an inner-flush event
handler that would reset the relationship change (cf. the SAWarning
``Attribute history events accumulated on N previously clean instances
within inner-flush event handlers have been reset``).

The unit tests previously called ``_import`` directly twice in the same
session — production wraps ``run()`` in ``@transaction`` so each invocation
gets its own DB+Continuum transaction. Added ``db.session.commit()`` between
calls in ``test_import_adds_dashboard_charts``,
``test_import_removes_dashboard_charts``, and
``test_dashboard_import_with_overwrite_replaces_charts`` so the tests
mirror production semantics; otherwise the second call's M2M shadow
inserts conflict with the first call's on
``UNIQUE (dashboard_id, slice_id, transaction_id)``.
…ard.json_metadata

Continuum's no-op suppression compared post-flush column values
byte-for-byte against the previous live shadow row. For
``Dashboard.json_metadata`` that produced false-positive version rows
on saves where the user authored nothing — the frontend re-stamps
``map_label_colors`` (regenerated from the ``LabelsColorMap``
singleton) on every save, plus ``chart_configuration`` /
``global_chart_configuration`` / ``show_chart_timestamps`` /
``color_namespace`` (derived from the current chart set), so two
consecutive identical saves produce different bytes for the column.
The diff engine already excluded those keys via
``DASHBOARD_JSON_METADATA_AUDIT_KEYS`` when computing change records;
the skip-plugin diverged.

Adds a ``_COLUMN_NORMALIZERS`` registry keyed on
``(class_name, column_name)`` that maps to a per-column normalizer
applied to both pre- and post-image before equating. The first
entry parses ``Dashboard.json_metadata`` as JSON and drops the
audit-key set before comparing. The same registry is the extension
point for analogous transient fields on charts and datasets.

Promotes ``_DASHBOARD_JSON_METADATA_AUDIT_KEYS`` to a public name
(``DASHBOARD_JSON_METADATA_AUDIT_KEYS``) so the skip-plugin can import
it from ``superset.versioning.diff`` without reaching across a
leading-underscore boundary.

Integration coverage: ``test_map_label_colors_only_change_does_not_create_version``.
SQLAlchemy doesn't mark a parent as dirty when only its children
(``TableColumn`` / ``SqlMetric`` on ``SqlaTable``) are modified.
Continuum's UnitOfWork only creates operations for entities in
``session.dirty``, so a column-only edit produces shadow rows in
``table_columns_version`` but no parent shadow row in
``tables_version``. ``VersionDAO.list_versions`` queries the parent
shadow, so the version dropdown is empty for child-only saves —
exactly the failure mode reported when "I edited a column description
but no version appeared."

Extends ``register_baseline_listener`` with a new before-flush hook
``_force_parent_dirty_on_child_change`` that walks the existing
``_child_to_parent_registry`` and ``attributes.flag_modified(parent,
<first non-excluded versioned column>)`` whenever a versioned child
is dirty / new / deleted but the parent's own scalars haven't been
touched. The flag puts the parent in ``session.dirty`` so Continuum's
UoW creates a parent UPDATE operation; the resulting shadow row's
scalar columns mirror the previous version (only the children
actually changed), and the row exists to anchor the transaction in
the parent's version chain.

``SkipUnmodifiedPlugin._is_no_op_update`` is updated in this commit's
predecessor to recognize the "scalars match but children dirty" case
via ``_has_dirty_versioned_children`` so the forced parent UPDATE
isn't skipped.

Integration coverage: ``test_dataset_column_edit_creates_parent_version``.
…ert restore

VersionDAO.restore_version previously called Continuum's Reverter
once per relation in a split-revert loop with flush + expire between
calls. That closed an autoflush race in the Reverter when multiple
relations were reverted at once, but split one logical restore across
multiple Continuum transactions — and once the change-records listener
was wired up, the listener's tx-dedup guard skipped the second pass,
silently dropping child-addition records from version_changes. A
restore that re-added a calculated column would render as an empty
"Baseline" entry in the dropdown.

Replaces the split-revert with a single ``target_version.revert(relations=relations)``
call wrapped in a new ``single_flush_scope(db.session)`` context
manager (``superset/versioning/utils.py``). The context manager
suppresses autoflush inside the block and issues one trailing flush
on clean exit; on exception, the trailing flush is skipped so the
session's normal rollback path handles cleanup. Same autoflush window
closed, one Continuum transaction instead of N, the change-records
listener sees the complete shadow state in one after_flush pass.

The wrapper carries the full autoflush-race / cascade-add rationale
in its docstring so the restore_version call site can be a short
6-line block referencing it.

Integration coverage: ``test_restore_emits_full_child_diff_in_one_transaction``.
…sion bug

The full-Continuum spike (ADR-004 revised) replaced the JSON-snapshot
restore path with Continuum's native Reverter and removed the
``dataset_snapshots`` / ``dashboard_snapshots`` tables from the
migration chain. Seven VersionDAO methods and two module-level
helpers that read/wrote those tables stayed in the code anyway and
went unused — dead code that looked live.

Worse, ``VersionDAO.get_version`` still read from
``dataset_snapshots`` in its SqlaTable branch. On any environment
where the snapshot tables don't exist (current production behavior),
``GET /api/v1/dataset/<uuid>/versions/<version_uuid>/`` raised
``OperationalError``. The branch is rewritten to read column and
metric state from Continuum's child shadow tables
(``table_columns_version`` / ``sql_metrics_version``) via the
existing ``_shadow_rows_valid_at`` helper.

Deleted:
- ``_deserialize_snapshot_value`` (module helper)
- ``_coerce_snapshot_list`` (module helper)
- ``RESTORE_EXCLUDE_FIELDS`` (constant — only referenced by deleted code
  and a docstring)
- ``VersionDAO._restore_dataset_children``
- ``VersionDAO._parse_slice_ids_json``
- ``VersionDAO._apply_dashboard_slices``
- ``VersionDAO._restore_dashboard_children``
- ``VersionDAO._apply_snapshot_children``

The corresponding ~17 unit tests in
``tests/unit_tests/daos/test_version_dao.py`` are removed alongside.

Stale docstring references in ``versioning/changes.py`` and
``versioning/diff.py`` that pointed at the retired snapshot tables are
also cleaned up.

Also strips an 8-line comment block in ``restore_version`` that
duplicated the docstring of ``_stamp_audit_fields_for_restore``.

Net: −290 lines from ``daos/version.py``; a production-shape bug
fixed; dead code that looked live is gone.
…store commands onto BaseRestoreVersionCommand

Two coupled clean-code review fixes:

(1) Rename ``VersionDAO._find_active_entity_by_uuid`` →
``find_active_by_uuid``. The leading-underscore + three
``# pylint: disable=protected-access`` suppressions in the restore
commands were the smell of a wrongly-private API. The method is a
perfectly reasonable public DAO operation; dropping the underscore
removes the suppressions.

(2) Collapse ``RestoreChartVersionCommand``, ``RestoreDashboardVersionCommand``,
``RestoreDatasetVersionCommand`` onto a shared
``BaseRestoreVersionCommand`` (``superset/commands/version_restore.py``).
The three classes were textbook copy-paste — identical except for
the model class and three exception types. Each subclass now declares
``model_cls`` + ``not_found_exc`` + ``forbidden_exc`` and overrides
``run()`` with one ``@transaction(reraise=<failed_exc>)``-decorated
line delegating to ``self._do_restore()``. ~80 lines per file →
~45 lines per file; one shared workflow instead of three drift sources.

The api.py imports of ``RestoreChartVersionCommand`` /
``RestoreDashboardVersionCommand`` / ``RestoreDatasetVersionCommand`` are
unchanged — public class names preserved.
… regen lockfile

DashboardList demo dropdown previously instructed the user to "Reload
the page to see the change" after a restore. The URL the user
returns to may still carry ``?native_filters_key=…`` /
``permalink_key`` / ``form_data_key`` from a prior session — those
point at server-cached snapshots (in ``key_value`` and the
filter-state cache) captured before the restore. On rehydration the
cached state is merged on top of the restored ``json_metadata``,
masking the rollback (e.g. dashboard-level colour-scheme restore
appears not to take effect).

Replaces the alert + manual reload with a direct ``window.location.href``
navigation to ``/superset/dashboard/<uuid>/`` — drops all URL params,
forcing hydration from the freshly restored DB state.

Also regenerates ``package-lock.json`` to pick up the ``zod 4.4.1 →
4.4.3`` bump that master's ``package.json`` already reflects.

(``temp(versioning)`` prefix per the demo dropdown's status — this
file is not part of V1 scope per ADR-005; the V2 UI SIP owns the
actual restore UI surface.)
VersionDAO carried five distinct concerns under one class — UUID
derivation, version metadata queries, change-record loading,
single-version snapshot retrieval, and restore orchestration. Bob's
"and" test (the clean-code review flagged this as the next structural
fix after the dead-code purge) gives ~600 lines of "queries about
versioned state of one entity AND the workflow that mutates it."

Splits the read and write sides into purpose-built modules:

- ``superset/versioning/queries.py`` — UUID derivation
  (``VERSION_UUID_NAMESPACE``, ``derive_version_uuid``) + read-side
  helpers (``find_active_by_uuid``, ``current_version_number``,
  ``current_live_transaction_id``, ``current_live_version_uuid``,
  ``list_versions``, ``resolve_version_uuid``, ``get_version``,
  ``list_change_records_batch``). ~475 lines.

- ``superset/versioning/restore.py`` — write-side (``restore_version``,
  ``_stamp_audit_fields_for_restore``, ``_RESTORE_RELATIONS``).
  ~140 lines. Depends only on ``queries.find_active_by_uuid`` and
  ``utils.single_flush_scope``.

- ``superset/daos/version.py`` — collapsed to an ~85-line backward-compat
  façade that re-exports both modules under a single ``VersionDAO``
  class via ``staticmethod`` aliases. The module also re-exports
  ``VERSION_UUID_NAMESPACE`` and ``derive_version_uuid`` at module level
  so the ~10 existing callers (api.py handlers, command classes, the
  ETag emitter, integration tests) don't have to change their imports.
  New code is encouraged to import from the sub-modules directly.

The functions themselves are unchanged byte-for-byte aside from
internal call sites being rewritten from ``VersionDAO.foo`` to the bare
function name (since they now live as module-level functions, not
class methods).

One unit-test mock target moved: ``test_restore_version_returns_none_for_unknown_entity``
now patches ``superset.versioning.restore.find_active_by_uuid`` (the
actual call site) instead of ``VersionDAO.find_active_by_uuid`` (which
is now just an alias).

Each of the three modules now has one reason to change. When the
sc-103157 soft-delete pass adds the ``deleted_at IS NULL`` filter to
``find_active_by_uuid``, it touches only ``queries.py``. When a
per-entity-type restore Strategy replaces the string-keyed
``_RESTORE_RELATIONS`` dispatch, it touches only ``restore.py``.
Cleanup pass from the SQLAlchemy + migration code review. Eight items,
all in the "warnings / suggestions" tier — no behaviour change visible
to the API, but each closes a real correctness, perf, or maintainability
concern surfaced in review.

baseline.py
- Delete unused ``_get_user_id`` (W1). The function wrapped a broad
  ``except Exception:  # noqa: S110`` swallow that hid bugs; grep
  confirmed no callers anywhere. The legitimate audit-field paths
  (``row.get("changed_by_fk")`` etc.) already drive the
  ``version_transaction.user_id`` write.
- Batch ``_baseline_attached_slices`` from O(N) round-trips to
  three queries (W2): one membership SELECT, one existing-shadow
  SELECT, one bulk live-row SELECT for the missing ids. The previous
  per-slice ``COUNT(*)`` + ``SELECT`` was a measurable first-save
  hotspot on dashboards with many charts. Drops the now-unused
  ``_slice_has_shadow`` helper.
- Pick a stable column name for ``flag_modified`` in
  ``_force_parent_dirty_on_child_change`` (W3). ``uuid`` is on all
  three versioned parent classes and excluded by none, so the
  flagged attribute is deterministic across SQLAlchemy versions /
  mapper-config orders instead of depending on
  ``versioned_column_properties(parent)[0]``. Falls back to the
  first available column for forks that exclude ``uuid``.

changes.py
- Add ``Decimal`` handling to ``_jsonable`` (W4) — ``json.dumps``
  rejects ``Decimal``, so any numeric column (e.g. ``SqlMetric.currency``
  contents, or fork/plugin Decimal columns) would crash the bulk
  insert. Stringify rather than ``float()`` to preserve precision;
  the diff engine compares ``from_value`` / ``to_value`` by string
  equality after this coercion so both sides round-trip identically.

queries.py
- Promote the inline ``{0: "baseline", 1: "update", 2: "delete"}``
  dict to module-level ``_OP_TYPE_LABELS`` (W7). The literal was
  duplicated across ``list_versions`` and ``get_version``; the third
  caller is one bug fix away.
- Comment on ``resolve_version_uuid``'s Python-side ``derive_version_uuid``
  loop (W8) — no portable SQL form for UUIDv5 across PostgreSQL /
  MySQL / SQLite, iteration count is bounded by the retention
  window. Flags the place to revisit if retention is ever disabled
  (``=0``) on a heavily-edited entity.

migrations/2026-05-01_23-36 (composite-PK)
- Belt-and-braces guard in ``_downgrade_mysql_table`` (W6): asserts
  ``t.name in AFFECTED_TABLES`` before interpolating into the
  backtick-quoted ALTER statements. The invariant was already
  structurally implied (callers iterate ``AFFECTED_TABLES``), but
  making it load-bearing means a future refactor can't slip an
  arbitrary table name through.

(W5 was verified-no-change: grepped ``tests/`` for ``metadata.create_all``
callers that exercise versioning tables; none. The cascade-FK
gap on ``version_changes.transaction_id`` is already documented
in ``tests/integration_tests/versioning/change_records_tests.py:27-32``.)

62 versioning unit tests pass.
…t_version

After the SRP split (8c9cf36) put both functions in the same module
~150 lines apart, their overlap became visible: same JOIN of
version_table → version_transaction → ab_user, same baseline-first
ordering, same user-row → ``changed_by`` projection, same lookup
``_ENTITY_KIND_BY_CLASS_NAME.get(model_cls.__name__)``. About 30 lines
of duplication.

Five small helpers extracted at the module top:

- ``_resolve_version_tables(model_cls)`` returns ``(ver_tbl, tx_tbl, user_tbl)``
- ``_version_with_tx_user_join(ver_tbl, tx_tbl, user_tbl)`` builds the join
- ``_baseline_first_ordering(ver_tbl)`` returns the order-by tuple
- ``_user_select_cols(user_tbl)`` returns the user-column list with
  ``user_id`` as the stable label (normalises the prior asymmetry
  where ``list_versions`` labelled it ``user_id`` and ``get_version``
  labelled it ``_user_id`` to dodge a column-name collision — the
  ``user_id`` label collides with neither)
- ``_changed_by_from_row(row)`` projects user columns onto the API shape
- ``_entity_kind_for(model_cls)`` resolves the change-records taxonomy lookup

Both call sites get shorter and read what they do (build query / project
user / build row) rather than how. Behavior unchanged; no test changes.

Also two small inline tidyings while in the file:

- Replace the ternary
  ``changes_by_tx = list_change_records_batch(...) if entity_kind else {}``
  with an explicit two-line if-statement in both functions. The ternary
  buries the decision; the if-statement reads as one thought.
- Inline the one-shot ``meta_cols`` set declaration in ``get_version``
  into the ``if col.name in {...}`` check that uses it three lines later.

Net: about 110 lines → about 80 lines across the two functions, plus
a small helper section at the top.
baseline.py:_insert_baseline_row and changes.py:_read_pre_state both
issued the same "read a single row through ``session.connection()``
inside ``with session.no_autoflush:``" pattern. Same five-line block,
same intent ("read the pre-flush state without triggering the in-flight
edit's flush").

Promoted to ``superset.versioning.utils.read_row_outside_flush(session,
table, entity_id)``. Companion to ``single_flush_scope`` — they sit
next to each other in utils.py and frame the two directions of the
"don't autoflush mid-listener" pattern.

Returns ``dict[str, Any]`` (or ``None``) so callers can't accidentally
hold a cursor-bound ``RowMapping`` past the listener boundary. Both
call sites get shorter by ~5 lines.

Also picks up Decimal stringification in the changes.py docstring
update (was listed in the W4 commit but the docstring still said
"(datetime, UUID, bytes)" — now matches the implementation).

Behaviour unchanged. 96 unit tests pass.
…icle order)

Pure file shuffle, zero behaviour change. Reorders ``baseline.py`` so it
reads top-down by level of abstraction (newspaper-article rule): the
public entry point at the top, supporting helpers descending below.

Before: 14 private helpers, then ``register_baseline_listener`` at the
bottom. A reader opening the file met the leaf builders first and had
to accumulate context before finding the call site.

After (top-down):

  - Entry point: ``register_baseline_listener`` + inner ``capture_baseline``
  - High-level helpers used by ``capture_baseline``:
      ``_force_parent_dirty_on_child_change``,
      ``_collect_parents_to_baseline``,
      ``_child_to_parent_registry``,
      ``_version_table_for``,
      ``_shadow_row_count``,
      ``_insert_baseline_and_children``
  - Mid-level builders:
      ``_insert_baseline_row``,
      ``_baseline_children_for_parent``
  - Per-entity child handlers + their dispatch table:
      ``_baseline_dataset_children``,
      ``_baseline_dashboard_children``,
      ``_CHILD_BASELINE_HANDLERS``
  - Leaf builders:
      ``_insert_child_baseline_rows``,
      ``_baseline_attached_slices``,
      ``_insert_synthetic_slice_baseline``

Three section-divider comments mark the abstraction levels. The
``_CHILD_BASELINE_HANDLERS`` dict literal stays after its referenced
handlers (module-level literals evaluate at import time and need names
already bound); a comment now flags this constraint.

Function bodies are byte-for-byte unchanged; ``git log -L`` on any
function shows only its relocation. 96 unit tests pass.
…phan version_transaction rows inline

Extends the existing docstring note ("the orphan is swept by retention")
with the reasoning behind not cleaning it up in the same flush. The
inline-delete is appealing in principle but would couple this plugin
to the change-records listener's buffer state via the ON DELETE
CASCADE on ``version_changes.transaction_id``: both listeners would
have to agree that the flush produced nothing before the version_transaction
row could be dropped safely. The orphan's ~40-byte storage cost +
retention's correct-by-construction handling (orphans have no parent
shadow, so they're never in the "preserve" set) make the coordination
overhead not worth it.

Captures the design decision in the file where the next reader will
look for it.
Mike Bridge added 7 commits May 20, 2026 14:12
…/M3/M5)

Three small follow-ups surfaced by aminghadersohi's review of the
SoftDeleteMixin PR (apache#39977) that apply equally here:

- H1: cache _child_to_parent_registry() with functools.cache. Called
  twice per save flush; mapping depends only on import-time model
  classes, so unbounded cache is the right shape (no invalidation).
- M5: tighten _CHILD_BASELINE_HANDLERS type from dict[str, Any] to
  dict[str, Callable[[Session, Any, int], None]] via a named alias.
  Mypy now catches a future broken handler signature.
- M3/M4: explain the inline-import pattern once in the module
  docstrings of baseline.py and changes.py. Both modules use
  pylint disable=import-outside-toplevel uniformly because they
  load during init_versioning() before mappers are configured;
  the per-callsite "why" comments would just repeat the same
  reason. Module-level explanation + a hint to comment unusual
  cases is the cleaner shape.

M6 (listener placement) doesn't apply — init_versioning() already
runs inside init_app_in_ctx(). M8 (loose OpenAPI schema in
*/api.py docstrings) is real but its own change.
The force-parent-dirty listener was calling attributes.flag_modified
on every parent reachable from a dirty child — including parents
themselves in session.new (e.g. brand-new SqlaTable + brand-new
TableColumns from POST /api/v1/dataset/). flag_modified rejects
unloaded attributes, and a session.new SqlaTable's uuid (default=uuid4
fires at flush time) is unloaded until then. CI caught this with
InvalidRequestError cascading into 422s across dataset creation /
upload / Playwright dataset specs.

The hook is only needed for the persistent-and-clean case (child
edited, parent's own scalars untouched, dropdown otherwise empty).
Anything in session.new will flush anyway; anything in session.dirty
is already flagged; session.deleted shouldn't be touched. Short-
circuit before the flag_modified call.

Unblocks test-sqlite, test-mysql, test-postgres (previous), and
playwright dataset specs.
…ions

When one ORM flush touches multiple versioned entities (dashboard +
slice + dataset all save at tx=X), each gets a shadow row sharing
that tx. If only the dashboard is later edited at tx=Y, the
dashboard row at tx=X is closed (end_tx=Y) while slice/dataset rows
stay live at tx=X. Retention then preserves tx=X (slice/dataset are
live there) and prunes tx=Y. The dashboard's closed row at tx=X
survives step 1, then its end_transaction_id=Y trips the FK when
step 2 deletes version_transaction row Y.

Fix: extend the shadow-row delete to also match end_transaction_id
IN tx_ids. Live rows have end_tx=NULL so they're never matched by
either predicate. Closed rows that touch a pruned tx at either
endpoint are pruned together — consistent with retention semantics
(any tx in the row's lifespan is gone, so the row's chain is broken
anyway).

Unblocks test_retention_prunes_old_rows on sqlite, mysql, postgres.
- ruff: import sort + E501 reflow on the parent-state guard in
  baseline.py
- ruff format: function-signature collapse and join-chain reflow in
  queries.py
- auto-walrus: two ``entity_kind = …; if … is not None:`` patterns
  in queries.py converted to assignment-expressions
… catch

The previous attempt (d0520f6) was too aggressive: skipping when
parent is in session.dirty/new/deleted bypassed the
persistent-and-clean case the hook EXISTS for. Some upstream code
paths put the dataset in session.dirty *before* this listener fires
(API controllers touching audit fields, etc.), so the
session-membership pre-check made us silently no-op on the very
scenario the hook needs to handle. CI symptom:
test_dataset_column_edit_creates_parent_version showed before=317,
after=317 (parent shadow not written).

Restore the unconditional flag_modified and catch the specific
InvalidRequestError that fires only for the session.new case
(uuid default callable hasn't populated state yet). Other states
fall through to the original behavior:
- persistent + clean → flag_modified succeeds, parent goes dirty,
  Continuum picks it up, SkipUnmodifiedPlugin keeps the row via
  _has_dirty_versioned_children. ✓
- persistent + dirty → flag_modified is harmless (already dirty).
- session.new → InvalidRequestError, skip (parent INSERTs anyway).
- session.deleted → flag_modified may or may not raise; if it does,
  we skip; if not, the delete dominates.

Should unblock test_dataset_column_edit_creates_parent_version,
test_get_version_returns_historical_snapshot_with_children, and
test_restore_with_column_edits_reverts_columns.
- factory.py: TID251 banned ``import json``; switch to
  ``from superset.utils import json`` (project convention).
- factory.py: ruff format reflow on _matches_previous_version.
- version_restore.py: ruff format collapse on restore_version call.

CI was pinning a different ruff version than my local uvx default;
re-ran against ruff==0.9.7 (the version in requirements/development.txt)
which surfaced these.
…rent-dirty

flag_modified(parent, "uuid") was producing FK integrity failures via
the column's BLOB/BINARY round-trip: SQLAlchemy logs the param as
``<memory at 0x…>`` and the UUID round-trip doesn't always match the
in-memory value byte-for-byte. Symptom: in scenarios where the parent
is already going to flush (Reverter applying historical state during
restore, RLS test triggering autoflush during a query), our added
``uuid`` UPDATE column tripped the FK check.

Pick ``description`` instead — plain Text column on all three
versioned parent classes (Dashboard, Slice, SqlaTable), no
TypeDecorator, no marshaling layer. Flagging it round-trips its
current value safely. Fallback chain ``description → uuid → col_keys[0]``
keeps the original deterministic-pick property for forks/subclasses
that excluded ``description``.

Should unblock test_restore_applies_scalar_field and the
test_rls_filter_alters_no_role_user_birth_names_query autoflush
error.
Mike Bridge added 3 commits May 21, 2026 15:55
…hanges

_force_parent_dirty_on_child_change was firing whenever ANY
TableColumn or SqlMetric of the parent appeared in
session.dirty / new / deleted — even when the child was there for
non-content reasons:

- Lazy-load side effects when a relationship is touched
- M2M relationship-cascade artifacts (e.g. RLS setUp doing
  rls_entry.tables.extend([dataset]) triggers cascade behavior
  that pulls children into the session)
- AuditMixin auto-bumps from earlier code paths
- Reverter side passes during restore

Force-touching the parent in those cases produced an incidental
UPDATE tables SET description=…, changed_on=…, changed_by_fk=…
whose changed_by_fk value or autoflush ordering tripped FK
integrity on some dialects. Symptoms:

- test_rls_filter_alters_no_role_user_birth_names_query → FK
  IntegrityError on autoflush during a query
- test_restore_applies_scalar_field → 422 "Dataset could not be
  updated" during restore

Fix: gate on Continuum's is_modified(child), which returns True
only when a non-excluded versioned column on the child has
SQLAlchemy attribute-history changes. New objects (session.new)
and genuinely-modified rows still flag the parent; phantom-dirty
rows do not.

The intended hook semantics — "child edit forces a parent shadow
row" — are preserved: a column-description edit through the
dataset API still triggers is_modified True, still flags the
parent. See test_dataset_column_edit_creates_parent_version.
Pre-commit (previous) flagged I001 unsorted-imports on the
backward-compat façade. Two queries imports merged into one
block (the aliased ``derive_version_uuid as _derive_version_uuid``
moves inline rather than living in its own block), and the
restore-side names sorted: ``_RESTORE_RELATIONS``,
``_stamp_audit_fields_for_restore``, ``restore_version``.

Pure mechanical reformatting; no behaviour change.
Previous fix (9c2391d) gated the force-parent-dirty hook on
is_modified(child) for ALL session collections (dirty/new/deleted).
That was over-restrictive: is_modified checks attribute history,
and deletion is a state transition with no attribute history —
so deleted children evaluated as not-modified and the parent
wasn't flagged. The change-records listener then didn't see the
deletion and no removal record was emitted.

Symptom: test_restore_emits_full_child_diff_in_one_transaction
failed expecting a column-removed change record after a restore
that removed the column; instead only the parent's scalar fields
appeared in observed paths.

Refine: apply the is_modified filter ONLY to persistent rows in
session.dirty. session.new (creation) and session.deleted
(removal) are always real content changes by virtue of their
session-collection membership — no is_modified check needed (and
in deletion's case, the check returns the wrong answer).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api Related to the REST API i18n:french Translation related to French language i18n Namespace | Anything related to localization risk:ci-script PR modifies scripts that execute in CI (supply chain risk) risk:db-migration PRs that require a DB migration size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant