Skip to content

[Fabric/TSQL] Removed models persist in state snapshot when using virtual_environment_mode = "dev_only" #5775

@fresioAS

Description

@fresioAS

Describe the bug

When using Microsoft Fabric (TSQL) with virtual_environment_mode = "dev_only", models that have been removed from a plan continue to exist as snapshot records in the state. This causes the janitor to repeatedly attempt cleanup of non-existent (or already-dropped) models and fail.

Environment

  • Engine: Microsoft Fabric / TSQL
  • virtual_environment_mode = "dev_only"

Steps to reproduce

  1. Configure SQLMesh with a Fabric connection and virtual_environment_mode = "dev_only"
  2. Apply a plan that includes a model (e.g., my_schema.my_model)
  3. Remove the model from the project and apply a new plan
  4. Run the janitor (or wait for it to run automatically)
  5. Observe that the snapshot record for the removed model still exists in state

Expected behavior

After the janitor runs, the snapshot record for the removed model should be deleted from state.

Actual behavior

The snapshot record persists in state. The janitor logs errors on each subsequent run when attempting to clean up the model.

Root cause analysis

The issue is a two-step failure in the janitor's cleanup flow in sqlmesh/core/janitor.py:

snapshot_evaluator.cleanup(target_snapshots=batch.cleanup_tasks, ...)  # step 1
state_sync.delete_expired_snapshots(...)                                # step 2

delete_expired_snapshots (step 2) is only called if cleanup (step 1) succeeds. If step 1 raises an exception, the snapshot records are intentionally retained for retry — but on Fabric, the retry never succeeds.

The failure originates in _cleanup_snapshot in sqlmesh/core/snapshot/evaluator.py:

try:
    evaluation_strategy.delete(table_name, ...)
except Exception:
    if adapter.get_data_object(table_name) is not None:
        raise  # re-raises if table still exists
    logger.warning("Skipping cleanup ...")

In dev_only mode, snapshot.table_name(is_deployable=True) returns the original unversioned table name (e.g., my_schema.my_model) rather than a versioned sqlmesh__-prefixed name. This table lives in the user's actual Fabric warehouse (catalog), so accessing it requires a catalog switch.

Fabric's set_current_catalog implementation closes the connection pool and reopens it with a new catalog configuration (sqlmesh/core/engine_adapter/fabric.py). This teardown/rebuild cycle can fail in two ways:

  • Mode A: drop_table fails due to a connection/auth error triggered by the catalog switch. The fallback get_data_object call then also throws (same connection state issue), propagating the exception up and aborting the cleanup batch before delete_expired_snapshots is reached.
  • Mode B: drop_table fails, get_data_object returns non-None (the drop failed so the table still exists), the original exception is re-raised — same result.

This is specific to dev_only mode because in full mode the physical tables use versioned sqlmesh__schema names that are less likely to require a catalog switch during cleanup.

Relevant code locations

  • sqlmesh/core/janitor.py — sequential cleanupdelete_expired_snapshots flow
  • sqlmesh/core/snapshot/evaluator.py_cleanup_snapshot exception handling
  • sqlmesh/core/engine_adapter/fabric.pyset_current_catalog / _drop_catalog connection teardown
  • sqlmesh/core/state_sync/db/snapshot.pyget_expired_snapshots (snapshot expiry detection works correctly; the problem is in the cleanup step)

Possible fix

The get_data_object fallback call in _cleanup_snapshot should be made more resilient to Fabric connection errors — either by catching exceptions from get_data_object itself and treating them as "table unknown / skip", or by ensuring the Fabric adapter properly handles catalog context before the get_data_object query. Additionally, it may be worth investigating whether the catalog switch can be avoided entirely during janitor cleanup by using a fully-qualified table name query against INFORMATION_SCHEMA that does not require switching the active catalog.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions