Skip to content

Fix: Include unexpired downstream views when cleaning up expired tables#5098

Closed
erindru wants to merge 6 commits intomainfrom
erin/janitor-drop-order
Closed

Fix: Include unexpired downstream views when cleaning up expired tables#5098
erindru wants to merge 6 commits intomainfrom
erin/janitor-drop-order

Conversation

@erindru
Copy link
Collaborator

@erindru erindru commented Aug 6, 2025

Fixes #5083 #4767

Prior to this, it was possible for the janitor to drop a snapshot table while another snapshot view still depended on it.

The reason is not related to DAG order - it turns out the janitor always drops in DAG order.

Rather, the reason was that it's possible to get into a situation where a snapshot table has expired (based on updated_ts + ttl_ms being earlier than the current time) but a downstream view that depends on it had not expired, because it was created later.

A naive implementation might simply just call DROP... CASCADE to automatically drop downstream objects but the problem with this is that while the physical objects are gone there will still be references in SQLMesh state.

So this PR updates the code that detects expired objects to also consider unexpired views downstream of expired tables.

Open question: Is it ever valid to drop an expired upstream dependency and not cascade that to all downstream dependencies? Should we be dropping all downstream unexpired tables too?

snapshots_by_version = defaultdict(set)
snapshots_by_dev_version = defaultdict(set)
for s in snapshots:
for s in sv_snapshots:
Copy link
Collaborator Author

@erindru erindru Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These variable renames were just to placate mypy, because snapshot is used above to refer to objects of type Snapshot which does not match the objects of type SharedVersionSnapshot used here

@erindru erindru force-pushed the erin/janitor-drop-order branch 2 times, most recently from d06a9ef to 65b4c2e Compare August 6, 2025 06:13
@erindru erindru marked this pull request as ready for review August 6, 2025 06:32
@erindru erindru force-pushed the erin/janitor-drop-order branch 2 times, most recently from eeba4dd to 854f3ec Compare August 6, 2025 20:58
if not ignore_ttl:
expired_query = expired_query.where(
(exp.column("updated_ts") + exp.column("ttl_ms")) <= current_ts
((exp.column("updated_ts") + exp.column("ttl_ms")) <= current_ts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m wondering if it might make sense to break this into two steps, first querying only with the initial condition
(exp.column("updated_ts") + exp.column("ttl_ms")) <= current_ts
then only if that query returns results, we proceed with a second query for the views. the downside is that it adds an extra query in cases where there are expired tables.but I’m thinking about the scenario where there are no expired tables, where currently we still end up querying snapshots via full_candidates = self.get_snapshots(candidates.keys()) for all views and going through the full logic, even though there won’t be any results in the end. that said im not sure since both have tradeoffs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I understand the concern. I've broken it into the following steps (which only apply if ignore_ttl is not set):

  • Query a count of expired snapshots
  • If that count is >0, add in the OR clause to the query to fetch all view snapshots as well
  • If that count is 0, use it to short-circuit and return

adapter.create_table(long_table_name, {"col": exp.DataType.build("int")})


def test_janitor_drops_downstream_unexpired_hard_dependencies(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you also add a test for a transitive dependency? For example:
Table A is expired ← View B (not expired) ← View C (not expired)
I believe this case was handled with the reversed dag before and that View C got picked up, but I’m not entirely certain without the dag construction now, so it’d be good to have a test to confirm

Copy link
Collaborator Author

@erindru erindru Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're right, the drops weren't cascading through transitive dependencies. I've re implemented this to use a DAG again which simplified the implementation because it can be traversed in topological order and for any given node we can:

  • check if its expired (gets put directly in the expired list)
  • if it's a view, check if any of its parents are in the expired list. if they are, it should be expired too

I also updated the tests to test transitive dependencies too

@erindru erindru force-pushed the erin/janitor-drop-order branch from 78cc597 to 532d1c5 Compare August 8, 2025 00:29
Copy link
Contributor

@themisvaltinos themisvaltinos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, thanks for addressing comments and adding the extra tests

if snapshot.expiration_ts <= current_ts:
# All expired snapshots should be included regardless
expired_candidates[snapshot.snapshot_id] = snapshot.name_version
elif snapshot.model_kind_name == ModelKindName.VIEW:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be filtered by a database rather than the application layer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you mean? Line 242 adjusts the query to the database to include the views:

            if expired_record_count > 0:
                expired_query = t.cast(
                    exp.Select, expired_query.or_(exp.column("kind_name").eq(ModelKindName.VIEW))
                )

The DAG can only be built in the application layer, right? Since we have no way of knowing at the database level what views are affected, don't we have to select all views and filter them at the application layer?

We can't filter to expired views in the DB query because that's what caused this problem to begin with.

@erindru
Copy link
Collaborator Author

erindru commented Aug 12, 2025

After internal discussion, closing in favour of #5133

@erindru erindru closed this Aug 12, 2025
@erindru erindru deleted the erin/janitor-drop-order branch August 20, 2025 03:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unexpected behaviour with FULL model kinds (DuckDB+Postgres)

3 participants