Skip to content

fix: Use Insert/Overwrite for Replace to Fix Dataframe Batching#2031

Merged
eakmanrq merged 3 commits intomainfrom
eakmanrq/fix_df_batching_full_refresh
Jan 26, 2024
Merged

fix: Use Insert/Overwrite for Replace to Fix Dataframe Batching#2031
eakmanrq merged 3 commits intomainfrom
eakmanrq/fix_df_batching_full_refresh

Conversation

@eakmanrq
Copy link
Contributor

Some engines don't support "CREATE OR REPLACE AS SELECT..." which means we had to come up with another way to replace the contents in a given table. Prior to this PR, some engines had custom logic to do this replacement. You could though think of a CREATE OR REPLACE AS SELECT... as a INSERT/OVERWRITE where the table already exists. Therefore this change makes replace_query use INSERT/OVERWRITE when it can't replace. Since INSERT/OVERWRITE already properly supports batching this will resolve the issues users were reporting while simplifying the code.

@eakmanrq eakmanrq marked this pull request as draft January 26, 2024 21:12
@eakmanrq eakmanrq force-pushed the eakmanrq/fix_df_batching_full_refresh branch from 0914992 to b98b81e Compare January 26, 2024 21:38
ctx.compare_with_current(table, replace_data)


def test_replace_query_batched(ctx: TestContext):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added to specifically test the issue users reported (and used to fail for some engines).

@eakmanrq eakmanrq force-pushed the eakmanrq/fix_df_batching_full_refresh branch from b2cecad to c9d73ec Compare January 26, 2024 21:57
@eakmanrq eakmanrq marked this pull request as ready for review January 26, 2024 22:07
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this result in line 326 running as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but that is fine right? Since it self-references then it would do a CREATE OR REPLACE AS SELECT... and reference itself so it would actually run. Without this it would error since it would reference a table that doesn't exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happened before this was added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we wouldn't see this within SQLMesh since we always have a table but this now properly ensures that is the case regardless of context.

@eakmanrq eakmanrq force-pushed the eakmanrq/fix_df_batching_full_refresh branch from c9d73ec to c0bbe57 Compare January 26, 2024 22:16
@eakmanrq eakmanrq enabled auto-merge (squash) January 26, 2024 22:44
@eakmanrq eakmanrq merged commit 56ef74e into main Jan 26, 2024
@eakmanrq eakmanrq deleted the eakmanrq/fix_df_batching_full_refresh branch January 26, 2024 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants