fix: Use Insert/Overwrite for Replace to Fix Dataframe Batching#2031
fix: Use Insert/Overwrite for Replace to Fix Dataframe Batching#2031
Conversation
0914992 to
b98b81e
Compare
| ctx.compare_with_current(table, replace_data) | ||
|
|
||
|
|
||
| def test_replace_query_batched(ctx: TestContext): |
There was a problem hiding this comment.
This was added to specifically test the issue users reported (and used to fail for some engines).
b2cecad to
c9d73ec
Compare
sqlmesh/core/engine_adapter/base.py
Outdated
There was a problem hiding this comment.
could this result in line 326 running as well?
There was a problem hiding this comment.
Yeah but that is fine right? Since it self-references then it would do a CREATE OR REPLACE AS SELECT... and reference itself so it would actually run. Without this it would error since it would reference a table that doesn't exist.
There was a problem hiding this comment.
what happened before this was added?
There was a problem hiding this comment.
Looks like it would error if self-referencing: https://github.com/TobikoData/sqlmesh/blob/c0bbe57041b090bd8c802e8ca04174040d8ba2f8/sqlmesh/core/engine_adapter/base.py#L618-L626
There was a problem hiding this comment.
Note that we wouldn't see this within SQLMesh since we always have a table but this now properly ensures that is the case regardless of context.
c9d73ec to
c0bbe57
Compare
Some engines don't support "CREATE OR REPLACE AS SELECT..." which means we had to come up with another way to replace the contents in a given table. Prior to this PR, some engines had custom logic to do this replacement. You could though think of a
CREATE OR REPLACE AS SELECT...as a INSERT/OVERWRITE where the table already exists. Therefore this change makesreplace_queryuse INSERT/OVERWRITE when it can't replace. Since INSERT/OVERWRITE already properly supports batching this will resolve the issues users were reporting while simplifying the code.