Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Optimize insert-only merge to not rewrite existing files #246
A common pattern used in merge is to dedup new data with data in the delta table in the following way.
So there is no update clause, only insert clause, so ideally only new files (containing new non-matching data from source) should be added to the table (i.e. append-only). However, the current implementation of the merge does not optimize this case. This leads to the following problems.
The proposed solution is to optimize this case by performing an anti-join on the source data to insert the data. This will