-
-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read the data only once for incremental models using the merge strategy #455
Comments
First of all, thank you for the issue and for raising such a relevant point. The behavior that you observed is mostly liked because we use a tmp table to understand which columns to add in case of schema change. |
I think there may be a way to discover schema changes and avoid creating a temp in all cases: there's a trick where running a Instead of running a CTAS and looking at the resulting table to determine if the schema has changed, the @nicor88 do you think this is feasible? |
In theory the approach should work, but the core logic to detect schema changes is in I didn't investigate much, but seems that
|
Incremental models can in the worst case end up costing double because data is written to a temp table before being inserted into the destination. Even when there are reductions in the size of the data from the source to the destination, the cost will be higher than if the data was written to the destination directly. It would be great to have an option to not use a temp table in cases where it is not strictly needed, to be able to reduce costs.
The text was updated successfully, but these errors were encountered: