-
Notifications
You must be signed in to change notification settings - Fork 877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
incremental models on postgres are only fast if all CTEs are ephemeral #335
Comments
Hey @TjrGithub, you've got a good point. Postgres is, structurally, a very different databases from the analytics-forward data warehouses with which we most often work. I'm not surprised to hear that Postgres' optimizer fails to push down filters into the underlying views; older versions of Postgres (<12) treated CTEs as optimization fences, too. This was one of the biggest innovations of Redshift's optimizer, in the early days of analytic databases.
I think this is a great idea! I'm going to transfer this issue to our docs.getdbt.com repo, since I don't foresee a code change in this one. Would you be willing to contribute the additional context? I think a comment in this section may be appropriate. |
I agree, that's where I was expecting this to be documented. As an aside, what's also missing from that section is how to handle views. Do I put the is_incremental stuff in there, too? At least, that might be a way around the optimization fence problem. But it just doesn't feel right, from a functional programming point of view. What kind of missing additional context to you have in mind? |
The tricky thing about I think the larger context here is: Incremental modeling is an attempt to optimize database performance. As such it depends significantly on the database you're using, and it relies heavily on your understanding of what that database's optimizer will and won't be able to do. |
Does this boil down to "if Postgres and incremental, then all models should be tables"? |
I think that's right—if you're creating an incremental model on postgres, it should be selecting from (and filtering against) a table Out of curiosity, which version of Postgres are you running? |
No, a view does not work. That is the point of this bug report. Postgres 12.4 from https://hub.docker.com/_/postgres |
My mistake in writing that above, I think this is what I meant: "If you're creating an incremental model on postgres, it should be selecting from (and filtering against) a table, not a view or an ephemeral model." |
Closing this issue as |
@runleonarun I ran into this issue a few weeks ago and had no guidance from the dbt docs on why using incremental models wasn't improving my model runs in postgres at all. The docs should be edited with this information, it feels critical to new dbt users who are working in postgres |
Describe the bug
When making a postgres-backed model incremental, you would expect the incremental loads of raw data to be faster. Instead, they get slower. However, once you set all the contribution CTEs (separate DBT files) to ephemeral, and make sure the CTEs are consumed in a row with one descendant only (i.e.: with a from b, b from c, c from d, d from source -- so that it can be inlined), you do get the performance benefits.
That means: Views are performance barriers that prevent the optimizer from pushing down the {{is_incremental}} stuff to the underlying queries from which the views are selected.
Maybe you could mention that in the documentation?
Steps To Reproduce
In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots and log output
If applicable, add screenshots or log output to help explain your problem.
System information
Which database are you using dbt with?
The output of
dbt --version
:The operating system you're using:
$ uname -a
Linux 21d117632cc9 5.3.0-55-generic dbt-labs/dbt-core#49-Ubuntu SMP Thu May 21 12:47:19 UTC 2020 x86_64 GNU/Linux
i.e. the official docker container for python:latest
The output of
python --version
:Python 3.8.4
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: