incremental models on postgres are only fast if all CTEs are ephemeral #335

TjrGithub · 2020-08-07T15:08:52Z

Describe the bug

When making a postgres-backed model incremental, you would expect the incremental loads of raw data to be faster. Instead, they get slower. However, once you set all the contribution CTEs (separate DBT files) to ephemeral, and make sure the CTEs are consumed in a row with one descendant only (i.e.: with a from b, b from c, c from d, d from source -- so that it can be inlined), you do get the performance benefits.
That means: Views are performance barriers that prevent the optimizer from pushing down the {{is_incremental}} stuff to the underlying queries from which the views are selected.

Maybe you could mention that in the documentation?

Steps To Reproduce

In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

System information

Which database are you using dbt with?

The output of dbt --version:

installed version: 0.17.1
   latest version: 0.17.2

Your version of dbt is out of date! You can find instructions for upgrading here:
https://docs.getdbt.com/docs/installation

Plugins:
  - postgres: 0.17.1
  - redshift: 0.17.1
  - snowflake: 0.17.1
  - bigquery: 0.17.1

The operating system you're using:
$ uname -a
Linux 21d117632cc9 5.3.0-55-generic dbt-labs/dbt-core#49-Ubuntu SMP Thu May 21 12:47:19 UTC 2020 x86_64 GNU/Linux
i.e. the official docker container for python:latest

The output of python --version:
Python 3.8.4

Additional context

Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2020-08-07T18:27:18Z

Hey @TjrGithub, you've got a good point. Postgres is, structurally, a very different databases from the analytics-forward data warehouses with which we most often work. I'm not surprised to hear that Postgres' optimizer fails to push down filters into the underlying views; older versions of Postgres (<12) treated CTEs as optimization fences, too. This was one of the biggest innovations of Redshift's optimizer, in the early days of analytic databases.

Maybe you could mention that in the documentation?

I think this is a great idea! I'm going to transfer this issue to our docs.getdbt.com repo, since I don't foresee a code change in this one. Would you be willing to contribute the additional context? I think a comment in this section may be appropriate.

TjrGithub · 2020-08-07T20:19:07Z

I agree, that's where I was expecting this to be documented.

As an aside, what's also missing from that section is how to handle views. Do I put the is_incremental stuff in there, too? At least, that might be a way around the optimization fence problem. But it just doesn't feel right, from a functional programming point of view.

What kind of missing additional context to you have in mind?

jtcohen6 · 2020-08-09T19:14:05Z

The tricky thing about is_incremental() is that it checks four conditions, one of which is that the current model is materialized as incremental. For that reason, if you include is_incremental() in a view or ephemeral model, the condition will never resolve to true.

I think the larger context here is: Incremental modeling is an attempt to optimize database performance. As such it depends significantly on the database you're using, and it relies heavily on your understanding of what that database's optimizer will and won't be able to do.

TjrGithub · 2020-08-21T08:09:47Z

Does this boil down to "if Postgres and incremental, then all models should be tables"?
Making all views ephemeral does not work in my testing - it completely unhinges the query optimizer, and what previously took 30 seconds now takes 5 hours.

jtcohen6 · 2020-08-21T15:15:59Z

I think that's right—if you're creating an incremental model on postgres, it should be selecting from (and filtering against) a table ~~or a view~~, not an ephemeral model.

Out of curiosity, which version of Postgres are you running?

TjrGithub · 2020-09-17T11:03:38Z

No, a view does not work. That is the point of this bug report.

Postgres 12.4 from https://hub.docker.com/_/postgres

jtcohen6 · 2020-09-17T12:33:06Z

My mistake in writing that above, I think this is what I meant:

"If you're creating an incremental model on postgres, it should be selecting from (and filtering against) a table, not a view or an ephemeral model."

runleonarun · 2023-03-15T18:27:33Z

Closing this issue as won't fix. If you believe I have closed this in error, then let me know and I willput it on the icebox instead of closing completely.

kyletl · 2023-03-21T16:40:46Z

@runleonarun I ran into this issue a few weeks ago and had no guidance from the dbt docs on why using incremental models wasn't improving my model runs in postgres at all. The docs should be edited with this information, it feels critical to new dbt users who are working in postgres

REPO SYNC - Public to Private

jtcohen6 transferred this issue from dbt-labs/dbt-core Aug 7, 2020

runleonarun closed this as completed Mar 15, 2023

nghi-ly pushed a commit that referenced this issue Feb 13, 2024

Merge pull request #335 from dbt-labs/repo-sync

19e9003

REPO SYNC - Public to Private

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incremental models on postgres are only fast if all CTEs are ephemeral #335

incremental models on postgres are only fast if all CTEs are ephemeral #335

TjrGithub commented Aug 7, 2020

jtcohen6 commented Aug 7, 2020 •

edited

Loading

TjrGithub commented Aug 7, 2020

jtcohen6 commented Aug 9, 2020

TjrGithub commented Aug 21, 2020 •

edited

Loading

jtcohen6 commented Aug 21, 2020 •

edited

Loading

TjrGithub commented Sep 17, 2020

jtcohen6 commented Sep 17, 2020

runleonarun commented Mar 15, 2023

kyletl commented Mar 21, 2023

incremental models on postgres are only fast if all CTEs are ephemeral #335

incremental models on postgres are only fast if all CTEs are ephemeral #335

Comments

TjrGithub commented Aug 7, 2020

Describe the bug

Steps To Reproduce

Expected behavior

Screenshots and log output

System information

Additional context

jtcohen6 commented Aug 7, 2020 • edited Loading

TjrGithub commented Aug 7, 2020

jtcohen6 commented Aug 9, 2020

TjrGithub commented Aug 21, 2020 • edited Loading

jtcohen6 commented Aug 21, 2020 • edited Loading

TjrGithub commented Sep 17, 2020

jtcohen6 commented Sep 17, 2020

runleonarun commented Mar 15, 2023

kyletl commented Mar 21, 2023

jtcohen6 commented Aug 7, 2020 •

edited

Loading

TjrGithub commented Aug 21, 2020 •

edited

Loading

jtcohen6 commented Aug 21, 2020 •

edited

Loading