dbt compile runs in 7+ minutes #243

dot2dotseurat · 2022-12-28T16:01:45Z

Describe the bug

A clear and concise description of what the bug is. What command did you run? What happened?

running dbt compile routine takes upwards of 7 minutes and sometimes greater than 10 minutes. Project size listed below by dbt logs.

✗ dbt compile
15:48:46  Running with dbt=1.2.3
15:48:48  Found 537 models, 842 tests, 0 snapshots, 0 analyses, 551 macros, 0 operations, 42 seed files, 672 sources, 0 exposures, 0 metrics
15:48:48  
15:54:40  Concurrency: 8 threads (target='dev')
15:54:40  
15:55:14  Done.

Steps To Reproduce

In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.

The project is open sourced here. The issue can be reproduced by cloning the repo and running dbt compile after installing the pipenv included in the repo.

Expected behavior

A clear and concise description of what you expected to happen.

I would expect compiling to run closed to 30 seconds based on what I've seen for other dbt projects.

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

System information

The output of dbt --version:

Core:
  - installed: 1.2.3
  - latest:    1.3.1 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - databricks: 1.2.3 - Update available!
  - spark:      1.2.0 - Update available!

  At least one plugin is out of date or incompatible with dbt-core.
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

The operating system you're using:
Mac Big Sur version 11.5.2, M1 chip

The output of python --version:
Python 3.9.15

Additional context

Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

ueshin · 2022-12-28T21:16:52Z

Hi @dot2dotseurat, thanks for the report.

I tried the project in my local and it took like 4 mins even without any tables created, and I guess it will be even worse with tables created.
It's spending long time to retrieve metadata from more than 200 schemas.

I guess this is not an adapter issue, but how dbt-core creates metadata which needs to run queries for each schema, IIUC.
cc @jtcohen6

dot2dotseurat · 2022-12-28T22:02:27Z

Hi @ueshin,

Thank you so much for the quick reply.

As I understand it, dbt compile doesn't require a database connection to run. I just did a quick check and dbt compile runs in the expected couple of seconds if I remove the database credentials from my profile.yml (replacing host, token etc with placeholders). After replacing all of the credentials, the project again compiles in 7+ minutes.

So I think you are right that it's something about querying metadata but it's odd that dbt compile apparently can run successfully without running those queries.

Anyway, I am happy to open an issue with dbt-core if @jtcohen6 agrees that's the better place for this inquiry.

Thanks again for the quick response.

jtcohen6 · 2023-01-05T12:52:17Z

Thanks for tagging me in @ueshin - sorry for the delay.

@dot2dotseurat This isn't quite right:

As I understand it, dbt compile doesn't require a database connection to run.

While dbt doesn't require a database connection to parse your project, or list the resources from it, dbt compile does require a database connection because some Jinja-SQL models may depend on introspective queries (implicitly or explicitly) in order to template out SQL. If you have any models containing macros like (e.g.) {{ is_incremental() }}, {{ run_query(...) }}, {{ dbt_utils.get_columns_values(...) }} — all of those require running queries first, or accessing the relation cache to understand which objects do or don't exist in the database, before dbt can produce the compiled SQL for that model. That remains true, even though dbt is not actually running that model / its compiled SQL.

At the same time, I understand that cache population is a time-consuming step, especially on Spark/Databricks today. The good news is that #231 should (hopefully) speed up cache population significantly; I'm guessing that change will be landing in the next release of dbt-databricks (v1.4).

I also just opened a dbt-core issue, proposing an optional config to skip cache population entirely (dbt-labs/dbt-core#6526), which may be faster or slower depending on the command, environment, and selection criteria.

ueshin · 2023-01-05T19:51:15Z

@jtcohen6 Thanks for the reply.

The good news is that #231 should (hopefully) speed up cache population significantly

Unfortunately, #231 won't help this case. When creating the cache, dbt-core asks adapters to run list_relations_without_caching for each schema. If there are too many schemas defined in the project, the method will be called many times anyway now.
I guess it's the same with dbt-snowflake, dbt-bigquery, or other adapters?

jtcohen6 · 2023-01-09T10:10:32Z

When creating the cache, dbt-core asks adapters to run list_relations_without_caching for each schema. If there are too many schemas defined in the project, the method will be called many times anyway now.

This is true, the overall mechanism remains the same. The important difference is speed. In our testing several months ago, in the issues & draft PRs that eventually led to #231, we found that show tables in [schema] + show views in [schema] is significantly faster than show table extended in [schema] like '*'. Also, dbt will parallelize the queries for different schemas, up to the configured number of --threads.

I guess it's the same with dbt-snowflake, dbt-bigquery, or other adapters?

Correct, it's the same. On other adapters, the biggest difference is that the metadata queries powering cache population are quite fast. On dbt-bigquery, we skip running a query entirely, and just make an API call.

Even then, we do run into performance issues at serious scale (e.g. dbt-labs/dbt-snowflake#83, dbt-labs/dbt-bigquery#205). Hence the interest in allowing users to either do partial caching (supported as an experimental config), or skip cache population entirely (new issue/proposal I linked above).

ueshin · 2023-01-10T01:23:00Z

@dot2dotseurat Is it possible to upgrade dbt to 1.3?
If possible, could you try dbt-databricks 1.3.3rc1 that contains #231 to see how much it can improve?

dot2dotseurat · 2023-01-10T21:40:14Z

@ueshin I gave it a shot and I didn't see an improvement. :/

github-actions · 2023-08-30T01:46:00Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue.

dot2dotseurat added the bug Something isn't working label Dec 28, 2022

dot2dotseurat mentioned this issue Jan 6, 2023

Allow for local compile of incremental models. duneanalytics/spellbook#2403

Merged

andrefurlan-db added the performance label Mar 3, 2023

github-actions bot added the Stale label Aug 30, 2023

benc-db closed this as completed Sep 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dbt compile runs in 7+ minutes #243

dbt compile runs in 7+ minutes #243

dot2dotseurat commented Dec 28, 2022

ueshin commented Dec 28, 2022

dot2dotseurat commented Dec 28, 2022

jtcohen6 commented Jan 5, 2023

ueshin commented Jan 5, 2023

jtcohen6 commented Jan 9, 2023

ueshin commented Jan 10, 2023

dot2dotseurat commented Jan 10, 2023

github-actions bot commented Aug 30, 2023

dbt compile runs in 7+ minutes #243

dbt compile runs in 7+ minutes #243

Comments

dot2dotseurat commented Dec 28, 2022

Describe the bug

Steps To Reproduce

Expected behavior

Screenshots and log output

System information

Additional context

ueshin commented Dec 28, 2022

dot2dotseurat commented Dec 28, 2022

jtcohen6 commented Jan 5, 2023

ueshin commented Jan 5, 2023

jtcohen6 commented Jan 9, 2023

ueshin commented Jan 10, 2023

dot2dotseurat commented Jan 10, 2023

github-actions bot commented Aug 30, 2023