Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only run dbt deps when there are dependencies #1030

Merged
merged 1 commit into from
Jun 7, 2024
Merged

Conversation

tatiana
Copy link
Collaborator

@tatiana tatiana commented Jun 6, 2024

As of Cosmos 1.4, Cosmos will attempt to run dbt deps even if there are no dependencies.yml or packages.yml in the dbt project directory. This causes an unnecessary overhead in creating subprocesses without any benefit.

This problem was initially spotted by @AlgirdasDubickas, who created a pull request proposing a solution to the problem: #893

Despite the original PR becoming stale, the problem it addresses remains relevant.

This PR proposes a different implementation to solve the same problem. It addresses the issue from a rendering perspective (converting a dbt project into an Airflow DAG using LoadMode.DBT_LS) and an execution perspective (when Airflow worker nodes run/trigger dbt commands to be run when using ExecutionMode.LOCAL or ExecutionMode.VIRTUALENV).

Co-authored-by: AlgirdasDubickas 123624084+AlgirdasDubickas@users.noreply.github.com

@tatiana tatiana added the area:performance Related to performance, like memory usage, CPU usage, speed, etc label Jun 6, 2024
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jun 6, 2024
Copy link

netlify bot commented Jun 6, 2024

Deploy Preview for sunny-pastelito-5ecb04 canceled.

Name Link
🔨 Latest commit 8fe85dc
🔍 Latest deploy log https://app.netlify.com/sites/sunny-pastelito-5ecb04/deploys/66622d08c168880008a59492

@dosubot dosubot bot added dbt:deps Primarily related to dbt deps command or functionality execution:local Related to Local execution environment labels Jun 6, 2024
As of Cosmos 1.4, Cosmos will attempt to run dbt deps even if there are no dependencies.yml or packages.yml in the dbt project directory.
This causes an unecessary overhead in creating subprocesses, without any benefit.

This problem was originally spotted by Algirdas, who created a pull request proposing a solution to the problem:
https://github.com/astronomer/astronomer-cosmos/pull/893/commits

Since then, the PR became stale, but the problem remains relevant.

This PR proposes a different implementation to solve the same problem.

Co-authored-by: AlgirdasDubickas <123624084+AlgirdasDubickas@users.noreply.github.com>
@tatiana tatiana changed the title Optimise performance: only run dbt deps when there are dependencies Only run dbt deps when there are dependencies Jun 6, 2024
@tatiana tatiana added the area:rendering Related to rendering, like Jinja, Airflow tasks, etc label Jun 6, 2024
@tatiana tatiana added this to the Cosmos 1.5.0 milestone Jun 6, 2024
@tatiana tatiana self-assigned this Jun 6, 2024
Copy link

codecov bot commented Jun 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.77%. Comparing base (966c94a) to head (8fe85dc).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1030      +/-   ##
==========================================
+ Coverage   95.75%   95.77%   +0.01%     
==========================================
  Files          62       62              
  Lines        2965     2979      +14     
==========================================
+ Hits         2839     2853      +14     
  Misses        126      126              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jun 7, 2024
@tatiana tatiana merged commit 3a63bee into main Jun 7, 2024
64 checks passed
@tatiana tatiana deleted the optimise-dbt-deps branch June 7, 2024 10:12
pankajkoti pushed a commit that referenced this pull request Jun 7, 2024
As of Cosmos 1.4, Cosmos will attempt to run dbt deps even if there are
no `dependencies.yml` or `packages.yml` in the dbt project directory.
This causes an unnecessary overhead in creating subprocesses without any
benefit.

This problem was initially spotted by @AlgirdasDubickas, who created a
pull request proposing a solution to the problem:
#893

Despite the original PR becoming stale, the problem it addresses remains
relevant.

This PR proposes a different implementation to solve the same problem.
It addresses the issue from a rendering perspective (converting a dbt
project into an Airflow DAG using `LoadMode.DBT_LS`) and an execution
perspective (when Airflow worker nodes run/trigger dbt commands to be
run when using `ExecutionMode.LOCAL` or `ExecutionMode.VIRTUALENV`).

Co-authored-by: AlgirdasDubickas <123624084+AlgirdasDubickas@users.noreply.github.com>
(cherry picked from commit 3a63bee)
@pankajkoti pankajkoti mentioned this pull request Jun 7, 2024
pankajkoti added a commit that referenced this pull request Jun 7, 2024
Bug fixes

* Bring back ``dataset`` as a required field for BigQuery profile by
@pankajkoti in #1033

Enhancements

* Only run ``dbt deps`` when there are dependencies by @tatiana in #1030

Docs

* Fix docs so it does not reference non-existing ``get_dbt_dataset`` by
@tatiana in #1034

---------

Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com>
arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this pull request Jul 14, 2024
As of Cosmos 1.4, Cosmos will attempt to run dbt deps even if there are
no `dependencies.yml` or `packages.yml` in the dbt project directory.
This causes an unnecessary overhead in creating subprocesses without any
benefit.

This problem was initially spotted by @AlgirdasDubickas, who created a
pull request proposing a solution to the problem:
astronomer#893

Despite the original PR becoming stale, the problem it addresses remains
relevant.

This PR proposes a different implementation to solve the same problem.
It addresses the issue from a rendering perspective (converting a dbt
project into an Airflow DAG using `LoadMode.DBT_LS`) and an execution
perspective (when Airflow worker nodes run/trigger dbt commands to be
run when using `ExecutionMode.LOCAL` or `ExecutionMode.VIRTUALENV`).

Co-authored-by: AlgirdasDubickas <123624084+AlgirdasDubickas@users.noreply.github.com>
arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this pull request Jul 14, 2024
Bug fixes

* Bring back ``dataset`` as a required field for BigQuery profile by
@pankajkoti in astronomer#1033

Enhancements

* Only run ``dbt deps`` when there are dependencies by @tatiana in astronomer#1030

Docs

* Fix docs so it does not reference non-existing ``get_dbt_dataset`` by
@tatiana in astronomer#1034

---------

Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:performance Related to performance, like memory usage, CPU usage, speed, etc area:rendering Related to rendering, like Jinja, Airflow tasks, etc dbt:deps Primarily related to dbt deps command or functionality execution:local Related to Local execution environment lgtm This PR has been approved by a maintainer size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants