Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor dbt ls to run from a temporary directory #414

Merged
merged 8 commits into from Aug 14, 2023

Conversation

tatiana
Copy link
Collaborator

@tatiana tatiana commented Jul 28, 2023

As of Cosmos 1.0.0, LoadMode.DBT_LS runs dbt ls from within the original dbt project directory.

The dbt ls outputs files to the directory it's running from unless the environment variables DBT_LOG_PATH and DBT_TARGET_PATH are specified (as of dbt 1.6).

Depending on the deployment, the Airflow worker does not have write permissions to the dbt project directory. This can lead to an error message similar to the following:

20:43:06  Encountered an error:
[Errno 30] Read-only file system: '/usr/local/airflow/dags/dbt_grindr/logs/dbt.log'

This PR changes the behavior of dbt ls to try to make the dbt ls artifacts (logs and target directory) not be written to the original project directory.

In addition to the introduced test, this change was validated using airflow 2.6 and dbt 1.6, by following these steps:
(1) Delete folders logs and target from astronomer-cosmos/dev/dags/dbt/jaffle_shop
(2) Add a breakpoint after stdout, stderr = process.communicate() in dbt/graph.py
(3) Run a DAG that uses astronomer-cosmos/dev/dags/dbt/jaffle_shop, e.g.:

airflow dags test basic_cosmos_dag `date -Iseconds`

(4) When the breakpoint happens, check that no target or logs folder was created after running dbt ls in astronomer-cosmos/dev/dags/dbt/jaffle_shop

A limitation with the current approach is that, although dbt ls is not creating these directories in the given circumstances, if the user is using the local executor or an earlier version of dbt, the files will still be written to the project directory.

Closes: #411

@netlify
Copy link

netlify bot commented Jul 28, 2023

👷 Deploy Preview for amazing-pothos-a3bca0 processing.

Name Link
🔨 Latest commit 7bd4e30
🔍 Latest deploy log https://app.netlify.com/sites/amazing-pothos-a3bca0/deploys/64d6136543a5b3000778ba9e

@codecov
Copy link

codecov bot commented Jul 28, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.07% 🎉

Comparison is base (1233f9c) 91.43% compared to head (7bd4e30) 91.51%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #414      +/-   ##
==========================================
+ Coverage   91.43%   91.51%   +0.07%     
==========================================
  Files          50       50              
  Lines        1752     1768      +16     
==========================================
+ Hits         1602     1618      +16     
  Misses        150      150              
Files Changed Coverage Δ
cosmos/constants.py 100.00% <100.00%> (ø)
cosmos/dbt/graph.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tatiana tatiana marked this pull request as ready for review July 28, 2023 15:58
@tatiana tatiana requested review from a team as code owners July 28, 2023 15:58
tests/dbt/test_graph.py Outdated Show resolved Hide resolved
cosmos/dbt/graph.py Outdated Show resolved Hide resolved
cosmos/dbt/graph.py Outdated Show resolved Hide resolved
As of Cosmos 1.0.0, `LoadMode.DBT_LS` ran `dbt ls` from within the original dbt project directory.

The `dbt ls` outputs files to the directory it's running from unless the environment variables `DBT_LOG_PATH` and `DBT_TARGET_PATH` are specified.

Depending on the deployment, the Airflow worker does not have write permissions to the dbt project directory. This PR changes the behavior of `dbt ls` to make a copy of the original project directory into a temporary directory and run the command `dbt ls` from there.

Closes: #411
…a separate dir

Unfortunately this does not work in dbt 1.5 or previous versions
@tatiana
Copy link
Collaborator Author

tatiana commented Aug 10, 2023

Thanks, @jlaneve , I think I addressed all the feedback!

@tatiana
Copy link
Collaborator Author

tatiana commented Aug 10, 2023

A different approach we could adopt to solve this issue is not to pass project-dir to dbt-ls and to create symbolic links for the original project-dir files to the temp-dir. This way, the target & logs will be written to the temp dir.

Still, this change will not solve the limitation that ATM local operators run things locally. @jlaneve @harels is the scope of this ticket only dbt ls or all local operator executions?

cosmos/dbt/graph.py Show resolved Hide resolved
@tatiana tatiana merged commit 41bf623 into main Aug 14, 2023
41 checks passed
@tatiana tatiana deleted the issue-411-dbt-ls-tmp-dir branch August 14, 2023 09:09
tatiana added a commit that referenced this pull request Aug 16, 2023
Feature (pending documentation!)
* Support dbt global flags (via dbt_cmd_global_flags in `operator_args` by @tatiana in #469

Enhancements
* Hide sensitive field when using BigQuery keyfile_dict profile mapping by @jbandoro in #471

Bug fixes
* Fix bug on select node add exclude selector subset ids logic by @jensenity in #463
* Refactor dbt ls to run from a temporary directory, to avoid Read-only file system errors during DAG parsing, by @tatiana in #414

Others
* Docs: Fix RenderConfig load argument by @jbandoro in #466
* Enable CI integration tests from external forks by @tatiana in #458
* Improve CI tests runtime by @tatiana in #457
* Change CI to run coverage after tests pass by @tatiana in #461
* Fix forks code revision in code coverage by @tatiana in #472
* [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #467"
i
@tatiana tatiana mentioned this pull request Aug 16, 2023
tatiana added a commit that referenced this pull request Aug 29, 2023
Since Cosmos 1.0, `load_method.DBT_LS` is the default dbt project
parsing method, unless the user gives a manifest.

Using the original dbt project path has been a source of issues when
that path is Read-Only. This issue was faced when running commands that
generate `{project-dir}/target/` and `{project-dir}/logs/`, which was
solved as part of #414.

This issue is particularly problematic if we want to run `dbt deps` from
the original project directory since dbt 1.6 saves adaptors to
`{project_dir}/dbt_packages` unless specified in the user's
`dbt_project.yml`. To our knowledge, dbt currently does not allow users
to define this directory via flags or environment variables, as
discussed in #481.

This change aims to solve these issues, by creating a temporary
directory and creating symbolic links to the original directory.

Finally, during the development of this task, it was observed that when
running dbt ls in a project with `packages.yml`, dbt raises a
'Compilation Error'. Since dbt may raise other errors in stdout, this PR
captures "Errors" more generically - making it more evident potential
issues to the end-users.
@tatiana tatiana mentioned this pull request Sep 6, 2023
tatiana added a commit that referenced this pull request Sep 6, 2023
**Features**

* Support dbt global flags (via dbt_cmd_global_flags in operator_args)
by @tatiana in #469
* Support parsing DAGs when there are no connections by @jlaneve in #489

**Enhancements**

* Hide sensitive field when using BigQuery keyfile_dict profile mapping
by @jbandoro in #471
* Consistent Airflow Dataset URIs, inlets and outlets with `Openlineage
package <https://pypi.org/project/openlineage-integration-common/>`_ by
@tatiana in #485. `Read more
<https://astronomer.github.io/astronomer-cosmos/configuration/lineage.html>`_.
* Refactor ``LoadMethod.DBT_LS`` to run from a temporary directory with
symbolic links by @tatiana in #488
* Run ``dbt deps`` when using ``LoadMethod.DBT_LS`` by @DanMawdsleyBA in
#481
* Update Cosmos log color to purple by @harels in #494
* Change operators to log ``dbt`` commands output as opposed to
recording to XCom by @tatiana in #513

**Bug fixes**

* Fix bug on select node add exclude selector subset ids logic by
@jensenity in #463
* Refactor dbt ls to run from a temporary directory, to avoid Read-only
file system errors during DAG parsing, by @tatiana in #414
* Fix profile_config arg in DbtKubernetesBaseOperator by @david-mag in
#505
* Fix SnowflakePrivateKeyPemProfileMapping private_key reference by
@nacpacheco in #501
* Fix incorrect temporary directory creation in VirtualenvOperator init
by @tatiana in #500
* Fix log propagation issue by @tatiana in #498
* Fix PostgresUserPasswordProfileMapping to retrieve port from
connection by @jlneve in #511

**Others**

* Docs: Fix RenderConfig load argument by @jbandoro in #466
* Enable CI integration tests from external forks by @tatiana in #458
* Improve CI tests runtime by @tatiana in #457
* Change CI to run coverage after tests pass by @tatiana in #461
* Fix forks code revision in code coverage by @tatiana in #472
* [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #467
* Drop support to Python 3.7 in the CI test matrix by @harels in #490
* Add Airflow 2.7 to the CI test matrix by @tatiana in #487
* Add MyPy type checks to CI since we exceeded pre-commit disk quota
usage by @tatiana in #510
@dwreeves dwreeves mentioned this pull request Feb 12, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Write logs to a temp dir when using dbt_ls parsing mode
3 participants