-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can I Improve on performance #840
Comments
🤖 Hey @zeedevio! Great to see you back here, bringing more interesting challenges to the table. Hope we can tackle this one together too! 🚀 Based on the information you've provided, there are several potential areas to explore to improve the performance of your Airflow DAG when executing DBT transformation models. Firstly, it appears that you're installing dependencies every time the DAG runs. This can significantly slow down the execution time. If your dependencies don't change frequently, you might want to consider installing them once and reusing the environment. You can set Secondly, DBT supports parallel model execution, which can significantly speed up the transformation process. However, this feature might not be fully utilized when running DBT transformations within Airflow, depending on your Airflow configuration. You might want to check your Airflow configuration to ensure it allows for parallel task execution. Thirdly, the performance can also be influenced by the resources allocated to your Airflow and DBT tasks. Ensure that your tasks have enough memory and CPU resources to run efficiently. Fourthly, the performance of DBT transformations can also be influenced by the volume and complexity of the data being processed. If you're dealing with large volumes of data or complex transformations, consider optimizing your DBT models or breaking down the transformations into smaller, more manageable tasks. Lastly, if your Airflow instance and the database are not in the same network, there might be network latency issues. Consider reducing network latency by bringing them closer, for example, in the same VPC or using a direct connection. Please note that these are general suggestions. The exact solution might vary depending on your specific setup and requirements. If you could provide more information about your current Airflow configuration, particularly in terms of parallel task execution, it would be helpful in providing more specific advice. SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Having very similar performance issue. |
Hi, @zeedevio, I'm helping the Cosmos team manage our backlog and am marking this issue as stale. From what I understand, you opened this issue to address the slow performance of running DBT transformation models using Astro Airflow locally. In response, I provided several potential areas to explore for improving the performance, and kzajaczkowski commented that they are experiencing a similar performance issue. Could you please confirm if this issue is still relevant to the latest version of the Cosmos repository? If it is, please let the Cosmos team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you! |
Hi, @zeedevio. This is something we're currently working on and trying to improve. There have been significant improvements in Cosmos 1.4:
Please, could you confirm how is the performance for you after upgrading? |
Some progress: #1014. |
Hey Good day, In my case I used GCP - Composer. Things I done to speed up transformations was to update Airflow Configuration:
|
PR #1014 is significantly improving the performance with |
If using
Some ways to improve the performance using Cosmos 1.4: 1. Can you pre-compile your dbt project? If yes, this would remove this responsibility from the Airflow DAG processor, greatly impacting the DAG parsing time. You could try this by using and specifying the path to the manifest file:
More information: https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-manifest 2. If you need to use If yes, this will avoid Cosmos having to run dbt deps all the time before running any dbt command, both in the scheduler and worker nodes. In that case, you should set:
More info: 3. If you need to use LoadMode.DBT_LS, is your dbt project large? Could you use selectors to select a subset?
More info: https://astronomer.github.io/astronomer-cosmos/configuration/selecting-excluding.html |
@tatiana, thank you for detailed explanations. They're helpful. Is using the manifest a recommended mode in terms of performance? If so will it still be the case after 1.5.0 is released? |
@kzajaczkowski, it really depends on your team's needs! Using manifest is the safest approach in production from the perspective you're fully off-loading the Airflow DAG processor from ever having to run the We understand that it is handy not to have to pre-compile the dbt project for many teams. If it is acceptable for the team that from time to time, the Airflow DAG processor will have to run In the past weeks, we've collaborated with customers who tested every iteration of the 1.5 alphas. They will deploy them live once we release the production version, giving us confidence in the stability of what we've built. |
I'm using Astro Airflow locally.
So I'm trying to execute DBT transformation models using airflow and find it to be extremely slow.
I'm looking for help to improve the performance of my airflow dag.
I'm trying to run 56 transformation models, I have done the following test:
DBT cloud: it completes my model transformation in 4 minutes
Testing Profile Locally: using
dbt run --profiles-dir /usr/local/airflow/include/dbt/
it completes my transformations in 6 minutesAIRFLOW:: it takes 44 minutes and i get a fail
I'm looking for a way to run my models much faster.
The text was updated successfully, but these errors were encountered: