Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create doc on "performance mode" #832

Closed
jlaneve opened this issue Feb 2, 2024 · 1 comment
Closed

Create doc on "performance mode" #832

jlaneve opened this issue Feb 2, 2024 · 1 comment
Labels
area:docs Relating to documentation, changes, fixes, improvement execution:kubernetes Related to Kubernetes execution environment

Comments

@jlaneve
Copy link
Collaborator

jlaneve commented Feb 2, 2024

As a follow-on to #800, we should create a document on how to tune Cosmos to be the most performant

Copy link

dosubot bot commented Feb 2, 2024

🤖

Hey @jlaneve, nice to see you diving into another challenge with us!

Sure, I can provide some guidance on creating a document that explains how to optimize the performance of Cosmos, following the changes made in pull request #800.

To optimize the performance of Cosmos, you can consider the following steps:

  1. Deep Copying Configurations: In the migrate_to_new_interface function, the render_config and execution_config are deep copied to avoid changes affecting other DAGs or Task Groups that may reuse the same original configuration. This ensures that each DAG or Task Group has its own unique configuration, reducing the chance of conflicts and improving performance.

  2. Argument Validation: The validate_arguments function checks for mutually exclusive selectors filters and deprecated arguments. This helps to avoid unnecessary computations and improves the performance of the system.

  3. User Configuration Validation: The validate_initial_user_config and validate_adapted_user_config functions ensure that the user has set the fields as expected and all the necessary fields required by Cosmos to render the DAG are set. This helps to avoid unnecessary errors and improves the performance of the system.

  4. Airflow Graph Building: The build_airflow_graph function is used to build the Airflow graph. This function is optimized to build the graph in the most efficient way possible.

The changes made in pull request #800 that are relevant to performance optimization are:

  1. Deep copying of render_config and execution_config in the migrate_to_new_interface function. This ensures that changes do not affect other DAGs or Task Groups that may reuse the same original configuration, thus improving performance by avoiding unnecessary side effects.

  2. The DbtToAirflowConverter class now directly uses ProjectConfig instead of creating a DbtProject. This reduces the overhead of creating an additional object and improves performance.

  3. The DbtGraph is now loaded with the load method, which could potentially improve performance depending on the implementation of the load method.

  4. The build_airflow_graph function is now used to build the Airflow graph. This could potentially improve performance depending on the implementation of the build_airflow_graph function.

Remember, performance optimization is a continuous process and it's important to monitor the performance of your system regularly to identify any potential bottlenecks. You can find more details in the cosmos/converter.py file.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@dosubot dosubot bot added area:docs Relating to documentation, changes, fixes, improvement execution:kubernetes Related to Kubernetes execution environment labels Feb 2, 2024
@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 3, 2024
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale May 10, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:docs Relating to documentation, changes, fixes, improvement execution:kubernetes Related to Kubernetes execution environment
Projects
None yet
Development

No branches or pull requests

1 participant