New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add description on the ways how users should approach DB monitoring #36483
Add description on the ways how users should approach DB monitoring #36483
Conversation
Inspired by discussion in #36454 - whenever user/issue is likely related to excessive DB usage, we should be able to just link the discussion/issue to this documentation, and possibly it will even inspire Deployment Managers to set-up monitoring of their DB as they might not be aware this falls under their responsibility when they choose the DB backend and that they have to learn how to monitor the DB. |
fe0b4bc
to
3bac97e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First block has very long sentences, besides this well explained.
I did not know about hte SQLAlchemy logging yet, could not find this. Woul dbe cool to have WARNings generated if queries are above an certain threshold like if running >500ms - but seems this would be rather a contribution for SQLAlchemy :-D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
https://www.datadoghq.com/dg/monitor/databases/ is great for this kind of monitoring.
I agree with explaining monitoring of Airflow's meta-database in the docs. However, this chapter contains a lot of (IMO) unnecessary details which makes it difficult to extract the key information. How about restructuring it as:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great initiative. I have ideas along similar lines.
I have observed in our use case where db grows (1GB per day) indefinitely unless manually run the db clean cli command. So, it will eventually degrade the performance of the database over time. If we can add a feature to automatically clean the data based on the retention parameter that we set in the airflow configuration.
IMHO, the |
Good points . I will add the description about those commands |
Agree here with @hussein-awala . Those are tools we give the Deployment Managers but they should adjust the usage to their needs and run those tools as and when needed with the parameters they decide are best for their deployment. Generally speaking I think we should stress (and this is partially my motivation) that Airflow (components) and Airflow maintainers will NOT solve the problems of DB retention, configuration and optimization. Also commenting to @BasPH - I think the part where we explicilty say I think (and in parts that is my motivation behind this change) is not only to help our users to see what they should do but also make it It's extremely complex and rather brittle to develop a generic and applicable to all cases ways to keep the database healthy and optimized. There is a good reason why databases need maintenance and someone to look after that - otherwise we would have it already implemented in Postgres and MySQL so that it's completely But some of our users expect this will happen. And with that documentation chapter I want to make it crystal clear and set expectations Also mentioning "managed database" in this context is important - those databases might be closer to "zero-maintenance". And then if you go to "managed Airflow" - that's even more "zero-maintenance" when you pay someone to manage Airflow, then absolutely - yes, you should expect you do not have to worry about database maintenance and the one who provides "managed Airflow" should take care about it - this is precisely why you pay (among other things) - you pay for the maintanance that you do not have to do. @BasPH - yep I hear you. I will try to remove some duplications and restructure it a bit (I just bought Chat GPT 4.5 subscription just to see if it can help with that). But the points about "Deployment Manager responsibity" especially and settting clear expectations what they have to do is crucial part of this change and one that is the main reason I am doing it in the first place. |
I am not sure if we should endorse specific services in our docs, but I will leave enough clues for the users to search for services that offer it. |
Here it is - after employing Chat GPT (after few iterations and discussion with it) and applying the comments above. @BasPH - does it look better ? |
d3575a9
to
bfe7aea
Compare
Often our users are not aware that they are responsible for setting up and monitoring the database they chose as the metaa-data backend. While details of the tables and database structure of the metadata DB used by Airflow is internal detail, the monitoring, tracking the usage, fine-tuning and optimisation of the database configuration and detecting some cases where database becomes a bottle neck is generally a task that Deployment Manager should be aware of and it should be approached in a generic way - specific to the database chosen by the Deployment Manager and it also depends a lot on the choice of managed database if managed database is chosen by the Deployment Manager. This chapter makes it explicit and gives enough leads to the Deployment Manager to be able to follow after they chose the database, it also explain the specific parameters tha the Deployment Manager should pay attention to when setting up such monitoring. We also add an explanation of how Deployment Manager can setup client-side logging of SQL queries generated by Airflow in case database access is suspected for performance issues with Airflow, as a poor-man's version of complete, server-side monitoring and explains caveats of such client side configuraiton.
Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>
bfe7aea
to
ab62d9f
Compare
@@ -244,7 +244,7 @@ Zombie/Undead Tasks | |||
No system runs perfectly, and task instances are expected to die once in a while. Airflow detects two kinds of task/process mismatch: | |||
|
|||
* *Zombie tasks* are ``TaskInstances`` stuck in a ``running`` state despite their associated jobs being inactive | |||
(e.g. their process didn't send a recent heartbeat as it got killed, or the machine died). Airflow will find these | |||
(e.g. their process did not send a recent heartbeat as it got killed, or the machine died). Airflow will find these |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This failed when I run --clean-build locally so i fixed it :)
@@ -273,7 +273,7 @@ The explanation of the criteria used in the above snippet to detect zombie tasks | |||
|
|||
3. **Job Type** | |||
|
|||
The job associated with the task must be of type "LocalTaskJob." | |||
The job associated with the task must be of type ``LocalTaskJob``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
I will merge it for now - we can always correct it later. I believe the structured way is much cleaner and responds to the "too many details" concern. |
…36483) * Add description on the ways how users should approach DB monitoring Often our users are not aware that they are responsible for setting up and monitoring the database they chose as the metaa-data backend. While details of the tables and database structure of the metadata DB used by Airflow is internal detail, the monitoring, tracking the usage, fine-tuning and optimisation of the database configuration and detecting some cases where database becomes a bottle neck is generally a task that Deployment Manager should be aware of and it should be approached in a generic way - specific to the database chosen by the Deployment Manager and it also depends a lot on the choice of managed database if managed database is chosen by the Deployment Manager. This chapter makes it explicit and gives enough leads to the Deployment Manager to be able to follow after they chose the database, it also explain the specific parameters tha the Deployment Manager should pay attention to when setting up such monitoring. We also add an explanation of how Deployment Manager can setup client-side logging of SQL queries generated by Airflow in case database access is suspected for performance issues with Airflow, as a poor-man's version of complete, server-side monitoring and explains caveats of such client side configuraiton. * Update docs/apache-airflow/howto/set-up-database.rst Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com> * fixup! Update docs/apache-airflow/howto/set-up-database.rst --------- Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com> (cherry picked from commit dea715d)
…pache#36483) * Add description on the ways how users should approach DB monitoring Often our users are not aware that they are responsible for setting up and monitoring the database they chose as the metaa-data backend. While details of the tables and database structure of the metadata DB used by Airflow is internal detail, the monitoring, tracking the usage, fine-tuning and optimisation of the database configuration and detecting some cases where database becomes a bottle neck is generally a task that Deployment Manager should be aware of and it should be approached in a generic way - specific to the database chosen by the Deployment Manager and it also depends a lot on the choice of managed database if managed database is chosen by the Deployment Manager. This chapter makes it explicit and gives enough leads to the Deployment Manager to be able to follow after they chose the database, it also explain the specific parameters tha the Deployment Manager should pay attention to when setting up such monitoring. We also add an explanation of how Deployment Manager can setup client-side logging of SQL queries generated by Airflow in case database access is suspected for performance issues with Airflow, as a poor-man's version of complete, server-side monitoring and explains caveats of such client side configuraiton. * Update docs/apache-airflow/howto/set-up-database.rst Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com> * fixup! Update docs/apache-airflow/howto/set-up-database.rst --------- Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>
Often our users are not aware that they are responsible for setting up and monitoring the database they chose as the metaa-data backend.
While details of the tables and database structure of the metadata DB used by Airflow is internal detail, the monitoring, tracking the usage, fine-tuning and optimisation of the database configuration and detecting some cases where database becomes a bottle neck is generally a task that Deployment Manager should be aware of and it should be approached in a generic way - specific to the database chosen by the Deployment Manager and it also depends a lot on the choice of managed database if managed database is chosen by the Deployment Manager.
This chapter makes it explicit and gives enough leads to the Deployment Manager to be able to follow after they chose the database, it also explain the specific parameters tha the Deployment Manager should pay attention to when setting up such monitoring.
We also add an explanation of how Deployment Manager can setup client-side logging of SQL queries generated by Airflow in case database access is suspected for performance issues with Airflow, as a poor-man's version of complete, server-side monitoring and explains caveats of such client side configuraiton.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.