Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add description on the ways how users should approach DB monitoring #36483

Merged
merged 3 commits into from
Dec 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/apache-airflow/core-concepts/tasks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,7 @@ Zombie/Undead Tasks
No system runs perfectly, and task instances are expected to die once in a while. Airflow detects two kinds of task/process mismatch:

* *Zombie tasks* are ``TaskInstances`` stuck in a ``running`` state despite their associated jobs being inactive
(e.g. their process didn't send a recent heartbeat as it got killed, or the machine died). Airflow will find these
(e.g. their process did not send a recent heartbeat as it got killed, or the machine died). Airflow will find these
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This failed when I run --clean-build locally so i fixed it :)

periodically, clean them up, and either fail or retry the task depending on its settings.

* *Undead tasks* are tasks that are *not* supposed to be running but are, often caused when you manually edit Task
Expand Down Expand Up @@ -273,7 +273,7 @@ The explanation of the criteria used in the above snippet to detect zombie tasks

3. **Job Type**

The job associated with the task must be of type "LocalTaskJob."
The job associated with the task must be of type ``LocalTaskJob``.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.


4. **Queued by Job ID**

Expand Down
96 changes: 96 additions & 0 deletions docs/apache-airflow/howto/set-up-database.rst
Original file line number Diff line number Diff line change
Expand Up @@ -383,6 +383,102 @@ After configuring the database and connecting to it in Airflow configuration, yo

airflow db migrate

Database Monitoring and Maintenance in Airflow
----------------------------------------------

Airflow extensively utilizes a relational metadata database for task scheduling and execution.
Monitoring and proper configuration of this database are crucial for optimal Airflow performance.

Key Concerns
............
1. **Performance Impact**: Long or excessive queries can significantly affect Airflow's functionality.
These may arise due to workflow specifics, lack of optimizations, or code bugs.
2. **Database Statistics**: Incorrect optimization decisions by the database engine,
often due to outdated data statistics, can degrade performance.

Responsibilities
................

The responsibilities for database monitoring and maintenance in Airflow environments vary depending on
whether you're using self-managed databases and Airflow instances or opting for managed services.

**Self-Managed Environments**:

In the setups where both the database and Airflow are self-managed, the Deployment Manager
is responsible for setting up, configuring, and maintaining the database. This includes monitoring
its performance, managing backups, periodic cleanups and ensuring its optimal operation with Airflow.

**Managed Services**:

- Managed Database Services: When using managed DB services, many maintenance tasks (like backups,
patching, and basic monitoring) are handled by the provider. However, the Deployment Manager still
needs to oversee the configuration of Airflow and optimize performance settings specific to their
workflows, manages periodic cleanups and monitor their DB to ensure optimal operations with Airflow.

- Managed Airflow Services: With managed Airflow services, those service provider take responsibility
for the configuration and maintenance of Airflow and its database. However, the Deployment Manager
needs to collaborate with the service configuration to ensure that the sizing and workflow requirements
are matching the sizing and configuration of the managed service.

Monitoring Aspects
..................

Regular monitoring should include:

- CPU, I/O, and memory usage.
- Query frequency and number.
- Identification and logging of slow or long-running queries.
- Detection of inefficient query execution plans.
- Analysis of disk swap versus memory usage and cache swapping frequency.

Tools and Strategies
....................

- Airflow doesn't provide direct tooling for database monitoring.
- Use server-side monitoring and logging to obtain metrics.
- Enable tracking of long-running queries based on defined thresholds.
- Regularly run house-keeping tasks (like ``ANALYZE`` SQL command) for maintenance.

Database Cleaning Tools
.......................

- **Airflow DB Clean Command**: Utilize the ``airflow db clean`` command to help manage and clean
up your database.
- **Python Methods in ``airflow.utils.db_cleanup``**: This module provides additional Python methods for
database cleanup and maintenance, offering more fine-grained control and customization for specific needs.

Recommendations
...............

- **Proactive Monitoring**: Implement monitoring and logging in production without significantly
impacting performance.
- **Database-Specific Guidance**: Consult the chosen database's documentation for specific monitoring
setup instructions.
- **Managed Database Services**: Check if automatic maintenance tasks are available with your
database provider.

SQLAlchemy Logging
..................

For detailed query analysis, enable SQLAlchemy client logging (``echo=True`` in SQLAlchemy
engine configuration).

- This method is more intrusive and can affect Airflow's client-side performance.
- It generates a lot of logs, especially in a busy Airflow environment.
- Suitable for non-production environments like staging systems.

You can do it with ``echo=True`` as sqlalchemy engine configuration as explained in the
`SQLAlchemy logging documentation <https://docs.sqlalchemy.org/en/14/core/engines.html#configuring-logging>`_.

Use :ref:`config:database__sql_alchemy_engine_args` configuration parameter to set echo arg to True.

Caution
.......

- Be mindful of the impact on Airflow's performance and system resources when enabling extensive logging.
- Prefer server-side monitoring over client-side logging for production environments to minimize
performance interference.

What's next?
------------

Expand Down