Skip to content

Commit

Permalink
Add description on the ways how users should approach DB monitoring (#…
Browse files Browse the repository at this point in the history
…36483)

* Add description on the ways how users should approach DB monitoring

Often our users are not aware that they are responsible for setting
up and monitoring the database they chose as the metaa-data backend.

While details of the tables and database structure of the metadata
DB used by Airflow is internal detail, the monitoring, tracking
the usage, fine-tuning and optimisation of the database configuration
and detecting some cases where database becomes a bottle neck is
generally a task that Deployment Manager should be aware of and
it should be approached in a generic way - specific to the database
chosen by the Deployment Manager and it also depends a lot on the
choice of managed database if managed database is chosen by the
Deployment Manager.

This chapter makes it explicit and gives enough leads to the
Deployment Manager to be able to follow after they chose the
database, it also explain the specific parameters tha the
Deployment Manager should pay attention to when setting up
such monitoring.

We also add an explanation of how Deployment Manager can setup
client-side logging of SQL queries generated by Airflow in case
database access is suspected for performance issues with Airflow,
as a poor-man's version of complete, server-side monitoring and
explains caveats of such client side configuraiton.

* Update docs/apache-airflow/howto/set-up-database.rst

Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>

* fixup! Update docs/apache-airflow/howto/set-up-database.rst

---------

Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>
  • Loading branch information
potiuk and jscheffl committed Dec 30, 2023
1 parent 3d6ecdf commit dea715d
Show file tree
Hide file tree
Showing 2 changed files with 98 additions and 2 deletions.
4 changes: 2 additions & 2 deletions docs/apache-airflow/core-concepts/tasks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,7 @@ Zombie/Undead Tasks
No system runs perfectly, and task instances are expected to die once in a while. Airflow detects two kinds of task/process mismatch:

* *Zombie tasks* are ``TaskInstances`` stuck in a ``running`` state despite their associated jobs being inactive
(e.g. their process didn't send a recent heartbeat as it got killed, or the machine died). Airflow will find these
(e.g. their process did not send a recent heartbeat as it got killed, or the machine died). Airflow will find these
periodically, clean them up, and either fail or retry the task depending on its settings.

* *Undead tasks* are tasks that are *not* supposed to be running but are, often caused when you manually edit Task
Expand Down Expand Up @@ -273,7 +273,7 @@ The explanation of the criteria used in the above snippet to detect zombie tasks

3. **Job Type**

The job associated with the task must be of type "LocalTaskJob."
The job associated with the task must be of type ``LocalTaskJob``.

4. **Queued by Job ID**

Expand Down
96 changes: 96 additions & 0 deletions docs/apache-airflow/howto/set-up-database.rst
Original file line number Diff line number Diff line change
Expand Up @@ -383,6 +383,102 @@ After configuring the database and connecting to it in Airflow configuration, yo
airflow db migrate
Database Monitoring and Maintenance in Airflow
----------------------------------------------

Airflow extensively utilizes a relational metadata database for task scheduling and execution.
Monitoring and proper configuration of this database are crucial for optimal Airflow performance.

Key Concerns
............
1. **Performance Impact**: Long or excessive queries can significantly affect Airflow's functionality.
These may arise due to workflow specifics, lack of optimizations, or code bugs.
2. **Database Statistics**: Incorrect optimization decisions by the database engine,
often due to outdated data statistics, can degrade performance.

Responsibilities
................

The responsibilities for database monitoring and maintenance in Airflow environments vary depending on
whether you're using self-managed databases and Airflow instances or opting for managed services.

**Self-Managed Environments**:

In the setups where both the database and Airflow are self-managed, the Deployment Manager
is responsible for setting up, configuring, and maintaining the database. This includes monitoring
its performance, managing backups, periodic cleanups and ensuring its optimal operation with Airflow.

**Managed Services**:

- Managed Database Services: When using managed DB services, many maintenance tasks (like backups,
patching, and basic monitoring) are handled by the provider. However, the Deployment Manager still
needs to oversee the configuration of Airflow and optimize performance settings specific to their
workflows, manages periodic cleanups and monitor their DB to ensure optimal operations with Airflow.

- Managed Airflow Services: With managed Airflow services, those service provider take responsibility
for the configuration and maintenance of Airflow and its database. However, the Deployment Manager
needs to collaborate with the service configuration to ensure that the sizing and workflow requirements
are matching the sizing and configuration of the managed service.

Monitoring Aspects
..................

Regular monitoring should include:

- CPU, I/O, and memory usage.
- Query frequency and number.
- Identification and logging of slow or long-running queries.
- Detection of inefficient query execution plans.
- Analysis of disk swap versus memory usage and cache swapping frequency.

Tools and Strategies
....................

- Airflow doesn't provide direct tooling for database monitoring.
- Use server-side monitoring and logging to obtain metrics.
- Enable tracking of long-running queries based on defined thresholds.
- Regularly run house-keeping tasks (like ``ANALYZE`` SQL command) for maintenance.

Database Cleaning Tools
.......................

- **Airflow DB Clean Command**: Utilize the ``airflow db clean`` command to help manage and clean
up your database.
- **Python Methods in ``airflow.utils.db_cleanup``**: This module provides additional Python methods for
database cleanup and maintenance, offering more fine-grained control and customization for specific needs.

Recommendations
...............

- **Proactive Monitoring**: Implement monitoring and logging in production without significantly
impacting performance.
- **Database-Specific Guidance**: Consult the chosen database's documentation for specific monitoring
setup instructions.
- **Managed Database Services**: Check if automatic maintenance tasks are available with your
database provider.

SQLAlchemy Logging
..................

For detailed query analysis, enable SQLAlchemy client logging (``echo=True`` in SQLAlchemy
engine configuration).

- This method is more intrusive and can affect Airflow's client-side performance.
- It generates a lot of logs, especially in a busy Airflow environment.
- Suitable for non-production environments like staging systems.

You can do it with ``echo=True`` as sqlalchemy engine configuration as explained in the
`SQLAlchemy logging documentation <https://docs.sqlalchemy.org/en/14/core/engines.html#configuring-logging>`_.

Use :ref:`config:database__sql_alchemy_engine_args` configuration parameter to set echo arg to True.

Caution
.......

- Be mindful of the impact on Airflow's performance and system resources when enabling extensive logging.
- Prefer server-side monitoring over client-side logging for production environments to minimize
performance interference.

What's next?
------------

Expand Down

0 comments on commit dea715d

Please sign in to comment.