Skip to content

Commit

Permalink
Migrate executor docs to respective providers (#37728)
Browse files Browse the repository at this point in the history
  • Loading branch information
RNHTTR committed Feb 27, 2024
1 parent 5d1b271 commit cd33c2a
Show file tree
Hide file tree
Showing 30 changed files with 124 additions and 83 deletions.
3 changes: 0 additions & 3 deletions .github/boring-cyborg.yml
Expand Up @@ -15,7 +15,6 @@
# specific language governing permissions and limitations
# under the License.
---

# Details: https://github.com/kaxil/boring-cyborg

labelPRBasedOnFilePath:
Expand Down Expand Up @@ -154,8 +153,6 @@ labelPRBasedOnFilePath:
- airflow/example_dags/example_kubernetes_executor.py
- airflow/providers/cncf/kubernetes/**/*
- airflow/providers/celery/executors/celery_kubernetes_executor.py
- docs/apache-airflow/core-concepts/executor/kubernetes.rst
- docs/apache-airflow/core-concepts/executor/celery_kubernetes.rst
- docs/apache-airflow-providers-cncf-kubernetes/**/*
- kubernetes_tests/**/*
- tests/providers/cncf/kubernetes/**/*
Expand Down
2 changes: 1 addition & 1 deletion airflow/providers/celery/executors/celery_executor.py
Expand Up @@ -19,7 +19,7 @@
.. seealso::
For more information on how the CeleryExecutor works, take a look at the guide:
:ref:`executor:CeleryExecutor`
:doc:`/celery_executor`
"""
from __future__ import annotations

Expand Down
Expand Up @@ -19,7 +19,7 @@
.. seealso::
For more information on how the KubernetesExecutor works, take a look at the guide:
:ref:`executor:KubernetesExecutor`
:doc:`/kubernetes_executor`
"""
from __future__ import annotations

Expand Down
Expand Up @@ -15,9 +15,6 @@
specific language governing permissions and limitations
under the License.
.. _executor:CeleryExecutor:

Celery Executor
===============

Expand All @@ -36,7 +33,7 @@ change your ``airflow.cfg`` to point the executor parameter to
For more information about setting up a Celery broker, refer to the
exhaustive `Celery documentation on the topic <https://docs.celeryq.dev/en/latest/getting-started/>`_.

The configuration parameters of the Celery Executor can be found in :doc:`apache-airflow-providers-celery:configurations-ref`.
The configuration parameters of the Celery Executor can be found in the Celery provider's :doc:`configurations-ref`.

Here are a few imperative requirements for your workers:

Expand Down Expand Up @@ -97,7 +94,7 @@ Some caveats:
- Tasks can consume resources. Make sure your worker has enough resources to run ``worker_concurrency`` tasks
- Queue names are limited to 256 characters, but each broker backend might have its own restrictions

See :doc:`/administration-and-deployment/modules_management` for details on how Python and Airflow manage modules.
See :doc:`apache-airflow:administration-and-deployment/modules_management` for details on how Python and Airflow manage modules.

Architecture
------------
Expand Down Expand Up @@ -173,7 +170,7 @@ The components communicate with each other in many places
Task execution process
----------------------

.. figure:: ../../img/run_task_on_celery_executor.png
.. figure:: img/run_task_on_celery_executor.png
:scale: 50 %

Sequence diagram - task execution process
Expand Down Expand Up @@ -205,7 +202,7 @@ During this process, two 2 process are created:
| [11] **WorkerProcess** saves status information in **ResultBackend**.
| [13] When **SchedulerProcess** asks **ResultBackend** again about the status, it will get information about the status of the task.
.. _executor:CeleryExecutor:queue:
.. _celery_executor:queue:

Queues
------
Expand Down
Expand Up @@ -15,9 +15,6 @@
specific language governing permissions and limitations
under the License.
.. _executor:CeleryKubernetesExecutor:

CeleryKubernetes Executor
=========================

Expand All @@ -36,7 +33,7 @@ An executor is chosen to run a task based on the task's queue.
``CeleryKubernetesExecutor`` inherits the scalability of the ``CeleryExecutor`` to
handle the high load at the peak time and runtime isolation of the ``KubernetesExecutor``.

The configuration parameters of the Celery Executor can be found in :doc:`apache-airflow-providers-celery:configurations-ref`.
The configuration parameters of the Celery Executor can be found in the Celery provider's :doc:`configurations-ref`.


When to use CeleryKubernetesExecutor
Expand Down
10 changes: 10 additions & 0 deletions docs/apache-airflow-providers-celery/index.rst
Expand Up @@ -29,6 +29,16 @@
Changelog <changelog>
Security <security>


.. toctree::
:hidden:
:maxdepth: 1
:caption: Executors

CeleryExecutor details <celery_executor>
CeleryKubernetesExecutor details <celery_kubernetes_executor>


.. toctree::
:hidden:
:maxdepth: 1
Expand Down
File renamed without changes
File renamed without changes
8 changes: 8 additions & 0 deletions docs/apache-airflow-providers-cncf-kubernetes/index.rst
Expand Up @@ -29,6 +29,14 @@
Changelog <changelog>
Security <security>

.. toctree::
:hidden:
:maxdepth: 1
:caption: Executors

KubernetesExecutor details <kubernetes_executor>
LocalKubernetesExecutor details <local_kubernetes_executor>

.. toctree::
:hidden:
:maxdepth: 1
Expand Down
Expand Up @@ -16,7 +16,7 @@
under the License.
.. _executor:KubernetesExecutor:
.. _KubernetesExecutor:

Kubernetes Executor
===================
Expand All @@ -38,12 +38,12 @@ KubernetesExecutor requires a non-sqlite database in the backend.

When a DAG submits a task, the KubernetesExecutor requests a worker pod from the Kubernetes API. The worker pod then runs the task, reports the result, and terminates.

.. image:: ../../img/arch-diag-kubernetes.png
.. image:: img/arch-diag-kubernetes.png


One example of an Airflow deployment running on a distributed set of five nodes in a Kubernetes cluster is shown below.

.. image:: ../../img/arch-diag-kubernetes2.png
.. image:: img/arch-diag-kubernetes2.png

Consistent with the regular Airflow architecture, the Workers need access to the DAG files to execute the tasks within those DAGs and interact with the Metadata repository. Also, configuration information specific to the Kubernetes Executor, such as the worker namespace and image information, needs to be specified in the Airflow Configuration file.

Expand All @@ -56,7 +56,7 @@ Additionally, the Kubernetes Executor enables specification of additional featur
.. Airflow_Worker -> Kubernetes: Pod completes with state "Succeeded" and k8s records in ETCD
.. Kubernetes -> Airflow_Scheduler: Airflow scheduler reads "Succeeded" from k8s watcher thread
.. @enduml
.. image:: ../../img/k8s-happy-path.png
.. image:: img/k8s-happy-path.png

Configuration
-------------
Expand Down Expand Up @@ -272,7 +272,7 @@ In the case where a worker dies before it can report its status to the backend D
..
.. @enduml
.. image:: ../../img/k8s-failed-pod.png
.. image:: img/k8s-failed-pod.png


A Kubernetes watcher is a thread that can subscribe to every change that occurs in Kubernetes' database. It is alerted when pods start, run, end, and fail.
Expand Down
Expand Up @@ -16,7 +16,7 @@
under the License.
.. _executor:LocalKubernetesExecutor:
.. _LocalKubernetesExecutor:

LocalKubernetes Executor
=========================
Expand Down
4 changes: 2 additions & 2 deletions docs/apache-airflow-providers-cncf-kubernetes/operators.rst
Expand Up @@ -35,7 +35,7 @@ you to create and run Pods on a Kubernetes cluster.
- :ref:`EksPodOperator <howto/operator:EksPodOperator>` operator for `AWS Elastic Kubernetes Engine <https://aws.amazon.com/eks/>`__.

.. note::
The :doc:`Kubernetes executor <apache-airflow:core-concepts/executor/kubernetes>` is **not** required to use this operator.
The :doc:`Kubernetes executor <kubernetes_executor>` is **not** required to use this operator.

How does this operator work?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -602,7 +602,7 @@ you to create and run Jobs on a Kubernetes cluster.
- ``GKEStartJobOperator`` operator for `Google Kubernetes Engine <https://cloud.google.com/kubernetes-engine/>`__.

.. note::
The :doc:`Kubernetes executor <apache-airflow:core-concepts/executor/kubernetes>` is **not** required to use this operator.
The :doc:`Kubernetes executor <kubernetes_executor>` is **not** required to use this operator.

How does this operator work?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
Expand Up @@ -32,7 +32,7 @@ We maintain an :doc:`official Helm chart <helm-chart:index>` for Airflow that he
Kubernetes Executor
^^^^^^^^^^^^^^^^^^^

The :doc:`Kubernetes Executor </core-concepts/executor/kubernetes>` allows you to run all the Airflow tasks on
The :doc:`Kubernetes Executor <apache-airflow-providers-cncf-kubernetes:kubernetes_executor>` allows you to run all the Airflow tasks on
Kubernetes as separate Pods.

KubernetesPodOperator
Expand Down
Expand Up @@ -135,7 +135,7 @@ HTTP monitoring for Celery Cluster

You can optionally use Flower to monitor the health of the Celery cluster. It also provides an HTTP API that you can use to build a health check for your environment.

For details about installation, see: :ref:`executor:CeleryExecutor`. For details about usage, see: `The Flower project documentation <https://flower.readthedocs.io/>`__.
For details about installation, see: :doc:`apache-airflow-providers-celery:celery_executor`. For details about usage, see: `The Flower project documentation <https://flower.readthedocs.io/>`__.

CLI Check for Celery Workers
----------------------------
Expand Down
Expand Up @@ -60,8 +60,8 @@ Airflow uses :class:`~airflow.executors.sequential_executor.SequentialExecutor`
nature, the user is limited to executing at most one task at a time. ``Sequential Executor`` also pauses
the scheduler when it runs a task, hence it is not recommended in a production setup. You should use the
:class:`~airflow.executors.local_executor.LocalExecutor` for a single machine.
For a multi-node setup, you should use the :doc:`Kubernetes executor <../core-concepts/executor/kubernetes>` or
the :doc:`Celery executor <../core-concepts/executor/celery>`.
For a multi-node setup, you should use the :doc:`Kubernetes executor <apache-airflow-providers-cncf-kubernetes:kubernetes_executor>` or
the :doc:`Celery executor <apache-airflow-providers-celery:celery_executor>`.


Once you have configured the executor, it is necessary to make sure that every node in the cluster contains
Expand Down Expand Up @@ -170,19 +170,19 @@ of the executor you use:
the tasks it runs - then you will need to clear and restart those tasks manually after the upgrade
is completed (or rely on ``retry`` being set for stopped tasks).

* For the :doc:`Celery executor <../core-concepts/executor/celery>`, you have to first put your workers in
* For the :doc:`Celery executor <apache-airflow-providers-celery:celery_executor>`, you have to first put your workers in
offline mode (usually by setting a single ``TERM`` signal to the workers), wait until the workers
finish all the running tasks, and then upgrade the code (for example by replacing the image the workers run
in and restart the workers). You can monitor your workers via ``flower`` monitoring tool and see the number
of running tasks going down to zero. Once the workers are upgraded, they will be automatically put in online
mode and start picking up new tasks. You can then upgrade the ``Scheduler`` in a rolling restart mode.

* For the :doc:`Kubernetes executor <../core-concepts/executor/kubernetes>`, you can upgrade the scheduler
* For the :doc:`Kubernetes executor <apache-airflow-providers-cncf-kubernetes:kubernetes_executor>`, you can upgrade the scheduler
triggerer, webserver in a rolling restart mode, and generally you should not worry about the workers, as they
are managed by the Kubernetes cluster and will be automatically adopted by ``Schedulers`` when they are
upgraded and restarted.

* For the :doc:``CeleryKubernetesExecutor <../core-concepts/executor/celery-kubernetes>``, you follow the
* For the :doc:``CeleryKubernetesExecutor <apache-airflow-providers-celery:celery_kubernetes_executor>``, you follow the
same procedure as for the ``CeleryExecutor`` - you put the workers in offline mode, wait for the running
tasks to complete, upgrade the workers, and then upgrade the scheduler, triggerer and webserver in a
rolling restart mode - which should also adopt tasks run via the ``KubernetesExecutor`` part of the
Expand Down
2 changes: 1 addition & 1 deletion docs/apache-airflow/best-practices.rst
Expand Up @@ -82,7 +82,7 @@ it difficult to check the logs of that Task from the Webserver. If that is not d
Communication
--------------

Airflow executes tasks of a DAG on different servers in case you are using :doc:`Kubernetes executor </core-concepts/executor/kubernetes>` or :doc:`Celery executor </core-concepts/executor/celery>`.
Airflow executes tasks of a DAG on different servers in case you are using :doc:`Kubernetes executor <apache-airflow-providers-cncf-kubernetes:kubernetes_executor>` or :doc:`Celery executor <apache-airflow-providers-celery:celery_executor>`.
Therefore, you should not store any file or config in the local filesystem as the next task is likely to run on a different server without access to it — for example, a task that downloads the data file that the next task processes.
In the case of :class:`Local executor <airflow.executors.local_executor.LocalExecutor>`,
storing a file on disk can make retries harder e.g., your task requires a config file that is deleted by another task in DAG.
Expand Down
Expand Up @@ -15,44 +15,10 @@
specific language governing permissions and limitations
under the License.
.. _executor:DebugExecutor:

Debug Executor (deprecated)
===========================
.. _concepts:debugging:

The :class:`~airflow.executors.debug_executor.DebugExecutor` is meant as
a debug tool and can be used from IDE. It is a single process executor that
queues :class:`~airflow.models.taskinstance.TaskInstance` and executes them by running
``_run_raw_task`` method.

Due to its nature the executor can be used with SQLite database. When used
with sensors the executor will change sensor mode to ``reschedule`` to avoid
blocking the execution of DAG.

Additionally ``DebugExecutor`` can be used in a fail-fast mode that will make
all other running or scheduled tasks fail immediately. To enable this option set
``AIRFLOW__DEBUG__FAIL_FAST=True`` or adjust ``fail_fast`` option in your ``airflow.cfg``.
For more information on setting the configuration, see :doc:`../../howto/set-config`.

**IDE setup steps:**

1. Add ``main`` block at the end of your DAG file to make it runnable.

It will run a backfill job:

.. code-block:: python
if __name__ == "__main__":
from airflow.utils.state import State
dag.clear()
dag.run()
2. Setup ``AIRFLOW__CORE__EXECUTOR=DebugExecutor`` in run configuration of your IDE. In
this step you should also setup all environment variables required by your DAG.

3. Run / debug the DAG file.
Debugging Airflow DAGs
======================

Testing DAGs with dag.test()
*****************************
Expand All @@ -74,7 +40,7 @@ and that's it! You can add argument such as ``execution_date`` if you want to te
you can run or debug DAGs as needed.

Comparison with DebugExecutor
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-----------------------------

The ``dag.test`` command has the following benefits over the :class:`~airflow.executors.debug_executor.DebugExecutor`
class, which is now deprecated:
Expand Down Expand Up @@ -102,3 +68,42 @@ Run ``python -m pdb <path to dag file>.py`` for an interactive debugging experie
-> bash_command='echo 1',
(Pdb) run_this_last
<Task(EmptyOperator): run_this_last>
.. _executor:DebugExecutor:

Debug Executor (deprecated)
***************************

The :class:`~airflow.executors.debug_executor.DebugExecutor` is meant as
a debug tool and can be used from IDE. It is a single process executor that
queues :class:`~airflow.models.taskinstance.TaskInstance` and executes them by running
``_run_raw_task`` method.

Due to its nature the executor can be used with SQLite database. When used
with sensors the executor will change sensor mode to ``reschedule`` to avoid
blocking the execution of DAG.

Additionally ``DebugExecutor`` can be used in a fail-fast mode that will make
all other running or scheduled tasks fail immediately. To enable this option set
``AIRFLOW__DEBUG__FAIL_FAST=True`` or adjust ``fail_fast`` option in your ``airflow.cfg``.
For more information on setting the configuration, see :doc:`../../howto/set-config`.

**IDE setup steps:**

1. Add ``main`` block at the end of your DAG file to make it runnable.

It will run a backfill job:

.. code-block:: python
if __name__ == "__main__":
from airflow.utils.state import State
dag.clear()
dag.run()
2. Setup ``AIRFLOW__CORE__EXECUTOR=DebugExecutor`` in run configuration of your IDE. In
this step you should also setup all environment variables required by your DAG.

3. Run / debug the DAG file.
12 changes: 4 additions & 8 deletions docs/apache-airflow/core-concepts/executor/index.rst
Expand Up @@ -52,19 +52,15 @@ There are two types of executors - those that run tasks *locally* (inside the ``
.. toctree::
:maxdepth: 1

debug
local
sequential

**Remote Executors**

.. toctree::
:maxdepth: 1

celery
celery_kubernetes
kubernetes
local_kubernetes
* :doc:`CeleryExecutor <apache-airflow-providers-celery:celery_executor>`
* :doc:`CeleryKubernetesExecutor <apache-airflow-providers-celery:celery_kubernetes_executor>`
* :doc:`KubernetesExecutor <apache-airflow-providers-cncf-kubernetes:kubernetes_executor>`
* :doc:`KubernetesLocalExecutor <apache-airflow-providers-cncf-kubernetes:local_kubernetes_executor>`


.. note::
Expand Down
7 changes: 7 additions & 0 deletions docs/apache-airflow/core-concepts/index.rst
Expand Up @@ -51,3 +51,10 @@ Here you can find detailed documentation about each one of the core concepts of
xcoms
variables
params

**Debugging**

.. toctree::
:maxdepth: 1

debug

0 comments on commit cd33c2a

Please sign in to comment.