Skip to content

Commit

Permalink
Document cron and delta timetables (apache#32392)
Browse files Browse the repository at this point in the history
Add a comparison between delta and cron data interval timetable to
highlight differences when catchup is False.
  • Loading branch information
AlexisBRENON authored and syedahsn committed Jul 11, 2023
1 parent c292215 commit 2eed48e
Show file tree
Hide file tree
Showing 2 changed files with 89 additions and 3 deletions.
86 changes: 83 additions & 3 deletions docs/apache-airflow/authoring-and-scheduling/timetable.rst
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,11 @@ DeltaDataIntervalTimetable
Schedules data intervals with a time delta. Can be selected by providing a
:class:`datetime.timedelta` or ``dateutil.relativedelta.relativedelta`` to the ``schedule`` parameter of a DAG.

This timetable is more focused on the data interval value and does not necessarily align execution dates with
arbitrary bounds such as start of day or of hour.

.. seealso:: `Differences between the cron and delta data interval timetables`_

.. code-block:: python
@dag(schedule=datetime.timedelta(minutes=30))
Expand All @@ -129,6 +134,7 @@ A timetable that accepts a cron expression, creates data intervals according to
trigger points, and triggers a DAG run at the end of each data interval.

.. seealso:: `Differences between the two cron timetables`_
.. seealso:: `Differences between the cron and delta data interval timetables`_

This can be selected by providing a string that is a valid cron expression to the ``schedule``
parameter of a DAG as described in the :doc:`../core-concepts/dags` documentation.
Expand Down Expand Up @@ -171,10 +177,15 @@ first) event for the data interval, otherwise manual runs will run with a ``data
def example_dag():
pass
Timetables comparisons
----------------------


.. _Differences between the two cron timetables:

Differences between the two cron timetables
-------------------------------------------
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

There are two timetables `CronTriggerTimetable`_ and `CronDataIntervalTimetable`_ that accepts a cron expression.
There are some differences between the two:
Expand All @@ -183,7 +194,7 @@ There are some differences between the two:
expect cron to behave than that of `CronDataIntervalTimetable`_ (when ``catchup`` is ``False``).

Whether taking care of *Data Interval*
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

`CronTriggerTimetable`_ *does not* care the idea of *data interval*. It means the value of ``data_interval_start``,
``data_interval_end`` and legacy ``execution_date`` are the same - the time when a DAG run is triggered.
Expand All @@ -193,7 +204,7 @@ On the other hand, `CronDataIntervalTimetable`_ *does* care the idea of *data in
and end of the interval respectively.

The time when a DAG run is triggered
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There is no difference between the two when ``catchup`` is ``True``. :ref:`dag-catchup` tells you how DAG runs are
triggered when ``catchup`` is ``True``.
Expand All @@ -217,3 +228,72 @@ is immediately triggered after you re-enable the DAG.

By these examples, you see how `CronTriggerTimetable`_ triggers DAG runs is more intuitive and more similar to what
people expect cron to behave than how `CronDataIntervalTimetable`_ does.


.. _Differences between the cron and delta data interval timetables:

Differences between the cron and delta data interval timetables:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Choosing between `DeltaDataIntervalTimetable`_ and `CronDataIntervalTimetable`_ depends on your use case.
If you enable a DAG at 01:05 on February 1st, the following table summarizes the DAG runs created (and the
data interval that they cover), depending on 3 arguments: ``schedule``, ``start_date`` and ``catchup``.

.. list-table::
:header-rows: 1

* - ``schedule``
- ``start_date``
- ``catchup``
- Intervals covered
- Remarks

* - ``*/30 * * * *``
- ``year-02-01``
- ``True``
- * 00:00 - 00:30
* 00:30 - 01:00
- Same behavior than using the timedelta object.

* - ``*/30 * * * *``
- ``year-02-01``
- ``False``
- * 00:30 - 01:00
-

* - ``*/30 * * * *``
- ``year-02-01 00:10``
- ``True``
- * 00:30 - 01:00
- Interval 00:00 - 00:30 is not after the start date, and so is skipped.

* - ``*/30 * * * *``
- ``year-02-01 00:10``
- ``False``
- * 00:30 - 01:00
- Whatever the start date, the data intervals are aligned with hour/day/etc. boundaries.

* - ``datetime.timedelta(minutes=30)``
- ``year-02-01``
- ``True``
- * 00:00 - 00:30
* 00:30 - 01:00
- Same behavior than using the cron expression.

* - ``datetime.timedelta(minutes=30)``
- ``year-02-01``
- ``False``
- * 00:35 - 01:05
- Interval is not aligned with start date but with the current time.

* - ``datetime.timedelta(minutes=30)``
- ``year-02-01 00:10``
- ``True``
- * 00:10 - 00:40
- Interval is aligned with start date. Next one will be triggered in 5 minutes covering 00:40 - 01:10.

* - ``datetime.timedelta(minutes=30)``
- ``year-02-01 00:10``
- ``False``
- * 00:35 - 01:05
- Interval is aligned with current time. Next run will be triggered in 30 minutes.
6 changes: 6 additions & 0 deletions docs/apache-airflow/core-concepts/dag-run.rst
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,12 @@ with a data between 2016-01-01 and 2016-01-02, and the next one will be created
just after midnight on the morning of 2016-01-03 with a data interval between
2016-01-02 and 2016-01-03.

Be aware that using a ``datetime.timedelta`` object as schedule can lead to a different behavior.
In such a case, the single DAG Run created will cover data between 2016-01-01 06:00 and
2016-01-02 06:00 (one schedule interval ending now). For a more detailed description of the
differences between a cron and a delta based schedule, take a look at the
:ref:`timetables comparison <Differences between the cron and delta data interval timetables>`

If the ``dag.catchup`` value had been ``True`` instead, the scheduler would have created a DAG Run
for each completed interval between 2015-12-01 and 2016-01-02 (but not yet one for 2016-01-02,
as that interval hasn't completed) and the scheduler will execute them sequentially.
Expand Down

0 comments on commit 2eed48e

Please sign in to comment.