From 2eed48e1b7ec5f9404430a6cd18f894460194e15 Mon Sep 17 00:00:00 2001 From: Alexis BRENON Date: Fri, 7 Jul 2023 18:09:26 +0200 Subject: [PATCH] Document cron and delta timetables (#32392) Add a comparison between delta and cron data interval timetable to highlight differences when catchup is False. --- .../authoring-and-scheduling/timetable.rst | 86 ++++++++++++++++++- docs/apache-airflow/core-concepts/dag-run.rst | 6 ++ 2 files changed, 89 insertions(+), 3 deletions(-) diff --git a/docs/apache-airflow/authoring-and-scheduling/timetable.rst b/docs/apache-airflow/authoring-and-scheduling/timetable.rst index 490e27d4f2084..6366e18d22883 100644 --- a/docs/apache-airflow/authoring-and-scheduling/timetable.rst +++ b/docs/apache-airflow/authoring-and-scheduling/timetable.rst @@ -114,6 +114,11 @@ DeltaDataIntervalTimetable Schedules data intervals with a time delta. Can be selected by providing a :class:`datetime.timedelta` or ``dateutil.relativedelta.relativedelta`` to the ``schedule`` parameter of a DAG. +This timetable is more focused on the data interval value and does not necessarily align execution dates with +arbitrary bounds such as start of day or of hour. + +.. seealso:: `Differences between the cron and delta data interval timetables`_ + .. code-block:: python @dag(schedule=datetime.timedelta(minutes=30)) @@ -129,6 +134,7 @@ A timetable that accepts a cron expression, creates data intervals according to trigger points, and triggers a DAG run at the end of each data interval. .. seealso:: `Differences between the two cron timetables`_ +.. seealso:: `Differences between the cron and delta data interval timetables`_ This can be selected by providing a string that is a valid cron expression to the ``schedule`` parameter of a DAG as described in the :doc:`../core-concepts/dags` documentation. @@ -171,10 +177,15 @@ first) event for the data interval, otherwise manual runs will run with a ``data def example_dag(): pass + +Timetables comparisons +---------------------- + + .. _Differences between the two cron timetables: Differences between the two cron timetables -------------------------------------------- +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ There are two timetables `CronTriggerTimetable`_ and `CronDataIntervalTimetable`_ that accepts a cron expression. There are some differences between the two: @@ -183,7 +194,7 @@ There are some differences between the two: expect cron to behave than that of `CronDataIntervalTimetable`_ (when ``catchup`` is ``False``). Whether taking care of *Data Interval* -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `CronTriggerTimetable`_ *does not* care the idea of *data interval*. It means the value of ``data_interval_start``, ``data_interval_end`` and legacy ``execution_date`` are the same - the time when a DAG run is triggered. @@ -193,7 +204,7 @@ On the other hand, `CronDataIntervalTimetable`_ *does* care the idea of *data in and end of the interval respectively. The time when a DAG run is triggered -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There is no difference between the two when ``catchup`` is ``True``. :ref:`dag-catchup` tells you how DAG runs are triggered when ``catchup`` is ``True``. @@ -217,3 +228,72 @@ is immediately triggered after you re-enable the DAG. By these examples, you see how `CronTriggerTimetable`_ triggers DAG runs is more intuitive and more similar to what people expect cron to behave than how `CronDataIntervalTimetable`_ does. + + +.. _Differences between the cron and delta data interval timetables: + +Differences between the cron and delta data interval timetables: +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Choosing between `DeltaDataIntervalTimetable`_ and `CronDataIntervalTimetable`_ depends on your use case. +If you enable a DAG at 01:05 on February 1st, the following table summarizes the DAG runs created (and the +data interval that they cover), depending on 3 arguments: ``schedule``, ``start_date`` and ``catchup``. + +.. list-table:: + :header-rows: 1 + + * - ``schedule`` + - ``start_date`` + - ``catchup`` + - Intervals covered + - Remarks + + * - ``*/30 * * * *`` + - ``year-02-01`` + - ``True`` + - * 00:00 - 00:30 + * 00:30 - 01:00 + - Same behavior than using the timedelta object. + + * - ``*/30 * * * *`` + - ``year-02-01`` + - ``False`` + - * 00:30 - 01:00 + - + + * - ``*/30 * * * *`` + - ``year-02-01 00:10`` + - ``True`` + - * 00:30 - 01:00 + - Interval 00:00 - 00:30 is not after the start date, and so is skipped. + + * - ``*/30 * * * *`` + - ``year-02-01 00:10`` + - ``False`` + - * 00:30 - 01:00 + - Whatever the start date, the data intervals are aligned with hour/day/etc. boundaries. + + * - ``datetime.timedelta(minutes=30)`` + - ``year-02-01`` + - ``True`` + - * 00:00 - 00:30 + * 00:30 - 01:00 + - Same behavior than using the cron expression. + + * - ``datetime.timedelta(minutes=30)`` + - ``year-02-01`` + - ``False`` + - * 00:35 - 01:05 + - Interval is not aligned with start date but with the current time. + + * - ``datetime.timedelta(minutes=30)`` + - ``year-02-01 00:10`` + - ``True`` + - * 00:10 - 00:40 + - Interval is aligned with start date. Next one will be triggered in 5 minutes covering 00:40 - 01:10. + + * - ``datetime.timedelta(minutes=30)`` + - ``year-02-01 00:10`` + - ``False`` + - * 00:35 - 01:05 + - Interval is aligned with current time. Next run will be triggered in 30 minutes. diff --git a/docs/apache-airflow/core-concepts/dag-run.rst b/docs/apache-airflow/core-concepts/dag-run.rst index 391fe958ff45e..f9ae98e3511ae 100644 --- a/docs/apache-airflow/core-concepts/dag-run.rst +++ b/docs/apache-airflow/core-concepts/dag-run.rst @@ -161,6 +161,12 @@ with a data between 2016-01-01 and 2016-01-02, and the next one will be created just after midnight on the morning of 2016-01-03 with a data interval between 2016-01-02 and 2016-01-03. +Be aware that using a ``datetime.timedelta`` object as schedule can lead to a different behavior. +In such a case, the single DAG Run created will cover data between 2016-01-01 06:00 and +2016-01-02 06:00 (one schedule interval ending now). For a more detailed description of the +differences between a cron and a delta based schedule, take a look at the +:ref:`timetables comparison ` + If the ``dag.catchup`` value had been ``True`` instead, the scheduler would have created a DAG Run for each completed interval between 2015-12-01 and 2016-01-02 (but not yet one for 2016-01-02, as that interval hasn't completed) and the scheduler will execute them sequentially.