Skip to content

Commit

Permalink
FAQ entry about start_date
Browse files Browse the repository at this point in the history
  • Loading branch information
mistercrunch committed Mar 5, 2016
1 parent 6832671 commit be69d39
Show file tree
Hide file tree
Showing 2 changed files with 41 additions and 1 deletion.
4 changes: 3 additions & 1 deletion airflow/models.py
Expand Up @@ -1419,7 +1419,9 @@ class derived from this one results in the creation of a task object,
start_date are offset in a way that their execution_date don't line
up, A's dependencies will never be met. If you are looking to delay
a task, for example running a daily task at 2AM, look into the
``TimeSensor`` and ``TimeDeltaSensor``.
``TimeSensor`` and ``TimeDeltaSensor``. We advise against using
dynamic ``start_date`` and recommend using fixed ones. Read the
FAQ entry about start_date for more information.
:type start_date: datetime
:param end_date: if specified, the scheduler won't go beyond this date
:type end_date: datetime
Expand Down
38 changes: 38 additions & 0 deletions docs/faq.rst
Expand Up @@ -60,3 +60,41 @@ documentation

- Verify that the ``fernet_key`` defined in ``$AIRFLOW_HOME/airflow.cfg`` is a valid Fernet key. It must be a base64-encoded 32-byte key. You need to restart the webserver after you update the key
- For existing connections (the ones that you had defined before installing ``airflow[crypto]`` and creating a Fernet key), you need to open each connection in the connection admin UI, re-type the password, and save it

**What's the deal with ``start_date``?**

``start_date`` is partly legacy from the pre-DagRun era, but it is still
relevant in many ways. When creating a new DAG, you probably want to set
a global ``start_date`` for your tasks using ``default_args``. The first
DagRun to be created will be based on the ``min(start_date)`` for all your
task. From that point on, the scheduler creates new DagRuns based on
your ``schedule_interval`` and the corresponding task instances run as your
dependencies are met. When introducing new tasks to your DAG, you need to
pay special attention to ``start_date``, and may want to reactivate
inactive DagRuns to get the new task to get onboarded properly.

We recommend against using dynamic values as ``start_date``, especially
``datetime.now()`` as it can be quite confusing. The task is triggered
once the period closes, and in theory an ``@hourly`` DAG would never get to
an hour after now as ``now()`` moves along.

We also recommend using rounded ``start_date`` in relation to your
``schedule_interval``. This means an ``@hourly`` would be at ``00:00``
minutes:seconds, a ``@daily`` job at midnight, a ``@monthly`` job on the
first of the month. You can use any sensor or a ``TimeDeltaSensor`` to delay
the execution of tasks within that period. While ``schedule_interval``
does allow specifying a ``datetime.timedelta``
object, we recommend using the macros or cron expressions instead, as
it enforces this idea of rounded schedules.

When using ``depends_on_past=True`` it's important to pay special attention
to ``start_date`` as the past dependency is not enforced only on the specific
schedule of the ``start_date`` specified for the task. It' also
important to watch DagRun activity status in time when introducing
new ``depends_on_past=True``, unless you are planning on running a backfill
for the new task(s).

Also important to note is that the tasks ``start_date``, in the context of a
backfill CLI command, get overridden by the backfill's command ``start_date``.
This allows for a backfill on tasks that have ``depends_on_past=True`` to
actually start, if it wasn't the case, the backfill just wouldn't start.

0 comments on commit be69d39

Please sign in to comment.