Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI: airflow dags test { dag w/ schedule_interval=None } error: "No run dates were found" #18473

Closed
1 of 2 tasks
MatrixManAtYrService opened this issue Sep 23, 2021 · 7 comments · Fixed by #18742
Closed
1 of 2 tasks
Labels
affected_version:2.2 Issues Reported for 2.2 area:core kind:bug This is a clearly a bug
Milestone

Comments

@MatrixManAtYrService
Copy link
Contributor

Apache Airflow version

2.2.0b2 (beta snapshot)

Operating System

ubuntu 20.04

Versions of Apache Airflow Providers

n/a

Deployment

Virtualenv installation

Deployment details

pip install /path/to/airflow/src

What happened

Given any DAG initialized with: schedule_interval=None

Run airflow dags test mydagname $(date +%Y-%m-%d) and get an error:

INFO - No run dates were found for the given dates and dag interval.

This behavior changed in #15397, it used to trigger a backfill dagrun at the given date.

What you expected to happen

I expected a backfill dagrun with the given date, regardless of whether it fit into the schedule_interval.

If AIP-39 made that an unrealistic expectation, then I'd hope for some way to define unscheduled dags which can still be tested from the command line (which, so far as I know, is the fastest way to iterate on a DAG.).

As it is, I keep changing schedule_interval back and forth depending on whether I want to iterate via astro dev start (which tolerates None but does superfluous work if the dag is scheduled) or via airflow dags test ... (which doesn't tolerate None).

How to reproduce

Initialize a DAG with: schedule_interval=None and run it via airflow dags test mydagname $(date +%Y-%m-%d)

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@MatrixManAtYrService MatrixManAtYrService added area:core kind:bug This is a clearly a bug labels Sep 23, 2021
@MatrixManAtYrService MatrixManAtYrService changed the title CLI: airflow dags test { any dag with schedule_interval=None } causes error: "No run dates were found" CLI: airflow dags test { dag w/ schedule_interval=None } causes error: "No run dates were found" Sep 23, 2021
@MatrixManAtYrService MatrixManAtYrService changed the title CLI: airflow dags test { dag w/ schedule_interval=None } causes error: "No run dates were found" CLI: airflow dags test { dag w/ schedule_interval=None } error: "No run dates were found" Sep 23, 2021
@MatrixManAtYrService
Copy link
Contributor Author

I had filed this issue in the wrong repo previously (oops). Here's a comment about it from @uranusjr :

Hmm, this is some weird inherited design. So airflow dag test triggers backfill runs (in this instance one backfill run on start_date) but ignores all the scheduling limitations specifies by the DAG. It schedules backfills even for a None schedule (which logically never backfills), and ignore start_date, end_date, etc. The timetable implementation unified the scheduling logic and causes a None schedule to never backfill, and a backfill to never happen outside of the DAG and its tasks’ start_date and end_date.

I think we have a design decision to make. We could stick to the current logic and change airflow dag test to trigger manual runs instead, but that’s affect the behaviour of DAGs that have a schedule since now the backfilled time to run more than once (in the previous implementation, if a date was run by airflow dag test, the scheduler won’t schedule a backfill at that time again). Or we need to add some additional logic to the timetable protocol (and users’ custom timetables) to account for this “hey I know this timetable don’t schedule backfills during this time but please give me a backfilling schedule anyway” logic.

@ashb ashb added this to the Airflow 2.2.0 milestone Sep 23, 2021
@ashb ashb added the affected_version:2.2 Issues Reported for 2.2 label Sep 23, 2021
@uranusjr
Copy link
Member

uranusjr commented Sep 23, 2021

Some additional unorganised thoughts:

  1. It’s possible to match the pre-AIP-39 behaviour.
  2. But I am not very keen to do it and would prefer other possibilities if possible because the previous design really does not make sense.
  3. For this specific use case, changing the test runs to manual would be sufficient. But that may cause issues for DAGs that do have a backfilling schedule (changing the type to manual would result in duplicated runs, one backfilled and one manual).
  4. Do we really need airflow dags test now since we can do airflow dags trigger -e? This has clearer semantic (obviously manual), and airflow backfill -s ... -e ... is available if you want backfill runs instead. Maybe we can match the old behaviour and deprecate the entire command instead.

@MatrixManAtYrService
Copy link
Contributor Author

regarding (4), the trouble with using those commands as a substitute for airflow dags test is that they don't show you the task logs, so they're not especially useful if your dev workflow looks like:

  • edit mydag.py
  • airflow dags test mydag
  • look at task logs
  • edit some more

Also, they require that you have a scheduler running, which is a small step, but still an extra setup/teardown to keep track of.

@uranusjr
Copy link
Member

uranusjr commented Sep 24, 2021

That’s a good point. It’s difficult to have the same behaviour the same for trigger right now (needs some refactoring), so if this is an important feature I’m inclined to restore the previous behaviour for 2.2 and plan for a deprecation/unification later. (I’m still not entirely convinced we should cover schedule_interval=None though, it really feels weird).

@uranusjr
Copy link
Member

cc @ashb for thoughts.

@uranusjr
Copy link
Member

uranusjr commented Oct 5, 2021

I've submitted a hack to restore the behaviour #18742. I'm still not entirely convinced this is the right thing to do; maybe eventually we should get rid of airflow dags test entirely and add a flag to trigger and backfill instead to make them block and emit logs to the CLI (i.o.w. allow the user to run them with DebugExecutor instead of the default LocalExecutor) to cover the use case. This would make more semantical sense because it'd be obvious to the user what kind of run they are requesting (backfill or manual), and it'd be easier to understand why a backfill does not trigger any runs at all.

@dstandish dstandish reopened this Oct 5, 2021
@dstandish
Copy link
Contributor

maybe eventually we should get rid of airflow dags test entirely and add a flag to trigger and backfill instead to make them block and emit logs to the CLI (i.o.w. allow the user to run them with DebugExecutor instead of the default LocalExecutor) to cover the use case. This would make more semantical sense because it'd be obvious to the user what kind of run they are requesting (backfill or manual), and it'd be easier to understand why a backfill does not trigger any runs at all.

Yeah now that you mention it, dags test is not not super obvious that it's actually running the task. One could think it is running tests for the dag.

kaxil pushed a commit that referenced this issue Oct 6, 2021
This restores the behavior around this prior to AIP-39 implementation. It is arguably not correct, but nobody ever complained about it (and they have to the new behavior), so we should meet user expectations.

Close #18473
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affected_version:2.2 Issues Reported for 2.2 area:core kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants