Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New dags fail to run #7872

Closed
cccs-cat001 opened this issue Mar 25, 2020 · 6 comments
Closed

New dags fail to run #7872

cccs-cat001 opened this issue Mar 25, 2020 · 6 comments
Labels
kind:bug This is a clearly a bug

Comments

@cccs-cat001
Copy link
Contributor

Apache Airflow version: 1.10.9

Kubernetes version (if you are using kubernetes) (use kubectl version): N/A

Environment: JupyterLab docker image, Ubuntu 18.04 VM

  • Cloud provider or hardware configuration: Microsoft Azure

  • OS (e.g. from /etc/os-release): Ubuntu 18.04

  • Kernel (e.g. uname -a): Linux 5.0.0-1032-azure Carry ignore_dependencies from backfill to run commands #34-Ubuntu SMP Mon Feb 10 19:37:25 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

  • Install tools: apache-airflow

  • Others:
    What happened:
    On a fresh install of airflow, I run airflow initdb, and then create a dag (bash.py)

from datetime import datetime
from airflow.models import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.dummy_operator import DummyOperator
from datetime import timedelta
import airflow

start_date = datetime(2020, 1, 1)

args={'owner': 'cccs-cat001', 'start_date': start_date}

dag = DAG(dag_id='date', default_args=args, schedule_interval=timedelta(minutes=5), dagrun_timeout=timedelta(minutes=60))

run_dag = BashOperator(task_id='show_date', bash_command='date', dag=dag)

for i in range(5):
        task = DummyOperator(task_id='dummy_'+str(i), dag=dag)
        task.set_upstream(run_dag)

and then airflow list_dags will show that the dag exists, and if you run airflow trigger_dag date it will give you the following error

Traceback (most recent call last):
  File "/home/artifactory/.local/bin/airflow", line 37, in <module>
    args.func(args)
  File "/home/artifactory/.local/lib/python3.6/site-packages/airflow/utils/cli.py", line 75, in wrapper
    return f(*args, **kwargs)
  File "/home/artifactory/.local/lib/python3.6/site-packages/airflow/bin/cli.py", line 237, in trigger_dag
    execution_date=args.exec_date)
  File "/home/artifactory/.local/lib/python3.6/site-packages/airflow/api/client/local_client.py", line 34, in trigger_dag
    execution_date=execution_date)
  File "/home/artifactory/.local/lib/python3.6/site-packages/airflow/api/common/experimental/trigger_dag.py", line 124, in trigger_dag
    raise DagNotFound("Dag id {} not found in DagModel".format(dag_id))
airflow.exceptions.DagNotFound: Dag id bash not found in DagModel

And it won't run. Now run airflow initdb and then trigger the dag again, and it works fine.

What you expected to happen: Should just be able to run the dag without initializing the db again...

How to reproduce it: see above

Anything else we need to know: The other version we're running is 1.10.5, and this issue doesn't seem to happen there. Once you run airflow initdb after the dag, it'll run always. But add a new dag and trigger it and the same thing occurs.

@cccs-cat001 cccs-cat001 added the kind:bug This is a clearly a bug label Mar 25, 2020
@boring-cyborg
Copy link

boring-cyborg bot commented Mar 25, 2020

Thanks for opening your first issue here! Be sure to follow the issue template!

@kaxil
Copy link
Member

kaxil commented Mar 25, 2020

This is the expected behavior. The Scheduler needs to do various validation before it can execute tasks and hence it needs the DAG to be in the database and the database has the needed tables (and completed migrations).

@kaxil kaxil closed this as completed Mar 25, 2020
@cccs-cat001
Copy link
Contributor Author

So when adding a new dag to airflow the process is: add dag to repo/directory, reinitialize the db, then run? That wasn't the process before as I said in 1.10.5, in that version you just need to add the dag to the repo/directory and it was possible to run it right away...

@kaxil
Copy link
Member

kaxil commented Mar 25, 2020

So when adding a new dag to airflow the process is: add dag to repo/directory, reinitialize the db, then run? That wasn't the process before as I said in 1.10.5, in that version you just need to add the dag to the repo/directory and it was possible to run it right away...

Not reinitialize if you have already initialized it.

I don't think we had that in 1.10.5, can you try to reproduce that using 1.10.5 in a different VirtualEnv with a different AIRFLOW_HOME and let me know. I think your DB might have been initialized before that is the reason it worked for 1.10.5 but happy to hear what you find.

@cccs-cat001
Copy link
Contributor Author

I see. So that same behavior does happen in 1.10.5. If the scheduler is running it works fine, the scheduler must add it to the db for me?
With that said, I think it'd be nice if the error message reminded you to be running the scheduler or something like that. Just a thought.

@NeoWang9999
Copy link

I agreed with cat buddy, the error message is not clear what is going on really.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests

3 participants