Conversation
|
Nice. This is a very good start. Much inline with what I was thinking. |
airflow/models.py
Outdated
There was a problem hiding this comment.
we may want to add a state here, so that the scheduler could completely disregard DAGs that are fully processed. I'm not sure whether it should just be boolean or if having more states would help.
There was a problem hiding this comment.
I would use a string to be able to mark failed dags as well.
|
We may need a property |
|
Somewhat unrelated: I've been planning on allowing for the scheduler to be distributable (many scheduler instances running concurrently). It would be a matter of taking locks in DagModel, adding a |
There was a problem hiding this comment.
Can a dag have branches with tasks that are never executed within one run? If yes, then this check would not be sufficient.
I am aware that this pull request is not finished (tests, error handling, documentation). I would like to start a more concrete discussion on the externally triggered DAGs as mentioned in issue #417.
Within my company (http://blue-yonder.com) we are evaluating whether we could use airflow and I would really love to do so. Especially I liked the model you have chosen in the APIs and the possibilies to define the DAGs in Python.
What we really need is the possibility to trigger DAGs externally. I read the discussion in the roadmap issue #417 and liked the ideas expressed there. I did a first prototype for the DagRun object and using this in the scheduler. Before investing further work in stabilizing this, I would like to get your feedback on whether this approach fits with the existing concepts. Does it make sense from your point of view to further work on that, or do you already have different plans/implementations?