-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update tutorial docs to include a definition of operators #25012
Conversation
docs/apache-airflow/tutorial.rst
Outdated
An operator defines a unit of work for Airflow to complete. They are the most basic building blocks of DAGs. | ||
|
||
All operators inherit from the BaseOperator, which includes all of the required arguments for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All operators inherit from the BaseOperator
This is kind of technically inaccurate since task-mapping produces operators that do not inherit from BaseOperator. It may also confuse some users since using @task
(taskflow decorators) does not seem to have anything to do with operators (they do create operators behind the scenes, but it’s more of an implementation detail). So I’d probably just say something like
An operator defines a unit of work for Airflow to complete. All operator classes include required arguments for running work in Airflow. […]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also do not agree they are most basic building blocks. Hooks IMHO are much more of that.
We should treat thius chapter (which is important to add I agree), not only as describing the current (or rather past approach) but also to start passing a message to our users about where we want to steer their way of developing DAGs
I think (and hope) we will manage to transiton out from using Operators "by default" but using more Task Flow where Operator concept does not exist and is replaced by a much more flexible and volatile "task". There are a few things missing for that (lineage for hooks for one) to be able to say "taskf flow and hooks are the BEST way", but I think eventually we will get there, so I would not like to stress that Operators are "most basic" in the official documentation. I would like rather to gently steer away our users from using operators and convert them to use most task flow approach - and initially combine it with operators but finally fully transition to it.
What I would like to mention here, is to make sure that we:
-
mention that operators are "classic" approach but using Task Flow and Hooks is more flexible and in many cases simpler and that they can be combined with task-flow based tasks.
-
I think we should mention that Operators are important but not necessary to build DAGs. there are a way of doing it, but "most basic building block" is not a message we want to pass.
-
I think it's worth mentioning here that more modern and more Pythonic way is task flow and link to that (while mentioning that "classic" operators are still ok in a number of cases..
Those are the words (and message) I think we should pass:
- Operators are 'classic" and they are heading into direction of "legacy"
- TaskFlow is more "modern" and "flexible" and "more Pythonic"
- We can interoperate between the "classic" and "modern" approach in a single DAG and generally using more taskflow where it makes sense is advised.
@potiuk I updated the definition to include the language that you suggested, and added a sentence about why operators are helpful in this basic tutorial: I think the bit-shift operators in this tutorial help illustrate the shape of a DAG, which is a little bit harder to visualize if you jump directly into using TaskFlow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now.
Yeah. There are a few things that might be improved in the TaskFlow approach. I wonder if we cannot make the << >> operators to work with taskflow. Should be possible @uranusjr ? |
I believe those operators already work right now. |
Static checks unrelated. Fix is coming. Merging |
Awesome work, congrats on your first merged pull request! |
The basic Tutorial in docs introduces tasks without first introducing the concept of operators. I tried to document a basic definition of operators for this tutorial, and I updated the definition of tasks to more clearly relate to operators.