Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tutorial docs to include a definition of operators #25012

Merged
merged 10 commits into from
Jul 18, 2022

Conversation

jwitz
Copy link
Contributor

@jwitz jwitz commented Jul 13, 2022

The basic Tutorial in docs introduces tasks without first introducing the concept of operators. I tried to document a basic definition of operators for this tutorial, and I updated the definition of tasks to more clearly relate to operators.

Comment on lines 108 to 110
An operator defines a unit of work for Airflow to complete. They are the most basic building blocks of DAGs.

All operators inherit from the BaseOperator, which includes all of the required arguments for
Copy link
Member

@uranusjr uranusjr Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All operators inherit from the BaseOperator

This is kind of technically inaccurate since task-mapping produces operators that do not inherit from BaseOperator. It may also confuse some users since using @task (taskflow decorators) does not seem to have anything to do with operators (they do create operators behind the scenes, but it’s more of an implementation detail). So I’d probably just say something like

An operator defines a unit of work for Airflow to complete. All operator classes include required arguments for running work in Airflow. […]

Copy link
Member

@potiuk potiuk Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also do not agree they are most basic building blocks. Hooks IMHO are much more of that.

We should treat thius chapter (which is important to add I agree), not only as describing the current (or rather past approach) but also to start passing a message to our users about where we want to steer their way of developing DAGs

I think (and hope) we will manage to transiton out from using Operators "by default" but using more Task Flow where Operator concept does not exist and is replaced by a much more flexible and volatile "task". There are a few things missing for that (lineage for hooks for one) to be able to say "taskf flow and hooks are the BEST way", but I think eventually we will get there, so I would not like to stress that Operators are "most basic" in the official documentation. I would like rather to gently steer away our users from using operators and convert them to use most task flow approach - and initially combine it with operators but finally fully transition to it.

What I would like to mention here, is to make sure that we:

  • mention that operators are "classic" approach but using Task Flow and Hooks is more flexible and in many cases simpler and that they can be combined with task-flow based tasks.

  • I think we should mention that Operators are important but not necessary to build DAGs. there are a way of doing it, but "most basic building block" is not a message we want to pass.

  • I think it's worth mentioning here that more modern and more Pythonic way is task flow and link to that (while mentioning that "classic" operators are still ok in a number of cases..

Those are the words (and message) I think we should pass:

  • Operators are 'classic" and they are heading into direction of "legacy"
  • TaskFlow is more "modern" and "flexible" and "more Pythonic"
  • We can interoperate between the "classic" and "modern" approach in a single DAG and generally using more taskflow where it makes sense is advised.

@jwitz
Copy link
Contributor Author

jwitz commented Jul 13, 2022

@potiuk I updated the definition to include the language that you suggested, and added a sentence about why operators are helpful in this basic tutorial: I think the bit-shift operators in this tutorial help illustrate the shape of a DAG, which is a little bit harder to visualize if you jump directly into using TaskFlow.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now.

@potiuk
Copy link
Member

potiuk commented Jul 13, 2022

@potiuk I updated the definition to include the language that you suggested, and added a sentence about why operators are helpful in this basic tutorial: I think the bit-shift operators in this tutorial help illustrate the shape of a DAG, which is a little bit harder to visualize if you jump directly into using TaskFlow.

Yeah. There are a few things that might be improved in the TaskFlow approach. I wonder if we cannot make the << >> operators to work with taskflow. Should be possible @uranusjr ?

@uranusjr
Copy link
Member

I believe those operators already work right now.

@potiuk
Copy link
Member

potiuk commented Jul 18, 2022

Static checks unrelated. Fix is coming. Merging

@potiuk potiuk merged commit cf41bca into apache:main Jul 18, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Jul 18, 2022

Awesome work, congrats on your first merged pull request!

@ephraimbuddy ephraimbuddy added the type:doc-only Changelog: Doc Only label Aug 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants