Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Metadata to emit runtime extra #38650

Merged
merged 12 commits into from
Apr 8, 2024

Conversation

uranusjr
Copy link
Member

@uranusjr uranusjr commented Apr 1, 2024

Close #37810. This implements the yield syntax and the Metadata class, and mechanism to collect the yielded values in the worker.

@uranusjr uranusjr requested a review from jscheffl April 1, 2024 06:40
@boring-cyborg boring-cyborg bot added area:core-operators Operators, Sensors and hooks within Core Airflow kind:documentation labels Apr 1, 2024
airflow/operators/python.py Outdated Show resolved Hide resolved
Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@eladkal eladkal added this to the Airflow 2.9.0 milestone Apr 1, 2024
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the desire to add the option to yield but I do not get the relation why the execution internals of PythonOperator implementation needs a change in method signatures for this. I feel that DataSetEvents should be the same approach like other context information and do not see why the callable execution needs a signature change.

airflow/operators/python.py Outdated Show resolved Hide resolved
The dataset URI coercing logic has been extracted into its own
(internal) function for reuse.
Also reverted check for task.pre_execute and task.post_execute since
those are defined on BaseOperator and shouldn't be a generator function.
Instead, those functions need to individually check if the hooks wrapped
inside are generator functions.
The key is not always available. If it isn't, we just create an accessor
object on the fly.
@uranusjr uranusjr force-pushed the task-yield-metadata branch 2 times, most recently from c3cd16d to 5b929fc Compare April 3, 2024 01:55
@uranusjr uranusjr requested a review from jscheffl April 3, 2024 07:08
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the rework. Now I really "like" it. Before I would have stepped back from review but would have not blocked merge :-D

One minor thing would be great to also try to use it in one of the example DAGs for example in example_producer_1 or 2? Does it make sense to add it one of the examples? (I propose this on top of docs which are good, sometimes people seek for inspiration or examples in the example code. Not many people read docs before designing or coding.

@uranusjr uranusjr merged commit e04af7e into apache:main Apr 8, 2024
41 checks passed
@uranusjr uranusjr deleted the task-yield-metadata branch April 8, 2024 02:06
odaneau-astro pushed a commit to odaneau-astro/airflow that referenced this pull request Apr 8, 2024
utkarsharma2 pushed a commit to astronomer/airflow that referenced this pull request Apr 22, 2024
@ephraimbuddy ephraimbuddy added the type:new-feature Changelog: New Features label Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core-operators Operators, Sensors and hooks within Core Airflow kind:documentation type:new-feature Changelog: New Features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Annotate a Dataset Event in the Source Task
4 participants