Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaskFlow unexpectedly unwinds 'dict' #27819

Open
2 tasks done
bolkedebruin opened this issue Nov 21, 2022 · 5 comments
Open
2 tasks done

TaskFlow unexpectedly unwinds 'dict' #27819

bolkedebruin opened this issue Nov 21, 2022 · 5 comments
Labels
AIP-31 Task Flow API for nicer DAG definition area:core kind:feature Feature Requests

Comments

@bolkedebruin
Copy link
Contributor

bolkedebruin commented Nov 21, 2022

Apache Airflow version

main (development)

What happened

if using the TaskFlow API and one does the following

@task()
def MyFunc(myarg: int) -> dict:
  return dict({x: "y", z: "x"})

Airflow automatically unwinds dict automatically into separate variables x and z and if you do not want that kind of bahavior you will need to set multiple_outputs=False. While this is in the documentation, it is rather non intuitive requiring someone to be aware of the documentation that the default of multiple_outputs changes depending the return type of your function. Moreover, it is unexpected behavior for people coming from plain python (e.g. someone converting their functions to Airflow tasks) and wanting to do:

@task()
def f1(myarg: int) -> dict:
  return dict({x: myarg, z: "x"})

@task
def f2(in: dict):
  print(in)

a = f1(10)
f2(a)

Which would suddenly have issues in unittesting.

cc @josh-fell @BasPH @uranusjr

What you think should happen instead

Airflow should only unwind dicts if explicitly asked to do so.

How to reproduce

No response

Operating System

Not relevant

Versions of Apache Airflow Providers

No response

Deployment

Docker-Compose

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@bolkedebruin bolkedebruin added kind:bug This is a clearly a bug area:core labels Nov 21, 2022
@uranusjr
Copy link
Member

This is multiple output inference: https://airflow.apache.org/docs/apache-airflow/stable/tutorial/taskflow.html#multiple-outputs-inference

Personally I’ve always found this feature confusing. You could argue it’s not a good design and I would not object, but it is not a bug.

@uranusjr uranusjr added kind:feature Feature Requests and removed kind:bug This is a clearly a bug area:core labels Nov 22, 2022
@bolkedebruin
Copy link
Contributor Author

Agreed it is not a bug, but an awkward design choice. It is indeed confusing as @BasPH also encountered it and didnt know what was causing it.

@uranusjr
Copy link
Member

So it seems this is introduced in 2.0.0 via #8996, so I’m going to cc @turbaszek for some context. Type annotations have come a long way since 2020 so maybe the feature does not blend well with modern best practice anymore?

bolkedebruin added a commit to bolkedebruin/airflow that referenced this issue Nov 22, 2022
dict return values from functions were automatically unrolled
in separate values. This meant that the default of `multiple_outputs`
was changing based on the return type of the function.

This is confusing as it requires the user to be aware of this and
rather un-pythonic (explicit over implicit) and makes conversion from
plain python functions to Airflow tasks harder as unit tests would suddenly
fail.

Closes: apache#27819
@bolkedebruin
Copy link
Contributor Author

bolkedebruin commented Nov 22, 2022

I don't think it was introduced there (i.e. the automatic inference)? multiple_outputs in itself is fine of course, the change default based on return type is the part that is confusing.

@uranusjr
Copy link
Member

uranusjr commented Nov 22, 2022

It was added there, and you can see the documentation addition in the pull request.

@eladkal eladkal added AIP-31 Task Flow API for nicer DAG definition area:core labels Jan 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AIP-31 Task Flow API for nicer DAG definition area:core kind:feature Feature Requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants