Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using .output on non-templated fields #27285

Open
potiuk opened this issue Oct 26, 2022 Discussed in #26938 · 6 comments
Open

Using .output on non-templated fields #27285

potiuk opened this issue Oct 26, 2022 Discussed in #26938 · 6 comments
Labels

Comments

@potiuk
Copy link
Member

potiuk commented Oct 26, 2022

Discussed in #26938

Originally posted by stephenonethree October 7, 2022
I just discovered the .output property functionality that apparently was released in Airflow 2 for classic operators, as a simple way of accessing their output XComs. I think that this is a super useful feature because it would allow simpler connections between tasks than what I have been doing until now.

Until now, I've been explicitly giving a downstream task the task_ids and XCom names that it needs to pull from its upstreams (as hardcoded string parameters). Something like:

# pushes XCom named myvar
upstream_task=SomeOperator(task_id='upstream_task')

# this is a custom operator and within execute() I have a line roughly like:
# actual_input = context['task_instance'].xcom_pull(
#     dag_id=context['dag'].dag_id, task_ids=upstream_task_id, key=upstream_xcom_name)
this_task = OtherOperator(
    task_id='this_task',
    upstream_task_id='upstream_task',
    upstream_xcom_name='myvar'
)
upstream_task >> this_task

With .output I could simplify this to:

# pushes XCom named myvar
upstream_task=SomeOperator(task_id='upstream_task')

this_task = OtherOperator(
    task_id='this_task',
    actual_input=upstream_task.output['myvar']
)
upstream_task >> this_task

Unfortunately it seems that there is one limitation. On the TaskFlow documentation page (https://airflow.apache.org/docs/apache-airflow/2.4.1/tutorial/taskflow.html#consuming-xcoms-between-decorated-and-traditional-tasks) it says: Using the .output property as an input to another task is supported only for operator parameters listed as a template_field.

I don't use Jinja templating for very many of my parameters as it's mostly irrelevant for me. So my questions are, is there a technical reason for this limitation? If not, is this limitation something that you are considering dropping in a future Airflow version? I suppose I could just make these fields templated to get around it, but I don't really want to turn on templating if I don't expect to need it, since I suppose it introduces the possibility of incorrect interpolation (though perhaps that's a remote possibility because I don't think most of my variables will include {{ or }}.

Let me know if I should file this as a feature request instead, for now I guess the Ideas category works.

@uranusjr
Copy link
Member

I don’t think there’s a strict technical limitation, but a user perception considertation: Not all fields can reference an XComArg (e.g. outlets and executor_config can’t), so even if the limitation is lifted, it still wouldn’t be all fields. Instead of trying to teach users what fields can reference dynamic values and what can’t, it’s easier to positively list what can, and template_fields is an easy one to go with.

@potiuk
Copy link
Member Author

potiuk commented Oct 27, 2022

Following the comment - I have a bold proposal ... It's not exaclty what the original proposal is, but in a way it provides a possibility to do what was originally requested here.

Why don't we add an option (disabled by default) to make ALL ELIGIBLE fields - "templated_fields" (and automatically .output -capable).

That bothered me for a while but I think there is very little impact of making all fields templated and often people complained that templated fields. Performance overhead should be negligible (just walking through parameters and jinjafying them which in most cases will be no-op).

The only drawback it might have is that the if a string contains " {{}}" acidentally - this will be replaced with "" - which is backwards-incompatible. We could also provide a mechanism that would eclude a field from being templated just in case.

I think that has a number of benefits - for example our users will not have extend operators that miss some fields in "templated_fields".

I am not too worried anout "outlets" and executor_config not being available for .output and user's education. As long as we simply error out in this case that should be good.

In a way it woudl be similar to render_template_as_native_obj DAG paraemeter.

@rkarish
Copy link
Contributor

rkarish commented Jan 5, 2023

@potiuk Is this bold proposal still something you want to move forward with? I think it sounds interesting and I see how it can be useful to the Users. I can make an attempt at implementing the solution you have proposed.

@potiuk
Copy link
Member Author

potiuk commented Jan 6, 2023

Sure. Why not. I have not seen an opposition so far, and I think it's worth doing - I also see no big drawback if it's going to be similar setting to render_templates_as_native_obj that you set for the whole DAG.

@rkarish
Copy link
Contributor

rkarish commented Jan 19, 2023

Hey Jarek, I've opened a PR for this. I took a slightly different approach than a DAG level setting. I managed the functionality at the Operator level, but this can still be applied to the whole DAG with the use of default_args. I think that this approach gives the Users more flexibility. Looking forward to getting some feedback!

@potiuk
Copy link
Member Author

potiuk commented Jan 19, 2023

Hey Jarek, I've opened a PR for this. I took a slightly different approach than a DAG level setting. I managed the functionality at the Operator level, but this can still be applied to the whole DAG with the use of default_args. I think that this approach gives the Users more flexibility. Looking forward to getting some feedback!

Very nice. I asked others for input, but I like the way you proposed.

rkarish added a commit to rkarish/airflow that referenced this issue Jan 20, 2023
rkarish added a commit to rkarish/airflow that referenced this issue Jan 20, 2023
rkarish added a commit to rkarish/airflow that referenced this issue Jan 20, 2023
rkarish added a commit to rkarish/airflow that referenced this issue Jan 20, 2023
rkarish added a commit to rkarish/airflow that referenced this issue Jan 20, 2023
rkarish added a commit to rkarish/airflow that referenced this issue Jan 21, 2023
rkarish added a commit to rkarish/airflow that referenced this issue Jan 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants