[AIP-31] Create XComArg model #8052

casassg · 2020-04-02T02:17:01Z

Description

XComArg is a class that references an XCom entry from an operator. Can be used to explicitly set XCom dependencies and explictly pass results between operators.

Class attributes:

operator: BaseOperator: Origin operator instance.
key: str: Stores key for XCom value. Defaults to airflow.models.xcom. XCOM_RETURN_KEY

Class methods

get(context: dict)-> Any: Resolves XCom value given operator and key.
__getitem__(key, str) -> XComArg: Easy method to create new XComArg using the same operator but a different key.
add_downstream_dependent(operator: BaseOperator)->None: Add an operator as a downstream dependency of operator. [Optional] Makes this task simpler.

Proposed implementation: https://github.com/casassg/corrent/blob/master/corrent/xcom_arg.py

Use case / motivation

Explicit way to pass around results between operators.
This object is a reference to an XCom value that has not been created and will need to be resolved in the future.
It can generate new XComArgs that point to the same operator but has a different key by using the key accessor. E.g: output['train'].
To do so, it includes as class attributes both the originating operator instance and a key (both needed to pull the XCom).
Has a property op that allows access to the origin operator.

Stretch goal

Store type class for the XCom and lie to MyPy when trying to check class signature.

The text was updated successfully, but these errors were encountered:

boring-cyborg · 2020-04-02T02:17:02Z

Thanks for opening your first issue here! Be sure to follow the issue template!

turbaszek · 2020-04-17T12:03:09Z

Hey @casassg any update on this issue? :)

casassg · 2020-04-18T01:17:24Z

I've been a bit overwhelmed with the work from home situation. Will try to get to this on the weekend or next week. Otherwise, please feel free to take it from me 😄

jonathanshir · 2020-04-22T17:11:29Z

Hey guys, the databand team and I decided to implement [AIP-31] starting tomorrow.
Will be in touch with you soon for implementation discussions.
Thanks, Johnny :)

casassg · 2020-04-22T23:39:40Z

Sounds great to me! I can help review and give directions. Let's sync some time! I also do plan to contribute some part

evgenyshulman · 2020-04-25T13:43:43Z

now the main purpose of this XComArgs is about being able to inject “dependency” on another operator + route to fetch that dependency from XCom. I assume generic class is required around jinja templates baking + defining dependencies in the Arg. Let’s create class ArgBase that will have these 3 interfaces defined :

it can be “resolved”, it can be represented as Jinja template
it can provide list of dependencies ( list of upstream operators)
Jinja string that represents it

turbaszek · 2020-04-25T18:44:03Z

Should "normal" operators be also XComArgs?

Once we have functional operators, should I as a user expect this DAG to work?

with DAG("example"):
    op1 = CreateClusterOperator("op", ...)
    op2 = RunJobOperator("op2", cluster_name=op1)

# Or with custom XCom key

with DAG("example"):
    op1 = CreateClusterOperator("op", ...)
    op2 = RunJobOperator("op2", cluster_name=op1["cluster_id"])

If no, why not?

casassg · 2020-04-25T19:09:36Z

it can be represented as Jinja template

I do not agree on this. Why do we need to stick to Jinja templates? Seems we are trying to fit the new solution to the existing workaround, when we should think of this as a more generic solution. I dont see why we need a ArgBase class. Could you illustrate with an example?

Should "normal" operators be also XComArgs?

No let's use #8055 instead:

with DAG("example"):
    op1 = CreateClusterOperator("op", ...)
    op2 = RunJobOperator("op2", cluster_name=op1.output)

# Or with custom XCom key

with DAG("example"):
    op1 = CreateClusterOperator("op", ...)
    op2 = RunJobOperator("op2", cluster_name=op1.output["cluster_id"])

evgenyshulman · 2020-04-25T19:20:35Z

An example would be cases when people build command lines, text messages, SQL queries, and so on based on the previous operator's results
for example:
1. BashOperator with bash_command ( that is expected to be string)
BashOperator( "do_something --cluster-id %s" %..)
2. HiveOperator( hql = "... %s" % some_op_result["table_id")

I personally don't like using Jinja templating, but till now, this was the only option for Airflow. I would implement it for "backward compatibility," not only on the technical side but also on the philosophical side. Also, it worth mentioning, I would not do that as separate features, but everything else (Jinja, Jinja resolving, templated_params is already part of airflow, so we just need to implement "to string" representation for this feature.

evgenyshulman · 2020-04-25T19:22:44Z

The reasoning for base class is only because there is a proposal about having different implementation for the class. So it worth defining what is expected fro the new implementation in the base class. (for example what if some implementation doesn't use XCom at all, why should it inherit from XcomArg class? These are non-critical question, I am ok with no base class at all.

turbaszek · 2020-04-25T19:32:49Z

An example would be cases when people build command lines, text messages, SQL queries, and so on based on the previous operator's results
for example:

BashOperator with bash_command ( that is expected to be string)
BashOperator( "do_something --cluster-id %s" %..)

HiveOperator( hql = "... %s" % some_op_result["table_id")

Imho here we should be opinionated and "purely functional". Thus I would suggest using factory function (operator) like prepare_bash_cmd or build_hql which result (XComArg) will be then passed as an input.

evgenyshulman · 2020-04-25T22:35:22Z

it's definitely an option for people who wants a "purely function" approach from the beginning. Most of Airflow users are already very invested in the current implementation without functional operators.

They will have to rewrite their current implementation ( extract into function with @task).
In case it's part of internal operator implementation it's even harder to do
It will affect an execution graph (extra node), while some Airflow users are trying to keep graph very clean.

IMHO: ( as I wrote at current PR), we should decide before we push implementation into Airflow/master and we should base our decision on some examples we wrote with end to end usage of AIP-31 implementation. I see a point in keeping everything as simple as possible, at the same time, I want all users to be happy if it doesn't affect implementation in a bad way :)

casassg · 2020-04-26T05:31:32Z

Regarding templating, I understand why you say that now. An option we implemented internally to do this was to add a map function to XComArg which is lazy evaluated and that enables transforming the value pulled before passing it to the operator.

My main worry is enabling the Jinja string is that it may cause some non-expected behaviour. For example, what happens when the value is not a string? I rather not enable it mostly because it still can be done using context['ti'].xcom_pull if users needed, while not causing errors for users that are just starting to use Airflow.

Regarding BaseArg class. I would say no as we are moving the extensibility piece to XCom itself. That should allow to inject custom serialization/deserialization behaviour consistently across Airflow and not only XComArg. Can't think of any other reason to overwrite XComArg apart from custom serial/deserial logic.

evgenyshulman · 2020-04-26T14:48:36Z

Regarding templating, I understand why you say that now. An option we implemented internally to do this was to add a map function to XComArg which is lazy evaluated and that enables transforming the value pulled before passing it to the operator.

My main worry is enabling the Jinja string is that it may cause some non-expected behavior. For example, what happens when the value is not a string? I rather not enable it mostly because it still can be done using context['ti'].xcom_pull if users needed, while not causing errors for users that are just starting to use Airflow.

I agree that Jinja + XCom was always problematic, however, most of the time it does work. if object is nonstr -> Jinja will convert it into str as it was doing for any other object coming directly from xcom_pull. We can remove str implementation if we have any problems around it before we merge into apache-airflow

Regarding BaseArg class. I would say no as we are moving the extensibility piece to XCom itself. That should allow injecting custom serialization/deserialization behavior consistently across Airflow and not only XComArg. Can't think of any other reason to overwrite XComArg apart from custom serial/deserial logic.

My working assumption is that replacing XCom only is going to be not enough to implement "real" replacement for serialization/deserialization, but.. if I am wrong, we can easily implement that base class. So let's just wait till somebody will do that for real.

casassg · 2020-04-26T22:32:46Z

Sounds good. Regarding Jinja, maybe a way is to make it xcom_arg.jinja property instead of making it under str. But it's mostly a nitpick, so fine by me.

If we keep XComArg as a simple XCom wrapper, that should work. Want to make sure we keep XComArg as low complex as possible, and if we need to add some extra XCom logic add it to XCom class itself instead.

casassg added the kind:feature Feature Requests label Apr 2, 2020

This was referenced Apr 2, 2020

[AIP-31] Resolve XComArg in template_fields #8053

Closed

[AIP-31] Add dependency relationship on operator when templated_field is XComArg #8054

Closed

[AIP-31] Add "output" property in BaseOperator #8055

Closed

turbaszek added the AIP-31 Task Flow API for nicer DAG definition label Apr 2, 2020

kaxil assigned casassg Apr 2, 2020

turbaszek assigned jonathanshir and unassigned casassg Apr 24, 2020

jonathanshir mentioned this issue Apr 30, 2020

[AIP-31] Implement XComArg model to functionally pass output from one operator to the next #8652

Merged

6 tasks

kaxil closed this as completed in #8652 May 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AIP-31] Create XComArg model #8052

[AIP-31] Create XComArg model #8052

casassg commented Apr 2, 2020 •

edited

Loading

boring-cyborg bot commented Apr 2, 2020

turbaszek commented Apr 17, 2020

casassg commented Apr 18, 2020

jonathanshir commented Apr 22, 2020

casassg commented Apr 22, 2020

evgenyshulman commented Apr 25, 2020

turbaszek commented Apr 25, 2020

casassg commented Apr 25, 2020 •

edited

Loading

evgenyshulman commented Apr 25, 2020

evgenyshulman commented Apr 25, 2020

turbaszek commented Apr 25, 2020

evgenyshulman commented Apr 25, 2020

casassg commented Apr 26, 2020

evgenyshulman commented Apr 26, 2020

casassg commented Apr 26, 2020

[AIP-31] Create XComArg model #8052

[AIP-31] Create XComArg model #8052

Comments

casassg commented Apr 2, 2020 • edited Loading

boring-cyborg bot commented Apr 2, 2020

turbaszek commented Apr 17, 2020

casassg commented Apr 18, 2020

jonathanshir commented Apr 22, 2020

casassg commented Apr 22, 2020

evgenyshulman commented Apr 25, 2020

turbaszek commented Apr 25, 2020

casassg commented Apr 25, 2020 • edited Loading

evgenyshulman commented Apr 25, 2020

evgenyshulman commented Apr 25, 2020

turbaszek commented Apr 25, 2020

evgenyshulman commented Apr 25, 2020

casassg commented Apr 26, 2020

evgenyshulman commented Apr 26, 2020

casassg commented Apr 26, 2020

casassg commented Apr 2, 2020 •

edited

Loading

casassg commented Apr 25, 2020 •

edited

Loading