Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove cattrs from project #26130

Closed
ashb opened this issue Sep 2, 2022 · 11 comments · Fixed by #34672
Closed

Remove cattrs from project #26130

ashb opened this issue Sep 2, 2022 · 11 comments · Fixed by #34672
Labels
area:dependencies Issues related to dependencies problems kind:meta High-level information important to the community

Comments

@ashb
Copy link
Member

ashb commented Sep 2, 2022

Cattrs is currently only used in two places: Serialization for operator extra links, and for Lineage.

However cattrs is not a well maintained project and doesn't support many features that attrs itself does; in short, it's not worth the brain cycles to keep cattrs.

@ashb ashb added the kind:meta High-level information important to the community label Sep 2, 2022
@kaxil
Copy link
Member

kaxil commented Sep 2, 2022

@potiuk
Copy link
Member

potiuk commented Sep 2, 2022

Agree. It's been a source of many problems.

@Taragolis
Copy link
Contributor

Just a silly question. Are operator extra links use any ability of attrs?

I could find only airflow.providers.google.cloud.operators.bigquery.BigQueryConsoleIndexableLink which is Extra link to deprecated airflow.providers.google.cloud.operators.bigquery.BigQueryExecuteQueryOperator

@attr.s(auto_attribs=True)
class BigQueryConsoleIndexableLink(BaseOperatorLink):
"""Helper class for constructing BigQuery link."""
index: int = attr.ib()
@property
def name(self) -> str:
return f'BigQuery Console #{self.index + 1}'

All other links just inherit of BaseOperatorLink without use decorator @attr.s so it become as regular class and child not use any specific methods (actually this methods such as __attrs_post_init__ not use in links).

Might be possible to convert extra links to dataclasses? In this case it required to use dataclasses.fields for serialize and dataclasses.make_dataclass for deserialize

@ashb
Copy link
Member Author

ashb commented Sep 2, 2022

@Taragolis Attrs automatically works for subclasses, so those subclasses are attrs classes:

In [7]: @attr.define
   ...: class A:
   ...:     x: int

In [8]: class B(A):
   ...:     y: str

In [9]: attr.has(B())
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [9], in <module>
----> 1 attr.has(B())

TypeError: A.__init__() missing 1 required positional argument: 'x'

In [10]: attr.has(B(x=1))
Out[10]: True

@ashb
Copy link
Member Author

ashb commented Sep 2, 2022

But the bigger issue: We don't know what custom links might exist out in the wild, so we can't just blindly change the tech used without breaking someone's custom code.

@Taragolis
Copy link
Contributor

Attrs automatically works for subclasses, so those subclasses are attrs classes

I mean this case

Python 3.9.10 (main, Feb 25 2022, 16:54:01) 
Type 'copyright', 'credits' or 'license' for more information
IPython 8.4.0 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 8.4.0
Python 3.9.10 (main, Feb 25 2022, 16:54:01) 
[Clang 13.0.0 (clang-1300.0.29.30)] on darwin

import attr
@attr.s(auto_attribs=True)
class AAttr:
    x: int
    y: int = attr.field(init=False)
    def __attrs_post_init__(self):
        self.y = 42
        
AAttr(x=1)
Out[4]: AAttr(x=1, y=42)

class BAttr(AAttr):
    z: int = attr.field(init=False)
    def __attrs_post_init__(self):
        super().__attrs_post_init__()
        self.z = 43
        
BAttr(x=1)
Out[6]: BAttr(x=1, y=42)

@attr.s(auto_attribs=True)
class CAttr(AAttr):
    z: int = attr.field(init=False)
    def __attrs_post_init__(self):
        super().__attrs_post_init__()
        self.z = 43
        
CAttr(x=1)
Out[8]: CAttr(x=1, y=42, z=43)

But it mostly synthetic because BaseOperatorLink not use any of the __attrs_pre_init__, __attrs_post_init__, __attrs_init__ methods.

I don't remember why decorator required however without this decorator this link wouldn't show any warnings. Might be also related to serialisation/deserialisation cattr issue

@attr.s(auto_attribs=True)
class ExternalTaskSensorLink(ExternalDagLink):
"""
This external link is deprecated.
Please use :class:`airflow.sensors.external_task.ExternalDagLink`.
"""
def __attrs_post_init__(self):
warnings.warn(
"This external link is deprecated. "
"Please use :class:`airflow.sensors.external_task.ExternalDagLink`.",

But the bigger issue: We don't know what custom links might exist out in the wild, so we can't just blindly change the tech used without breaking someone's custom code.

I think it main reason why we can't just replace attrs by dataclasses

@kaxil
Copy link
Member

kaxil commented Sep 2, 2022

@Taragolis The problem isn't attrs but cattrs -- attrs is well maintained and is a superset of dataclasses.

You can read about it in https://www.attrs.org/en/stable/why.html#data-classes

@Taragolis
Copy link
Contributor

@kaxil I think attrs not actual superset of dataclasses because dataclasses itself introduced only in 3.7 and dataclasses itself has less abilities than attrs.

My main point more about is it actually required to use attrs for extra links or it is possible to use any other types such as dataclasses or even regular class. However since we really don't know about third-party extra links we can't replace it.

@potiuk
Copy link
Member

potiuk commented Sep 2, 2022

@kaxil I think attrs not actual superset of dataclasses because dataclasses itself introduced only in 3.7 and dataclasses itself has less abilities than attrs.

My main point more about is it actually required to use attrs for extra links or it is possible to use any other types such as dataclasses or even regular class. However since we really don't know about third-party extra links we can't replace it.

But this does not mean we cannot removed cattrs as required Airlfow dependency.

Potentially we could fallback to optionally available cattrs if the custom base link fail to instantiate. We can safely make an assumption that in most cases where somoene uses custom links they already have cattrs installed so this is not going to be a breaking change for them. And even if they (for example) use our image and no cattrs is installed there, we could check it and crash hard with error "please install cattrs to make it work again" in case custom link cannot be instantiated without catrs.

@Taragolis
Copy link
Contributor

But this does not mean we cannot removed cattrs as required Airlfow dependency.

Sure. I don't think cattrs need actually for ser/deser. Some simple snippet show that it could be archived by attrs

import attr
from airflow.utils.module_loading import import_string
from airflow.models.baseoperator import BaseOperatorLink

operator_links_source = {
    'airflow.providers.google.cloud.operators.bigquery.BigQueryConsoleIndexableLink': {
        'index': 0
    },
    'airflow.operators.trigger_dagrun.TriggerDagRunLink': {}
}

for link_class, link_kwargs in operator_links_source.items():
    # Ser
    klass = attr.make_class(
        link_class.rsplit(".")[-1],
        [],
        bases=(import_string(link_class), )
    )
    assert issubclass(klass, BaseOperatorLink)
    link = klass(**link_kwargs)

    # DeSer
    dotted_class = f"{klass.__module__}.{klass.__name__}"
    link_dict = attr.asdict(link)

@kaxil
Copy link
Member

kaxil commented Sep 2, 2022

Yup we are going to use attrs or serialized_objects (from Airflow) for that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dependencies Issues related to dependencies problems kind:meta High-level information important to the community
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants