Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIP-58: Add object storage backend for xcom #37058

Merged
merged 20 commits into from
Feb 3, 2024

Conversation

bolkedebruin
Copy link
Contributor

This adds the possibility to store xcoms on a object storage supported backend.

This is an initial implementation and I am open for feedback (duh ;-) ). The BaseXCom is a bit messy and required some bug fixes. Hence, I have implemented the backend currently separated from BaseXCom. While functionally it is quite easy to merge both.

Notes:

  • Made sure that custom serialization is now called when using get_one
  • Added the purge function that is called when deleting / clearing xcoms so custom backend can purge xcom when required. This has also been reported as a bug: XCom delete in ui and db clean do not trigger delete for custom backend #31774
  • I have looked at FSMap that comes with fsspec, but decided that wasn't worth the effort and did not provide extra flexibility
  • I could move this implementation to common.io but given that I'd like to merge it with the standard BaseXCom I though to put it where it is now. Can be discussed of course
  • The state of BaseXCom leaves something to be desired

@potiuk @dstandish @hussein-awala @uranusjr


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

This adds the possibility to store xcoms on a object storage
supported backend.
airflow/io/xcom.py Outdated Show resolved Hide resolved
airflow/io/xcom.py Outdated Show resolved Hide resolved
airflow/io/xcom.py Outdated Show resolved Hide resolved
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a first pass

Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good! I'm just wondering if we should move the introduced XCom backend and its configurations to the common.io provider 🤔

airflow/models/xcom.py Show resolved Hide resolved
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments. Looks good overall. Not tested by running though, was just reading.

docs/apache-airflow/core-concepts/xcoms.rst Outdated Show resolved Hide resolved
airflow/io/xcom.py Outdated Show resolved Hide resolved
airflow/io/xcom.py Outdated Show resolved Hide resolved
@bolkedebruin
Copy link
Contributor Author

bolkedebruin commented Jan 28, 2024

It looks good! I'm just wondering if we should move the introduced XCom backend and its configurations to the common.io provider 🤔

@hussein-awala as mentioned in the topic start we can two multiple ways:

  1. Merge this into BaseXCom (and cleanup BaseXCom)
  2. Keep it separate

If we chose 2 then it makes sense to move it into common.io (with all the extras that it requires cause it is a 'new' capability for a provider).

As it is backwards compatible and by default does the same as BaseXCom (storing data in the db) option 1 might make more sense. Then it should not go in common.io.

For now (and preventing all the stuff that comes with a provider adjustment) I have assumed option 1 down the line.

@potiuk
Copy link
Member

potiuk commented Jan 29, 2024

If we chose 2 then it makes sense to move it into common.io (with all the extras that it requires cause it is a 'new' capability for a provider).

I think I''d support @hussein-awala 's idea here. There is a great feature which is non-obvious when thing are added in provider, namely dependency independence and ability to upgrade/downgrde separately from Airflow. And it's not much different than what we already have with the other secret backends - which are also implemented in providers, so this one would just follow the suite, and turns common.io into actualy pretty useful provider.

Binding such "auxiliary" functionality into Airflow core is also against our "Airflow-as-a-platform" approach. Without my suggestions applied (where I considered adding a DB change to make it clearer what is local and what remote), in my view this one simply builds on top of the Public API that we already have and there is no particular reason it should be in "core" airflow.

Copy link
Contributor

@dirrao dirrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment.

airflow/providers/common/io/xcom/backend.py Show resolved Hide resolved
@bolkedebruin bolkedebruin merged commit 573d650 into apache:main Feb 3, 2024
57 checks passed
@bolkedebruin bolkedebruin deleted the xcom_objectstorage branch February 3, 2024 08:31
@jedcunningham jedcunningham added the type:new-feature Changelog: New Features label Feb 5, 2024
@jedcunningham
Copy link
Member

(@bolkedebruin, moving this back to 2.9 as we won't ship this in a core patch release)

@bolkedebruin
Copy link
Contributor Author

bolkedebruin commented Feb 6, 2024

@jedcunningham it was agreed to have this in 2.8.2 as we consider this a bug fix more than a feature release. What has changed?

Edit: I'm okay with targeting 2.9 it was just in this thread that we decided to use 2.8.2. However checking my code for the provider I'm checking against 2.9 so it's fine to me just unexpected.

Cc @potiuk @ephraimbuddy

@jedcunningham
Copy link
Member

Sorry, that decision was in a collapsed discussion I didn't expand :) It doesn't "feel" like a bugfix to me, but I'm happy to step back on this. @ephraimbuddy is the rm ultimately.

@jedcunningham jedcunningham removed the type:new-feature Changelog: New Features label Feb 6, 2024
@bolkedebruin
Copy link
Contributor Author

O yes the "expansion bug" ;-). I'm fine either way. Targeting 2.8.2 will require a code change.

@cmarteepants
Copy link
Collaborator

cmarteepants commented Feb 8, 2024

@ephraimbuddy @bolkedebruin - I want to promote this PR - just confirming that it's going out in 2.8.2 for real? I agree with @jedcunningham that this feels like a new feature to me, but can never object to people getting something early.

@potiuk
Copy link
Member

potiuk commented Feb 8, 2024

@ephraimbuddy @bolkedebruin - I want to promote this PR - just confirming that it's going out in 2.8.2 for real? I agree with @jedcunningham that this feels like a new feature to me, but can never object to people getting something early.

Side comment: We can sneakily pretend it's a bug-fix (as if - it should have behaved this way already, we just have not realized that.

We had a lot of discussion on semver and classifying things and I am always in a position that SemVer is about the "intent" not technically breaking things, or technically adding new methods and fields and we can make individual decisions on some features or changes to classify them differently (mostly based on our assesment on how likely it is things will get broken or how much this new feature is really "new")

I think people take "compatibility" far too seriously and far too "technically" :).

@potiuk
Copy link
Member

potiuk commented Feb 8, 2024

Targeting 2.8.2 will require a code change.

Just chenging >= 2.9.0 right ? That's the only thing to change as far as I understand @bolkedebruin ?

@bolkedebruin
Copy link
Contributor Author

bolkedebruin commented Feb 12, 2024

@potiuk the code assumes >= 2.9.0 now. So no code change is required for targeting 2.9.0 from my point of view.

Ok missed part of the thread:

If targeting 2.8.2 we need to change the code in the provider to check for 2.8.2+:

https://github.com/apache/airflow/blob/main/airflow/providers/common/io/xcom/__init__.py#L26

If targeting 2.9.0 no change is required.

I'm okay either way. As we have marked Object Storage experimental and adding the API call to XCom can really be considered a bug fix but has no effect of existing installations, I think 2.8.2 is fine

@potiuk potiuk modified the milestones: Airflow 2.8.2, Airflow 2.9.0 Feb 12, 2024
@potiuk
Copy link
Member

potiuk commented Feb 12, 2024

Ok. Thought a bit and I propose, we take it slower (reassigned it to 2.9.0). I think we have some other - more important issues to solvve for 2.8.*, it was a bit "bendig" the rules to put it at 2.8.2 - and also likely some stability issues to be implemented as result of fsspec/universal_pathlib#173 and upcoming 0.2 release of universal pathlib (also see #37311 ) are quite a bit more important to handle

@bolkedebruin
Copy link
Contributor Author

Fine to me!

@ephraimbuddy ephraimbuddy added the type:new-feature Changelog: New Features label Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants