Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Upgrade Kryo version of Kryo Java extension #27635

Closed
1 of 15 tasks
Gadiguibou opened this issue Jul 24, 2023 · 11 comments
Closed
1 of 15 tasks

[Feature Request]: Upgrade Kryo version of Kryo Java extension #27635

Gadiguibou opened this issue Jul 24, 2023 · 11 comments

Comments

@Gadiguibou
Copy link
Contributor

Gadiguibou commented Jul 24, 2023

What would you like to happen?

New Kryo versions support useful features like serialization of Java records. These features are currently unavailable to users of the Kryo Beam extension. Without this, users must fall back to other serialization methods or maintaining custom implementations for equals() hashCode() and friends.

Issue Priority

Priority: 2 (default / most feature requests should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@aromanenko-dev
Copy link
Contributor

Thanks @Gadiguibou !
Are you going to work on this?
And what's about incompatibility between versions 4 and 5?

@Gadiguibou
Copy link
Contributor Author

Gadiguibou commented Aug 1, 2023

Hi @aromanenko-dev!
I think I could. Unfortunately Kryo v4 and v5 are not binary compatible. How is this usually handled in Beam? Is this even a concern?

The migration itself seems relatively straightforward.

@Gadiguibou
Copy link
Contributor Author

While I'm here I could also add support for Java records to SchemaCoder. This could be as simple as creating a new JavaRecordSchema class like JavaBeanSchema and JavaFieldSchema and reading from the record components instead of the fields or getters. This would probably warrant its own issue/PR for better tracking, but I'd like to have the opinion of a maintainer before opening pull requests.

@aromanenko-dev
Copy link
Contributor

aromanenko-dev commented Aug 1, 2023

@Gadiguibou

I think I could. Unfortunately Kryo v4 and v5 are not binary compatible. How is this usually handled in Beam? Is this even a concern?

In Beam, we tend to avoid the breaking changes that may affect users after Beam upgrade, especially, if it may cause potentially data loss or corruption, as much as we can.

Since Beam doesn't support "hot" upgrade on the fly while running a pipeline, and Kryo extension is used only as a KryoCoder (correct me if I'm wrong) then serialised data should not be stored somewhere after pipeline is stopped. Also, I hope we don't expose any Kryo API to users as well.

So, this means that it should be safe.

Regarding the adding support for Java records to SchemaCoder - I believe it deserves a dedicated issue/PR.

Btw, I'm not sure that we have an active maintainer for this extension but I'll ping @dmvk, who initially worked on this and Euphoria extensions (that heavily uses KryoCoder), for additional comments.

@Gadiguibou
Copy link
Contributor Author

Moved the issue about SchemaCoder to #27802.

@je-ik
Copy link
Contributor

je-ik commented Aug 22, 2023

Since Beam doesn't support "hot" upgrade on the fly while running a pipeline, and Kryo extension is used only as a KryoCoder (correct me if I'm wrong) then serialised data should not be stored somewhere after pipeline is stopped. Also, I hope we don't expose any Kryo API to users as well.

@aromanenko-dev Is this an official policy? Upgrading pipeline in general can be tricky, but I believe we should do our best to enable users to upgrade their pipelines, if possible. This change forces users to do a bootstrap of streaming pipelines which might be expensive and might discourage users from upgrading.

I think we could have tried introducing KryoV5Coder and deprecate original KryoCoder for at least several releases, we might have even kept old coder until we drop support for JDK 11 (Kryo 4 does not support JDK 17). Also some more context on this: https://lists.apache.org/thread/5glkdp68b024t8pnwq7ncz5mt9gcr27r

I came across this when validating 2.50.0 RC (this is even a code breaking change, beacuse the KryoCoder 'leaks' kryo API). Wondering if we should try to fix it before the release.

@je-ik
Copy link
Contributor

je-ik commented Aug 22, 2023

I'd personally prefer to rollback this change for 2.50.0 and introduce KryoV5Coder in 2.51.0, but I don't think this needs to be a blocker for release. Could we do that in case we need to build another RC?

@aromanenko-dev
Copy link
Contributor

aromanenko-dev commented Sep 5, 2023

@je-ik I'm sorry for delay with an answer, I was on vacation.

I agree that we have to do our best to avoid breaking changes (I'm one of the person who always propose to deprecate things before) and I mentioned this before, but I don't remember any "official" (or previously agreed) policy about such a support of Beam upgrade on the fly, and particularly, for streaming pipelines.

So, do we need to guarantee a Coder compatibility between Beam releases since Beam coders are supposed to be used only "inside" a pipeline?

I came across this when validating 2.50.0 RC (this is even a code breaking change, beacuse the KryoCoder 'leaks' kryo API)

Did you have a compilation time or runtime issue? Could you provide more details on this?

Also, as a side note:
It leads us to the question - should we even make Kryo provided and let users to choose which versions to use?

@je-ik
Copy link
Contributor

je-ik commented Sep 6, 2023

I think this is part of a broader discussion. IMPOV, speaking about an 'ideal' solution, we should be able to provide way to upgrade any streaming pipeline, at least for the case when the application logic does not change. After all, the streaming pipeline is a long-running software component and thus upgrades are expected to happen. The truth is that Beam is not quite there, yet. And there are fundamental reasons why upgrading a pipeline after changing logic might be hard (if even possible), which means users must be able to restart their pipeline from scratch.

In the case of this issue, we probably could have taken path that would not break existing pipelines (introducing different Coder, shading Kryo v5 into different package), so that we would minimize impact of the upgrade. But because the new coder has already been released I think there are no more actions needed to be taken here. I did not see this as enough important to block the release on this.

Regarding the 'leakage' of the Kryo API, it is generates compile-time error, because there are some (minor) changes in the Kryo v5 API (e.g. Serializers). I'm not sure we can meaningfully make Kryo provided, because the coder is always bound to a specific (major) version of Kryo.

@aromanenko-dev
Copy link
Contributor

@je-ik Thanks for details.

I fully agree with your statement about streaming pipelines upgrade that in real world is a very complicated thing to do and, especially, to maintain in the future for all parts of Beam.

I also agree that for this current issue we could introduce a new coder and deprecated an old one but, as I explained before, at that time I thought it was not so necessary to do.

Regarding a custom user Kryp version - we had the similar issue with Avro (it was even worse since Avro was a citizen of "core") and we finally made it possible to use any supported Avro version for users.

@je-ik
Copy link
Contributor

je-ik commented Sep 6, 2023

I meant, yes, we could make Kryo pluggable (I can imagine many ways of doing this), but the biggest pain always comes from incompatibility of the serialized data. The best option would be to include a version marker in the data and give KryoCoder the ability to work with multiple versions of Kryo at the same time. But that would require major reimplementation of the coder. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants