Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-6869] [scala] Specify serialVersionUID for all Scala serializers #4090

Closed
wants to merge 3 commits into from

Conversation

tzulitai
Copy link
Contributor

@tzulitai tzulitai commented Jun 8, 2017

This PR fixes 2 issues:

1. Configuration snapshots of Scala serializers were not readable:
Prior to this PR, the configuration snapshot classes of Scala serializers did not have the proper default empty constructor that is used for deserializing the configuration snapshot. Since some Scala serializers' config snapshots extend the Java CompositeTypeSerializerConfigSnapshot, their config snapshot classes are also changed to be implemented in Java since in Scala we can only call a single base class constructor from subclasses.

2. Scala serializers did not specify the serialVersionUID:
Previously, Scala serializers did not specify the serialVersionUID, and therefore prohibited restore from previous Flink version snapshots because the serializers' implementations changed in 1.3. The serialVersionUIDs added in this PR are identical to what they were (as generated by Java) in Flink 1.2, so that we can at least restore state that were written with the Scala serializers as of 1.2.

@tzulitai
Copy link
Contributor Author

tzulitai commented Jun 8, 2017

R: @StefanRRichter @aljoscha tagging you because I talked to you about the issue offline :) Could you have a quick look?

@tzulitai
Copy link
Contributor Author

tzulitai commented Jun 8, 2017

One caveat that this PR does not yet fully fix:
the deserialization of the anonymous class serializers (CaseClassSerializer and TraversableSerializer), even with the serialVersionUID specified, can still fail because there is no guarantee of the generated classname of anonymous classes (it depends on the order of when the anonymous classes were instantiated, and format seems to also change across compilers).

At this moment, I've hit a bit of a wall trying to resolve this. The problem was always there pre-1.3, as if users simply change the order of their Scala type serializer generation (simply changing call order of createTypeInformation for their Scala types), the classnames would change and they wouldn't be able to restore state.

Note that this problem only exists for the heap backends, because there we deserialize eagerly on restore.

@StefanRRichter
Copy link
Contributor

Overall, I think this is ok as a best effort until we have some eager registration that helps with the remaining problems in the heap backend.

…hot classes

Prior to this commit, the configuration snapshot classes of Scala
serializers did not have the proper default empty constructor that is
used for deserializing the configuration snapshot.

Since some Scala serializers' config snapshots extend the Java
CompositeTypeSerializerConfigSnapshot, their config snapshot classes are
also changed to be implemented in Java since in Scala we can only call a
single base class constructor from subclasses.
Previously, Scala serializers did not specify the serialVersionUID, and
therefore prohibited restore from previous Flink version snapshots
because the serializers' implementations changed.

The serialVersionUIDs added in this commit are identical to what they
were (as generated by Java) in Flink 1.2, so that we can at least
restore state that were written with the Scala serializers as of 1.2.
…s classed serializers

This commit lets the TypeSerializerSerializationProxy be tolerable for
serialVersionUID mismatches when reading anonymous classed serializers.

Our Scala case class serializers require this since they use Scala
macros to be generated at compile time, and therefore is not possible to
fix a certain serialVersionUID for them.

This commit also updates the streaming state docs to educate the user to
avoid using anonymous classes for their state serializers.
try {
typeSerializer = InstantiationUtil.deserializeObject(buffer, userClassLoader);

ClassLoader old = Thread.currentThread().getContextClassLoader();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to previousClassLoader

@asfgit asfgit closed this in b216a4a Jun 13, 2017
asfgit pushed a commit that referenced this pull request Jun 13, 2017
…d anonymous serializers

This commit lets the TypeSerializerSerializationProxy be tolerable for
serialVersionUID mismatches when reading anonymous classed serializers
or our Scala serializers.

Our Scala serializers require this since they use Scala macros to be
generated at compile time, and therefore is not possible to fix a
certain serialVersionUID for them. For non-generated Scala serializers,
we still also need this because their serialVersionUIDs pre-1.3 may
vary depending on the Scala version used.

This can be seen as a workaround, and should be reverted once 1.2
savepoint compatibility is no longer maintained.

This commit also updates the streaming state docs to educate the user to
avoid using anonymous classes for their state serializers.

This closes #4090.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants