Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-9377] [core] Implement restore serializer factory method for simple composite serializers #6273

Closed
wants to merge 6 commits into from

Conversation

tzulitai
Copy link
Contributor

@tzulitai tzulitai commented Jul 6, 2018

What is the purpose of the change

This PR is built on top of #6235. It is a WIP PR.

This PR implements the restore serializer factory method for all simple composite serializers (i.e., Flink serializers with nested serializers). More complex serializers such as the Scala serializers, POJO serializers, KryoSerializer, AvroSerializer, etc. will come as a follow-up PR.

Brief change log

  • Introduce the CompositeTypeSerializer base class, which wraps the configuration snapshotting logic and compatibility checks.
  • Let all simple composite type serializers extend the CompositeTypeSerializer.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

… factory for restoring serializers

This commit is the first step towards removing serializers from
checkpointed state meta info and making Flink checkpoints Java
serialization free.

Instead of writing serializers in checkpoints, and trying to read that
to obtain a restore serializer at restore time, we aim to only write the
config snapshot as the single source of truth and use it as a factory to
create a restore serializer.

This commit adds the method and signatures to the
TypeSerializerConfigSnapshot interface. Use of the method, as well as
properly implementing the method for all serializers, will be
implemented in follow-up commits.
… CompatibilityResult

Now that the config snapshot is used as a factory for the restore
serializer, it should be guaranteed that a restore serializer is always
available. This removes the need for the user to provide a "fallback"
convert serializer in the case where a migration is required.
This commit deprecates all utility methods and classes related to
serializing serializers. All methods that will still be in use, i.e.
writing config snapshots, are now moved to a separate new
TypeSerializerConfigSnapshotSerializationUtil class.
The BackwardsCompatibleConfigSnapshot is a wrapper, dummy config
snapshot which wraps an actual config snapshot, as well as a
pre-existing serializer instance.

In previous versions, since the config snapshot wasn't a serializer
factory but simply a container for serializer parameters, previous
serializers didn't necessarily have config snapshots that are capable of
correctly creating a correct corresponding restore serializer.

In this case, since previous serializers still have serializers written
in the checkpoint, the backwards compatible solution would be to wrap
the written serializer and the config snapshot within the
BackwardsCompatibleConfigSnapshot dummy. When attempting to restore the
serializer, the wrapped serializer instance is returned instead of
actually calling the restoreSerializer method of the wrapped config
snapshot.
… meta infos

This commit officially removes the behaviour of writing serializers in
the state meta info of keyed state, operator state, and timers state.
This affects the serialization formats of the
KeyedBackendSerializationProxy, OperatorBackendSerializationProxy, and
InternalTimerServiceSerializationProxy, and therefore their versions are
all upticked.
… simple composite serializer config snapshots
@yanghua
Copy link
Contributor

yanghua commented Jul 8, 2018

conflicts, please update the PR~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants