Skip to content

Conversation

chermenin
Copy link

Fixed JIRA issue [FLINK-2608] Arrays.asList(..) does not work with CollectionInputFormat as issue in Twitter Chill project (twitter/chill#255). Updated version of Twitter Chill in pom.xml.

@zentol
Copy link
Contributor

zentol commented Oct 11, 2016

Could you include a simple test to verify that this does in fact fix the issue?

@rmetzger
Copy link
Contributor

Does this change have any implications on our Kryo version, since Chill since 0.8.0 uses Kryo3 ? https://github.com/twitter/chill/releases/tag/0.8.0

@chermenin
Copy link
Author

@rmetzger If I'm not mistaken Kryo 2.24 and 3.0 versions is compatible each other at the level of standard IO serialization. And if it possible we will be able to a little wait for Chill 0.7.5 version (with needed fixes and Kryo 2.x) or migrate Flink to Kryo 3.x version.

@fhueske
Copy link
Contributor

fhueske commented Oct 12, 2016

Bumping the version of serializers is a sensitive issue in Flink.
If the serialization format changes, savepoints of previous versions can no longer be used.

@rmetzger
Copy link
Contributor

This change adds Kryo-shaded to our dependency tree:

[INFO] |  +- com.twitter:chill_2.10:jar:0.8.1:compile
[INFO] |  |  +- com.twitter:chill-java:jar:0.8.1:compile
[INFO] |  |  \- com.esotericsoftware:kryo-shaded:jar:3.0.3:compile
[INFO] |  |     \- com.esotericsoftware:minlog:jar:1.3.0:compile

I suspect that maven is not recognizing this because chill seems to depend on kryo-shaded.
Apparently, kryo-shaded has a shaded ASM version included, but it is not relocating the regular Kryo classes. So we'll end up having two Kryo versions in our classpath.
So if we want to upgrade Kryo, we need to do it explicitly, to avoid having two Kryo versions in our classpath.

Another issue we need to consider is the serialization compatibility. Savepoints in Flink could potentially contain data serialized with Kryo 2.24. If we want to provide savepoint compatibility between Flink 1.1 and 1.2, we need to consider that.
According to the Kryo documentation, 2.24 to 3.0.0 is serialization compatible (I hope the same holds true for chill): https://github.com/EsotericSoftware/kryo/blob/master/CHANGES.md#2240---300-2014-10-04
I would like to hear @StephanEwen and @uce's opinion on this.

@StephanEwen
Copy link
Contributor

As far as I read it, Kryo 3.x is not strictly serialization compatible with 2.x, hence the major version number bump.

If the interfaces are still stable, then it should be fine to bump the chill dependency version, exclude any kryo dependency, and add our own 2.x kryo dependency. I would prefer that approach.

@StephanEwen
Copy link
Contributor

Okay, in detail it seems that compatibility is not maintained for "Unsafe based I/O", but maintained for standard I/O.

@StephanEwen
Copy link
Contributor

@chermenin Do you plan to continue with this pull request? Would be great if you

I would suggest to adjust it in the following way:

  • Update the Chill dependency (validate if they change any existing serializer)
  • Exclude Kryo version 2 and version 3 from the Chill dependency. That way, Flink's Kryo version should be used by Chill as well.

Also, testing the serializers via a "GroupReduce" integration test seems heavy and unspecific. Can you change this to a serializer Unit test? Have a look at the Kryo Serializer unit tests for an example.

Do you think that would work?

@chermenin
Copy link
Author

I exclude Kryo as a dependency from Chill and add simple test to check Java collections. Tests added to flink-tests because it is depends on flink-runtime (where used Chill).

return env.fromCollection(data);
}

public static DataSet<PojoWithCollection> getPojoWithArraysAsListCollection(ExecutionEnvironment env) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these still needed any more by the tests?

Copy link
Author

@chermenin chermenin Oct 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be saved. Just on the safe side.

@StephanEwen
Copy link
Contributor

I like this approach. One minor question, otherwise +1 to go here

@chermenin
Copy link
Author

@StephanEwen What about merging this PR?

@mxm
Copy link
Contributor

mxm commented Nov 22, 2016

The build fails during the Maven Rat plugin license check. Could you fix the build and rebase to the latest master? I think we can merge then.

@chermenin
Copy link
Author

Yep, fixed.

@mxm
Copy link
Contributor

mxm commented Nov 23, 2016

Thank you!

@asfgit asfgit closed this in 0d3ff88 Nov 23, 2016
@StephanEwen
Copy link
Contributor

@mxm @chermenin Can we please do a followup to this, given that it is already merged?

The tests in GroupReduceITCase need to be replaced by proper targeted unit tests.

Sorry for being super strict, but our build times are already exploding, and it is exactly cases like this one (adding distributed runtime Integration Tests to test a serializer) that contribute a lot to that.

@mxm
Copy link
Contributor

mxm commented Nov 24, 2016

Sure, I'll take care of it. I saw you giving a +1 and didn't see an issue myself.

alpinegizmo pushed a commit to alpinegizmo/flink that referenced this pull request Nov 28, 2016
[FLINK-2608] Updated test with Java collections.

[FLINK-2608] Updated Chill and Kryo dependencies.

[FLINK-2608] Added collections serialization test.

This closes apache#2623.
liuyuzhong pushed a commit to liuyuzhong/flink that referenced this pull request Dec 5, 2016
[FLINK-2608] Updated test with Java collections.

[FLINK-2608] Updated Chill and Kryo dependencies.

[FLINK-2608] Added collections serialization test.

This closes apache#2623.
static-max pushed a commit to static-max/flink that referenced this pull request Dec 13, 2016
[FLINK-2608] Updated test with Java collections.

[FLINK-2608] Updated Chill and Kryo dependencies.

[FLINK-2608] Added collections serialization test.

This closes apache#2623.
joseprupi pushed a commit to joseprupi/flink that referenced this pull request Feb 12, 2017
[FLINK-2608] Updated test with Java collections.

[FLINK-2608] Updated Chill and Kryo dependencies.

[FLINK-2608] Added collections serialization test.

This closes apache#2623.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants