Skip to content

Conversation

rmetzger
Copy link
Contributor

This pull request is basically a collection of the work of others ;)
@tillrohrmann contributed some of the tests (in particular for the serializer itself). He did the integration with Chill.
@twalthr contributed one commit for fixing a TypeExtractor bug.

I added two IT cases, checking it to work with the following types:

java.util.List<String>
scala.math.BigInt
java.io.File
java.math.BigDecimal;
java.math.BigInteger;
java.util.HashMap;
java.io.Serializable;

I also tested it on a cluster with a Collection of Objects (containing Strings and Integers). It worked. I didn't test it for performance yet. The purpose of the pull request is to extend the type support for cases we don't cover by our own serialization framework.

I'll now test the change with the code that brought the issue on top of my TODO list: https://issues.apache.org/jira/browse/FLINK-629?focusedCommentId=14241856&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14241856

@rmetzger
Copy link
Contributor Author

I rebased the pull request to the 0.8 branch and added a fix for https://issues.apache.org/jira/browse/FLINK-1333.

@StephanEwen
Copy link
Contributor

Very good idea. I think we could fix this still a bit and make it better:

  • Avro created a schema from the objects to serialize (somewhat a simple thing as our POJO analysis). That way, it does not need to write all classnames. Kryo writes all classnames by default, which is very painful, unless they are registered. I think that was one of the reasons why Avro was actually faster for us out of the box.
    • We should register the types at Kryo that we see as being (recursively contained in the type information)
    • We should also allow at the execution environment to manually register types or serializers.
  • Why is the copy() method not using Kryo's copy functionality? The current implementation seems painfully inefficient, and will still be used until @aljoscha 's change about reuse/non-reuse mode is in.
  • The tests that have a hardwired check for the KryoSerializer break design...

It seems that the Java/Scala code is moving closer together still, with the common serializer being configured in the Java API for Scala classes. That means we might really collapse the projects at some point.

@rmetzger
Copy link
Contributor Author

Thank you for the feedback.

  • We are using the Twitter Chill library. It registers many commonly used classes with Kryo (Collections, Date, BigInt, ...). So the issue with class names being written does only apply to userdefined types.
    You are indeed right that we should provide facilities to register custom classes and that we should register all classes we see during type analysis.
    I didn't implement these features because Timo is actually working on this and I wanted to have at least basic Kryo support in the release. Timo will integrate Kryo more tightly with the type analysis.
  • The copy() method of Kryo is not implemented for many types (in particular those by Chill). For example the java.sql.Date type (https://github.com/twitter/chill/blob/develop/chill-java/src/main/java/com/twitter/chill/java/SqlDateSerializer.java) doesn't have a copy() method.
    Kryo provides a default copy() method which fails on mutable types. (https://github.com/EsotericSoftware/kryo/blob/master/src/com/esotericsoftware/kryo/Serializer.java#L102). I thought about contributing the missing copy() methods for chill.
    We could also whitelist some classes for kryo's copy and fall back to the slow variant.
  • The hardwired check for the KryoSerializer is very ugly. Maybe I can come up with a different solution. Till suggested to add a method (canCreateInstance) to all serializers .. but thats a lot of code for one test case.

@StephanEwen
Copy link
Contributor

How about we try to use Kryo's copy method and fall back to serialization
copies upon failure?
Am 17.12.2014 11:59 schrieb "Robert Metzger" notifications@github.com:

Thank you for the feedback.

We are using the Twitter Chill library. It registers many commonly
used classes with Kryo (Collections, Date, BigInt, ...). So the issue with
class names being written does only apply to userdefined types.
You are indeed right that we should provide facilities to register
custom classes and that we should register all classes we see during type
analysis.
I didn't implement these features because Timo is actually working on
this and I wanted to have at least basic Kryo support in the release. Timo
will integrate Kryo more tightly with the type analysis.

The copy() method of Kryo is not implemented for many types (in
particular those by Chill). For example the java.sql.Date type (
https://github.com/twitter/chill/blob/develop/chill-java/src/main/java/com/twitter/chill/java/SqlDateSerializer.java)
doesn't have a copy() method.
Kryo provides a default copy() method which fails on mutable types. (
https://github.com/EsotericSoftware/kryo/blob/master/src/com/esotericsoftware/kryo/Serializer.java#L102).
I thought about contributing the missing copy() methods for chill.
We could also whitelist some classes for kryo's copy and fall back to
the slow variant.

The hardwired check for the KryoSerializer is very ugly. Maybe I can
come up with a different solution. Till suggested to add a method (
canCreateInstance) to all serializers .. but thats a lot of code for
one test case.


Reply to this email directly or view it on GitHub
#271 (comment)
.

@rmetzger
Copy link
Contributor Author

Okay, I'll add that

@StephanEwen
Copy link
Contributor

That makes sense.

+1 to merge

mbalassi pushed a commit to mbalassi/flink that referenced this pull request Dec 17, 2014
asfgit pushed a commit that referenced this pull request Dec 17, 2014
@rmetzger
Copy link
Contributor Author

Manually closing it. Has been merged

@rmetzger rmetzger closed this Dec 18, 2014
mbalassi pushed a commit to mbalassi/flink that referenced this pull request Dec 18, 2014
zhijiangW pushed a commit to zhijiangW/flink that referenced this pull request Apr 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants