Skip to content
This repository has been archived by the owner. It is now read-only.

Improve serialization performance #20

Merged
merged 1 commit into from Jun 21, 2017
Merged

Improve serialization performance #20

merged 1 commit into from Jun 21, 2017

Conversation

@c-w
Copy link
Contributor

@c-w c-w commented Jun 20, 2017

This PR switches our Spark setup to using the Kryo serializer for common DTO classes.

Unfortunately we can't enforce the use of the Kryo serializer for all classes (via the setting spark.kryo.registrationRequired) because some classes that we depend upon are non-public so we can't easily get a handle to them to register them in Kryo (e.g. TwitterReceiver).

Resolves #15

@c-w c-w requested review from erikschlegel and kevinhartman Jun 20, 2017
Unfortunately we can't use Kryo serializer for all classes (via setting
"spark.kryo.registrationRequired") because some classes that we depend
upon are non-public (e.g. TwitterReceiver) so we can't easily get a
handle to them to register them in Kryo.
@c-w c-w force-pushed the improve-serialization branch from 43391b2 to a3ed4fc Jun 21, 2017
Copy link
Contributor

@kevinhartman kevinhartman left a comment

LGTM. Perhaps we can register some internal classes using reflection in a later PR?

Loading

@c-w
Copy link
Contributor Author

@c-w c-w commented Jun 21, 2017

I did some research and didn't find anyone doing this. From my understanding, it's usually good enough to use Kryo for the commonly serialized data types (i.e., the DTOs) and keep using the Java serializer for less common types (like Receivers).

Loading

@c-w c-w merged commit 4bb54b2 into master Jun 21, 2017
2 checks passed
Loading
@c-w c-w deleted the improve-serialization branch Jun 21, 2017
@c-w c-w removed the in progress label Jun 21, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants