Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design doc on static evidence for grouping/serialization #1886

Open
johnynek opened this issue Nov 27, 2018 · 1 comment
Open

Design doc on static evidence for grouping/serialization #1886

johnynek opened this issue Nov 27, 2018 · 1 comment

Comments

@johnynek
Copy link
Collaborator

@ttim has a design and even PR: #1857 to improve performance in scalding. The idea is to move towards requiring evidence that we can do binary sorting without deserializing needed for sort-partitioning data. It would also be interesting to optionally take static evidence we can serialize the values as well.

We do have a PR, but this is probably the most major change to scalding since we introduced the typed API. I think we should make a few page google doc and iterate on that to minimize the pain of adoption.

For instance, I think we could possibly introduce a SerializationProducer type, which is something like:

trait SerializationProducer[A] {
  def build(conf: Config, mode: Mode): Serialization[A] = ...
}

so we can defer building the actual serializers until just before job submission. In this way, we can get the config of the job to set serialization options. Something like this would be needed to support the current Kryo stuff, which has Config-based options.

@ttim
Copy link
Collaborator

ttim commented Nov 27, 2018

I did a bit more of experimentation around the subject during last week: https://github.com/ttim/scalding/tree/tabishev/data_tags/scalding-core/src/main/scala/com/twitter/scalding/datatag.

The idea was to introduce serialization-independent compile time generated data class descriptors. Using this descriptors it's possible to derive serialization for example, or to derive TypedTsvs TypeDescriptors.

I'll try to write it down this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants