Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Scala] support scala collection jit serialization #1077

Merged
merged 2 commits into from
Nov 4, 2023

Conversation

chaokunyang
Copy link
Collaborator

@chaokunyang chaokunyang commented Nov 4, 2023

What do these changes do?

  • Support scala collection jit serialization
  • Relax ScalaDispatcher to intercept all scala map/seq/set types.

Related issue number

Closes #1076

#765

Check code requirements

  • tests added / passed (if needed)
  • Ensure all linting tests pass, see here for how to run them

@chaokunyang chaokunyang merged commit f125517 into apache:main Nov 4, 2023
15 checks passed
@chaokunyang
Copy link
Collaborator Author

With following naive tests, fury is faster and smaller than twill and JDK:

Fury is 3X faster than twill/kryo and 6x faster than JDK.

case class Foo(list: List[String])

object Test {
  def main(args: Array[String]): Unit = {
    val fury = Fury.builder().requireClassRegistration(false)
      .withScalaOptimizationEnabled(true).build()
    fury.getClassResolver.setSerializerFactory(new ScalaDispatcher)

    val c = Foo(Range(1, 1000).map(x => String.valueOf(x)).toList)
    val instantiator = new ScalaKryoInstantiator
    instantiator.setRegistrationRequired(false)
    val kryo = instantiator.newKryo()

    var bytes: Array[Byte] = new Array[Byte](0)
    val output = new Output(40000)
    var start = System.currentTimeMillis()
    for (_ <- Range(0, 200000)) {
      output.clear()
      kryo.writeClassAndObject(output, c)
      bytes = output.toBytes
    }
    printf("twill time: %s, size %s\n", System.currentTimeMillis() - start, bytes.length)
    val furySize = fury.serializeJavaObject(c).length
    start = System.currentTimeMillis()
    for (_ <- Range(0, 200000)) {
      fury.serializeJavaObject(c)
    }
    printf("fury time: %s, size %s\n", System.currentTimeMillis() - start, furySize)
    val bas = new ByteArrayOutputStream(200)
    val objectOutputStream = new ObjectOutputStream(bas)
    objectOutputStream.writeObject(c)
    objectOutputStream.flush()
    bytes = bas.toByteArray
    start = System.currentTimeMillis()
    for (_ <- Range(0, 200000)) {
      bas.reset()
      val objectOutputStream = new ObjectOutputStream(bas)
      objectOutputStream.writeObject(c)
      objectOutputStream.flush()
    }
    printf("jdk time: %s, size %s\n", System.currentTimeMillis() - start, bytes.length)
  }
}

Benchmark result:

twill time: 16855, size 4925
fury time: 6356, size 5039
jdk time: 34283, size 6390

For following collection, Fury is 1.5X faster than twill/kryo and 9x faster than JDK:

object Test {
  def main(args: Array[String]): Unit = {
    val fury = Fury.builder().requireClassRegistration(false)
      .withScalaOptimizationEnabled(true).build()
    fury.getClassResolver.setSerializerFactory(new ScalaDispatcher)

    val c = Range(1, 1000).toList
    val instantiator = new ScalaKryoInstantiator
    instantiator.setRegistrationRequired(false)
    val kryo = instantiator.newKryo()

    var bytes: Array[Byte] = new Array[Byte](0)
    var start = System.currentTimeMillis()
    val output = new Output(40000)
    for (_ <- Range(0, 200000)) {
      output.clear()
      kryo.writeClassAndObject(output, c)
      bytes = output.toBytes
      //      println(bytes.length)
    }
    println(System.currentTimeMillis() - start + " " + bytes.length)
    start = System.currentTimeMillis()
    for (_ <- Range(0, 200000)) {
      fury.serialize(c)
    }
    println(System.currentTimeMillis() - start + " " + fury.serialize(c).length)
    start = System.currentTimeMillis()
    for (_ <- Range(0, 200000)) {
      val bas = new ByteArrayOutputStream(200)
      val objectOutputStream = new ObjectOutputStream(bas)
      objectOutputStream.writeObject(c)
      objectOutputStream.flush()
    }
    val x = new ByteArrayOutputStream(200)
    val objectOutputStream = new ObjectOutputStream(x)
    objectOutputStream.writeObject(c)
    objectOutputStream.flush()
    bytes = x.toByteArray
    println(System.currentTimeMillis() - start + " " + bytes.length)
  }
}

Benchmark result:

twill time: 5935, size 2938
fury time: 4076, size 2060
jdk time: 37672, size 10479

@pjfanning The benchmark results seems pretty promising.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Scala] support scala collection jit serialization
1 participant