Skip to content

[Scala] support scala collection jit serialization#1077

Merged
chaokunyang merged 2 commits intoapache:mainfrom
chaokunyang:support_scala_collection_jit
Nov 4, 2023
Merged

[Scala] support scala collection jit serialization#1077
chaokunyang merged 2 commits intoapache:mainfrom
chaokunyang:support_scala_collection_jit

Conversation

@chaokunyang
Copy link
Collaborator

@chaokunyang chaokunyang commented Nov 4, 2023

What do these changes do?

  • Support scala collection jit serialization
  • Relax ScalaDispatcher to intercept all scala map/seq/set types.

Related issue number

Closes #1076

#765

Check code requirements

  • tests added / passed (if needed)
  • Ensure all linting tests pass, see here for how to run them

@chaokunyang chaokunyang merged commit f125517 into apache:main Nov 4, 2023
@chaokunyang
Copy link
Collaborator Author

With following naive tests, fury is faster and smaller than twill and JDK:

Fury is 3X faster than twill/kryo and 6x faster than JDK.

case class Foo(list: List[String])

object Test {
  def main(args: Array[String]): Unit = {
    val fury = Fury.builder().requireClassRegistration(false)
      .withScalaOptimizationEnabled(true).build()
    fury.getClassResolver.setSerializerFactory(new ScalaDispatcher)

    val c = Foo(Range(1, 1000).map(x => String.valueOf(x)).toList)
    val instantiator = new ScalaKryoInstantiator
    instantiator.setRegistrationRequired(false)
    val kryo = instantiator.newKryo()

    var bytes: Array[Byte] = new Array[Byte](0)
    val output = new Output(40000)
    var start = System.currentTimeMillis()
    for (_ <- Range(0, 200000)) {
      output.clear()
      kryo.writeClassAndObject(output, c)
      bytes = output.toBytes
    }
    printf("twill time: %s, size %s\n", System.currentTimeMillis() - start, bytes.length)
    val furySize = fury.serializeJavaObject(c).length
    start = System.currentTimeMillis()
    for (_ <- Range(0, 200000)) {
      fury.serializeJavaObject(c)
    }
    printf("fury time: %s, size %s\n", System.currentTimeMillis() - start, furySize)
    val bas = new ByteArrayOutputStream(200)
    val objectOutputStream = new ObjectOutputStream(bas)
    objectOutputStream.writeObject(c)
    objectOutputStream.flush()
    bytes = bas.toByteArray
    start = System.currentTimeMillis()
    for (_ <- Range(0, 200000)) {
      bas.reset()
      val objectOutputStream = new ObjectOutputStream(bas)
      objectOutputStream.writeObject(c)
      objectOutputStream.flush()
    }
    printf("jdk time: %s, size %s\n", System.currentTimeMillis() - start, bytes.length)
  }
}

Benchmark result:

twill time: 16855, size 4925
fury time: 6356, size 5039
jdk time: 34283, size 6390

For following collection, Fury is 1.5X faster than twill/kryo and 9x faster than JDK:

object Test {
  def main(args: Array[String]): Unit = {
    val fury = Fury.builder().requireClassRegistration(false)
      .withScalaOptimizationEnabled(true).build()
    fury.getClassResolver.setSerializerFactory(new ScalaDispatcher)

    val c = Range(1, 1000).toList
    val instantiator = new ScalaKryoInstantiator
    instantiator.setRegistrationRequired(false)
    val kryo = instantiator.newKryo()

    var bytes: Array[Byte] = new Array[Byte](0)
    var start = System.currentTimeMillis()
    val output = new Output(40000)
    for (_ <- Range(0, 200000)) {
      output.clear()
      kryo.writeClassAndObject(output, c)
      bytes = output.toBytes
      //      println(bytes.length)
    }
    println(System.currentTimeMillis() - start + " " + bytes.length)
    start = System.currentTimeMillis()
    for (_ <- Range(0, 200000)) {
      fury.serialize(c)
    }
    println(System.currentTimeMillis() - start + " " + fury.serialize(c).length)
    start = System.currentTimeMillis()
    for (_ <- Range(0, 200000)) {
      val bas = new ByteArrayOutputStream(200)
      val objectOutputStream = new ObjectOutputStream(bas)
      objectOutputStream.writeObject(c)
      objectOutputStream.flush()
    }
    val x = new ByteArrayOutputStream(200)
    val objectOutputStream = new ObjectOutputStream(x)
    objectOutputStream.writeObject(c)
    objectOutputStream.flush()
    bytes = x.toByteArray
    println(System.currentTimeMillis() - start + " " + bytes.length)
  }
}

Benchmark result:

twill time: 5935, size 2938
fury time: 4076, size 2060
jdk time: 37672, size 10479

@pjfanning The benchmark results seems pretty promising.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Scala] support scala collection jit serialization

1 participant