Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework of the concept of collection in the Schema abstraction #290

Merged
merged 14 commits into from
Jul 7, 2022

Conversation

daddykotex
Copy link
Contributor

@daddykotex daddykotex commented Jul 4, 2022

[Oli's comment]

This replaces list/set in Schema by a generic collection construct, that takes a CollectionTag as a parameter. The CollectionTag carries methods to (de)construct the collections, and can be used to generically implement SchemaVisitors without worrying about what collection we're dealing with. In addition, the CollectionTag is sealed, and can be inspected to provide specialised version of the encoding/decoding logic when relevant.

@daddykotex daddykotex marked this pull request as draft July 4, 2022 18:08
@Baccata
Copy link
Contributor

Baccata commented Jul 5, 2022

@daddykotex please run the jsoniter benchmarks to see between the current version and this branch to whether this impacts the performance when decoding/encoding lists.

@plokhotnyuk
Copy link
Contributor

@daddykotex please run the jsoniter benchmarks to see between the current version and this branch to whether this impacts the performance when decoding/encoding lists.

It will impact performance - please, see ListOfBooleanReading and ListOfBooleanWriting benchmarks.

Do we have in this new design an option to have specialized codecs for some collection tags?

@Baccata
Copy link
Contributor

Baccata commented Jul 5, 2022

@plokhotnyuk if we keep the abstraction sealed, and expose the sealed members to recover the abstracted type, it'll allow for specialisation. I'd be fine with this.

@plokhotnyuk
Copy link
Contributor

plokhotnyuk commented Jul 5, 2022

Bellow are results of benchmarks for different sizes.

Before:

[info] Benchmark                           (size)   Mode  Cnt         Score        Error  Units
[info] ListOfBooleansReading.smithy4sJson       1  thrpt    5  25712962.540 ± 739164.263  ops/s
[info] ListOfBooleansReading.smithy4sJson      10  thrpt    5   4185337.378 ± 231537.022  ops/s
[info] ListOfBooleansReading.smithy4sJson     100  thrpt    5    502574.157 ±   8798.920  ops/s
[info] ListOfBooleansReading.smithy4sJson    1000  thrpt    5     50891.385 ±    235.812  ops/s
[info] ListOfBooleansWriting.smithy4sJson       1  thrpt    5  83653388.040 ± 780133.856  ops/s
[info] ListOfBooleansWriting.smithy4sJson      10  thrpt    5  31529158.661 ± 475120.464  ops/s
[info] ListOfBooleansWriting.smithy4sJson     100  thrpt    5   6066729.156 ± 110610.169  ops/s
[info] ListOfBooleansWriting.smithy4sJson    1000  thrpt    5    468400.626 ±  13864.790  ops/s

After:

[info] Benchmark                           (size)   Mode  Cnt         Score         Error  Units
[info] ListOfBooleansReading.smithy4sJson       1  thrpt    5  23543939.110 ±  186439.061  ops/s
[info] ListOfBooleansReading.smithy4sJson      10  thrpt    5   4052132.013 ±   36540.274  ops/s
[info] ListOfBooleansReading.smithy4sJson     100  thrpt    5    440101.808 ±    2434.889  ops/s
[info] ListOfBooleansReading.smithy4sJson    1000  thrpt    5     49631.207 ±     224.666  ops/s
[info] ListOfBooleansWriting.smithy4sJson       1  thrpt    5  82515183.792 ± 7018104.795  ops/s
[info] ListOfBooleansWriting.smithy4sJson      10  thrpt    5  28518167.248 ±  903701.144  ops/s
[info] ListOfBooleansWriting.smithy4sJson     100  thrpt    5   5602098.425 ±  171208.549  ops/s
[info] ListOfBooleansWriting.smithy4sJson    1000  thrpt    5    464679.344 ±   12586.326  ops/s

The command used for that:

sbt -java-home /usr/lib/jvm/zulu-17 clean jsoniter-scala-benchmarkJVM/test 'jsoniter-scala-benchmarkJVM/jmh:run -p size=1,10,100,1000 ListOfBooleans(Reading|Writing).smithy4sJson'

@daddykotex
Copy link
Contributor Author

Last update, with case object instead of def list/set:

Before:

[info] Benchmark                           (size)   Mode  Cnt         Score        Error  Units
[info] ListOfBooleansReading.smithy4sJson       1  thrpt    5  25993006.192 ± 720633.324  ops/s
[info] ListOfBooleansReading.smithy4sJson      10  thrpt    5   4430988.006 ±  34950.346  ops/s
[info] ListOfBooleansReading.smithy4sJson     100  thrpt    5    475538.740 ±   6568.466  ops/s
[info] ListOfBooleansReading.smithy4sJson    1000  thrpt    5     47875.703 ±    472.228  ops/s
[info] ListOfBooleansWriting.smithy4sJson       1  thrpt    5  67302416.332 ± 727037.113  ops/s
[info] ListOfBooleansWriting.smithy4sJson      10  thrpt    5  28217963.212 ± 828884.018  ops/s
[info] ListOfBooleansWriting.smithy4sJson     100  thrpt    5   3448066.045 ±  49790.724  ops/s
[info] ListOfBooleansWriting.smithy4sJson    1000  thrpt    5    283070.390 ±   2624.769  ops/s

After:

[info] Benchmark                           (size)   Mode  Cnt         Score         Error  Units
[info] ListOfBooleansReading.smithy4sJson       1  thrpt    5  23691087.074 ±  346083.797  ops/s
[info] ListOfBooleansReading.smithy4sJson      10  thrpt    5   3878777.112 ±  179290.385  ops/s
[info] ListOfBooleansReading.smithy4sJson     100  thrpt    5    428526.504 ±    3785.386  ops/s
[info] ListOfBooleansReading.smithy4sJson    1000  thrpt    5     42317.259 ±    1073.143  ops/s
[info] ListOfBooleansWriting.smithy4sJson       1  thrpt    5  70342211.467 ± 1301975.796  ops/s
[info] ListOfBooleansWriting.smithy4sJson      10  thrpt    5  28465901.292 ±  337870.882  ops/s
[info] ListOfBooleansWriting.smithy4sJson     100  thrpt    5   3409951.467 ±   49448.292  ops/s
[info] ListOfBooleansWriting.smithy4sJson    1000  thrpt    5    344767.207 ±    1870.146  ops/s

@Baccata
Copy link
Contributor

Baccata commented Jul 5, 2022

@daddykotex, in the SchemaVisitorJCodec, the desired change would be reverted to something specific to the collection types being handled, such as listImpl here. You'd also have a setImpl like before, and additional methods for the added collection types.

Then, in the collection method, you'd pattern-match against the CollectionTag construct and delegate to the relevant, specialised method. Therefore, the interface methods exposed by CollectionTag would not be used in the Jsoniter-specific interpreter, to squeeze as much performance as possible out of the json decoding logic.

@daddykotex
Copy link
Contributor Author

@daddykotex daddykotex closed this Jul 5, 2022
@daddykotex daddykotex reopened this Jul 5, 2022
@daddykotex
Copy link
Contributor Author

@daddykotex, in the SchemaVisitorJCodec, the desired change would be reverted to something specific to the collection types being handled, such as listImpl here. You'd also have a setImpl like before, and additional methods for the added collection types.

Then, in the collection method, you'd pattern-match against the CollectionTag construct and delegate to the relevant, specialised method. Therefore, the interface methods exposed by CollectionTag would not be used in the Jsoniter-specific interpreter, to squeeze as much performance as possible out of the json decoding logic.

Got it, I'm sorry, I did not think we wanted to ditch CollectionTag implementation in this specific scenario.

Updating the code and re-running the benchmarks

@daddykotex
Copy link
Contributor Author

Before:

[info] Benchmark                           (size)   Mode  Cnt         Score        Error  Units
[info] ListOfBooleansReading.smithy4sJson       1  thrpt    5  25993006.192 ± 720633.324  ops/s
[info] ListOfBooleansReading.smithy4sJson      10  thrpt    5   4430988.006 ±  34950.346  ops/s
[info] ListOfBooleansReading.smithy4sJson     100  thrpt    5    475538.740 ±   6568.466  ops/s
[info] ListOfBooleansReading.smithy4sJson    1000  thrpt    5     47875.703 ±    472.228  ops/s
[info] ListOfBooleansWriting.smithy4sJson       1  thrpt    5  67302416.332 ± 727037.113  ops/s
[info] ListOfBooleansWriting.smithy4sJson      10  thrpt    5  28217963.212 ± 828884.018  ops/s
[info] ListOfBooleansWriting.smithy4sJson     100  thrpt    5   3448066.045 ±  49790.724  ops/s
[info] ListOfBooleansWriting.smithy4sJson    1000  thrpt    5    283070.390 ±   2624.769  ops/s

After:

[info] Benchmark                           (size)   Mode  Cnt         Score         Error  Units
[info] ListOfBooleansReading.smithy4sJson       1  thrpt    5  23691087.074 ±  346083.797  ops/s
[info] ListOfBooleansReading.smithy4sJson      10  thrpt    5   3878777.112 ±  179290.385  ops/s
[info] ListOfBooleansReading.smithy4sJson     100  thrpt    5    428526.504 ±    3785.386  ops/s
[info] ListOfBooleansReading.smithy4sJson    1000  thrpt    5     42317.259 ±    1073.143  ops/s
[info] ListOfBooleansWriting.smithy4sJson       1  thrpt    5  70342211.467 ± 1301975.796  ops/s
[info] ListOfBooleansWriting.smithy4sJson      10  thrpt    5  28465901.292 ±  337870.882  ops/s
[info] ListOfBooleansWriting.smithy4sJson     100  thrpt    5   3409951.467 ±   49448.292  ops/s
[info] ListOfBooleansWriting.smithy4sJson    1000  thrpt    5    344767.207 ±    1870.146  ops/s

After specialization:

[info] Benchmark                           (size)   Mode  Cnt         Score         Error  Units
[info] ListOfBooleansReading.smithy4sJson       1  thrpt    5  25857027.983 ± 1667865.836  ops/s
[info] ListOfBooleansReading.smithy4sJson      10  thrpt    5   4439332.358 ±   34581.609  ops/s
[info] ListOfBooleansReading.smithy4sJson     100  thrpt    5    462419.385 ±    8232.583  ops/s
[info] ListOfBooleansReading.smithy4sJson    1000  thrpt    5     47899.194 ±     792.417  ops/s
[info] ListOfBooleansWriting.smithy4sJson       1  thrpt    5  67179957.102 ±  685996.973  ops/s
[info] ListOfBooleansWriting.smithy4sJson      10  thrpt    5  28097529.170 ±  923336.601  ops/s
[info] ListOfBooleansWriting.smithy4sJson     100  thrpt    5   3392081.900 ±   41404.484  ops/s
[info] ListOfBooleansWriting.smithy4sJson    1000  thrpt    5    282219.904 ±    6480.179  ops/s

package smithy4s
package schema

sealed trait CollectionTag[C[_]] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

def struct[S](shapeId: ShapeId, hints: Hints, fields: Vector[SchemaField[S, _]], make: IndexedSeq[Any] => S): MaybeCT[S] = None
def union[U](shapeId: ShapeId, hints: Hints, alternatives: Vector[SchemaAlt[U, _]], dispatch: U => Alt.SchemaAndValue[U, _]): MaybeCT[U] = None
def biject[A, B](schema: Schema[A], to: A => B, from: B => A): MaybeCT[B] = {
if (to.isInstanceOf[Newtype.Make[A, B]]) apply(schema).asInstanceOf[MaybeCT[B]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hacky but when we have newtypes of primitives, we should attempt to retain compaction

@Baccata
Copy link
Contributor

Baccata commented Jul 6, 2022

The changes seem to behave as expected. Using the Spark utility, we can estimate the size of instance on the heap

Using the following smithy schema :

namespace foo

integer Foo

and the following script :

//> using scala 2.13

import $ivy.`org.openjdk.jol:jol-core:0.16`
import $ivy.`org.apache.spark::spark-core:3.3.0`
import $ivy.`com.disneystreaming.smithy4s::smithy4s-json:dev`

import scala.reflect.ClassTag
import org.openjdk.jol.info.ClassLayout;
import scala.collection.immutable.ArraySeq
import org.apache.spark.util.SizeEstimator
import foo.Foo
import smithy4s.http.json._
import com.github.plokhotnyuk.jsoniter_scala.core._
import smithy4s.schema._

object Main {

  implicit val indexedSeqFooCodec = smithy4s.http.json.JCodec.fromSchema(Schema.indexedSeq(Foo.schema))
  implicit val indexedSeqIntCodec = smithy4s.http.json.JCodec.fromSchema(Schema.indexedSeq(Schema.int))

  def main(args: Array[String]): Unit =  {
    implicit val ct: ClassTag[Foo] = implicitly[ClassTag[Int]].asInstanceOf[ClassTag[Foo]]
    val naive = IndexedSeq.fill(1000)(1)
    val ints = ArraySeq.fill(1000)(1)
    val foos = ArraySeq.fill(1000)(Foo(1))
    val json = writeToArray[IndexedSeq[Int]](ints)
    val resultInt = readFromArray[IndexedSeq[Int]](json)
    val resultFoo = readFromArray[IndexedSeq[Foo]](json)
    println(SizeEstimator.estimate(naive)) // Ints are stored as objects in standard IndexedSeq construction, we get 4696
    println(SizeEstimator.estimate(ints)) // 4032
    println(SizeEstimator.estimate(foos)) // 4032
    println(SizeEstimator.estimate(resultInt)) // 4032
    println(SizeEstimator.estimate(resultFoo)) // 4032
  }
}

@daddykotex
Copy link
Contributor Author

that's interesting, thanks for scala-cli script, TIL about the SizeEstimator

@daddykotex daddykotex marked this pull request as ready for review July 6, 2022 14:09
@Baccata Baccata changed the title Preliminary rework of the collection in the Schema abstraction Rework of the concept of collection in the Schema abstraction Jul 6, 2022
Comment on lines 26 to 27
@inline final def apply(a: A): Type = a.asInstanceOf[Type]
final val make: Newtype.Make[A, Type] = apply(_: A)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, is there ever a case where someone should use the apply method rather than make? If not, should the apply method be private?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users should very much keep using the apply method to construct instances of newtypes. This is very much a kind of hack to ensure we have a way of detecting whether a bijection is actually coming from a newtype. I'll try to hide it further and make the Make construct package private.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Baccata
Copy link
Contributor

Baccata commented Jul 7, 2022

I'm gonna go forward with this. We can refine later if need be.

@Baccata Baccata merged commit 0a41994 into main Jul 7, 2022
@Baccata Baccata deleted the dfrancoeur/collections branch July 7, 2022 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants