Rework of the concept of collection in the Schema abstraction #290

daddykotex · 2022-07-04T18:08:46Z

[Oli's comment]

This replaces list/set in Schema by a generic collection construct, that takes a CollectionTag as a parameter. The CollectionTag carries methods to (de)construct the collections, and can be used to generically implement SchemaVisitors without worrying about what collection we're dealing with. In addition, the CollectionTag is sealed, and can be inspected to provide specialised version of the encoding/decoding logic when relevant.

modules/core/src/smithy4s/schema/CollectionTag.scala

modules/core/src/smithy4s/internals/SchematicDocumentDecoder.scala

modules/core/src/smithy4s/schema/SchematicRepr.scala

Baccata · 2022-07-05T08:00:01Z

@daddykotex please run the jsoniter benchmarks to see between the current version and this branch to whether this impacts the performance when decoding/encoding lists.

plokhotnyuk · 2022-07-05T08:46:04Z

@daddykotex please run the jsoniter benchmarks to see between the current version and this branch to whether this impacts the performance when decoding/encoding lists.

It will impact performance - please, see ListOfBooleanReading and ListOfBooleanWriting benchmarks.

Do we have in this new design an option to have specialized codecs for some collection tags?

Baccata · 2022-07-05T08:49:12Z

@plokhotnyuk if we keep the abstraction sealed, and expose the sealed members to recover the abstracted type, it'll allow for specialisation. I'd be fine with this.

plokhotnyuk · 2022-07-05T09:20:38Z

Bellow are results of benchmarks for different sizes.

Before:

[info] Benchmark                           (size)   Mode  Cnt         Score        Error  Units
[info] ListOfBooleansReading.smithy4sJson       1  thrpt    5  25712962.540 ± 739164.263  ops/s
[info] ListOfBooleansReading.smithy4sJson      10  thrpt    5   4185337.378 ± 231537.022  ops/s
[info] ListOfBooleansReading.smithy4sJson     100  thrpt    5    502574.157 ±   8798.920  ops/s
[info] ListOfBooleansReading.smithy4sJson    1000  thrpt    5     50891.385 ±    235.812  ops/s
[info] ListOfBooleansWriting.smithy4sJson       1  thrpt    5  83653388.040 ± 780133.856  ops/s
[info] ListOfBooleansWriting.smithy4sJson      10  thrpt    5  31529158.661 ± 475120.464  ops/s
[info] ListOfBooleansWriting.smithy4sJson     100  thrpt    5   6066729.156 ± 110610.169  ops/s
[info] ListOfBooleansWriting.smithy4sJson    1000  thrpt    5    468400.626 ±  13864.790  ops/s

After:

[info] Benchmark                           (size)   Mode  Cnt         Score         Error  Units
[info] ListOfBooleansReading.smithy4sJson       1  thrpt    5  23543939.110 ±  186439.061  ops/s
[info] ListOfBooleansReading.smithy4sJson      10  thrpt    5   4052132.013 ±   36540.274  ops/s
[info] ListOfBooleansReading.smithy4sJson     100  thrpt    5    440101.808 ±    2434.889  ops/s
[info] ListOfBooleansReading.smithy4sJson    1000  thrpt    5     49631.207 ±     224.666  ops/s
[info] ListOfBooleansWriting.smithy4sJson       1  thrpt    5  82515183.792 ± 7018104.795  ops/s
[info] ListOfBooleansWriting.smithy4sJson      10  thrpt    5  28518167.248 ±  903701.144  ops/s
[info] ListOfBooleansWriting.smithy4sJson     100  thrpt    5   5602098.425 ±  171208.549  ops/s
[info] ListOfBooleansWriting.smithy4sJson    1000  thrpt    5    464679.344 ±   12586.326  ops/s

The command used for that:

sbt -java-home /usr/lib/jvm/zulu-17 clean jsoniter-scala-benchmarkJVM/test 'jsoniter-scala-benchmarkJVM/jmh:run -p size=1,10,100,1000 ListOfBooleans(Reading|Writing).smithy4sJson'

Move `A` as a method param Add name: String on the interface

daddykotex · 2022-07-05T14:08:11Z

Last update, with case object instead of def list/set:

Before:

[info] Benchmark                           (size)   Mode  Cnt         Score        Error  Units
[info] ListOfBooleansReading.smithy4sJson       1  thrpt    5  25993006.192 ± 720633.324  ops/s
[info] ListOfBooleansReading.smithy4sJson      10  thrpt    5   4430988.006 ±  34950.346  ops/s
[info] ListOfBooleansReading.smithy4sJson     100  thrpt    5    475538.740 ±   6568.466  ops/s
[info] ListOfBooleansReading.smithy4sJson    1000  thrpt    5     47875.703 ±    472.228  ops/s
[info] ListOfBooleansWriting.smithy4sJson       1  thrpt    5  67302416.332 ± 727037.113  ops/s
[info] ListOfBooleansWriting.smithy4sJson      10  thrpt    5  28217963.212 ± 828884.018  ops/s
[info] ListOfBooleansWriting.smithy4sJson     100  thrpt    5   3448066.045 ±  49790.724  ops/s
[info] ListOfBooleansWriting.smithy4sJson    1000  thrpt    5    283070.390 ±   2624.769  ops/s

After:

[info] Benchmark                           (size)   Mode  Cnt         Score         Error  Units
[info] ListOfBooleansReading.smithy4sJson       1  thrpt    5  23691087.074 ±  346083.797  ops/s
[info] ListOfBooleansReading.smithy4sJson      10  thrpt    5   3878777.112 ±  179290.385  ops/s
[info] ListOfBooleansReading.smithy4sJson     100  thrpt    5    428526.504 ±    3785.386  ops/s
[info] ListOfBooleansReading.smithy4sJson    1000  thrpt    5     42317.259 ±    1073.143  ops/s
[info] ListOfBooleansWriting.smithy4sJson       1  thrpt    5  70342211.467 ± 1301975.796  ops/s
[info] ListOfBooleansWriting.smithy4sJson      10  thrpt    5  28465901.292 ±  337870.882  ops/s
[info] ListOfBooleansWriting.smithy4sJson     100  thrpt    5   3409951.467 ±   49448.292  ops/s
[info] ListOfBooleansWriting.smithy4sJson    1000  thrpt    5    344767.207 ±    1870.146  ops/s

Baccata · 2022-07-05T14:13:41Z

@daddykotex, in the SchemaVisitorJCodec, the desired change would be reverted to something specific to the collection types being handled, such as listImpl here. You'd also have a setImpl like before, and additional methods for the added collection types.

Then, in the collection method, you'd pattern-match against the CollectionTag construct and delegate to the relevant, specialised method. Therefore, the interface methods exposed by CollectionTag would not be used in the Jsoniter-specific interpreter, to squeeze as much performance as possible out of the json decoding logic.

daddykotex · 2022-07-05T14:20:32Z

daddykotex · 2022-07-05T14:21:27Z

@daddykotex, in the SchemaVisitorJCodec, the desired change would be reverted to something specific to the collection types being handled, such as listImpl here. You'd also have a setImpl like before, and additional methods for the added collection types.

Then, in the collection method, you'd pattern-match against the CollectionTag construct and delegate to the relevant, specialised method. Therefore, the interface methods exposed by CollectionTag would not be used in the Jsoniter-specific interpreter, to squeeze as much performance as possible out of the json decoding logic.

Got it, I'm sorry, I did not think we wanted to ditch CollectionTag implementation in this specific scenario.

Updating the code and re-running the benchmarks

daddykotex · 2022-07-05T14:29:56Z

Before:

[info] Benchmark                           (size)   Mode  Cnt         Score        Error  Units
[info] ListOfBooleansReading.smithy4sJson       1  thrpt    5  25993006.192 ± 720633.324  ops/s
[info] ListOfBooleansReading.smithy4sJson      10  thrpt    5   4430988.006 ±  34950.346  ops/s
[info] ListOfBooleansReading.smithy4sJson     100  thrpt    5    475538.740 ±   6568.466  ops/s
[info] ListOfBooleansReading.smithy4sJson    1000  thrpt    5     47875.703 ±    472.228  ops/s
[info] ListOfBooleansWriting.smithy4sJson       1  thrpt    5  67302416.332 ± 727037.113  ops/s
[info] ListOfBooleansWriting.smithy4sJson      10  thrpt    5  28217963.212 ± 828884.018  ops/s
[info] ListOfBooleansWriting.smithy4sJson     100  thrpt    5   3448066.045 ±  49790.724  ops/s
[info] ListOfBooleansWriting.smithy4sJson    1000  thrpt    5    283070.390 ±   2624.769  ops/s

After:

[info] Benchmark                           (size)   Mode  Cnt         Score         Error  Units
[info] ListOfBooleansReading.smithy4sJson       1  thrpt    5  23691087.074 ±  346083.797  ops/s
[info] ListOfBooleansReading.smithy4sJson      10  thrpt    5   3878777.112 ±  179290.385  ops/s
[info] ListOfBooleansReading.smithy4sJson     100  thrpt    5    428526.504 ±    3785.386  ops/s
[info] ListOfBooleansReading.smithy4sJson    1000  thrpt    5     42317.259 ±    1073.143  ops/s
[info] ListOfBooleansWriting.smithy4sJson       1  thrpt    5  70342211.467 ± 1301975.796  ops/s
[info] ListOfBooleansWriting.smithy4sJson      10  thrpt    5  28465901.292 ±  337870.882  ops/s
[info] ListOfBooleansWriting.smithy4sJson     100  thrpt    5   3409951.467 ±   49448.292  ops/s
[info] ListOfBooleansWriting.smithy4sJson    1000  thrpt    5    344767.207 ±    1870.146  ops/s

After specialization:

[info] Benchmark                           (size)   Mode  Cnt         Score         Error  Units
[info] ListOfBooleansReading.smithy4sJson       1  thrpt    5  25857027.983 ± 1667865.836  ops/s
[info] ListOfBooleansReading.smithy4sJson      10  thrpt    5   4439332.358 ±   34581.609  ops/s
[info] ListOfBooleansReading.smithy4sJson     100  thrpt    5    462419.385 ±    8232.583  ops/s
[info] ListOfBooleansReading.smithy4sJson    1000  thrpt    5     47899.194 ±     792.417  ops/s
[info] ListOfBooleansWriting.smithy4sJson       1  thrpt    5  67179957.102 ±  685996.973  ops/s
[info] ListOfBooleansWriting.smithy4sJson      10  thrpt    5  28097529.170 ±  923336.601  ops/s
[info] ListOfBooleansWriting.smithy4sJson     100  thrpt    5   3392081.900 ±   41404.484  ops/s
[info] ListOfBooleansWriting.smithy4sJson    1000  thrpt    5    282219.904 ±    6480.179  ops/s

Baccata · 2022-07-05T14:34:05Z

modules/core/src/smithy4s/schema/CollectionTag.scala

+package smithy4s
+package schema
+
+sealed trait CollectionTag[C[_]] {


Baccata · 2022-07-06T08:31:36Z

modules/core/src/smithy4s/schema/CollectionTag.scala

+    def struct[S](shapeId: ShapeId, hints: Hints, fields: Vector[SchemaField[S, _]], make: IndexedSeq[Any] => S): MaybeCT[S] = None
+    def union[U](shapeId: ShapeId, hints: Hints, alternatives: Vector[SchemaAlt[U, _]], dispatch: U => Alt.SchemaAndValue[U, _]): MaybeCT[U] = None
+    def biject[A, B](schema: Schema[A], to: A => B, from: B => A): MaybeCT[B] = {
+      if (to.isInstanceOf[Newtype.Make[A, B]]) apply(schema).asInstanceOf[MaybeCT[B]]


This is a hacky but when we have newtypes of primitives, we should attempt to retain compaction

Baccata · 2022-07-06T10:12:50Z

The changes seem to behave as expected. Using the Spark utility, we can estimate the size of instance on the heap

Using the following smithy schema :

namespace foo

integer Foo

and the following script :

//> using scala 2.13

import $ivy.`org.openjdk.jol:jol-core:0.16`
import $ivy.`org.apache.spark::spark-core:3.3.0`
import $ivy.`com.disneystreaming.smithy4s::smithy4s-json:dev`

import scala.reflect.ClassTag
import org.openjdk.jol.info.ClassLayout;
import scala.collection.immutable.ArraySeq
import org.apache.spark.util.SizeEstimator
import foo.Foo
import smithy4s.http.json._
import com.github.plokhotnyuk.jsoniter_scala.core._
import smithy4s.schema._

object Main {

  implicit val indexedSeqFooCodec = smithy4s.http.json.JCodec.fromSchema(Schema.indexedSeq(Foo.schema))
  implicit val indexedSeqIntCodec = smithy4s.http.json.JCodec.fromSchema(Schema.indexedSeq(Schema.int))

  def main(args: Array[String]): Unit =  {
    implicit val ct: ClassTag[Foo] = implicitly[ClassTag[Int]].asInstanceOf[ClassTag[Foo]]
    val naive = IndexedSeq.fill(1000)(1)
    val ints = ArraySeq.fill(1000)(1)
    val foos = ArraySeq.fill(1000)(Foo(1))
    val json = writeToArray[IndexedSeq[Int]](ints)
    val resultInt = readFromArray[IndexedSeq[Int]](json)
    val resultFoo = readFromArray[IndexedSeq[Foo]](json)
    println(SizeEstimator.estimate(naive)) // Ints are stored as objects in standard IndexedSeq construction, we get 4696
    println(SizeEstimator.estimate(ints)) // 4032
    println(SizeEstimator.estimate(foos)) // 4032
    println(SizeEstimator.estimate(resultInt)) // 4032
    println(SizeEstimator.estimate(resultFoo)) // 4032
  }
}

daddykotex · 2022-07-06T14:09:45Z

that's interesting, thanks for scala-cli script, TIL about the SizeEstimator

lewisjkl · 2022-07-06T16:13:19Z

modules/core/src-2/Newtype.scala

  @inline final def apply(a: A): Type = a.asInstanceOf[Type]
+  final val make: Newtype.Make[A, Type] = apply(_: A)


With this change, is there ever a case where someone should use the apply method rather than make? If not, should the apply method be private?

Users should very much keep using the apply method to construct instances of newtypes. This is very much a kind of hack to ensure we have a way of detecting whether a bijection is actually coming from a newtype. I'll try to hide it further and make the Make construct package private.

Baccata · 2022-07-07T15:14:56Z

I'm gonna go forward with this. We can refine later if need be.

daddykotex added 2 commits July 4, 2022 14:05

Schema refactor in core

fe1d613

Fix errors throughout in other modules

52daa2d

daddykotex marked this pull request as draft July 4, 2022 18:08

Baccata reviewed Jul 5, 2022

View reviewed changes

modules/core/src/smithy4s/schema/CollectionTag.scala Outdated Show resolved Hide resolved

Baccata reviewed Jul 5, 2022

View reviewed changes

modules/core/src/smithy4s/schema/CollectionTag.scala Outdated Show resolved Hide resolved

Baccata reviewed Jul 5, 2022

View reviewed changes

modules/core/src/smithy4s/internals/SchematicDocumentDecoder.scala Outdated Show resolved Hide resolved

Baccata reviewed Jul 5, 2022

View reviewed changes

modules/core/src/smithy4s/schema/SchematicRepr.scala Outdated Show resolved Hide resolved

daddykotex added 5 commits July 5, 2022 09:13

Merge remote-tracking branch 'origin/main' into dfrancoeur/collections

c9e2c03

Improve CollectionTag

6b6ecb8

Move `A` as a method param Add name: String on the interface

Add Vector and ArraySeq implementation

465829f

Remove ArraySeq for now

147f728

Use tag.name in string representation

9a4a772

daddykotex closed this Jul 5, 2022

daddykotex reopened this Jul 5, 2022

Rework implementation for jcodec collections

df05035

Baccata reviewed Jul 5, 2022

View reviewed changes

modules/core/src/smithy4s/schema/CollectionTag.scala

package smithy4s

package schema

sealed trait CollectionTag[C[_]] {

Copy link

Contributor

Baccata Jul 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Add IndexedSeq to CollectionTag

ec50c65

Baccata reviewed Jul 6, 2022

View reviewed changes

Baccata added 3 commits July 6, 2022 10:34

pre-compute builder-making function

eefd49c

Fix Scala 3

c6f454e

Fix cast

26aa019

Regenerated examples

b364081

daddykotex marked this pull request as ready for review July 6, 2022 14:09

Baccata changed the title ~~Preliminary rework of the collection in the Schema abstraction~~ Rework of the concept of collection in the Schema abstraction Jul 6, 2022

lewisjkl reviewed Jul 6, 2022

View reviewed changes

Make Newtype.Make package private

e2d5e2e

Baccata force-pushed the dfrancoeur/collections branch from 263a7a2 to e2d5e2e Compare July 7, 2022 07:25

Baccata approved these changes Jul 7, 2022

View reviewed changes

Baccata merged commit 0a41994 into main Jul 7, 2022

Baccata deleted the dfrancoeur/collections branch July 7, 2022 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework of the concept of collection in the Schema abstraction #290

Rework of the concept of collection in the Schema abstraction #290

daddykotex commented Jul 4, 2022 •

edited by Baccata

Loading

Baccata commented Jul 5, 2022

plokhotnyuk commented Jul 5, 2022

Baccata commented Jul 5, 2022

plokhotnyuk commented Jul 5, 2022 •

edited

Loading

daddykotex commented Jul 5, 2022

Baccata commented Jul 5, 2022 •

edited

Loading

daddykotex commented Jul 5, 2022

daddykotex commented Jul 5, 2022

daddykotex commented Jul 5, 2022

Baccata Jul 5, 2022

Baccata Jul 6, 2022

Baccata commented Jul 6, 2022 •

edited

Loading

daddykotex commented Jul 6, 2022

lewisjkl Jul 6, 2022

Baccata Jul 7, 2022

Baccata Jul 7, 2022

Baccata commented Jul 7, 2022

		@inline final def apply(a: A): Type = a.asInstanceOf[Type]
		final val make: Newtype.Make[A, Type] = apply(_: A)

Rework of the concept of collection in the Schema abstraction #290

Rework of the concept of collection in the Schema abstraction #290

Conversation

daddykotex commented Jul 4, 2022 • edited by Baccata Loading

Baccata commented Jul 5, 2022

plokhotnyuk commented Jul 5, 2022

Baccata commented Jul 5, 2022

plokhotnyuk commented Jul 5, 2022 • edited Loading

daddykotex commented Jul 5, 2022

Baccata commented Jul 5, 2022 • edited Loading

daddykotex commented Jul 5, 2022

daddykotex commented Jul 5, 2022

daddykotex commented Jul 5, 2022

Baccata Jul 5, 2022

Choose a reason for hiding this comment

Baccata Jul 6, 2022

Choose a reason for hiding this comment

Baccata commented Jul 6, 2022 • edited Loading

daddykotex commented Jul 6, 2022

lewisjkl Jul 6, 2022

Choose a reason for hiding this comment

Baccata Jul 7, 2022

Choose a reason for hiding this comment

Baccata Jul 7, 2022

Choose a reason for hiding this comment

Baccata commented Jul 7, 2022

daddykotex commented Jul 4, 2022 •

edited by Baccata

Loading

plokhotnyuk commented Jul 5, 2022 •

edited

Loading

Baccata commented Jul 5, 2022 •

edited

Loading

Baccata commented Jul 6, 2022 •

edited

Loading