Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Micro optimization for collect* operator. #983

Merged
merged 1 commit into from
Jan 20, 2024

Conversation

He-Pin
Copy link
Member

@He-Pin He-Pin commented Jan 19, 2024

Motivation:
As collect and collectWhile are hot, I think it's worth to do this kind of micro optimization, and these code unlikely to be changed too.
Same trick can be find https://github.com/scala/scala/blob/2.13.x/src/library/scala/collection/IterableOnce.scala#L1178
Result:
IF_ACMPEQ will be a little faster, and reduce an INVOKEVIRTUAL java/lang/Object.equals

Jmh/run -i 10 -wi 10 -f1 -t1 .CollectBenchmark.

first run:

[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark                          Mode  Cnt         Score        Error  Units
[info] CollectBenchmark.benchNewCollect  thrpt   10  14550829.734 ± 110526.538  ops/s
[info] CollectBenchmark.benchOldCollect  thrpt   10  14268840.296 ± 332840.984  ops/s
[success] Total time: 783 s (13:03), completed 2024年1月20日 上午2:04:44
sbt:pekko-bench-jmh> Jmh/run -i 10 -wi 10 -f1 -t1 .*CollectBenchmark.*

info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark                          Mode  Cnt         Score        Error  Units
[info] CollectBenchmark.benchNewCollect  thrpt   10  15099034.601 ± 331220.483  ops/s
[info] CollectBenchmark.benchOldCollect  thrpt   10  14517524.914 ± 281804.895  ops/s
[success] Total time: 469 s (07:49), completed 2024年1月20日 上午3:25:36

[info] Iteration  10: 2024-01-20 03:54:31,099 INFO  org.apache.pekko.actor.CoordinatedShutdown  - Running CoordinatedShutdown with reason [ActorSystemTerminateReason] MDC: {pekkoUid=5508704921438154807, sourceThread=org.apache.pekko.stream.CollectBenchmark.benchOldCollect-jmh-worker-1, sourceActorSystem=CollectBenchmark, pekkoAddress=pekko://CollectBenchmark, pekkoSource=CoordinatedShutdown(pekko://CollectBenchmark), pekkoTimestamp=19:54:31.099UTC}
[info] 14499689.518 ops/s
[info] Result "org.apache.pekko.stream.CollectBenchmark.benchOldCollect":
[info]   14603414.744 ±(99.9%) 115289.942 ops/s [Average]
[info]   (min, avg, max) = (14499689.518, 14603414.744, 14709273.129), stdev = 76257.158
[info]   CI (99.9%): [14488124.802, 14718704.686] (assumes normal distribution)
[info] # Run complete. Total time: 00:07:00
[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark                          Mode  Cnt         Score        Error  Units
[info] CollectBenchmark.benchNewCollect  thrpt   10  15046779.903 ± 565166.752  ops/s
[info] CollectBenchmark.benchOldCollect  thrpt   10  14603414.744 ± 115289.942  ops/s
[success] Total time: 465 s (07:45), completed 2024年1月20日 上午3:54:32

image

@He-Pin He-Pin added this to the 1.1.0 milestone Jan 19, 2024
@He-Pin He-Pin added the t:stream Pekko Streams label Jan 19, 2024
Copy link
Member

@jxnu-liguobin jxnu-liguobin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@pjfanning pjfanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matches are the norm for this in Scala. Much more readable than instance of casting. I would ideally like to see benchmarks to agree to this sort of change.

@He-Pin
Copy link
Member Author

He-Pin commented Jan 19, 2024

I will post a benchmark then, I have diffed the bytecode, the mapError and recover ,recoverWith is not touched, as not on the hot path.

And on the otherside, the applyOrElse trick is already not that readable:)

Copy link
Member

@Roiocam Roiocam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another issue about this background: alexandru/scala-best-practices#24

I haven't benchmarked them yet, but I performed disassembly class: The compiler does optimized pattern matching to if-else, but in this PR, two ifeq could be optimized to a single if_acmpne

@He-Pin
Copy link
Member Author

He-Pin commented Jan 19, 2024

I trust the bytecode, I diffed the bytecode in detail, and the GFW slow me done:) and just find another issue osgi plugin bug issue too #986 .

@He-Pin
Copy link
Member Author

He-Pin commented Jan 19, 2024

@szeiger Would you like to confirm that this trick still apply? thanks.

@mdedetrich
Copy link
Contributor

Ill do some benchmarks as well, will post them here

Copy link
Contributor

@mdedetrich mdedetrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I have done my own benchmarks which you can see here https://www.diffchecker.com/EGNbOWui/, left is the unoptimized version and right is optimized.

There probably is a difference albeit its small however I wouldn't put too much importance to my benchmarks due to it being run on the M1 laptop however I did ran each benchmark many times and the optimized version tends to win.

In general I am not against the merit of the PR since the whole tenet of Akka/Pekko is as fast performance as you can get even if its at the cost of code readability.

In other words one of the primary reasons why people use Akka/Pekko is because its fast and not that its written in perfect idiomatic Scala.

@pjfanning Would you reconsider the PR? Akka/Pekko has an entire history of micro-optimizations like this so blocking for this reason is going against the grain. If we want to discuss this point more generally we should do so on the mailing list.

@He-Pin
Copy link
Member Author

He-Pin commented Jan 20, 2024

Yes, as both zio and fs2 are doing this kind off optimization too.

But I think we may add some comments about it why.

@Roiocam
Copy link
Member

Roiocam commented Jan 20, 2024

I have also did my benchmark too. https://gist.github.com/Roiocam/27e258a43aba39b308732cad49e45ac8

Optimized version both win on happy and sad path.

[info] Benchmark                                                Mode  Cnt             Score              Error  Units
[info] CollectFirstBenchmark.modifiedCollectFirst_OnHappyPath  thrpt    3  130858794743.725 ± 159214224029.529  ops/s
[info] CollectFirstBenchmark.modifiedCollectFirst_OnSadPath    thrpt    3     219557169.282 ±    395479472.799  ops/s
[info] CollectFirstBenchmark.originalCollectFirst_OnHappyPath  thrpt    3  139881378115.900 ±  35329769673.033  ops/s
[info] CollectFirstBenchmark.originalCollectFirst_OnSadPath    thrpt    3      32618319.864 ±     17940245.081  ops/s

@He-Pin
Copy link
Member Author

He-Pin commented Jan 20, 2024

object Cast {

  val NotApplied: Any => Any = _ => NotApplied
  def cast[A](a: Any): A = a.asInstanceOf[A]

  def castWithMatch[A](a: Any): A = a match {
    case NotApplied  => ???
    case a: A @unchecked => a
  }

  def castWithRef[A](a: Any): A = {
    if (a.asInstanceOf[AnyRef] eq NotApplied) {
      ???
    } else {
      a.asInstanceOf[A]
    }
  }
}

and the bytecode:

// class version 52.0 (52)
// access flags 0x31
public final class org/apache/pekko/stream/impl/Cast$ {

  // compiled from: Cast.scala

  ATTRIBUTE Scala : unknown

  ATTRIBUTE ScalaInlineInfo : unknown
  // access flags 0x19
  public final static INNERCLASS java/lang/invoke/MethodHandles$Lookup java/lang/invoke/MethodHandles Lookup

  // access flags 0x19
  public final static Lorg/apache/pekko/stream/impl/Cast$; MODULE$

  // access flags 0x1A
  // signature Lscala/Function1<Ljava/lang/Object;Ljava/lang/Object;>;
  // declaration: NotApplied extends scala.Function1<java.lang.Object, java.lang.Object>
  private final static Lscala/Function1; NotApplied

  // access flags 0x9
  public static <clinit>()V
   L0
    LINENUMBER 25 L0
    NEW org/apache/pekko/stream/impl/Cast$
    DUP
    INVOKESPECIAL org/apache/pekko/stream/impl/Cast$.<init> ()V
    PUTSTATIC org/apache/pekko/stream/impl/Cast$.MODULE$ : Lorg/apache/pekko/stream/impl/Cast$;
   L1
    LINENUMBER 27 L1
    INVOKEDYNAMIC apply()Lscala/Function1; [
      // handle kind 0x6 : INVOKESTATIC
      java/lang/invoke/LambdaMetafactory.altMetafactory(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
      // arguments:
      (Ljava/lang/Object;)Ljava/lang/Object;, 
      // handle kind 0x6 : INVOKESTATIC
      org/apache/pekko/stream/impl/Cast$.$anonfun$NotApplied$1(Ljava/lang/Object;)Lscala/Function1;, 
      (Ljava/lang/Object;)Lscala/Function1;, 
      5, 
      1, 
      (Ljava/lang/Object;)Lscala/Function1;
    ]
    PUTSTATIC org/apache/pekko/stream/impl/Cast$.NotApplied : Lscala/Function1;
   L2
    LINENUMBER 25 L2
    RETURN
    MAXSTACK = 2
    MAXLOCALS = 0

  // access flags 0x1
  // signature ()Lscala/Function1<Ljava/lang/Object;Ljava/lang/Object;>;
  // declaration: scala.Function1<java.lang.Object, java.lang.Object> NotApplied()
  public NotApplied()Lscala/Function1;
   L0
    LINENUMBER 27 L0
    GETSTATIC org/apache/pekko/stream/impl/Cast$.NotApplied : Lscala/Function1;
    ARETURN
   L1
    LOCALVARIABLE this Lorg/apache/pekko/stream/impl/Cast$; L0 L1 0
    MAXSTACK = 1
    MAXLOCALS = 1

  // access flags 0x1
  // signature <A:Ljava/lang/Object;>(Ljava/lang/Object;)TA;
  // declaration: A cast<A>(java.lang.Object)
  public cast(Ljava/lang/Object;)Ljava/lang/Object;
    // parameter final  a
   L0
    LINENUMBER 28 L0
    ALOAD 1
    ARETURN
   L1
    LOCALVARIABLE this Lorg/apache/pekko/stream/impl/Cast$; L0 L1 0
    LOCALVARIABLE a Ljava/lang/Object; L0 L1 1
    MAXSTACK = 1
    MAXLOCALS = 2

  // access flags 0x1
  // signature <A:Ljava/lang/Object;>(Ljava/lang/Object;)TA;
  // declaration: A castWithMatch<A>(java.lang.Object)
  public castWithMatch(Ljava/lang/Object;)Ljava/lang/Object;
    // parameter final  a
   L0
    LINENUMBER 31 L0
    ALOAD 0
    INVOKEVIRTUAL org/apache/pekko/stream/impl/Cast$.NotApplied ()Lscala/Function1;
    DUP
    IFNONNULL L1
    POP
    ALOAD 1
    IFNULL L2
    GOTO L3
   L1
   FRAME SAME1 scala/Function1
    ALOAD 1
    INVOKEVIRTUAL java/lang/Object.equals (Ljava/lang/Object;)Z
    IFEQ L3
   L2
   FRAME SAME
    GETSTATIC scala/Predef$.MODULE$ : Lscala/Predef$;
    INVOKEVIRTUAL scala/Predef$.$qmark$qmark$qmark ()Lscala/runtime/Nothing$;
    ATHROW
   L3
    LINENUMBER 32 L3
   FRAME SAME
    ALOAD 1
    INSTANCEOF java/lang/Object
    IFEQ L4
    ALOAD 1
    ARETURN
   L4
    LINENUMBER 30 L4
   FRAME SAME
    NEW scala/MatchError
    DUP
    ALOAD 1
    INVOKESPECIAL scala/MatchError.<init> (Ljava/lang/Object;)V
    ATHROW
   L5
    LOCALVARIABLE this Lorg/apache/pekko/stream/impl/Cast$; L0 L5 0
    LOCALVARIABLE a Ljava/lang/Object; L0 L5 1
    MAXSTACK = 3
    MAXLOCALS = 2

  // access flags 0x1
  // signature <A:Ljava/lang/Object;>(Ljava/lang/Object;)TA;
  // declaration: A castWithRef<A>(java.lang.Object)
  public castWithRef(Ljava/lang/Object;)Ljava/lang/Object;
    // parameter final  a
   L0
    LINENUMBER 36 L0
    ALOAD 0
    INVOKEVIRTUAL org/apache/pekko/stream/impl/Cast$.NotApplied ()Lscala/Function1;
    ASTORE 2
   L1
    LINENUMBER 37 L1
    ALOAD 1
    ALOAD 2
    IF_ACMPNE L2
   L3
    LINENUMBER 38 L3
    GETSTATIC scala/Predef$.MODULE$ : Lscala/Predef$;
    INVOKEVIRTUAL scala/Predef$.$qmark$qmark$qmark ()Lscala/runtime/Nothing$;
    ATHROW
   L2
    LINENUMBER 40 L2
   FRAME APPEND [scala/Function1]
    ALOAD 1
    ARETURN
   L4
    LOCALVARIABLE sentinel Lscala/Function1; L1 L4 2
    LOCALVARIABLE this Lorg/apache/pekko/stream/impl/Cast$; L0 L4 0
    LOCALVARIABLE a Ljava/lang/Object; L0 L4 1
    MAXSTACK = 2
    MAXLOCALS = 3

  // access flags 0x1019
  public final static synthetic $anonfun$NotApplied$1(Ljava/lang/Object;)Lscala/Function1;
    // parameter final  x$1
   L0
    LINENUMBER 27 L0
    GETSTATIC org/apache/pekko/stream/impl/Cast$.MODULE$ : Lorg/apache/pekko/stream/impl/Cast$;
    INVOKEVIRTUAL org/apache/pekko/stream/impl/Cast$.NotApplied ()Lscala/Function1;
    ARETURN
   L1
    LOCALVARIABLE x$1 Ljava/lang/Object; L0 L1 0
    MAXSTACK = 1
    MAXLOCALS = 1

  // access flags 0x2
  private <init>()V
   L0
    LINENUMBER 25 L0
    ALOAD 0
    INVOKESPECIAL java/lang/Object.<init> ()V
    RETURN
   L1
    LOCALVARIABLE this Lorg/apache/pekko/stream/impl/Cast$; L0 L1 0
    MAXSTACK = 1
    MAXLOCALS = 1

  // access flags 0x100A
  private static synthetic $deserializeLambda$(Ljava/lang/invoke/SerializedLambda;)Ljava/lang/Object;
    ALOAD 0
    INVOKEDYNAMIC lambdaDeserialize(Ljava/lang/invoke/SerializedLambda;)Ljava/lang/Object; [
      // handle kind 0x6 : INVOKESTATIC
      scala/runtime/LambdaDeserialize.bootstrap(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;[Ljava/lang/invoke/MethodHandle;)Ljava/lang/invoke/CallSite;
      // arguments:
      // handle kind 0x6 : INVOKESTATIC
      org/apache/pekko/stream/impl/Cast$.$anonfun$NotApplied$1(Ljava/lang/Object;)Lscala/Function1;
    ]
    ARETURN
    MAXSTACK = 1
    MAXLOCALS = 1
}

@mdedetrich
Copy link
Contributor

But I think we may add some comments about it why.

Do you want me to do this, I have permissions to alter PR branches by adding commits on them, if so then just let me know what the text is.

@He-Pin
Copy link
Member Author

He-Pin commented Jan 20, 2024

Seems like the networking is recovered, let me try. @mdedetrich

// eg: just a simple `IF_ACMPNE`, and you can find the same trick in `CollectWhile` operator.
// If you interest, you can check the associated PR for this change and the
// current implementation of `scala.collection.IterableOnceOps.collectFirst`.
if (result.asInstanceOf[AnyRef] eq Collect.NotApplied) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdedetrich I have updated the text, how do you think about it? you can change it directly, I may not be able to push next time:)

Copy link
Contributor

@mdedetrich mdedetrich Jan 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me its good, just need to wait for @pjfanning t ore-review this. The code is not formatted so you need to repush though.

@He-Pin He-Pin requested a review from pjfanning January 20, 2024 09:57
Copy link
Contributor

@pjfanning pjfanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@He-Pin He-Pin merged commit cd70ae8 into apache:main Jan 20, 2024
18 checks passed
@He-Pin He-Pin deleted the microCollectOpt branch January 20, 2024 11:31
@som-snytt
Copy link

som-snytt commented Jan 22, 2024

I mentioned on the forum that pattern matching on the singleton type has the desired effect.

       pf.applyOrElse(grab(in), NotApplied) match {
          case _: NotApplied.type             => completeStage()

Worth adding that sometimes fetching the value is not free and it's faster to do a type test first:

  thing match {
    case h :: t => 1
    case _ => 0
  }

Apparently, default case also reduces MatchError cruft.

@mdedetrich
Copy link
Contributor

@som-snytt On what Scala versions is this? Pekko cross compiles to Scala 2.12, 2.13 and 3.3.1 so unless this optimization is in all of those versions of Scala we still have to use the manual version

@som-snytt
Copy link

som-snytt commented Jan 23, 2024

@mdedetrich All versions. Somewhere I just noted that it's in the spec. OK I looked it up: https://scala-lang.org/files/archive/spec/2.13/08-pattern-matching.html#type-patterns see singleton.

That is, it's not an optimization, so put that under optimization.

@mdedetrich
Copy link
Contributor

@som-snytt Thanks!

@He-Pin Do you want to look into using this pattern + checking bytecode to see if its the same for all scala versions? Feel free to make an issue on it if you don't have time to do it now (its not urgent)

@He-Pin
Copy link
Member Author

He-Pin commented Jan 23, 2024

@He-Pin I'm testing it while I compile my $Work code:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t:stream Pekko Streams
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants