-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Micro optimization for collect*
operator.
#983
Conversation
a6243fc
to
703999e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matches are the norm for this in Scala. Much more readable than instance of casting. I would ideally like to see benchmarks to agree to this sort of change.
I will post a benchmark then, I have diffed the bytecode, the And on the otherside, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another issue about this background: alexandru/scala-best-practices#24
I haven't benchmarked them yet, but I performed disassembly class: The compiler does optimized pattern matching to if-else
, but in this PR, two ifeq
could be optimized to a single if_acmpne
I trust the bytecode, I diffed the bytecode in detail, and the GFW slow me done:) and just find another issue osgi plugin bug issue too #986 . |
@szeiger Would you like to confirm that this trick still apply? thanks. |
edeadd6
to
98c37ff
Compare
Ill do some benchmarks as well, will post them here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I have done my own benchmarks which you can see here https://www.diffchecker.com/EGNbOWui/, left is the unoptimized version and right is optimized.
There probably is a difference albeit its small however I wouldn't put too much importance to my benchmarks due to it being run on the M1 laptop however I did ran each benchmark many times and the optimized version tends to win.
In general I am not against the merit of the PR since the whole tenet of Akka/Pekko is as fast performance as you can get even if its at the cost of code readability.
In other words one of the primary reasons why people use Akka/Pekko is because its fast and not that its written in perfect idiomatic Scala.
@pjfanning Would you reconsider the PR? Akka/Pekko has an entire history of micro-optimizations like this so blocking for this reason is going against the grain. If we want to discuss this point more generally we should do so on the mailing list.
Yes, as both zio and fs2 are doing this kind off optimization too. But I think we may add some comments about it why. |
I have also did my benchmark too. https://gist.github.com/Roiocam/27e258a43aba39b308732cad49e45ac8 Optimized version both win on happy and sad path.
|
object Cast {
val NotApplied: Any => Any = _ => NotApplied
def cast[A](a: Any): A = a.asInstanceOf[A]
def castWithMatch[A](a: Any): A = a match {
case NotApplied => ???
case a: A @unchecked => a
}
def castWithRef[A](a: Any): A = {
if (a.asInstanceOf[AnyRef] eq NotApplied) {
???
} else {
a.asInstanceOf[A]
}
}
} and the bytecode: // class version 52.0 (52)
// access flags 0x31
public final class org/apache/pekko/stream/impl/Cast$ {
// compiled from: Cast.scala
ATTRIBUTE Scala : unknown
ATTRIBUTE ScalaInlineInfo : unknown
// access flags 0x19
public final static INNERCLASS java/lang/invoke/MethodHandles$Lookup java/lang/invoke/MethodHandles Lookup
// access flags 0x19
public final static Lorg/apache/pekko/stream/impl/Cast$; MODULE$
// access flags 0x1A
// signature Lscala/Function1<Ljava/lang/Object;Ljava/lang/Object;>;
// declaration: NotApplied extends scala.Function1<java.lang.Object, java.lang.Object>
private final static Lscala/Function1; NotApplied
// access flags 0x9
public static <clinit>()V
L0
LINENUMBER 25 L0
NEW org/apache/pekko/stream/impl/Cast$
DUP
INVOKESPECIAL org/apache/pekko/stream/impl/Cast$.<init> ()V
PUTSTATIC org/apache/pekko/stream/impl/Cast$.MODULE$ : Lorg/apache/pekko/stream/impl/Cast$;
L1
LINENUMBER 27 L1
INVOKEDYNAMIC apply()Lscala/Function1; [
// handle kind 0x6 : INVOKESTATIC
java/lang/invoke/LambdaMetafactory.altMetafactory(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
// arguments:
(Ljava/lang/Object;)Ljava/lang/Object;,
// handle kind 0x6 : INVOKESTATIC
org/apache/pekko/stream/impl/Cast$.$anonfun$NotApplied$1(Ljava/lang/Object;)Lscala/Function1;,
(Ljava/lang/Object;)Lscala/Function1;,
5,
1,
(Ljava/lang/Object;)Lscala/Function1;
]
PUTSTATIC org/apache/pekko/stream/impl/Cast$.NotApplied : Lscala/Function1;
L2
LINENUMBER 25 L2
RETURN
MAXSTACK = 2
MAXLOCALS = 0
// access flags 0x1
// signature ()Lscala/Function1<Ljava/lang/Object;Ljava/lang/Object;>;
// declaration: scala.Function1<java.lang.Object, java.lang.Object> NotApplied()
public NotApplied()Lscala/Function1;
L0
LINENUMBER 27 L0
GETSTATIC org/apache/pekko/stream/impl/Cast$.NotApplied : Lscala/Function1;
ARETURN
L1
LOCALVARIABLE this Lorg/apache/pekko/stream/impl/Cast$; L0 L1 0
MAXSTACK = 1
MAXLOCALS = 1
// access flags 0x1
// signature <A:Ljava/lang/Object;>(Ljava/lang/Object;)TA;
// declaration: A cast<A>(java.lang.Object)
public cast(Ljava/lang/Object;)Ljava/lang/Object;
// parameter final a
L0
LINENUMBER 28 L0
ALOAD 1
ARETURN
L1
LOCALVARIABLE this Lorg/apache/pekko/stream/impl/Cast$; L0 L1 0
LOCALVARIABLE a Ljava/lang/Object; L0 L1 1
MAXSTACK = 1
MAXLOCALS = 2
// access flags 0x1
// signature <A:Ljava/lang/Object;>(Ljava/lang/Object;)TA;
// declaration: A castWithMatch<A>(java.lang.Object)
public castWithMatch(Ljava/lang/Object;)Ljava/lang/Object;
// parameter final a
L0
LINENUMBER 31 L0
ALOAD 0
INVOKEVIRTUAL org/apache/pekko/stream/impl/Cast$.NotApplied ()Lscala/Function1;
DUP
IFNONNULL L1
POP
ALOAD 1
IFNULL L2
GOTO L3
L1
FRAME SAME1 scala/Function1
ALOAD 1
INVOKEVIRTUAL java/lang/Object.equals (Ljava/lang/Object;)Z
IFEQ L3
L2
FRAME SAME
GETSTATIC scala/Predef$.MODULE$ : Lscala/Predef$;
INVOKEVIRTUAL scala/Predef$.$qmark$qmark$qmark ()Lscala/runtime/Nothing$;
ATHROW
L3
LINENUMBER 32 L3
FRAME SAME
ALOAD 1
INSTANCEOF java/lang/Object
IFEQ L4
ALOAD 1
ARETURN
L4
LINENUMBER 30 L4
FRAME SAME
NEW scala/MatchError
DUP
ALOAD 1
INVOKESPECIAL scala/MatchError.<init> (Ljava/lang/Object;)V
ATHROW
L5
LOCALVARIABLE this Lorg/apache/pekko/stream/impl/Cast$; L0 L5 0
LOCALVARIABLE a Ljava/lang/Object; L0 L5 1
MAXSTACK = 3
MAXLOCALS = 2
// access flags 0x1
// signature <A:Ljava/lang/Object;>(Ljava/lang/Object;)TA;
// declaration: A castWithRef<A>(java.lang.Object)
public castWithRef(Ljava/lang/Object;)Ljava/lang/Object;
// parameter final a
L0
LINENUMBER 36 L0
ALOAD 0
INVOKEVIRTUAL org/apache/pekko/stream/impl/Cast$.NotApplied ()Lscala/Function1;
ASTORE 2
L1
LINENUMBER 37 L1
ALOAD 1
ALOAD 2
IF_ACMPNE L2
L3
LINENUMBER 38 L3
GETSTATIC scala/Predef$.MODULE$ : Lscala/Predef$;
INVOKEVIRTUAL scala/Predef$.$qmark$qmark$qmark ()Lscala/runtime/Nothing$;
ATHROW
L2
LINENUMBER 40 L2
FRAME APPEND [scala/Function1]
ALOAD 1
ARETURN
L4
LOCALVARIABLE sentinel Lscala/Function1; L1 L4 2
LOCALVARIABLE this Lorg/apache/pekko/stream/impl/Cast$; L0 L4 0
LOCALVARIABLE a Ljava/lang/Object; L0 L4 1
MAXSTACK = 2
MAXLOCALS = 3
// access flags 0x1019
public final static synthetic $anonfun$NotApplied$1(Ljava/lang/Object;)Lscala/Function1;
// parameter final x$1
L0
LINENUMBER 27 L0
GETSTATIC org/apache/pekko/stream/impl/Cast$.MODULE$ : Lorg/apache/pekko/stream/impl/Cast$;
INVOKEVIRTUAL org/apache/pekko/stream/impl/Cast$.NotApplied ()Lscala/Function1;
ARETURN
L1
LOCALVARIABLE x$1 Ljava/lang/Object; L0 L1 0
MAXSTACK = 1
MAXLOCALS = 1
// access flags 0x2
private <init>()V
L0
LINENUMBER 25 L0
ALOAD 0
INVOKESPECIAL java/lang/Object.<init> ()V
RETURN
L1
LOCALVARIABLE this Lorg/apache/pekko/stream/impl/Cast$; L0 L1 0
MAXSTACK = 1
MAXLOCALS = 1
// access flags 0x100A
private static synthetic $deserializeLambda$(Ljava/lang/invoke/SerializedLambda;)Ljava/lang/Object;
ALOAD 0
INVOKEDYNAMIC lambdaDeserialize(Ljava/lang/invoke/SerializedLambda;)Ljava/lang/Object; [
// handle kind 0x6 : INVOKESTATIC
scala/runtime/LambdaDeserialize.bootstrap(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;[Ljava/lang/invoke/MethodHandle;)Ljava/lang/invoke/CallSite;
// arguments:
// handle kind 0x6 : INVOKESTATIC
org/apache/pekko/stream/impl/Cast$.$anonfun$NotApplied$1(Ljava/lang/Object;)Lscala/Function1;
]
ARETURN
MAXSTACK = 1
MAXLOCALS = 1
}
|
Do you want me to do this, I have permissions to alter PR branches by adding commits on them, if so then just let me know what the text is. |
Seems like the networking is recovered, let me try. @mdedetrich |
98c37ff
to
80ec105
Compare
// eg: just a simple `IF_ACMPNE`, and you can find the same trick in `CollectWhile` operator. | ||
// If you interest, you can check the associated PR for this change and the | ||
// current implementation of `scala.collection.IterableOnceOps.collectFirst`. | ||
if (result.asInstanceOf[AnyRef] eq Collect.NotApplied) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdedetrich I have updated the text, how do you think about it? you can change it directly, I may not be able to push next time:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me its good, just need to wait for @pjfanning t ore-review this. The code is not formatted so you need to repush though.
80ec105
to
29774b8
Compare
stream/src/main/scala/org/apache/pekko/stream/impl/fusing/CollectWhile.scala
Show resolved
Hide resolved
bench-jmh/src/main/scala/org/apache/pekko/stream/CollectBenchmark.scala
Outdated
Show resolved
Hide resolved
29774b8
to
810a7d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
I mentioned on the forum that pattern matching on the singleton type has the desired effect.
Worth adding that sometimes fetching the value is not free and it's faster to do a type test first:
Apparently, default case also reduces |
@som-snytt On what Scala versions is this? Pekko cross compiles to Scala 2.12, 2.13 and 3.3.1 so unless this optimization is in all of those versions of Scala we still have to use the manual version |
@mdedetrich All versions. Somewhere I just noted that it's in the spec. OK I looked it up: https://scala-lang.org/files/archive/spec/2.13/08-pattern-matching.html#type-patterns see singleton. That is, it's not an optimization, so put that under |
@som-snytt Thanks! @He-Pin Do you want to look into using this pattern + checking bytecode to see if its the same for all scala versions? Feel free to make an issue on it if you don't have time to do it now (its not urgent) |
@He-Pin I'm testing it while I compile my $Work code:) |
Motivation:
As
collect
andcollectWhile
are hot, I think it's worth to do this kind of micro optimization, and these code unlikely to be changed too.Same trick can be find https://github.com/scala/scala/blob/2.13.x/src/library/scala/collection/IterableOnce.scala#L1178
Result:
IF_ACMPEQ
will be a little faster, and reduce anINVOKEVIRTUAL java/lang/Object.equals
first run: