Scala
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
project
src/main/scala
.gitignore
README.md
baseline.assebly.txt
build.sbt

README.md

#Imp vs Implicitly I've made this repo in order to benchmark https://github.com/non/imp that claims to implement a faster implicitly.

I'm trying to check it's particular claim that implicitly[ClassTag[Int]] is slower than imp[ClassTag[Int]].

##Benchmarking framework used I do it by using the best benchmarking tool available for JVM: http://openjdk.java.net/projects/code-tools/jmh/

This is the state-of-the art tool avaliable for JVM. In this [thread] (https://groups.google.com/forum/#!topic/mechanical-sympathy/m4opvy4xq3U) you can see many problems that make most of benchmarking frameworks unreliable or simply wrong for that kind of benchmark.

Result

I've checked. And what I can say is that on Oracle HotSpot version "1.8.0_72"(x86_64 ubuntu i7-4770) it's not. Actually, both of them get optimized by JVM to be a simple load of same constant.

[info] Benchmark                       Mode  Cnt  Score   Error  Units
[info] ImplVsImplicitly.baseline       avgt   30  2.507 ± 0.031  ns/op
[info] ImplVsImplicitly.baseline:·asm  avgt         NaN            ---
[info] ImplVsImplicitly.measure        avgt   30  2.491 ± 0.044  ns/op
[info] ImplVsImplicitly.measure:·asm   avgt         NaN            ---

Run it yourself

You can re-run those benchmarks by executing in sbt promt:

 jmh:run -i 10 -wi 10 -f3 -t 1

Runtime low-level details

I've had a look on assembly generated for both methods. The generated assembly was the same. Hotspot was able to optimize both of them to a single read of the very same constant object.

baseline.assembly.txt contains the assembly for the implicitly[ClassTag[Int]] where you can see that hotspot was able to get rid of all the overheads.

Most of those 2.5 ns are actually taken by benchmark framework itself. So, to be entirelly honest, the numbers that I showed earlier do not show anything. As those are simply benchmarking overheads of the jmh itself. What is a good indicator is the assembly printout iself: it shows that hotspot was able to remove static loads and virtual calls in this example.

In order to reproduce this output you would need to run on Linux with -prof perfasm. You would also need a jvm that has dissassembler plugin installed. Setting it up is quite involved, and you'll likely need to build it from source of your version of JDK. Google for hsdis for more details.

Bytecode difference

There is a small difference in the size of generated method: implicitly[ClassTag[Int]] becomes

     0: getstatic     #13                 // Field scala/Predef$.MODULE$:Lscala/Predef$;
     3: getstatic     #18                 // Field scala/reflect/ClassTag$.MODULE$:Lscala/reflect/ClassTag$;
     6: invokevirtual #21                 // Method scala/reflect/ClassTag$.Int:()Lscala/reflect/ClassTag;
     9: invokevirtual #25                 // Method scala/Predef$.implicitly:(Ljava/lang/Object;)Ljava/lang/Object;
    12: checkcast     #27                 // class scala/reflect/ClassTag

while imp[ClassTag[Int]] becomes

     0: getstatic     #13                 // Field scala/reflect/ClassTag$.MODULE$:Lscala/reflect/ClassTag$;
     3: invokevirtual #16                 // Method scala/reflect/ClassTag$.Int:()Lscala/reflect/ClassTag;
     6: areturn

This may lead to methods containing many calls to implicitly being less likely to be inlined, but it would argue this is very uncommon.

Contributions

I'd be very glad to see a PR that would prove me wrong. I'll learn something this way :-).

@lloydmeta has contributed results from OSX (El Capitan), Oracle Java8 JDK 1.8.0_05:

[info] Benchmark                  Mode  Cnt  Score   Error  Units
[info] ImplVsImplicitly.baseline  avgt   30  3.268 ± 0.025  ns/op
[info] ImplVsImplicitly.explicit  avgt   30  3.292 ± 0.036  ns/op
[info] ImplVsImplicitly.imply     avgt   30  3.291 ± 0.039  ns/op

He's running a bit more of code by computing 1 + 1 in a generic way. I didin't get assembly printout for his modification, but I assume that HotSpot was able to constant-fold the whole expression in a similar way.