Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
434 lines (344 sloc) 22.3 KB

In this module we study the effect of Spring and Spring Boot on startup time by first stripping out as much of it as we can, and then piling on more features, adding more dependencies and letting Spring Boot figure out the desired configuration. In the table below, the "demo" sample is the canonical, empty Spring Boot web app. All the smaller and faster apps are stripped down versions of that where we remove various things and finally get rid of even using reflection in the BeanFactory (using the new "functional bean registration").

Results:

  • With a couple of outliers we discuss below, the startup time is directly proportional to the number of classes loaded. This usually correlates with the number of beans in the application context, but things like Hibernate and Zuul add more classes than beans.

  • Reflection is not a bottleneck - more work would have to be done with more complex scenarios, but the sample that uses functional bean registration for everything fits the curve of the more heavyweight applications, so it isn’t intrinsically much faster, it just has fewer beans (and therefore fewer features).

  • Condition processing in Spring Boot is not expensive - when we remove conditions partially ("slim", "thin") and completely ("lite", "func") the startup time is right on the curve. The same is true of component scanning, except possibly in the very largest of applications.

  • There is some evidence that most of the cost is associated with JVM overheads, loading and parsing classes (this is supported by other data, for instance on devtools restarts, which are much quicker than cold starts).

Since more beans mean more features, you are paying at startup for actual functionality, so in some ways it should be an acceptable cost. On the other hand, there might be features that you end up never using, or don’t use until later and you would be willing to defer the cost. Spring doesn’t allow you to do that easily. It has some features that might defer the cost of creating beans, but that might not even help if the bean definitions still have to be created and most of the cost is actually to do with loading and parsing classes.

Benchmark   (sample) Mode  Cnt  Score   Error  Units Beans Classes
MainBenchmark  zuul  avgt   10  4.510 ± 0.095   s/op 516   9656
MainBenchmark  erkb  avgt   10  3.137 ± 0.049   s/op 494   7654
MainBenchmark  busr  avgt   10  2.525 ± 0.038   s/op 392   7129
MainBenchmark  erko  avgt   10  2.237 ± 0.071   s/op 313   6392
MainBenchmark  jpaa  avgt   10  3.232 ± 0.057   s/op 257   8297
MainBenchmark  jpag  avgt   10  2.897 ± 0.023   s/op 178   8294
MainBenchmark  jpaf  avgt   10  2.865 ± 0.043   s/op 177   8291
MainBenchmark  jpae  avgt   10  2.829 ± 0.038   s/op 176   8284
MainBenchmark  jpad  avgt   10  2.739 ± 0.073   s/op 175   7946
MainBenchmark  conf  avgt   10  1.636 ± 0.029   s/op 250   6232
MainBenchmark  actj  avgt   10  1.540 ± 0.049   s/op 226   6033
MainBenchmark  actr  avgt   10  1.316 ± 0.060   s/op 186   5666
MainBenchmark  jdbc  avgt   10  1.237 ± 0.050   s/op 147   5625
MainBenchmark  demo  avgt   10  1.056 ± 0.040   s/op 111   5266
MainBenchmark  slim  avgt   10  1.003 ± 0.011   s/op 105   5208
MainBenchmark  thin  avgt   10  0.855 ± 0.028   s/op 60    4892
MainBenchmark  lite  avgt   10  0.694 ± 0.015   s/op 30    4580
MainBenchmark  func  avgt   10  0.652 ± 0.017   s/op 25    4378
pubchart?oid=88442446&format=image
Figure 1. Number of Classes vs. Startup Time


pubchart?oid=2090464856&format=image
Figure 2. Number of Beans vs. Startup Time


Legend:

  • Zuul: same as "busr" sample but with Zuul proxy

  • Jpad: same as "demo" sample but with 1 JPA entity (and 0 repositories)

  • Jpae: same as "demo" sample but with 1 JPA entity (and 1 repository)

  • Jpaf: same as "demo" sample but with 2 JPA entities (and 2 repositories)

  • Jpag: same as "demo" sample but with 3 JPA entities (and 3 repositories)

  • Jpaa: same as "actr" sample but with 3 JPA entities (and 3 repositories)

  • Erkb: same as "busr" sample but with Eureka client

  • Busr: same as "conf" but adds Spring Cloud Bus and Rabbit

  • Erko: same as "actr" sample but with Eureka client (disabled)

  • Conf: same as "actr" sample plus config client

  • Actj: same as "actr" sample plus JDBC

  • Actr: same as "demo" sample plus Actuator

  • Jdbc: same as "demo" sample plus JDBC

  • Demo: vanilla Spring Boot MVC app with one endpoint (no Actuator)

  • Slim: same thing but explicitly @Imports all configuration

  • Thin: reduce the @Imports down to a set of 4 that are needed for the endpoint

  • Lite: copy the imports from "thin" and make them into hard-coded, unconditional configuration

  • Func: extract the configuration methods from "lite" and register bits of it using the function bean API

N.B. The "thin" sample has @EnableWebMvc (implicitly), but "lite" and "func" pulled the relevant features of that out into a separate class (so a few beans were dropped).

Outliers

Only 2 samples didn’t fit the trend for classes vs. startup. One starts up slower (Sleuth) and one faster (Erka). The JPA and Zuul samples are outliers for the beans vs. startup correlation, so we include those here again.

Benchmark   (sample) Mode  Cnt  Score   Error  Units Beans Trend Delta Classes
MainBenchmark  slth  avgt   10  5.110 ± 0.065   s/op 453    2762  2403    7674
MainBenchmark  erka  avgt   10  2.183 ± 0.076   s/op 287    2760  -577    7893
MainBenchmark  jpad  avgt   10  2.739 ± 0.073   s/op 175    1385  1354    7946
MainBenchmark  jpae  avgt   10  2.829 ± 0.038   s/op 176    1390  1439    8284
MainBenchmark  jpaf  avgt   10  2.865 ± 0.043   s/op 177    1396  1469    8291
MainBenchmark  jpag  avgt   10  2.897 ± 0.023   s/op 178    1401  1496    8294
MainBenchmark  jpaa  avgt   10  3.232 ± 0.057   s/op 257    1814  1418    8297
MainBenchmark  zuul  avgt   10  4.510 ± 0.095   s/op 516    3080  1430    9596


Legend:

  • Slth: same as "busr" sample but with Sleuth

  • Erka: same as "actr" sample but with Eureka client

The "Trend" number is the best fit prediction of the startup time from the number of classes (or beans for the JPA samples), taken from the non-outlier data. "Delta" is the difference between the actual startup time and the trend value (so it is the extra cost of the features being added).

Eureka

The "erka" sample started up faster then predicted, but it also has a suspiciously large number of loaded classes (even more classes than with Eureka and Bus). The loaded classes measurements are not stable - you get different answers from run to run - but they don’t usually fluctuate by enough to explain the difference here.

Sleuth

Here’s an explanation for the "slth" result. Spring processes @annotation matchers in @Pointcuts extremely inefficiently, so the startup time scales with the number of pointcuts with @annotations, not so much the number of beans. If the pointcuts are driving it (as suggested by results in these aspectj benchmarks), then the 4 pointcuts with @annotation matchers would be costing 2403ms or around 600ms each, which is horrendous but consistent with the aspectj benchmarks.

With AspectJ 1.8.13

Benchmark           (sample)  Mode  Cnt  Score   Error  Units  Beans  Classes
MainBenchmark.main      slth    ss   10  4.002 ± 0.113   s/op  450    8358

(Makes a huge difference, but still slower than the trend.)

JPA

Hibernate fixed startup cost is about 1300ms (the "delta" on "jpad"), which more or less doubles the startup time for a JPA app compared to the vanilla "demo". Spring Data JPA repository creation seems to have a fixed cost of about 90ms, which isn’t nothing but isn’t very large in comparison. Adding repositories and entities might cost something, but it isn’t a lot - the best estimate would be about 30ms per entity from these data (these were very basic, vanilla JpaRepositories, so maybe it would be more for more complex requirements). The JPA samples (and even Zuul) are a pretty good fit for number of classes loaded versus startup time, so Hibernate isn’t doing a lot of intensive stuff beyond forcing a lot of classes to be loaded.

Jackson

We can’t easily exclude Jackson from all the sample, but anything that doesn’t use the Actuator can be run with and without to see the difference. Here’s the vanilla "demo" sample

Benchmark           (sample)  Mode  Cnt  Score   Error  Units
SnapBenchmark.snap      demo    ss   10  1.150 ± 0.076   s/op

and with exclusions.spring-boot-jackson=org.springframework.boot:spring-boot-starter-json in thin.properties:

Benchmark           (sample)  Mode  Cnt  Score   Error  Units
SnapBenchmark.snap      demo    ss   10  1.069 ± 0.036   s/op

So that’s probably worth having.

Hibernate Validator

Further excluding Hibernate Validator with exclusions.hibernate-validator=org.hibernate.validator:hibernate-validator:

Benchmark           (sample)  Mode  Cnt  Score   Error  Units
SnapBenchmark.snap      demo    ss   10  1.014 ± 0.027   s/op

Running the Benchmarks

TL;DR: You need Java 8.

$ ../mvnw clean install
$ java -jar target/benchmarks.jar

and the whole suite takes quite a long time to run, so to just try it out quickly, it’s best to cherry pick some specific samples, e.g.

$ java -jar target/benchmarks.jar main -p sample=actr

There are 4 groups of benchmarks:

  1. MainBenchmark - add features to the "main" demo by manipulating the classpath

  2. StripBenchmark - "slim", "thin", "lite", "func" - stripping away from the "main" demo by hardcoding config

  3. OldBenchmark - same as MainBenchmarks but with Spring Boot 1.5.6.

  4. SnapBenchmark - same as MainBenchmarks but with Spring Boot 2.0.0 snapshots (and a restricted set of samples, "empt", "demo", "actr", "jdbc").

The JMH benchmarks are mostly just named after the class (so StripBenchmarks are all called "strip") but they have a @Param called "sample" whose value is the name of the sample. They can be run individually or as a group using a comma-separated list of sample names, e.g:

$ java -jar target/benchmarks.jar strip -p sample=func,slim

or altogether as

$ java -jar target/benchmarks.jar strip

Also to get decent results from the erk* samples you need Eureka running locally on port 8761. You can do that with the Spring Boot CLI (for example):

$ spring install org.springframework.cloud:spring-cloud-cli:1.3.4.RELEASE
$ spring cloud eureka

Eclipse J9

J9 is the IBM JVM, which they open sourced and is now available also as Eclipse J9. The benchmarks are tuned to use different command line optimizations depending on the JVM in use. Here’s a comparison between the regular OpenJDK Hotspot and the OpenJDK Eclipse J9 (still JDK 1.8) build:

Benchmark           (sample)  Mode  Cnt  Score   Error  Units JVM
MainBenchmark.main      demo    ss   10  1.171 ± 0.044   s/op 8u152-zulu
MainBenchmark.main      demo    ss   10  1.015 ± 0.116   s/op 8u152-openj9

Eclipse J9 is about 10% faster than HotSpot, probably owing to the ability to cache class data between runs (which is switched on by default in the benchmarks but not in general).

Java 10

Java 10 is quite a bit slower than Java 8, but you can get back most or all of the difference by switching on Class Data Sharing (CDS):

Benchmark           (sample)  Mode  Cnt  Score   Error  Units  JVM
OldBenchmark.old        demo    ss   10  1.070 ± 0.031   s/op  8u152-zulu
OldBenchmark.old        actr    ss   10  1.370 ± 0.042   s/op  8u152-zulu
MainBenchmark.main      demo    ss   10  1.128 ± 0.044   s/op  8u152-zulu
MainBenchmark.main      actr    ss   10  1.554 ± 0.068   s/op  8u152-zulu
OldBenchmark.old        demo    ss   10  1.155 ± 0.035   s/op  jdk-10
OldBenchmark.old        actr    ss   10  1.432 ± 0.043   s/op  jdk-10
MainBenchmark.main      demo    ss   10  1.195 ± 0.061   s/op  jdk-10
MainBenchmark.main      actr    ss   10  1.605 ± 0.060   s/op  jdk-10
CdsBenchmark.main       demo    ss   10  0.912 ± 0.051   s/op  jdk-10
CdsBenchmark.main       actr    ss   10  1.286 ± 0.044   s/op  jdk-10
CdsBenchmark.old        demo    ss   10  0.875 ± 0.050   s/op  jdk-10
CdsBenchmark.old        actr    ss   10  1.134 ± 0.032   s/op  jdk-10

Spring Boot 1.5 still wins all the races though (the "old" benchmarks above).

Other versions of Java

Benchmark           (sample)  Mode  Cnt  Score   Error  Units JVM
MainBenchmark.main      demo    ss   10  1.171 ± 0.044   s/op 8u152-zulu
MainBenchmark.main      demo    ss   10  1.015 ± 0.116   s/op 8u152-openj9
MainBenchmark.main      demo    ss   10  1.253 ± 0.076   s/op OpenJDK10
MainBenchmark.main      demo    ss   10  1.280 ± 0.066   s/op 9.0.4-zulu

Lazy Beans

There’s a bean factory post processor in Spring Boot issue 9685 that makes all beans lazy by default. It’s quite interesting to see what happens if we add that to our sample applications:

Benchmark           (sample)  Mode  Cnt  Score   Error  Units Classes
MainBenchmark.main      empt    ss   10  0.477 ± 0.018   s/op 3233
MainBenchmark.main      jlog    ss   10  0.811 ± 0.016   s/op 4374
MainBenchmark.main      demo    ss   10  0.913 ± 0.035   s/op 5463
MainBenchmark.main      flux    ss   10  0.885 ± 0.030   s/op 5325
MainBenchmark.main      actr    ss   10  1.241 ± 0.030   s/op 6225
MainBenchmark.main      jdbc    ss   10  1.001 ± 0.033   s/op 5618
MainBenchmark.main      actj    ss   10  1.388 ± 0.062   s/op 6432
MainBenchmark.main      jpae    ss   10  1.994 ± 0.055   s/op 8824
MainBenchmark.main      conf    ss   10  1.599 ± 0.118   s/op 6711
MainBenchmark.main      erka    ss   10  1.819 ± 0.045   s/op 6804
MainBenchmark.main      busr    ss   10  2.431 ± 0.068   s/op 7721
MainBenchmark.main      zuul    ss   10  3.029 ± 0.086   s/op 8348
MainBenchmark.main      erkb    ss   10  2.886 ± 0.107   s/op 8083
MainBenchmark.main      slth    ss   10  3.127 ± 0.041   s/op 8314

c.f. the non-lazy results for the same samples:

Benchmark           (sample)  Mode  Cnt  Score   Error  Units Classes Lazy Premium
MainBenchmark.main      empt    ss   10  0.578 ± 0.028   s/op 3703    17.47%
MainBenchmark.main      jlog    ss   10  0.953 ± 0.012   s/op 4647    14.90%
MainBenchmark.main      demo    ss   10  1.105 ± 0.049   s/op 5808    17.38%
MainBenchmark.main      flux    ss   10  0.988 ± 0.038   s/op 5726    10.43%
MainBenchmark.main      actr    ss   10  1.542 ± 0.043   s/op 6692    19.52%
MainBenchmark.main      jdbc    ss   10  1.281 ± 0.045   s/op 6068    21.86%
MainBenchmark.main      actj    ss   10  1.819 ± 0.191   s/op 6953    23.69%
MainBenchmark.main      jpae    ss   10  2.003 ± 0.053   s/op 8824    0.45%
MainBenchmark.main      conf    ss   10  1.948 ± 0.097   s/op 7216    17.92%
MainBenchmark.main      erka    ss   10  2.703 ± 0.106   s/op 8909    32.70%
MainBenchmark.main      busr    ss   10  3.111 ± 0.157   s/op 8282    21.86%
MainBenchmark.main      zuul    ss   10  3.834 ± 0.086   s/op 9325    21.00%
MainBenchmark.main      erkb    ss   10  4.026 ± 0.119   s/op 10176   28.32%
MainBenchmark.main      slth    ss   10  4.066 ± 0.073   s/op 8901    23.09%

and the same thing for the Petclinic:

Benchmark                              (sample)  Mode  Cnt  Score   Error  Units Classes Lazy Premium
PetclinicLatestBenchmark.noverify        (lazy)  avgt   10  3.495 ± 0.059   s/op 9687    25.80%
PetclinicLatestBenchmark.explodedJarMain (lazy)  avgt   10  3.023 ± 0.092   s/op 10644   27.40%
PetclinicLatestBenchmark.noverify        none    avgt   10  4.710 ± 0.053   s/op 11099
PetclinicLatestBenchmark.explodedJarMain none    avgt   10  4.164 ± 0.068   s/op 12132

New Data

With 2.1.0 snapshots after M4:

Benchmark           (sample)  Mode  Cnt  Score   Error  Units
SnapBenchmark.endp       N/A    ss   10  1.447 ± 0.038   s/op
SnapBenchmark.snap      empt    ss   10  0.572 ± 0.012   s/op
SnapBenchmark.snap      demo    ss   10  1.063 ± 0.023   s/op
SnapBenchmark.snap      actr    ss   10  1.447 ± 0.019   s/op
SnapBenchmark.snap      jdbc    ss   10  1.207 ± 0.023   s/op
SnapBenchmark.snap      actj    ss   10  1.616 ± 0.030   s/op
SnapBenchmark.snap      jpae    ss   10  2.062 ± 0.093   s/op
SnapBenchmark.snap      conf    ss   10  1.979 ± 0.183   s/op
MainBenchmark.main      demo    ss   10  1.147 ± 0.023   s/op
StripBenchmark.strip    slim    ss   10  1.083 ± 0.031   s/op
StripBenchmark.strip    thin    ss   10  0.919 ± 0.019   s/op

And before M1:

Benchmark           (sample)  Mode  Cnt  Score   Error  Units
SnapBenchmark.endp       N/A    ss   10  1.478 ± 0.110   s/op
SnapBenchmark.snap      empt    ss   10  0.604 ± 0.032   s/op
SnapBenchmark.snap      demo    ss   10  1.067 ± 0.050   s/op
SnapBenchmark.snap      actr    ss   10  1.417 ± 0.036   s/op
SnapBenchmark.snap      jdbc    ss   10  1.245 ± 0.144   s/op
SnapBenchmark.snap      actj    ss   10  1.606 ± 0.074   s/op
SnapBenchmark.snap      jpae    ss   10  2.013 ± 0.062   s/op
SnapBenchmark.snap      conf    ss   10  1.792 ± 0.033   s/op

after M1:

Benchmark           (sample)  Mode  Cnt  Score   Error  Units
SnapBenchmark.endp              ss   10  1.425 ± 0.036   s/op
SnapBenchmark.snap      empt    ss   10  0.570 ± 0.013   s/op
SnapBenchmark.snap      demo    ss   10  1.039 ± 0.014   s/op
SnapBenchmark.snap      actr    ss   10  1.417 ± 0.024   s/op
SnapBenchmark.snap      jdbc    ss   10  1.191 ± 0.028   s/op
SnapBenchmark.snap      actj    ss   10  1.601 ± 0.052   s/op
SnapBenchmark.snap      jpae    ss   10  2.017 ± 0.055   s/op
SnapBenchmark.snap      conf    ss   10  1.874 ± 0.071   s/op
Benchmark           (sample)  Mode  Cnt  Score   Error  Units
AutoBenchmark.auto      empt    ss   10  0.485 ± 0.024   s/op
AutoBenchmark.auto      demo    ss   10  0.864 ± 0.033   s/op
AutoBenchmark.auto      actr    ss   10  1.156 ± 0.060   s/op
AutoBenchmark.auto      jdbc    ss   10  0.953 ± 0.063   s/op
AutoBenchmark.auto      actj    ss   10  1.253 ± 0.036   s/op
AutoBenchmark.auto      conf    ss   10  1.552 ± 0.044   s/op

Spring Boot 2.0.0 snapshots (before RC2):

Benchmark             (sample)  Mode  Cnt  Score   Error  Units Beans Classes
StripBenchmark.strip      slim    ss   10  1.102 ± 0.041   s/op 107   5754
StripBenchmark.strip      thin    ss   10  0.941 ± 0.034   s/op 62    5444
StripBenchmark.strip      lite    ss   10  0.767 ± 0.021   s/op 30    5094
StripBenchmark.strip      func    ss   10  0.718 ± 0.010   s/op 26    5030

Even in the "lite" and "func" samples, where all the beans are hard coded (no scanning, no autoconfig, no condition evaluation), Boot 2.0 loads way more classes.

Old Data

(Boot 1.5.4 without -noverify)

sample configs beans startup(millis)

slth

176

460

5366

zuul

181

495

4336

busr

151

389

2758

erka

127

310

2423

conf

100

245

1779

actr

72

183

1430

demo

32

108

1154

slim

31

103

1112

thin

14

60

968

lite

4

30

813

func

1

25

742

(Boot 1.5.6, 2.0.0.M3 and 2.0.0.BUILD-SNAPSHOT)

Benchmark               (sample)  Mode  Cnt  Score   Error  Units  Beans  Classes
OldBenchmark.old            empt  avgt   10  0.738 ± 0.031   s/op  23     3031
OldBenchmark.old            demo  avgt   10  1.623 ± 0.069   s/op  109    4965
OldBenchmark.old            actr  avgt   10  2.098 ± 0.093   s/op  187    5384
OldBenchmark.old            jdbc  avgt   10  1.920 ± 0.083   s/op  140    5280
OldBenchmark.old            actj  avgt   10  2.417 ± 0.123   s/op  222    5715
OldBenchmark.old            jpae  avgt   10  2.536 ± 0.124   s/op  165    6841
OldBenchmark.old            conf  avgt   10  2.639 ± 0.146   s/op  251    5906
OldBenchmark.old            erka  avgt   10  2.960 ± 0.101   s/op  294    6077
OldBenchmark.old            busr  avgt   10  3.555 ± 0.125   s/op  370    6443
OldBenchmark.old            zuul  avgt   10  4.736 ± 0.507   s/op  433    6922
OldBenchmark.old            erkb  avgt   10  4.519 ± 0.365   s/op  434    6889
OldBenchmark.old            slth  avgt   10  7.331 ± 0.186   s/op  444    7058
MainBenchmark.main          empt  avgt   10  0.848 ± 0.059   s/op  22     3271
MainBenchmark.main          demo  avgt   10  1.773 ± 0.074   s/op  112    5360
MainBenchmark.main          actr  avgt   10  2.204 ± 0.121   s/op  187    5756
MainBenchmark.main          jdbc  avgt   10  2.081 ± 0.082   s/op  147    5625
MainBenchmark.main          actj  avgt   10  2.508 ± 0.091   s/op  226    6033
MainBenchmark.main          jpae  avgt   10  2.807 ± 0.100   s/op  176    8284
MainBenchmark.main          conf  avgt   10  2.781 ± 0.159   s/op  350    6232
MainBenchmark.main          erka  avgt   10  3.311 ± 0.407   s/op  294    6491
MainBenchmark.main          busr  avgt   10  3.777 ± 0.102   s/op  392    7129
MainBenchmark.main          zuul  avgt   10  4.758 ± 0.113   s/op  516    9656
MainBenchmark.main          erkb  avgt   10  4.773 ± 0.105   s/op  494    7654
MainBenchmark.main          slth  avgt   10  7.926 ± 0.197   s/op  453    7674
StripBenchmark.strip        func  avgt   10  1.112 ± 0.032   s/op  25     4378
StripBenchmark.strip        lite  avgt   10  1.205 ± 0.076   s/op  30     4580
StripBenchmark.strip        slim  avgt   10  1.743 ± 0.099   s/op  105    5208
StripBenchmark.strip        thin  avgt   10  1.501 ± 0.071   s/op  60     4892
SnapBenchmark.endp           N/A  avgt   10  2.515 ± 0.509   s/op  199    5838
SnapBenchmark.snap          empt  avgt   10  0.969 ± 0.123   s/op  22     3269
SnapBenchmark.snap          demo  avgt   10  1.880 ± 0.205   s/op  112    5356
SnapBenchmark.snap          actr  avgt   10  2.296 ± 0.101   s/op  198    5833
SnapBenchmark.snap          jdbc  avgt   10  2.136 ± 0.117   s/op  148    5716

Laptop (carbon)

Benchmark   (sample) Mode  Cnt  Score   Error  Units
MainBenchmark  demo  avgt   10  1.697 ± 0.081   s/op
MainBenchmark  slim  avgt   10  1.673 ± 0.098   s/op
MainBenchmark  thin  avgt   10  1.446 ± 0.061   s/op
MainBenchmark  lite  avgt   10  1.203 ± 0.072   s/op
MainBenchmark  func  avgt   10  1.150 ± 0.056   s/op