The java buildpack in Cloud Foundry calculates the memory settings for a java process. It has a hard job because it only has one input (the container memory limit) and it needs to come up with at least 5 numbers for the JVM. To do this it uses a standalone memory calculator program. We downloaded the memory calculator and used it to drive some tests on memory usage in a Spring Boot application.

Here are the command line options generated by the default settings with some typical container memory limits:

Container	-Xmx (Heap)	-XX:MaxMetaspaceSize (Metaspace)	-Xss (Stack)
128m	54613K	64M	568K
256m	160M	64M	853K
512m	382293K	64M	995K
1g	768M	104857K	1M

Example command line:

$ java-buildpack-memory-calculator-linux -memorySizes='metaspace:64m..' -memoryWeights=heap:75,metaspace:10,native:10,stack:5 -memoryInitials=heap:100%,metaspace:100% -totMemory=128m
-Xmx54613K -XX:MaxMetaspaceSize=64M -Xss568K -Xms54613K -XX:MetaspaceSize=64M

By default the initial metaspace and heap are identical with the maximum sizes (hence the -memoryInitials in the command line). The default arguments for the java-buildpack-memory-calculator-linux come from a config file in the buildpack.

The "native" value in the memory calculator is a margin for memory used by the process that isn't explicitly accounted for by metaspace or stack. Some of it will go to class loader and JIT caches at runtime and some of it will be untraceable from within the JVM (seems to be correlated with the amount of JAR data on the classpath).

For the Impatient

For the impatient here is a quick summary of the analysis. Java applications can barely run in a container memory of 128m in Cloud Foundry, and loading JAR files is enough of a burden on memory to kill some apps that are on the boundary. By default Spring Boot apps are much happier with 512m or 1g of container memory even if they don't need it, but you can tweak the command line up to a point.

It would be nice if the default memory calculation would change, but until it does you have two things that might help: 1) shading the jar uses a less memory at runtime, but it only really matters for the small containers, and it makes it much less efficient to upload; 2) configuring the buildpack using a custom command to allow the app a bigger "native" margin.

The calculation of the stack memory is idiosyncratic and probably should be improved.

Running the Freemarker Sample

The Freemarker sample in Spring Boot is a decent representative of a small, but active Spring Boot application. It does server side rendering with Freemarker, so it's not just serving static content, but it doesn't need a lot of memory. Locally, it is more than happy to run in a heap of 32MB, and you can squeeze it down to 24MB before you start to see garbage collection affecting performance.

We ran that app in Cloud Foundry (PWS) with a range of different parameters, trying to see where it runs and where it crashes. When it crashes it is almost always because the container manager killed it for using too much memory. Garbage collection can contribute to slow startup times if you squeeze the heap, but it is never so slow that the platform kills it because of a timeout. Here's a memory-oriented summary of the experiments (MB unless stated otherwise):

Heap	Metaspace	Stack	Native	Limit	Start(sec)	Key
23	64	11	72	170	9	A
55	64	18	-9	128	-	B
160	64	28	4	256	-	C
382	64	35	31	512	8	D
768	104	35	117	1024	4	E
53	64	11	128	256	3	M

The memory settings above are all are generated by the java-buildpack-memory-calculator-linux tool. It provides settings for heap, metaspace and stack size, and in the table we convert the latter to a total stack by assuming there are 35 threads. The "native" value is the balance of the container limit that isn't explicitly claimed by the JVM (so when it is small or negative the app may fail to start, as in experiments "B" and "C").

Experiment "A" is the smallest viable settings we found using the build pack to calculate the memory (but tweaking the inputs) for the freemarker sample launched with the default Spring Boot tooling. In this experiment the "native" memory is explicitly constrained to force the buildpack to allow enough memory for off-heap usage in the JVM. Additionally the thread count is fixed to ensure that the stack size can be explicitly constrained (if you let it take its default values the stack sizes end up rather large). Experiment B has the default settings for a 128m container memory limit, and it is doomed to fail because it needs too much memory for its stack (the native memory margin is -9MB). Experiment C has the default settings for a 256m container limit. It also fails to start because it only has 4m left of "native" memory once all it's threads get going. Experiments D and E are the only ones that run successfully with the default buildpack memory settings (512m and 1g container limit respectively). Experiment D is pretty close to the bone with the native memory margin, and has over allocated stack space.

If you want to verify the memory settings for each experiment, the memory calculator can be run with parameters as follows:

Key	memoryWeights	memorySizes	stackThreads	Notes
A	heap:40,metaspace:10,native:10,stack:20	metaspace:64..,native:72m..	35	Smallest viable
B	heap:75,metaspace:10,native:10,stack:5	metaspace:64..		Default for 128m
C	heap:75,metaspace:10,native:10,stack:5	metaspace:64..		Default for 256m
D	heap:75,metaspace:10,native:10,stack:5	metaspace:64..		Default for 512m
E	heap:75,metaspace:10,native:10,stack:5	metaspace:64..		Default for 1024m
M	heap:40,metaspace:10,native:10,stack:10	metaspace:64..,native:128m	35	Comfortable 256m

The Stack

The "Stack" values in the summary are calculated by assuming there will be 35 threads in a running app under load, and that each one uses an amount of memory specified in the -Xss argumemt to the JVM. If the number of threads sounds high, bear in mind that 14 of those threads are started by the JVM itself and are nothing to do with the app. The -Xss values are in turn calculated by the buildpack, and vary quite widely from about 300K to about 1M. There is no evidence that this serves any purpose, and indeed the stack size needed by an app depends more on the libraries and languages it uses than the size of overall memory consumption, e.g. Groovy uses large stacks. A stack of 256K would have been perfectly adequate for this application, and it would have been useful to be able to configure it that way, independent of the container memory limit.

Using A Custom Command

Spring Boot uses a generic main class called JarLauncher which deals with nested jars on the classpath, and there is some overhead associated with that, mainly in heap usage, but also a little bit of extra time to process the archives. If heap is scarce then it can slow things down dramatically, but if there is plenty of heap available on startup it will perform quite well. In Cloud Foundry the archive is exploded so we might think about using a different main class and see whether that can help speed up the start up at all.

The default command for a Spring Boot app is this:

$PWD/.java-buildpack/open_jdk_jre/bin/java -cp $PWD/.:$PWD/.java-buildpack/spring_auto_reconfiguration/spring_auto_reconfiguration-1.10.0_RELEASE.jar -Djava.io.tmpdir=$TMPDIR -XX:OnOutOfMemoryError=$PWD/.java-buildpack/open_jdk_jre/bin/killjava.sh $CALCULATED_MEMORY -Djava.security.egd=file:/dev/./urandom -verbose:gc org.springframework.boot.loader.JarLauncher

where $CALCULATED_MEMORY is the result of the java-buildpack-memory-calculator-linux command with default parameters as listed above (and documented in the buildpack).

If we can use a custom command we need to fix the memory explicitly, and at the same time we can run the main class directly instead of through the indirect (and slightly memory hungry) JarLauncher:

$PWD/.java-buildpack/open_jdk_jre/bin/java -cp $PWD/.:$PWD/lib/*:$PWD/.java-buildpack/spring_auto_reconfiguration/spring_auto_reconfiguration-1.10.0_RELEASE.jar -Djava.io.tmpdir=$TMPDIR -XX:OnOutOfMemoryError=$PWD/.java-buildpack/open_jdk_jre/bin/killjava.sh -XX:MaxMetaspaceSize=64M -Xss568K -Xmx54613K -Xms54613K -XX:MetaspaceSize=64M -Djava.security.egd=file:/dev/./urandom -verbose:gc sample.freemarker.SampleWebFreeMarkerApplication

This works with 256m (startup 3s) and fails with a 128m container limit, which is unsurprising given what we know about the native memory margin.

There are some things you can do to squeeze the app into 128m. For a start you can use initial values for heap and metaspace that are less than the maximum. We also found that some of the more obscure JVM flags do help: i.e. -XX:CompressedClassSpaceSize and -XX:ReservedCodeCacheSize, so this works in 128m even with the default JarLauncher:

-XX:MetaspaceSize=20M -XX:MaxMetaspaceSize=38M -Xss256K -Xms16M -Xmx32M -XX:CompressedClassSpaceSize=8M -XX:ReservedCodeCacheSize=4M

An other option which might avoid the overhead of JarLauncher and doesn't require a custom command would be to use the PropertiesLauncher (documented in the Spring Boot user guide). It still has to read the JAR files though, unless you use a customized assembly, so it is unlikely in practice to be much of a help.

Shading

Shading is a (technically poor) alternative to the Spring Boot tooling for creating executable jars, merging all the dependencies into a common root directory. It results in very slow uploads to Cloud Foundry because of the way the CLI interacts with the platform, but it doesn't require the JVM to open any JAR files, so it might give better memory performance on startup.

NOTE: Shaded versions of Spring Boot jars are easy to make using Maven if you use the spring-boot-starter-parent. You need to add a <start-class/> property to point to the main class, and swap the maven-shade-plugin for the spring-boot-maven-plugin.

This one started in 128m and then crashed under load:

-XX:MaxMetaspaceSize=40M -Xss256K -Xms16M -Xmx24M -XX:MetaspaceSize=20M

It bounces on startup, and interestingly has already passed a health check when it is killed with "out of memory":

...
2016-01-07T07:51:07.25+0000 [APP/0]      OUT 2016-01-07 07:51:07.252  INFO 11 --- [           main] s.f.SampleWebFreeMarkerApplication       : Started SampleWebFreeMarkerApplication in 7.234 seconds (JVM running for 7.573)
2016-01-07T07:51:07.28+0000 [HEALTH/0]   OUT healthcheck passed
2016-01-07T07:51:07.29+0000 [HEALTH/0]   OUT Exit status 0
2016-01-07T07:51:07.29+0000 [CELL/0]     OUT Container became healthy
2016-01-07T07:51:09.67+0000 [CELL/0]     OUT Exit status 255
2016-01-07T07:51:10.03+0000 [APP/0]      OUT Exit status 255
2016-01-07T07:51:10.08+0000 [API/4]      OUT App instance exited with guid cfa61cd1-348f-434c-8e96-6de4a5e89b63 payload: {"instance"=>"00b640b3-3777-4d16-5dea-06f8ceddc36e", "index"=>0, "reason"=>"CRASHED", "exit_description"=>"2 error(s) occurred:\n\n* 2 error(s) occurred:\n\n* Exited with status 255 (out of memory)\n* cancelled\n* cancelled", "crash_count"=>1, "crash_timestamp"=>1452153070046693621, "version"=>"20e712ca-d7ca-4be2-9adc-3810458d1fdc"}
...

it tries again (without re-staging) and starts more quickly:

...
2016-01-07T07:51:20.41+0000 [APP/0]      OUT 2016-01-07 07:51:20.418  INFO 14 --- [           main] s.f.SampleWebFreeMarkerApplication       : Started SampleWebFreeMarkerApplication in 3.242 seconds (JVM running for 3.623)
2016-01-07T07:51:20.56+0000 [HEALTH/0]   OUT healthcheck passed
2016-01-07T07:51:20.58+0000 [HEALTH/0]   OUT Exit status 0
2016-01-07T07:51:20.58+0000 [CELL/0]     OUT Container became healthy
...

ASIDE: One hypothesis we came up with for this behaviour is that the file system is buffering JAR files as they are read, and although that memory is not needed by the process once the classes are loaded, it is not always returned to the OS promptly. In fact whether or not it is returned is essentially random and depends on the total load on the host, and not anything that you can control from within the container.

This one was the same but continued to run under load:

-XX:MaxMetaspaceSize=32M -Xss256K -Xms16M -Xmx24M -XX:MetaspaceSize=20M

The unshaded jar runs with these parameters in a 140m container but not 128m. Furthermore -Dsun.zip.disableMemoryMapping=true didn't help, neither did MALLOC_ARENA_MAX: 4. It also runs in 128m with the additional -XX parameters already listed above, i.e.

-XX:MetaspaceSize=20M -XX:MaxMetaspaceSize=38M -Xss256K -Xms16M -Xmx32M -XX:CompressedClassSpaceSize=8M -XX:ReservedCodeCacheSize=4M

A final data point: the shaded jar also runs fine with a heap of 70M (as long as the -Xms is set lower).

Simulating Container Behaviour

We can attempt to simulate the behaviour of the container using ulimit (a bash primitive that works in Linux and OSX). Example:

$ ulimit -m 128000
$ java -XX:MetaspaceSize=20M -XX:MaxMetaspaceSize=32M -Xss256K -Xms16M -Xmx32M -XX:CompressedClassSpaceSize=8M -XX:ReservedCodeCacheSize=4M -jar target/spring-boot-test-web-freemarker-1.3.2.BUILD-SNAPSHOT.jar

We had some success with that, but in the end it seems to be more lenient than the container in Cloud Foundry, so not a really good simulation. (Using ulimit -v causes the app to fail immediately because apparently you can't stop the JVM from requesting virtual memory at that level.)

A Bigger JAR File

As another experiment, we added some (unused dependencies to the vanilla 256m app): spring-boot-starter-data-jpa, spring-cloud-starter-feign, spring-cloud-starter-stream-rabbit, h2. The jar goes up from 15MB to 42MB without any additional threads, although unfortunately there are quite a lot more classes loaded. Predictably it failed to start in PWS and it started locally (but took 29 seconds). Here's a summary of the local startup times of the two jars:

Non-heap	Classes	Startup Time (java -jar)	Startup Time (spring-boot:run)
72M	9200	29s	6s
45M	5800	3s	3s

The slow startup of this larger jar suggests that JarLauncher could be a target for optimization.

Ratpack

Ratpack uses fewer threads by default (it has additional thread pools that can be called on if needed), and starts up very quickly. The standard ratpack sample from Github is very minimal. It does start up successfully in Cloud Foundry with 128m container, and even serves HTTP requests under load. This is quite hard to explain given that it has 24 threads (measured locally), and so, even though it is using less memory than a Spring Boot Tomcat app, it should need more than the 128m available. It doesn't run in 64m. Here are the numbers:

Heap	Metaspace	Stack	Native	Total	Limit	Start(sec)	Key
55	64	12	-3	131	128	2	R

The JAR in the vanilla Ratpack sample is shaded. A shaded Spring Boot app with similar features using @EnableRatpack starts as well and behaves in a similar way, but a non-shaded version (with or without JarLauncher) fails to start. We conclude that reading JAR files uses memory that the shaded app doesn't, even if you don't use Spring Boot tooling, but the effect is only important in small containers. (The shaded Boot app bounced a bit as it was starting up, consistent with it being right on the limit of the container memory, but eventually settled down to run smoothly.)

What Would Work Better

What would work better would be to set the stack size based on the workload and/or make it easy to fix with a single environment variable. It might or might not make sense to expand the metaspace with the container size: really it depends more on the number of classes loaded than anything else, which correlates with the size of the archive uploaded, not so much on the memory.

Here are some memory settings that have been verified to actually work in Cloud Foundry with the freemarker sample (except the smallest), assuming 36 threads:

Container	-Xmx (Heap)	-XX:MaxMetaspaceSize (Metaspace)	-Xss (Stack)	Native	Startup(sec)
128m	32M	55M	256K	32	-
256m	140M	60M	256K	47	4
512m	384K	64M	256K	55	4
1g	800M	104M	256K	111	5

The smallest of these (128m) has very little chance of running a Spring Boot app with the default tooling, but might give you a fighting chance if you are prepared to tinker with the build (e.g. use shading).

To accommodate larger apps, we have to ask what a larger app would be doing that would need more memory. Most non-heap memory usage can be accounted for by a simple model involving threads and loaded classes.

memory = heap + non-heap

non-heap = threads x stack + classes x 7/1000

Adding more threads is quite common in enterprise-grade Java apps, and that costs memory which should be taken away from the heap if we want to keep the total constant. The number of application threads is something that the developer probably ideally needs to specify. Another common driver with larger apps would be to load more classes, and this is easier to predict based on the size of the archive (usually it is ddirectly correlated unless the developer makes a mistake building the archive).

In summary: it would be useful if the knobs that were available to modify JVM memory were more aligned with threads and classes, rather than the more abstract inputs that we have today. The classes input value could be guessed by the build pack by measuring the size of the archive (assuming it is known).

A Basic Model

A rule of thumb would be 400 classes per MB of application. We could also make a rough guess for threads, given that bigger applications (in terms of archive bytes) probably need more threads. Here's a suggestion:

classes = archive(MB) * 400
threads = 15 + archive(MB) * 6 / 10

Metaspace isn't the whole of the non-heap memory, but it probably scales with it (and probably the balance is proportional to the archive size). This all leads to a guess of

Archive	Jar Type	Container	-Xmx (Heap)	-XX:MaxMetaspaceSize (Metaspace)	-Xss (Stack)	Native Buffer
A	J	L	L - N - B	M	S:256K	B

Where N = (T + A * 60%)*S + A * 280% is an estimator for the non-heap memory, based on the archive size (A), the number of threads (T) and the stack size (S). The default value of S is 256K, but users might want to bump it if they know they use a non-Java language or a lot of layers of proxies. The metaspace M is roughly M = N - 80% * A, and we also re-assign the 80% * A to other cache settings below.

Finally, B is a native buffer (memory that we empirically see being needed in the container but it is hard to account for in JVM metrics). We generally seem to need less buffer for shaded jars than not and we believe the size is related to the files being read by the classloader. Thus a candidate rule for B is:

Jar Type	Native Buffer (B)
Shaded	22 + 80% * A
Nested	22 + 180% * A

We do not recommend starting the JVM with initial values equal to the max (e.g. -Xms=-Xmx) because an app often seems to need a bit of extra memory to get it started and every little helps. Initial values of -Xms=16M and -XX:MetaspaceSize=20M seems to work fine, and least for smaller containers (maybe they should scale with the max values just as in the existing calculation).

Here are some example values calculated using the formula above:

Archive	Jar Type	Threads	Container	-Xmx (Heap)	-XX:MaxMetaspaceSize (Metaspace)	-Xss (Stack)	Native Buffer
15	Nested	35	128	28.25	38.75	256	49
15	Nested	35	256	156.25	38.75	256	49
15	Nested	35	512	412.25	38.75	256	49
15	Nested	35	1024	924.25	38.75	256	49
15	Shaded	35	128	43.25	38.75	256	34
15	Shaded	35	256	171.25	38.75	256	34
15	Shaded	35	512	427.25	38.75	256	34
15	Shaded	35	1024	939.25	38.75	256	34
42	Nested	45	128	-98.45	95.25	256	97.6
42	Nested	45	256	29.55	95.25	256	97.6
42	Nested	45	512	285.55	95.25	256	97.6
42	Nested	45	1024	797.55	95.25	256	97.6
42	Shaded	45	128	-56.45	95.25	256	55.6
42	Shaded	45	256	71.55	95.25	256	55.6
42	Shaded	45	512	327.55	95.25	256	55.6
42	Shaded	45	1024	839.55	95.25	256	55.6

We think that adding -XX:CompressedClassSpaceSize, -XX:ReservedCodeCacheSize can also be quite useful. We haven't studied how they might scale with A, but probably they are proportional (e.g. they could be represented already by the 80% * A composing all or part of the empirical B). Setting -XX:+DisableAttachMechanism will also save a thread or two so occasionally worth a try in a small container.

Keep It Simple

Note that the number of threads T tends to scale with A (because you add more libraries and they all want more threads), so we can extrapolate the 2 data points we have and suppose that (approximately) T = 24 + 30% * A.

Multiplying everything out we have, in terms of inputs L (container limit), A (archive size), S (stack size) and J (jar type, 0 for shaded 1 otherwise):

Name	Value
-Xss	`S:256K`
-Xmx	`L - 28 - (24 + 90% * A) S - (J + 360%) A`
-XX:MetaspaceSize	`(24 + 90% * A) * S + 200% * A`
-XX:CompressedClassSpaceSize	`55% * A`
-XX:ReservedCodeCacheSize	`25% * A`

Keep it Super Simple

To get back to a model that only has one input (L) we can also make some additional assumptions. If we don't know what the value of A is (even though it's easy to measure) we could assume that it also scales with L - people want more memory for bigger apps. We can also guess S if we don't know better and say that it should scale linearly between 256K and 1M for containers in 128m to 1g. So we'll go with A = 12% * L and S = 256K + 768K * min(1,max(L-128,0)/896). Here's the result:

Archive	Jar Type	Threads	Container	-Xmx (Heap)	-XX:MaxMetaspaceSize (Metaspace)	-Xss (Stack)	Native Buffer
8	Nested	26	64	16	24	256	36
15	Nested	29	128	26	40	256	50
31	Nested	33	256	74	80	366	77
61	Nested	42	512	162	168	585	133
64	Nested	43	1024	626	210	1024	137
64	Nested	43	2048	1650	210	1024	137
8	Shaded	26	64	16	24	256	28
15	Shaded	29	128	41	40	256	34
31	Shaded	33	256	105	80	366	47
61	Shaded	42	512	224	168	585	71
64	Shaded	43	1024	690	210	1024	73
64	Shaded	43	2048	1714	210	1024	73

(We added caps on the estimated values of A from both sides, min 8 and max 64.)

kdgregory
IBM article

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cf.md

cf.md

For the Impatient

Running the Freemarker Sample

The Stack

Using A Custom Command

Shading

Simulating Container Behaviour

A Bigger JAR File

Ratpack

What Would Work Better

A Basic Model

Keep It Simple

Keep it Super Simple

Files

cf.md

Latest commit

History

cf.md

File metadata and controls

For the Impatient

Running the Freemarker Sample

The Stack

Using A Custom Command

Shading

Simulating Container Behaviour

A Bigger JAR File

Ratpack

What Would Work Better

A Basic Model

Keep It Simple

Keep it Super Simple