override heap / jvm params for tests in gradle build [LUCENE-9160] #10200

asfimport · 2020-01-22T04:13:37Z

Currently the gradle.properties that is generated lets you control the heap and flags for the gradle build jvms.

But there is no way to control these flags for the actual forked VMs running the unit tests. For example, minHeap is hardcoded at 256m and maxHeap at 512m.

I would like to change minHeap to 512m as well, for a fixed heap, and set some other jvm flags, such as -XX:+UseParallelGC so that my tests are not slow for silly reasons :)

I think it is stuff jenkins CI would need as well.

Migrated from LUCENE-9160 by Robert Muir (@rmuir), resolved Jan 22 2020
Attachments: LUCENE-9160.patch (versions: 2)
Linked issues:

Make heap and other test-jvm overrides self-described (gradlew testOpts) [LUCENE-9162] #10202

The text was updated successfully, but these errors were encountered:

asfimport · 2020-01-22T05:19:12Z

Robert Muir (@rmuir) (migrated from JIRA)

Here's a patch that works for me. It allows specifying these parameters similar to how you can with ant:

tests.heapsize=512m
tests.minheapsize=512m
args=-XX:+AlwaysPreTouch -XX:+UseTransparentHugePages -XX:+UseParallelGC
tests.workDir=/tmp/lucene_gradle

I tried to make the parameters match the ant build as much as possible, to reduce confusion, but I'm not stuck on the naming, just want to make it possible :)

asfimport · 2020-01-22T06:20:25Z

Robert Muir (@rmuir) (migrated from JIRA)

fwiw adding -XX:TieredStopAtLevel=1 to my args made an even bigger difference, cut overall test time in half (18 minutes -> 9 minutes). we waste all resources testing c2 compiler...

asfimport · 2020-01-22T13:58:12Z

Michael McCandless (@mikemccand) (migrated from JIRA)

fwiw adding -XX:TieredStopAtLevel=1 to my args made an even bigger difference

I tested this option, on 72 core box, using JDK 11.

In lucene/core I ran ant test -Dtests.jvms=36 for baseline, twice:

BUILD SUCCESSFUL
Total time: 1 minute 18 seconds

BUILD SUCCESSFUL
Total time: 1 minute 13 seconds

And then ran again with this option (to tell hotspot to not try so hard?), ant test -Dtests.jvms=36 -XX:TieredStopAtLevel=1:

BUILD SUCCESSFUL
Total time: 24 seconds
BUILD SUCCESSFUL
Total time: 42 seconds

Net/net this is a crazy crazy speedup for our tests!!!

asfimport · 2020-01-22T14:01:29Z

Uwe Schindler (@uschindler) (migrated from JIRA)

But we should not hardcode the JVM opts, as we would like to test all combinations (also C2 optimizations) on Jenkins.

So we can add sane defaults, but -Dargs should always override those settings.

asfimport · 2020-01-22T14:03:03Z

Uwe Schindler (@uschindler) (migrated from JIRA)

Basically -XX:TieredStopAtLevel=1 is very similar to -client in older JDKs. So for shortrunning processes this is optimal. Of course it's a bad idea for benchmarks or server environments.

asfimport · 2020-01-22T14:03:50Z

Robert Muir (@rmuir) (migrated from JIRA)

Yes, i'd like to just set args=-XX:TieredStopAtLevel=1 as a default. thats the only default i want. the other stuff i do here has tradeoffs, but this one is a no-brainer by default.

asfimport · 2020-01-22T14:39:15Z

Robert Muir (@rmuir) (migrated from JIRA)

Updated patch, it sets the default, but you can override of course. I changed name to tests.jvmargs to be consistent with org.gradle.jvmargs which is used for the build VMs.

I also updated the help page. I think its ready.

asfimport · 2020-01-22T14:41:37Z

Uwe Schindler (@uschindler) (migrated from JIRA)

OK, +1

asfimport · 2020-01-22T14:58:28Z

ASF subversion and git services (migrated from JIRA)

Commit 9dae566 in lucene-solr's branch refs/heads/master from Robert Muir
https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9dae566

LUCENE-9160: add params/docs to override jvm params in gradle build, default C2 off in tests.

Adds some build parameters to tune how tests run. There is an example
shown by "gradle helpLocalSettings"

Default C2 off in tests as it is wasteful locally and causes slowdown of
tests runs. You can override this by setting tests.jvmargs for gradle,
or args for ant.

Some crazy lucene stress tests may need to be toned down after the
change, as they may have been doing too many iterations by default...
but this is not a new problem.

asfimport · 2020-01-22T15:17:06Z

Dawid Weiss (@dweiss) (migrated from JIRA)

Correct that running tests effectively stresses the compiler. I'd do it slightly differently so that options are self-documenting but this is something that I can follow-up later on. LGTM overall. Great speedup for the common case.

asfimport · 2020-01-22T15:20:51Z

ASF subversion and git services (migrated from JIRA)

Commit 9dae566 in lucene-solr's branch refs/heads/gradle-master from Robert Muir
https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9dae566

LUCENE-9160: add params/docs to override jvm params in gradle build, default C2 off in tests.

Adds some build parameters to tune how tests run. There is an example
shown by "gradle helpLocalSettings"

Default C2 off in tests as it is wasteful locally and causes slowdown of
tests runs. You can override this by setting tests.jvmargs for gradle,
or args for ant.

Some crazy lucene stress tests may need to be toned down after the
change, as they may have been doing too many iterations by default...
but this is not a new problem.

asfimport · 2020-01-22T22:05:38Z

David Smiley (@dsmiley) (migrated from JIRA)

While this change might improve Lucene tests (I didn't check yet), I'm finding this to be a large degradation in Solr tests. A machine I use to run tests normally takes around 38 minutes but is now taking 52 minutes. It's a corporate VM that supposedly has 16 CPUs; "ant test" uses 4 runners. I passed "-Dargs=" to undo the change args change and I'm back to normal test run times.

asfimport · 2020-01-22T22:12:40Z

Robert Muir (@rmuir) (migrated from JIRA)

The solr tests are generally sleep()'ing and hence leave the cpu with a lot of cycles to run background compilation. so there is no downside for the overcompilation of tests, only the benefits. For solr I can't recommend anything, the tests are really hopeless: i'd just use as many runners as possible.

asfimport · 2020-01-22T22:35:02Z

Robert Muir (@rmuir) (migrated from JIRA)

Also, if you have a machine with 16 cpus, and you are running with just 4 runners, that is a misconfigured system: you are leaving 75% of your machine free. So it shouldnt be any surprise that background compilation (even if its some insane amount) causes you no problems: 75% of your resources are wasted.

Set the jvms to 16 if you want to do a comparison.

asfimport · 2020-01-22T22:41:04Z

Robert Muir (@rmuir) (migrated from JIRA)

Similar comparison: if you have a 16 cpu machine and only use 4 runners, i can speed up your tests by spawning 12 background threads from the build: 12 threads that spend 80% of their time mining cryptocurrency and only 20% of their time running tests.

You'd see a nice speedup, even though overall its wasting all of your resources. And if you set test jvms to 16 you'd see that these background threads only caused contention and slowed you down, because they are wasting your CPU overall. This is what C2 compiler does in our tests :)

asfimport · 2020-01-23T03:37:46Z

David Smiley (@dsmiley) (migrated from JIRA)

Yep; I hear you, and thanks for your amusing comparative explanation :). I recently acquired use of this VM and hadn't tuned the build. I tried tests.jvms=4,8,10,12,16 and ultimately found 10 yielded the best times on this VM – 17:24m.

asfimport · 2020-01-23T03:46:31Z

Robert Muir (@rmuir) (migrated from JIRA)

are you sure you really have 16?

python -c "import psutil; print(psutil.cpu_count(logical=False))"

asfimport · 2020-01-23T03:54:09Z

Robert Muir (@rmuir) (migrated from JIRA)

keep in mind with a vm, the admin may not have taken the time to expose resources "correctly" as far as hyperthreads and so on. I can pass -smp 48,cores=12,threads=4 to a kvm from my little 2-core machine and that is what the VM will see.

asfimport · 2020-01-23T03:55:30Z

David Smiley (@dsmiley) (migrated from JIRA)

I don't have the psutil module and I'm not versed in python but I ran lscpu:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                16
On-line CPU(s) list:   0-15
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             16
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 61
Model name:            Intel Core Processor (Broadwell)
Stepping:              2
CPU MHz:               2397.222
BogoMIPS:              4794.44
Virtualization:        VT-x
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              4096K
L3 cache:              16384K
NUMA node0 CPU(s):     0-15
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat

(and 48GB of RAM)

asfimport · 2020-01-23T13:21:11Z

Robert Muir (@rmuir) (migrated from JIRA)

My guess is there is only really 8. Impossible to tell inside a VM :) I will open an issue, the gradle build divides number of cpus by 2, then artifically caps this at 4, I think we should change that. Divide by 2 is fine, but machines have more cores these days. 8 would have been a better default here.

asfimport · 2020-01-23T18:21:41Z

Dawid Weiss (@dweiss) (migrated from JIRA)

I never had a chance to experiment on those super-beefy machines but I'm sure we can alter the defaults.

      // Approximate a common-sense default for running gradle with parallel
      // workers: half the count of available cpus but not more than 12.
      def cpus = Runtime.runtime.availableProcessors()
      def maxWorkers = (int) Math.max(1d, Math.min(cpus * 0.5d, 12))
      def testsJvms = (int) Math.max(1d, Math.min(cpus * 0.5d, 4))

My machines quickly saturate I/O and memory bandwidth for higher test parallelism, especially for Solr. The above is just off-the-top-off-my-head default. It can be certainly improved.

asfimport · 2020-01-23T18:27:18Z

Robert Muir (@rmuir) (migrated from JIRA)

Dawid I opened #10205 to discuss further.

Also keep in mindthis jira ticket alters the defaults in ways that impact this.
For example, when running lucene tests with 3 VMs I see load average around 4.0 instead of 15.0-16.0 before this very patch was committed!

That's because I don't have 3 CICompiler threads per JVM doing a lot of useless C2 recompilation. So it makes things more efficient and I think we should raise the hard cap of 4 jvms to 8 or 12.

asfimport · 2020-01-29T19:19:02Z

ASF subversion and git services (migrated from JIRA)

Commit 16f240e in lucene-solr's branch refs/heads/branch_8x from Robert Muir
https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=16f240e

LUCENE-9160: add params/docs to override jvm params in gradle build, default C2 off in tests.

Adds some build parameters to tune how tests run. There is an example
shown by "gradle helpLocalSettings"

Default C2 off in tests as it is wasteful locally and causes slowdown of
tests runs. You can override this by setting tests.jvmargs for gradle,
or args for ant.

Some crazy lucene stress tests may need to be toned down after the
change, as they may have been doing too many iterations by default...
but this is not a new problem.

asfimport · 2021-12-08T17:05:29Z

Adrien Grand (@jpountz) (migrated from JIRA)

Closing after the 9.0.0 release

asfimport closed this as completed Jan 22, 2020

This was referenced Aug 24, 2022

Make heap and other test-jvm overrides self-described (gradlew testOpts) [LUCENE-9162] #10202

Closed

change generate-defaults.gradle not to cap testsJvms at 4 [LUCENE-9165] #10205

Closed

merge gradle/ant test security policies [LUCENE-9159] #10199

Open

uschindler mentioned this issue May 9, 2024

Add a separate option to allow running Panama Vectorization for all tests with suitable C2 defaults #13351

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

override heap / jvm params for tests in gradle build [LUCENE-9160] #10200

override heap / jvm params for tests in gradle build [LUCENE-9160] #10200

asfimport commented Jan 22, 2020 •

edited

Loading

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 23, 2020

asfimport commented Jan 23, 2020

asfimport commented Jan 23, 2020

asfimport commented Jan 23, 2020

asfimport commented Jan 23, 2020

asfimport commented Jan 23, 2020

asfimport commented Jan 23, 2020 •

edited

Loading

asfimport commented Jan 29, 2020

asfimport commented Dec 8, 2021

override heap / jvm params for tests in gradle build [LUCENE-9160] #10200

override heap / jvm params for tests in gradle build [LUCENE-9160] #10200

Comments

asfimport commented Jan 22, 2020 • edited Loading

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 22, 2020

asfimport commented Jan 23, 2020

asfimport commented Jan 23, 2020

asfimport commented Jan 23, 2020

asfimport commented Jan 23, 2020

asfimport commented Jan 23, 2020

asfimport commented Jan 23, 2020

asfimport commented Jan 23, 2020 • edited Loading

asfimport commented Jan 29, 2020

asfimport commented Dec 8, 2021

asfimport commented Jan 22, 2020 •

edited

Loading

asfimport commented Jan 23, 2020 •

edited

Loading