RFC: Run each test in separate julia process to save memory #13567

andreasnoack · 2015-10-12T17:48:01Z

This is an attempt to fix #11553. Instead of running all tests in the same CPU_CORES processes where the memory consumption accumulates as the tests run, I here run each test in a separate julia process launched as a task such that at most CPU_CORES processes are running simultanously. E.g. linalg/traingular.jl causes the julia binary to grow to over 500mb so just shutting down that process saves a lot of memory.

I haven't used tasks much so this might not be the right implementation, even if the idea is right. Also, the printing is a complete mess right now. Anyone knows how to ensure that the printing is pretty?

ScottPJones · 2015-10-12T18:22:12Z

Maybe also whether or not a particular test needs to be run in a separate process could be indicated somewhere, or instead have some automatic restart after things grow past a certain size?

amitmurthy · 2015-10-12T19:41:51Z

With the current system we have caught bugs wherein a given test corrupted memory but passed successfully resulting in the next test scheduled on the same worker failing. Running each test in its own process everytime will prevent that.

I do understand that unless we get more RAM allocated on Travis, we may not have a choice.

As for the printing issue, we could execute an addprocs -> run test -> rmprocs cycle for each test. This would keep the printing more or less as is currently, but may result in some corruption faults being hidden.

andreasnoack · 2015-10-13T00:15:53Z

I see the point in catching memory bugs like that, but on the contrary I also see a point in running tests in new processes such that no tests are unaffected by global definitions from earlier tests.

However, the main reason for this pr is that it will reduce the memory usage of the test suite a lot. On my machine, the present suite has allocated a total of 4.4GB when it is done whereas the peak for test suite with this pr is not above 1.5GB.

Update: I found a fix for the printing.

tkelman · 2015-10-13T00:20:25Z

This isn't running the tests in parallel, so they're taking longer than usual.

andreasnoack · 2015-10-13T00:25:49Z

This isn't running the tests in parallel, so they're taking longer than usual.

That is not true. See https://travis-ci.org/JuliaLang/julia/builds/84994480 vs https://travis-ci.org/JuliaLang/julia/builds/84862284. The tests do run in parallel.

tkelman · 2015-10-13T00:31:14Z

It's 2-4 minutes slower on Travis. 6-7 on appveyor, https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.10031 vs https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.10013

andreasnoack · 2015-10-13T00:52:28Z

Well, the part after "so" is correct, but not because of the part before "so". The tests are running in parallel, but many more processes are launched and therefore the same code is compiled more times relative to the old setup. This effect is proabably also present when we go from one to two or more processes so the most efficient utilization of CI would then be no parallel tests.

I don't know if this pr is the right solution. I wanted to try it out and see the effect. Our test requirements are likely to grow so eventually some kind of resetting might be useful, but monitoring the memory allocation of each parallel process and restarting only when exceeding some threshold is a little too advanced for me right now.

tkelman · 2015-10-13T00:53:58Z

--precompiled=yes would probably help matters on appveyor, though failures would potentially have worse backtraces

yuyichao · 2015-10-13T12:51:58Z

Will this create a process for each test even if JULIA_CPU_CORES=1? It will be nice to keep the possibility to run tests in series on a single process (in a known order).

stevengj · 2015-10-13T13:09:14Z

Wouldn't it be easier to just addprocs(N), run N tests, then rmprocs(N), and repeat?

yuyichao · 2015-10-13T13:22:05Z

Wouldn't that take a much longer time to run unless we sort the tests carefully since they take very different amount of time to run?

andreasnoack · 2015-10-13T13:29:46Z

@stevengj I think that would be slower since it would put a "barrier" for each N tests whereas this approach tries to keep N tests running simultanously. However, the overhead from only running a single test in a process seems to be too large for the many small tests so the approach in #13577 is much better.

ScottPJones · 2015-10-13T15:07:37Z

@andreasnoack Do you have a list of the tests that you observed to use large amounts of memory?
A simple change based on your PR could be to just use a new worker for those only.

andreasnoack · 2015-10-13T15:18:06Z

I think I know which ones are problematic. The easiest way to calculate it is to run the tests twice in an single process and divide the first and the second running time. When the ratio is large the tests compile a lot. However, I think an automated approach like #13577 is better.

ScottPJones · 2015-10-13T15:20:21Z

Ah, yes, I hadn't seen #13577 yet, it seems to be like what I'd suggested here earlier (but didn't know how to do myself). That will be great.

Run each test in separate julia process to save memory

43429b6

andreasnoack force-pushed the anj/spawntest branch from cadbed3 to 43429b6 Compare October 12, 2015 20:24

kshyatt added the test This change adds or pertains to unit tests label Oct 12, 2015

andreasnoack changed the title ~~WIP: Run each test in separate julia process to save memory~~ RFC: Run each test in separate julia process to save memory Oct 13, 2015

amitmurthy mentioned this pull request Oct 13, 2015

restart worker during tests depending on resident memory size #13577

Merged

andreasnoack closed this Oct 13, 2015

andreasnoack deleted the anj/spawntest branch October 13, 2015 13:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Run each test in separate julia process to save memory #13567

RFC: Run each test in separate julia process to save memory #13567

andreasnoack commented Oct 12, 2015

ScottPJones commented Oct 12, 2015

amitmurthy commented Oct 12, 2015

andreasnoack commented Oct 13, 2015

tkelman commented Oct 13, 2015

andreasnoack commented Oct 13, 2015

tkelman commented Oct 13, 2015

andreasnoack commented Oct 13, 2015

tkelman commented Oct 13, 2015

yuyichao commented Oct 13, 2015

stevengj commented Oct 13, 2015

yuyichao commented Oct 13, 2015

andreasnoack commented Oct 13, 2015

ScottPJones commented Oct 13, 2015

andreasnoack commented Oct 13, 2015

ScottPJones commented Oct 13, 2015

RFC: Run each test in separate julia process to save memory #13567

RFC: Run each test in separate julia process to save memory #13567

Conversation

andreasnoack commented Oct 12, 2015

ScottPJones commented Oct 12, 2015

amitmurthy commented Oct 12, 2015

andreasnoack commented Oct 13, 2015

tkelman commented Oct 13, 2015

andreasnoack commented Oct 13, 2015

tkelman commented Oct 13, 2015

andreasnoack commented Oct 13, 2015

tkelman commented Oct 13, 2015

yuyichao commented Oct 13, 2015

stevengj commented Oct 13, 2015

yuyichao commented Oct 13, 2015

andreasnoack commented Oct 13, 2015

ScottPJones commented Oct 13, 2015

andreasnoack commented Oct 13, 2015

ScottPJones commented Oct 13, 2015