Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Run each test in separate julia process to save memory #13567

Closed
wants to merge 1 commit into from

Conversation

andreasnoack
Copy link
Member

This is an attempt to fix #11553. Instead of running all tests in the same CPU_CORES processes where the memory consumption accumulates as the tests run, I here run each test in a separate julia process launched as a task such that at most CPU_CORES processes are running simultanously. E.g. linalg/traingular.jl causes the julia binary to grow to over 500mb so just shutting down that process saves a lot of memory.

I haven't used tasks much so this might not be the right implementation, even if the idea is right. Also, the printing is a complete mess right now. Anyone knows how to ensure that the printing is pretty?

@ScottPJones
Copy link
Contributor

Maybe also whether or not a particular test needs to be run in a separate process could be indicated somewhere, or instead have some automatic restart after things grow past a certain size?

@amitmurthy
Copy link
Contributor

With the current system we have caught bugs wherein a given test corrupted memory but passed successfully resulting in the next test scheduled on the same worker failing. Running each test in its own process everytime will prevent that.

I do understand that unless we get more RAM allocated on Travis, we may not have a choice.

As for the printing issue, we could execute an addprocs -> run test -> rmprocs cycle for each test. This would keep the printing more or less as is currently, but may result in some corruption faults being hidden.

@kshyatt kshyatt added the test This change adds or pertains to unit tests label Oct 12, 2015
@andreasnoack
Copy link
Member Author

I see the point in catching memory bugs like that, but on the contrary I also see a point in running tests in new processes such that no tests are unaffected by global definitions from earlier tests.

However, the main reason for this pr is that it will reduce the memory usage of the test suite a lot. On my machine, the present suite has allocated a total of 4.4GB when it is done whereas the peak for test suite with this pr is not above 1.5GB.

Update: I found a fix for the printing.

@tkelman
Copy link
Contributor

tkelman commented Oct 13, 2015

This isn't running the tests in parallel, so they're taking longer than usual.

@andreasnoack
Copy link
Member Author

This isn't running the tests in parallel, so they're taking longer than usual.

That is not true. See https://travis-ci.org/JuliaLang/julia/builds/84994480 vs https://travis-ci.org/JuliaLang/julia/builds/84862284. The tests do run in parallel.

@andreasnoack andreasnoack changed the title WIP: Run each test in separate julia process to save memory RFC: Run each test in separate julia process to save memory Oct 13, 2015
@tkelman
Copy link
Contributor

tkelman commented Oct 13, 2015

@andreasnoack
Copy link
Member Author

Well, the part after "so" is correct, but not because of the part before "so". The tests are running in parallel, but many more processes are launched and therefore the same code is compiled more times relative to the old setup. This effect is proabably also present when we go from one to two or more processes so the most efficient utilization of CI would then be no parallel tests.

I don't know if this pr is the right solution. I wanted to try it out and see the effect. Our test requirements are likely to grow so eventually some kind of resetting might be useful, but monitoring the memory allocation of each parallel process and restarting only when exceeding some threshold is a little too advanced for me right now.

@tkelman
Copy link
Contributor

tkelman commented Oct 13, 2015

--precompiled=yes would probably help matters on appveyor, though failures would potentially have worse backtraces

@yuyichao
Copy link
Contributor

Will this create a process for each test even if JULIA_CPU_CORES=1? It will be nice to keep the possibility to run tests in series on a single process (in a known order).

@stevengj
Copy link
Member

Wouldn't it be easier to just addprocs(N), run N tests, then rmprocs(N), and repeat?

@yuyichao
Copy link
Contributor

Wouldn't that take a much longer time to run unless we sort the tests carefully since they take very different amount of time to run?

@andreasnoack
Copy link
Member Author

@stevengj I think that would be slower since it would put a "barrier" for each N tests whereas this approach tries to keep N tests running simultanously. However, the overhead from only running a single test in a process seems to be too large for the many small tests so the approach in #13577 is much better.

@andreasnoack andreasnoack deleted the anj/spawntest branch October 13, 2015 13:29
@ScottPJones
Copy link
Contributor

@andreasnoack Do you have a list of the tests that you observed to use large amounts of memory?
A simple change based on your PR could be to just use a new worker for those only.

@andreasnoack
Copy link
Member Author

I think I know which ones are problematic. The easiest way to calculate it is to run the tests twice in an single process and divide the first and the second running time. When the ratio is large the tests compile a lot. However, I think an automated approach like #13577 is better.

@ScottPJones
Copy link
Contributor

Ah, yes, I hadn't seen #13577 yet, it seems to be like what I'd suggested here earlier (but didn't know how to do myself). That will be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test This change adds or pertains to unit tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OOM killer causing the CI EOFError on linux recently
7 participants