Develop/Document multi-level parallelism policy #644

rrnewton opened this Issue Jul 21, 2015 · 25 comments


None yet

9 participants


I can see that stack test defaults to parallel builds. But that refers to parallelism between different test-suites, right? In the case of both builds and tests (with test-framework or tasty) there's the issue of "inner loop" parallelism within each build or test target. I suppose we can't parallelize the tests until we have some way for the build-tool to know it's a tasty/test-framework suite, not just exitcode-stdio-1.0, but that still leaves build-parallelism. Does Stack currently pass -j to GHC?

I haven't seen comprehensive benchmark numbers, but a small value like -j2 or -j4 for GHC offers some benefit. The current implementation is not scalable however (for example, I haven't seen anything good come of -j32).

Nested parallelism of course raises the issue of N * N jobs being spawned for N cores. As long as its quadratic oversubscription and not exponential, I think this is not that big a problem for CPU usage, but it very often can create problems with memory usage (or hitting ulimits / max # of processes).

There are several related cabal issues, but it's a little hard to tell the current status with them spanning a lot of time and various merged and unmerged pull requests:

@rrnewton rrnewton changed the title from Document multi-level parallelism policy to Develop/Document multi-level parallelism policy Jul 21, 2015

I'm seeing about three different things being raised here:

  • Asking GHC to build a single package in parallel
  • Telling a test suite to run its test cases in parallel
  • Running multiple test suites from a single package in parallel

stack does none of these right now. The only thing stack does in parallel is process individual packages in parallel. Does that answer your question?

As to what should stack be doing... I don't see a downside to passing -j to GHC when available. I'd rather avoid running test suites in parallel, but there's really no strong reason for that. I don't see how stack can have any impact on the insides of the test suite itself, since that's entirely up to the test framework.


One tricky thing is deciding how many threads GHC should be running if stack is running multiple builds (i.e. if you have 8 CPUs and stack is running 8 builds, each GHC shouldn't be running 8 of its own threads). Simplest might be to only pass -j if stack is only building a single package.

@borsboom borsboom added this to the Later improvements milestone Jul 21, 2015

Given that neither stack nor cabal nor GHC has a notion of global resource management on the machine, I guess in the medium term what I'd like is enough knobs to experiment with this.

For example, it would be great to be able to "turn it up" to max parallelism -- parallel packages, parallel targets within a package, parallel modules within ghc --make plus parallel tests. And then do some benchmarking to see what speedups and memory usage look like.

We can already vary things pretty easily by generating .cabal/.stack files that pass the right arguments through to GHC and to the test-suites. I guess the critical thing on which to get help from stack itself is the third bullet @snoyberg mentioned -- multiple test-suites (plus profiling/non-profiling/documentation) in parallel within one package, which corresponds to haskell/cabal#2623.

By the way, as far as I know, no one's properly analyzed GHC builds from a parallel algorithms perspective. I.e. we need to profile the dependence graph and do a work/span analysis to figure out what is limiting our scaling. (We're working on a research project where we may be hacking on fine-grain parallelism inside certain GHC phases, but it only makes sense if there's a coherent picture for parallelism at the coarser grain too.)

In lieu of a GHC server mode (which has been discussed on various issues), we can't directly implement an inter-process work-stealing type policy that mimics the now-standard intra-process load balancing approach. But we can do what that Gentoo builder seems to be doing and simply delay some tasks to manage resource use. The more look-ahead or previous knowledge we have about the task DAG the smarter we can be in prioritizing the critical path.


I'm tempted to close this issue, and just add a link to my comment above to the FAQ. Any objection?

@snoyberg snoyberg self-assigned this Jul 31, 2015

That's fine. Fixing this so that stack does something smart would be a big project not solved in a day, and that addresses the "document" part.

@snoyberg snoyberg closed this Jul 31, 2015

It would be great if this issue, since it's linked from the FAQ, tells me how to build at least my package in parallel using some ghc option, possibly specified in the cabal file or as an option to stack.

Alternatively, explicitly say that this cannot be done.

mgsloan commented Jun 7, 2016

@alexanderkjeldaas No such flag necessary, stack will build your package in parallel with other things if it can.

Perhaps you mean having ghc build modules in parallel? Unfortunately in my experience this doesn't speed things up as much as I'd hoped. You can do --ghc-options -j5 (or whatever number).


Yes that option is interesting - ghc seems to be able to use 6x the CPU and finish on exactly the sime time. Impressive!

sjakobi commented Jun 8, 2016

Yes that option is interesting - ghc seems to be able to use 6x the CPU and finish on exactly the sime time.

Related GHC ticket:

I did some simple timings on my machine (i3-2350M, 2 physical cores + hyperthreading) and always got the shortest build times with -j2 or -j3. The relative speedup varied a lot depending on the package, e.g. ~30% with vector-algorithms vs. ~10% with haskell-src-exts.

I was wondering how hard it would be to detect when stack doesn't use all of its build threads and in that case to pass -j2 or -j3 to the build jobs until all threads are used.
Build times would probably still be quite far from their optimum but I don't believe that this could result in build times that are worse than the status quo.


Related question (might need its own issue): how do I tell stack to set -j4 by default for itself, aside from ghc-options? I found nothing in, or by googling.

Studying this issue suggests also builds are parallel, and stack's source suggests it defaults to -j $(getNumProcessors), already in 1.1.2, which sounds good (depending on answers to the other questions), and that this can be tuned through e.g. jobs: 4 in config.yaml.

mgsloan commented Jun 13, 2016

stack build -j2. It's among the options listed in stack --help


@mgsloan I'm asking for docs on setting that by default, for all invocations.


Also a different setting for the current project only would make sense.
Dependencies might be built in parallel, but the current project won't.

On Mon, Jun 13, 2016 at 3:42 PM, Paolo G. Giarrusso <> wrote:

@mgsloan I'm asking for docs on setting that by
, for all invocations.

You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#644 (comment),
or mute the thread


Can anyone tell me how many parallel ghc processes stack will spawn? This is distinct from the ghc -j option, which specifies the level of parallelism within each ghc process, whereas I'm talking about how many of these ghc processes stack (or is it cabal?) keeps running at the same time for building dependencies in parallel.

I'm benchmarking some code on a 32-core machine, and stack seems to only spawn 8 concurrent ghc instances (building 8 dependencies in parallel), resulting in, at most, 25% use (8) of the 32 cores.

Based on my testing this figure should be equal to at least the number of available CPU cores, perhaps as much as five times that number, as ghc often seems to have a hard time using up just a single core fully (doing IO I presume). So if we set it to 5*NUM_CORES then each individual ghc process can use (on average) as little as 20% of one core, and we'd still be using all cores fully.

An actual use case for this would be automatically spawning the build process onto high-CPU VMs, so we can build stuff in 2 minutes rather than half an hour.

mgsloan commented Aug 9, 2016

@runeksvendsen By default, it will build NUM_CORES packages concurrently.

An actual use case for this would be automatically spawning the build process onto high-CPU VMs, so we can build stuff in 2 minutes rather than half an hour.

Often we can't actually build that many packages concurrently, and so your CPUs remain unsaturated. You can pass -j2 and similar to ghc via --ghc-options -j2. Unfortunately, in my experience this hasn't helped build time as much as I'd hoped.

alexanderkjeldaas commented Oct 29, 2016 edited

@snoyberg the above link Added here: is dead, then the new FAQ links back to this issue, making it circular as this issue was closed because this is supposedly documented.

reopen issue?


Also a separate issue that I'll mention here, a -j low-memory would be good to have for CI machines with limited RAM. It's not a problem to pre-fetch, and maybe configure in parallel, but not build.

@Blaisorblade Blaisorblade reopened this Oct 29, 2016

Reopened as requested. After the issue was closed, it seems more than one question was asked and not addressed by the docs (sorry if I'm wrong).
Re -j low memory: how low? Your request makes sense, but I ask because right now, at least on machines with 1G RAM, GHC tends to segfault rather than report it failed allocation.


I actually don't know what stack does by default, but I just tried to setup CI on, and stack build will give me the following out-of-the-box:

thyme- copy/register 

  --  While building package JuicyPixels-3.2.8 using: 

        /home/app/.stack/setup-exe-cache/x86_64-linux/setup-Simple-Cabal- --builddir=.stack-work/dist/x86_64-linux/Cabal- build --ghc-options " -ddump-hi -ddump-to-file" 

      Process exited with code: ExitFailure (-9) (THIS MAY INDICATE OUT OF MEMORY) 

      Logs have been written to: /mysecretproject/.stack-work/logs/JuicyPixels-3.2.8.log 

      Configuring JuicyPixels-3.2.8... 

      Building JuicyPixels-3.2.8... 

      Preprocessing library JuicyPixels-3.2.8... 

So some quick fix I could do to make CI work out-of-the-box is what I'd need.


For practical purposes, something like MAKEFLAGS for stack would be nice to have when stack is called from within some other build system. In that case it would be easy to slap an STACKBUILDFLAGS=-j1 <somebuildthing> to see if it solves the problem, instead of having to retrofit injecting stack build options through that other tool.

Not a big issue, but if someone is going to look at this, might as well add it.


Re -j low memory: how low?

For what it's worth, I experienced this on a 600M RAM f1-micro Google Cloud instance:

runesvend@cloudstore-test:~/code/test-gcloud-datastore$ stack install
Downloaded nightly-2016-09-15 build plan.    
Updating package index Hackage (mirrored at

Fetched package index.    
Populated index cache.    
stack: out of memory (requested 1048576 bytes)
runesvend@cloudstore-test:~/code/gcloud-datastore$ free -h
             total       used       free     shared    buffers     cached
Mem:          594M       195M       398M       4.2M       2.8M       113M
-/+ buffers/cache:        80M       514M
  • For what it's worth, I don't recommend trying with less than 2G RAM. And enough swap enabled. Most failures otherwise appear due to a single GHC instance and stack can't do much about them; @runeksvendsen's log shows a failure due to stack, but GHC requires far more memory so I see little point in trying to fix it.
  • By default, stack uses --jobs to the number of processors (a reasonable default). By default stack does not tell GHC to build at once multiple modules of a package, unless stack is explicitly configured otherwise in ~/.stack/config.yaml via ghc-options.

@alexanderkjeldaas You might want to run stack build -j1 (which forces building at most one package at a time) and see if that helps (it might, your trace looks like stack is setting -j2).


I can confirm that I had to completely give up trying to build my project on a 600M RAM machine. It worked OK to begin with, and it built GHC fine, but the closer it got towards actually finishing, the quicker all RAM was consumed.

I found a 1.7G RAM machine to be sufficient, however. Although, the build process sometimes requires a restart due to the occasional out-of-memory error (which, as mentioned, can be avoided -- while capping concurrency/performance -- by using eg. -j1).


By default, it will build NUM_CORES packages concurrently.

Quick note for the maintainers of the Windows build in case they're not already using it: NUMBER_OF_PROCESSORS will typically be set (certainly on "pro" editions / server editions / developer machines) in a like manner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment