New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`stack ghc` painfully slow #1671

Closed
ezrosent opened this Issue Feb 5, 2017 · 46 comments

Comments

Projects
None yet
@ezrosent
Copy link

ezrosent commented Feb 5, 2017

  • A brief description
    Managing haskell projects with the stack tool is unusable due to how slow it is.

  • Expected results
    (from a laptop running ubuntu 16.04)

time stack ghc -- --version
The Glorious Glasgow Haskell Compilation System, version 8.0.1

real    0m0.124s                                                                                                                                           
user    0m0.092s                                                                                                                                               
sys     0m0.036s
  • Actual results (with terminal output if applicable)
    On desktop running WSL
time stack ghc -- --version
The Glorious Glasgow Haskell Compilation System, version 8.0.1

real    0m50.520s
user    0m0.172s
sys     1m40.547s
  • Your Windows build number
    15025
  • Steps / All commands required to reproduce the error from a brand new installation
    After installation, need stack to pull in a version of GHC. This should do the trick.
stack setup
stack upgrade --install-ghc
time stack ghc -- --version
  • Strace of the failing command
    Generating the strace output (attached) inludes a few long (multi-second) waits on FUTEX_WAIT, as well as one for mmap.
  • Required packages and commands to install
    Install stack with the standard instructions

stack_ghc_strace.txt

@benhillis

This comment has been minimized.

Copy link
Member

benhillis commented Feb 6, 2017

This is on our backlog but is unlikely to make the Creators Update. I know we're planning on looking at this soon though.

For some context, I've looked at what causes this slowdown. For some reason stack has mapped an mind-bogglingly huge region of memory (I'm talking dozens of terabytes). When we fork we walk the entire address range to set up the new process's state. We have a design that should vastly speed this up, but we're approaching "pencils down" date for Creators Update.

@ezrosent

This comment has been minimized.

Copy link
Author

ezrosent commented Feb 6, 2017

Gotcha, thanks for the context!

@therealkenc

This comment has been minimized.

Copy link
Collaborator

therealkenc commented Feb 6, 2017

Terabytes. That's awesome. Can't wait to see it in Resource Monitor.

@benhillis

This comment has been minimized.

Copy link
Member

benhillis commented Feb 6, 2017

I assume they're doing it to manage their own heap. It's a big "MAP_NORESERVE" region which Linux seems to intelligently handle since "allocate all the things" seems to be a common paradigm.

@therealkenc

This comment has been minimized.

Copy link
Collaborator

therealkenc commented Feb 7, 2017

This seems to be the related discussion over at ghc ticket 9706 here, for what it is worth. Quoth:

BTW, I found that I could mmap 100 TB with PROT_NONE (or even PROT_READ) and MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED with no measurable delay, and I can start 10000 such processes at once, so there doesn't seem to be any significant cost to setting up the page table mappings (at least on my system). Not sure why that is, exactly. The VSZ column in ps looks quite funny of course :)

So by my math that's 100×1012×104 = 1018 ≅ 260, which gets you in just under the wire. Or something.

@benhillis

This comment has been minimized.

Copy link
Member

benhillis commented Feb 7, 2017

Adding @stehufntdev because he's been looking into this as well.

@mckennapsean

This comment has been minimized.

Copy link

mckennapsean commented Feb 18, 2017

I have encountered a similar bug with plain ghc and pandoc. Really slow just to call and print out version info. Can confirm for slow-ring Insider Build & the Windows Preview v.15.15014 (using the free VM).

I found this slowdown by installing ghc v.8.0.2 (anything v.7.10 and below was fast) or by installing pandoc v.1.18 & above. See directions to install ghc or install pandoc for testing. If needed, I can provide a simple set of commands to reproduce.

They both run similarly slow/delayed for me on both systems, but I have not seen reports from other *nix users seeing similar slowdowns, so I am guessing this is WSL related.

@sukhmel

This comment has been minimized.

Copy link

sukhmel commented Feb 20, 2017

This does not require Stack to replicate, GHC compiler alone is enough. I'm experiencing the same dreadfully slow compiler work. Besides, programs compiled with 8.0.x are slow too.

  1. wget http://downloads.haskell.org/~ghc/8.0.2/ghc-8.0.2-x86_64-deb8-linux.tar.xz
  2. tar -xJf ghc-8.0.2-x86_64-deb8-linux.tar.xz
  3. cd ghc-8.0.2
  4. ./configure --prefix=/tmp/ghc
  5. make install
  6. time /tmp/ghc/bin/ghc -e 'putStrLn ""'
@RyanGlScott

This comment has been minimized.

Copy link

RyanGlScott commented Feb 20, 2017

@benhillis: I believe you've bumped into GHC 8.0's new block-structured heap for 64-bit platforms. From the GHC 8.0.1 release notes:

We have a shiny new two-step memory allocator for 64-bit platforms (see Trac #9706). In addition to simplifying the runtime system’s implementation this may significantly improve garbage collector performance. Note, however, that Haskell processes will have an apparent virtual memory footprint of a terabyte or so. Don’t worry though, most of this amount is merely mapped but uncommitted address space which is not backed by physical memory.

@benhillis

This comment has been minimized.

Copy link
Member

benhillis commented Feb 20, 2017

@RyanGlScott - I suspect you are right. We need to modify the way our memory manager keeps track of uncommitted pages.

I'd be very curious to see some performance measurements on how much better their allocator performs versus raw mmap / munmap calls.

@therealkenc

This comment has been minimized.

Copy link
Collaborator

therealkenc commented Feb 20, 2017

I was going to quip that curiosity too, but stuck with "awesome" instead. So you benchmark 8.0 and find out it is some percent faster than 7.0. Or just as fast, but simpler. But you end up demonstrating not much in the exercise. The Haskell guys seem okay with a hello world app asking for a terabyte of virtual memory. The Chakra guys seem okay with asking for 32GB to print hello, and if you are going to do that, [expletive], why not ask for a TB. I am still academically interested in how they arrived at 32GB. Why not 64GB or 128GB? Certainly not because "that would be crazy".

It's working code. Smart people thought it was a good idea. Shrug. What you gonna do except sigh and re-work the memory manager.

@RyanGlScott

This comment has been minimized.

Copy link

RyanGlScott commented Feb 20, 2017

FWIW, Golang also does something similar by reserving a contiguous chunk of 512 GB of memory (see this comment).

I'm certainly not qualified enough to say how they came up with that number, other than that it's a power of two and—to use their words—"512 GB (MaxMem) should be big enough for now".

@pechersky

This comment has been minimized.

Copy link

pechersky commented Apr 19, 2017

I have a workaround in the meantime, based on the discussion in https://ghc.haskell.org/trac/ghc/ticket/13304. It involves compiling your own GHC which does not utilize the large address space allocation, and then using that as the GHC for your stack builds. My workaround relies on then supplying that GHC as the common GHC for your projects. In my example, I will recompile GHC 8.0.2 using whatever ghc you already have on your system. I will also make sure that Cabal is installed using this GHC -- otherwise, installing other packages will fall under the same problem of slowness. I suggest cleaning your ~/.stack and other stack directories to make sure you don't have any GHC lying around with the large-allocation functionality.

To fix, in the bash environment, I ran

# install necessary prereqs if not there
sudo apt-get install ghc happy alex
cd
git clone -b ghc-8.0.2-release --recursive git://git.haskell.org/ghc.git ghc-8.0.2
cd ghc-8.0.2
./boot
./configure --disable-large-address-space #can set --prefix=... here
make -j8 #-j(number-of-threads)
sudo make install
sudo ln -s /usr/local/bin/ghc ~/.local/bin/ghc-8.0.2 #or wherever your prefix put the binaries
# link the rest of the binaries, like runghc, ghci, etc
# this is to make sure the "system-ghc" is properly called
echo "system-ghc: true" >> ~/.stack/config.yaml
cd
# optional Cabal and cabal-install reinstallation to conform to new ghc
stack install Cabal
stack install cabal-install

Now you can do your stack install and stack build in your projects, using the specially compiled GHC.

You can monitor the VIRT usage with something like top or htop. Try stack exec ghci and monitor VIRT before and after.

@sgraf812

This comment has been minimized.

Copy link

sgraf812 commented Apr 21, 2017

Don't you also have to recompile stack for this?

@pechersky

This comment has been minimized.

Copy link

pechersky commented Apr 21, 2017

@sgraf812 In my use cases, I have not had to recompile stack. If I understand correctly, stack itself never builds anything, just calls the appropriate ghc to do so, through the project-level or system-level ghc (ghci, ghc-through-cabal, etc). This issue only appears during builds, so as long as the ghc that stack uses is fine, stack itself should be fine. Monitoring the path of the ghc binary using htop during a build step might help diagnose what ghc is being used if you still see the 1TB VIRT allocs.

@TerrorJack

This comment has been minimized.

Copy link

TerrorJack commented Apr 21, 2017

@pechersky This issue affects not only GHC 8, but also anything compiled with it (stack, pandoc, etc). The official binaries provided by stack developers happen to run fine because the latest release version is built with lts-6.25 and uses ghc-7.10.3.

@pechersky

This comment has been minimized.

Copy link

pechersky commented Apr 21, 2017

@TerrorJack Thank you for clarifying that for me. My work-around fixes the "stack ghc is slow" issue, as well as the @sukhmel MWE. I did rebuild Cabal in my workflow. Regarding pandoc, I would delegate to the example in their docs as "http://pandoc.org/installing.html#quick-stack-method". AFAIK stack just delegates builds to ghc, pandoc, etc, so as long as those are stack install after supplying the fixed GHC, I think you should be fine. You could also rebuild stack from source.

@TerrorJack

This comment has been minimized.

Copy link

TerrorJack commented Apr 21, 2017

@pechersky Also, the stack install Cabal step is not necessary. I'm working with GHC HEAD, and directly installed ghc to ~/.stack/programs/.... (using the --prefix= flag), then compiling Haskell projects using stack work out of the box. I guess regular GHC releases shall work the same.

@pechersky

This comment has been minimized.

Copy link

pechersky commented Apr 21, 2017

@TerrorJack The stack install Cabal outside of a project was in case someone wanted to use stack solver, which falls back to Cabal to inspect the .cabal file, calculate the build plan, and so on.

@TerrorJack

This comment has been minimized.

Copy link

TerrorJack commented Apr 21, 2017

@pechersky stack solver uses cabal-install (by invoking it and parsing the output). So in fact we need stack install cabal-install (or installing cabal-install by some other means)

@pechersky

This comment has been minimized.

Copy link

pechersky commented Apr 21, 2017

I have updated the code above to include your suggestion, @TerrorJack. According to https://docs.haskellstack.org/en/stable/faq/#what-is-the-relationship-between-stack-and-cabal, both the lib (... Cabal) and the executable (... cabal-install) are used. To be on the safe side, one could (re)install both.

@Roman2K

This comment has been minimized.

Copy link

Roman2K commented Aug 1, 2017

On the latest stable non-insider as of 31/07/2017, I noticed untars and configure scripts being extremely slow. Looking at the Task Manager, the Antimalware Service Executable seems the be the culprit, as though file writes are being inspected through that service. Both untars and configures do lots of small file writes.

Antimalware only takes ~5% of the CPU so maybe the slow part is the transfer of contents from WSL to that process. But I don't have any idea of what I'm talking about 😅.

So I tried the insider preview 16251 and immediately noticed a huge difference 😍. Don't have any numbers from before to compare to, unfortunately, but it feels twice as fast. Though still a lot slower than virtualized Linux.

Here are times from WSL compared to Alpine in VirtualBox on the same machine, within a tmpfs for reads and writes:

./configure [WSL]

real    2m32.522s
user    0m12.547s
sys     1m58.063s

./configure [WSL w/ rootfs excluded in Defender]

real    2m30.279s
user    0m13.813s
sys     1m56.328s

./configure [VirtualBox]

real    0m 23.22s
user    0m 8.22s
sys     0m 1.76s

tar xf [WSL]

real    0m18.166s
user    0m0.531s
sys     0m4.047s

tar xf [VirtualBox]

real    0m 0.84s
user    0m 0.60s
sys     0m 0.80s

Sorry I don't have more to provide. I hope you will keep on improving WSL, that's excellent work on an fantastic concept.

@benhillis

This comment has been minimized.

Copy link
Member

benhillis commented Aug 1, 2017

@Roman2K - Thanks for the information. I'm glad that it's much more usable for you, but we still do have a long way to go. We're looking into ways to improve base NTFS speed to help bring Windows filesystem performance more in line with Linux.

@therealkenc

This comment has been minimized.

Copy link
Collaborator

therealkenc commented Aug 1, 2017

I have anecdotally found the same problem with tar, which is (I think) separate from the huge memory allocation slowness. When I untar a large tarball (let's say 10GB) in a Linux VM, it returns almost immediately, because I have 20GB of RAM assigned to the VM and it all ends up in cache at near memcpy() speed. With WSL it seems to rate limit on writes to disk. I did not report it because I don't untar large files that often, and limiting on writes to disk is hard to prove these days without low level instrumentation (ugh, effort). But from the blinkenlights it looks like that is what's happening. It doesn't seem to be a CPU limiting thing, because of inefficient stat() calls per the git slowness complaints, say.

[edit] Another data point is sync never seems to do anything in WSL. With the same 10GB untar in a VM, sync takes countable time to flush the cache.

@jstarks

This comment has been minimized.

Copy link
Member

jstarks commented Dec 19, 2017

We have improved mmap performance further in insider build 17063. I believe this makes stack ghc bearable to use now :).

@AaronFriel

This comment has been minimized.

Copy link

AaronFriel commented Dec 20, 2017

Thank you! I can attest to the significant improvement.

@cemerick

This comment has been minimized.

Copy link

cemerick commented Jan 23, 2018

Another anecdote: I opted into 17074 hoping to get acceptable working conditions, but stack setup took exactly an hour to complete (even with windows defender temporarily disabled). For comparison, stack setup under cmd.exe starting from scratch took ~4 minutes. I'll give working with the result a shot, but it doesn't look promising.

Sorry for the negative report; keep doing great work, you'll get there. ❤️

@hvr

This comment has been minimized.

Copy link

hvr commented Feb 14, 2018

In the hopes this may be useful to somebody here: I've set up a GHC PPA optimised for WSL (i.e. built with --disable-large-address-space) over at

https://launchpad.net/~hvr/+archive/ubuntu/ghc-wsl

It should merely be a matter of

sudo add-apt-repository ppa:hvr/ghc-wsl
sudo apt-get update
sudo apt-get install ghc-8.2.2-prof cabal-install-head

and then simply prepending /opt/ghc/bin/ to your $PATH env-var.

@cboudereau

This comment has been minimized.

Copy link

cboudereau commented Apr 12, 2018

I would like to use VSCode on Windows + WSL + Stack ghci but due to this problem, it is really slow.

I will check if recompiling a custom ghc without large address space allocation is better or not. Thanks @pechersky !

@therealkenc

This comment has been minimized.

Copy link
Collaborator

therealkenc commented May 24, 2018

nb: GHC is "still slow" (as it were) but this was deemed fixedininsiderbuilds back in July 2017, and finally made its way into the April Update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment