Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GHC is too big #8390

Closed
edolstra opened this issue Jun 18, 2015 · 16 comments
Closed

GHC is too big #8390

edolstra opened this issue Jun 18, 2015 · 16 comments

Comments

@edolstra
Copy link
Member

The GHC package is enormous:

# du -sch /nix/store/w81ga836igs6ww8xvm5bckwql0178ma5-ghc-7.10.1/ 
913M    total

While this is probably unpleasant for users (and packages that depend on GHC), it also puts significant strain on Hydra: copying nearly a GB of data to each build machine, for every version of GHC, takes a lot of time. Not to mention the time/cost of uploading those copies to the binary cache.

Maybe this could be reduced somehow? For example, the path above contains 626 MB of *.a files, in profiling and non-profiling versions, which are in addition to the shared library versions.

@vcunat
Copy link
Member

vcunat commented Jun 18, 2015

/cc @peti and #7117.

@peti
Copy link
Member

peti commented Jun 18, 2015

I don't think that we can realistically drop any of those files from the ghc package; all of these files are required. We could take advantage of multiple outputs (#4504) to reduce the load required for the average build, which would help quite a bit.

Generally speaking, we have only two active versions of GHC in Nixpkgs: 7.8.4 and 7.10.1. I believe that builds requiring the former versions are rare, really, so the average Hydra build slave should not require more than two copies of GHC at a time

@vcunat
Copy link
Member

vcunat commented Jun 18, 2015

The *.a files are needed to create compile static executables (thinking of haskell stuff linkage only), right? Is it viable to e.g. use only dynamic linkage by default and thus omit *.a from the default haskell stuff?

@edolstra
Copy link
Member Author

@vcunat I guess you generally want static linking to prevent a runtime dependency on GHC :-)

@peti Two versions is not as bad as I thought, though of course you need to multiply it by the number of Nixpkgs jobset (master, staging, release branches...).

@vcunat
Copy link
Member

vcunat commented Jun 18, 2015

No, I would personally imagine Haskell so-libs shattered from one store path into many smaller ones and have runtime dependencies on (and among) those.

EDIT: that is, unless there are some other nontrivial disadvantages in Haskell dynamic linking (perhaps lower performance due to fewer optimization opportunities?).

@peti
Copy link
Member

peti commented Jun 18, 2015

@vcunat, splitting the different libs into multiple outputs is what #4504 is about. I don't believe anyone is actively working on that issue, though.

@lucabrunox
Copy link
Contributor

@edolstra @peti what's the problem on depending on ghc with dyn libs? Isn't
that what we do for C/C++ apps that depend on libc/libstdc++ after all? Is
depending on ghc runtime libs so costly compared to statically linked libs?
Probably not.

On Thu, Jun 18, 2015 at 8:59 PM, Peter Simons notifications@github.com
wrote:

@vcunat https://github.com/vcunat, splitting the different libs into
multiple outputs is what #4504
#4504 is about. I don't believe
anyone is actively working on that issue, though.


Reply to this email directly or view it on GitHub
#8390 (comment).

NixOS Linux http://nixos.org

@vcunat
Copy link
Member

vcunat commented Jun 18, 2015

IMHO splitting individual libs is secondary, because most users of Haskell stuff will need to have GHC itself as well, which will require to have all of the currently bundled libs anyway (I suppose).

What might make more sense is this *.a and *.so dichotomy. It seems a waste to me to always have both sets, though I must admit I don't have a good idea about static vs. dynamic use cases with GHC.

@peti
Copy link
Member

peti commented Aug 9, 2015

@vcunat, some Haskell projects like pandoc and git-annex have a crazy number of build inputs. When linked dynamically, these programs require several dozen shared libraries. That works fine in principle, but those dependencies make the programs start up really slowly because the dyn-linking process is so expensive (see #4239, for example). In these cases, we disable dynamic linking to get rid of all those Haskell dependencies. This means, of course, that all Haskell libraries need to have *.a versions around for static linking to succeed. Another advantage of statically linked Haskell binaries is that they don't depend on ghcat run-time any more! So, basically, there are good reasons why both static and dynamic versions of those libraries need to exist.

Now, splitting GHC's run-time libraries into a separate output sounds like a good idea at first glance, but it won't accomplish much because lib makes up 94% of the total size of the GHC store path. To make headway here, you'd really have to have an output path per library.

@Mathnerd314
Copy link
Contributor

For comparison, http://downloads.haskell.org/~ghc/7.10.2/ghc-7.10.2-x86_64-unknown-linux-deb7.tar.bz2 is 1.1 GB unpacked, Arch Linux's GHC is 943.7 MB, and Debian's GHC is 930MB (split across 4 packages), so NixOS is actually on the small side.

@domenkozar
Copy link
Member

GHC has perl in the closure for no reason good reason #10541

@vcunat
Copy link
Member

vcunat commented Oct 22, 2015

So, the current numbers on 15.09 – ghc has ~1001 MB, consisting of:

  • 133 MB of HTML docs which can be split easily and immediately,
  • 100 MB of *.so libs (only!),
  • 628 MB of *.a libs,
  • 133 MB of *hi (*.{hi,p_hi,dyn_hi}) where I'm not certain about their role.
  • It's funny that the source tarball takes 52 MB unpacked...

Apart from docs, *.so seem worth splitting, assuming that we do have stuff dynamically linked against them not needing the other parts. I can't see a separate option for the paths of *.so (in ./.configure --help), so it's likely we will have to move them manually and then somehow make ghc find those files. Maybe putting -L/foo/bar somewhere would be enough. Any ideas about this stuff? (I don't much use Haskell libs myself.)

@copumpkin
Copy link
Member

Back when GHC only did static linkage, I recall that running strip shrank things significantly. I wonder if those massive .a files could be un-ar'd, stripped, and then put back together for any sort of size improvement.

@vcunat .hi files are interface metadata files that GHC uses to keep track of what's in the corresponding objects.

Also, cc @thoughtpolice who's the primary GHC maintainer and might have ideas on what we can do to to improve this.

@vcunat
Copy link
Member

vcunat commented Dec 5, 2015

IIRC strip can be run directly on *.a files, and perhaps even is by default in stdenv.

@teh
Copy link
Contributor

teh commented Jan 23, 2016

If people are deploying binaries on a server it's best to use enableSharedExecutables = false; to avoid pulling in the 1G dependency.

With shared being the default maybe we should split out at least the .so files to avoid having that extra 1G by accident on servers.

A preliminary test [1] shows a lib/rest split works but I don't have the resources on my small laptop to test the full package set.

[1]
teh@45f2813

@peti
Copy link
Member

peti commented Apr 28, 2016

Closing in favor of #4504, which encompasses this topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants