New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ripgrep binaries are an order of magnitude larger than they need to be #413

Closed
BurntSushi opened this Issue Mar 19, 2017 · 11 comments

Comments

Projects
None yet
10 participants
@BurntSushi
Copy link
Owner

BurntSushi commented Mar 19, 2017

This discussion started in #412.

The problem is that when you compile ripgrep today, you get big binaries. Both the cargo install approach and installing Linux binaries from the Github releases yield large binaries.

To address the inconsistency concern in the abstract: there are many inconsistencies between the ripgrep releases and what's installed with cargo install, and these inconsistencies will always be there. For example, cargo install doesn't provide easy access to a man page, license info or to the various auto-completion scripts. So the fact that there is yet another inconsistency doesn't really bother me. Namely, cargo install is not the recommended way to install ripgrep, so I'm less sympathetic to issues that people have using it. It's provided on a best effort basis because it's occasionally convenient for folks who are otherwise Rust programmers.

Secondly, I will describe why I have debug = true for release builds. Perhaps there is some other solution that I haven't thought of.

There are two primary benefits I perceived when setting debug = true:

  1. A large fraction of time I spend developing ripgrep is with it compiled in release mode because I want to inspect its runtime performance. Release mode by default doesn't include debug symbols, so this makes profiling harder. If debug = false were committed to source control, then I would have to constantly be changing around that value in the Cargo.toml during development, which is a huge pain. I don't want to do it.
  2. If and when ripgrep panics and the bug is hard to reproduce, the reporter can set RUST_BACKTRACE=1 on some systems and get a useful stack trace if and only if debug symbols are in the binary.

(2) is interesting, because ripgrep hardly ever panics, so it actually has not turned out to be as useful as I would have thought. In the case where it does happen, we could ask the user to compile ripgrep on their own---which is probably what most projects do---but I had wanted to make it as effortless as possible to report a bug. Nevertheless, since it rarely happens, it seems weird to optimize for that case.

With all that said, it seems like this is an argument in favor of stripping debug info from the binaries released in Github, but to retain the debug = true in Cargo.toml. Is there another solution? Can Cargo's config do something to help me out here?

@sfackler

This comment has been minimized.

Copy link
Contributor

sfackler commented Mar 19, 2017

As of the 1.16 release, you can set debug = 1 or debug = 2 to select the level of debuginfo you want in addition to debug = true (which acts like debug = 2). I believe level 1 debuginfo includes everything necessary for backtraces and profiling? Not sure what the size difference is though.

@Aaronepower

This comment has been minimized.

Copy link
Contributor

Aaronepower commented Mar 19, 2017

I definitely feel the same with having to worry about making sure to not to publish with debug = true, one possible solution I was thinking was if there was another cargo profile. A way to differentiate between a cargo install and cargo build --release. With the install profile restricted in what it can override, maybe it can only toggle debug.

@pftbest

This comment has been minimized.

Copy link

pftbest commented Mar 28, 2017

Please also note that when debuginfo is enabled frame pointers are not eliminated.
rust-lang/rust#11906
This may (or may not) have some impact on a performance.

@DoumanAsh

This comment has been minimized.

Copy link
Contributor

DoumanAsh commented Apr 2, 2017

You can also consider to add lto = true for published builds.
As far as i know it is not a default for release builds.
But it requires testing.
It didn't give much difference on msvc toolchain for me

On other hand you can have debug symbols separately (cargo build even produces pdb for binaries), but i'm not sure if it possible to tell cargo to produce them separately instead of bundling together?
Since cargo build is capable (need confirmation on linux) it should be possible?
Most profiling tools accept loading of symbols in the same directory as your binary.

@kbknapp

This comment has been minimized.

Copy link
Contributor

kbknapp commented Apr 4, 2017

Using lto = true was discussed in #325 with the problem being compile times suffer for little gain (performance). However, quoting @BurntSushi

If someone wanted this then I'd be open to adding it only in CI when the final release artifact is produced.

@retep998

This comment has been minimized.

Copy link

retep998 commented Apr 18, 2017

Enabling debuginfo for msvc during cargo install provides no benefit because the .pdb is thrown away so you couldn't get a sane backtrace anyway. If you manually distribute compiled binaries with the .pdb then the debuginfo becomes valuable.

It didn't give much difference on msvc toolchain for me

This is because what you're seeing is LTO stripping unreferenced functions which ld is actually really bad at while link.exe is amazing at it. As a result LTO is just bringing the size of gnu binaries closer to where msvc binaries already are. LTO is still useful to maximize inlining potential though.

@scottchiefbaker

This comment has been minimized.

Copy link

scottchiefbaker commented Apr 18, 2017

I didn't know debug info was enabled and while I was doing a casual comparison of ag vs rg and I noticed the file sizes were drastically different.

ag is 244KB
rg is 25MB

I think it send the wrong message if you're 100x larger than the competition. The majority of people using ripgrep will never use the debug information so it's wasteful in both bandwidth and disk space.

@BurntSushi

This comment has been minimized.

Copy link
Owner Author

BurntSushi commented Apr 19, 2017

Why does it matter? If you care about a few megs of disk space, then I'm willing to bet that's a pretty specialized use case. In that case, just strip the binary yourself.

I don't know what "wrong message" you are referring to. Could you please be more specific?

@retep998

This comment has been minimized.

Copy link

retep998 commented Apr 19, 2017

Wow, the difference between msvc and gnu is pretty huge. For the latest release x86_64-pc-windows-gnu is 24,228,086 bytes while x86_64-pc-windows-msvc is 2,445,312 bytes, almost exactly an order of magnitude difference.

@kaushalmodi

This comment has been minimized.

Copy link
Contributor

kaushalmodi commented May 9, 2017

Setting debug to false reduced built rg binary size on RHEL 6.6 from 25 MB to 5.8 MB.

@luser

This comment has been minimized.

Copy link

luser commented May 22, 2017

One thing that might be feasible in the future is using ELF compressed sections. For example:

$ ls -l /tmp/rg
-rwxr-xr-x 1 luser docker 21233872 May 22 11:23 /tmp/rg
$ objcopy --compress-debug-sections /tmp/rg
$ ls -l /tmp/rg
-rwxr-xr-x 1 luser docker 6196568 May 22 11:24 /tmp/rg

Unfortunately libbacktrace doesn't support reading them yet, so this breaks their usefulness in Rust backtraces. If that got fixed it'd be a nice easy win (DWARF is pretty verbose).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment