Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce Bazel binary size by deduping embedded tools #2385

Closed
aj-michael opened this issue Jan 20, 2017 · 8 comments
Closed

Reduce Bazel binary size by deduping embedded tools #2385

aj-michael opened this issue Jan 20, 2017 · 8 comments
Assignees
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) type: feature request
Milestone

Comments

@aj-michael
Copy link
Contributor

Description of the problem / feature request / question:

The Bazel binary is large, partly due to lots of duplicated dependencies in the embedded tools deploy jars.

If possible, provide a minimal example to reproduce the problem:

At 88eca6e, the compiled Bazel binary is 141M.

$ git rev-parse @
88eca6e4f3951fe0ee1be3306bfb85f560bc046a

$ du -h bazel-bin/src/bazel
141M    bazel-bin/src/bazel

$ zipinfo -1 bazel-bin/src/bazel | grep embedded_tools.*\\.jar$ | xargs unzip -j bazel-bin/src/bazel -d jars
<inflating lots of jars>

$ zipinfo 'jars/*.jar' | sed '1,2d;$d' | grep 'class$' | awk '{c[$9]++}END{for (f in c) print c[f], f}' | sort -h | tail
22 archives were successfully processed.
11 org/checkerframework/dataflow/util/HashCodeUtils.class
11 org/checkerframework/dataflow/util/MostlySingleton$1.class
11 org/checkerframework/dataflow/util/MostlySingleton$2.class
11 org/checkerframework/dataflow/util/MostlySingleton.class
11 org/checkerframework/dataflow/util/MostlySingleton$State.class
11 org/checkerframework/dataflow/util/NodeUtils.class
11 org/checkerframework/dataflow/util/PurityChecker.class
11 org/checkerframework/dataflow/util/PurityChecker$PurityCheckerHelper.class
11 org/checkerframework/dataflow/util/PurityChecker$PurityResult.class
11 org/checkerframework/dataflow/util/PurityUtils.class

So we have up to 11 copies of some class files in the embedded tools. I've attached a copy of the full results of the last command: dupes.txt. One interesting thing to note is that com.google.common aka Guava appears 10 times. third_party/guava/guava-21.0-20161101.jar (which is only one of the jars in //third_party:guava) is 2.4MB. So 10 copies of that jar is ~17% of the Bazel binary.

The root of the problem is that way that the @bazel_tools repository is set up. When we have a tool that is used by a SpawnAction in a native rule, we typically define a java_binary for that tool in the BUILD file and then build a deploy jar and put it in the embedded_tools filegroup that gets put into this binary. Since most of our tools depend on common library like guava (or domain specific ones like the android common libararies), each deploy jar bundles a separate copy of it.

There are a couple of solutions to this problem, each with its own drawbacks.

  1. Instead of embedding deploy jars for the tools, embed just the library jars, and reconstruct the dependency graph in the BUILD.tools files. This would allow us to have only one embedded copy of guava, but has the downside that we are maintaining the dependencies twice, once in BUILD and once in BUILD.tools.
  2. Instead of embedding deploy jars for the tools, embed the source code and build the tools in the client's workspace. This has the advantage of not needing multiple copies of guava, and allows us to only specify the dependencies once (in BUILD.tools, not BUILD). However, it has the downside of making clean builds slower because you need to build the tools you use. It also has the downside of allowing bazel to build itself without its tools building. E.g. if I change the AndroidResourceProcessingAction, but don't update it's BUILD.tools and don't test it out, bazel build //src:bazel will still succeed, but building an android project with the output bazel will fail.

Environment info

  • Operating System: Ubuntu 14.04 LTS

  • Bazel version (output of bazel info release): development version

  • If bazel info release returns "development version" or "(@non-git)", please tell us what source tree you compiled Bazel from; git commit hash is appreciated (git rev-parse HEAD): 88eca6e

@aj-michael
Copy link
Contributor Author

aj-michael commented Jan 20, 2017

I've been hacking up a prototype of option 1 in https://bazel-review.googlesource.com/#/c/8410/. Down to 99MB from 141MB.

@aj-michael
Copy link
Contributor Author

@damienmg , what do you think of this approach? It certainly has trade offs, but https://bazel-review.googlesource.com/#/c/8410/ reduces the bazel binary by 30%. Is this worth pursuing?

@damienmg
Copy link
Contributor

30% is awesome, it is definitely worth it.

@damienmg
Copy link
Contributor

option 1 is definitely preferrable. It sures has the downside of making our build a bit more hairy but meh :/

@ulfjack
Copy link
Contributor

ulfjack commented Jan 21, 2017

Another option is to not embed the files but check them into a remote workspace, and ask users to add that to their WORKSPACE file.

@aj-michael
Copy link
Contributor Author

Ok, I will try to clean up https://bazel-review.googlesource.com/#/c/8410/ early next week. (The one hairy part I'm unsure of the right approach is some dependencies on java_proto_library's.)

Two quick observations:

  1. There are 6ish copies of //third_party:guava in deploy jars that are used as tools for the java_toolchain rule in tools/jdk/BUILD (GenClass_deploy.jar, SingleJar_deploy.jar, etc). I can't use option 1 on those, because it introduces a circular dependency where the java_binary that I would put in the BUILD.tools file to create them depends on the java_toolchain implicitly.
  2. There are a bunch of duplicate classes in //third_party:guava itself that we could remove to save some space. For instance, the error_prone_core jar seems to contain copies of guava, dataflow_checker and jformatstring, each of which are also included in the filegroup separately. If we got a smaller error_prone_core jar, we could save a few MB.

@damienmg damienmg added this to the 0.6 milestone Jan 23, 2017
@damienmg damienmg added the P3 We're not considering working on this, but happy to review a PR. (No assignee) label Jan 23, 2017
@aj-michael
Copy link
Contributor Author

Regarding @ulfjack comment: I think that's certainly the optimal way to go for most of these tools, especially the Android ones. In fact, some of the emulator tools that we will need for android_test we have already started the ball rolling on a new git repository for them.

bazel-io pushed a commit that referenced this issue Aug 25, 2017
This change reduces the size taken up in the bazel binary by Android tools deploy jars from 38.2 mb to 9.8 mb, which is 15% of the bazel binary size. Also, some minor cleanups of our BUILD files.

#2385

RELNOTES: None
PiperOrigin-RevId: 166373241
@aj-michael
Copy link
Contributor Author

Closing this because the android tools deduped their dependencies in e005adf.

There's probably more work to be done, but ulf's comment is the better long-term plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) type: feature request
Projects
None yet
Development

No branches or pull requests

4 participants