-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combine server JAR and auto-extracting ZIP #5301
Comments
#6318 and https://github.com/bazelbuild/bazel/milestone/13 seem to be related. |
Well, this FR won't really change the size of the binary, though it might impact how well it compresses (by avoiding compressing an already-compressed file). It probably makes sense to add this bug to that milestone though. |
Looked at this again. As I had lost my previous diff, I had to redo this from scratch. See https://github.com/jmmv/bazel/tree/jar. Same conclusion as before but with more details now: making this work for Bazel is rather easy. But we cannot make progress because Blaze cannot work. I dug some more today and concluded that, indeed, the number of files is the problem. When a ZIP file includes more than 64k files, it has to use the ZIP64 format. That's fine in itself... except that ZIP64 files cannot be made self-extracting executables. To reproduce: just create a ZIP file with more entries than 64k and then prepend some junk to it (or the Bazel client iself). From there on, any Bazel currently has 24k files in its JAR, but Blaze has 75k. Making this change only for Bazel and not for Blaze is not a good idea because we'd need to handle two code paths and the benefit of this change is not large enough to justify that cost. So the answer here is to put Blaze on a diet first. Which... seeing the number of files... may be a good thing in itself. |
The funny thing is that
So... why cannot |
64K classes should be enough for everyone... I think that limiting ourselves to 64K classes will bite us in the future. I sure don't want to have code reviews of the sort "inline this inner class because then self-extraction will fail" Are you sure zip64 files cannot be self-extracting? Maybe it's not as easy as prepending some junk to them, but I'm hoping that it's not an inherent limitation of the file format and can be worked around by tweaking some in-file pointers appropriately. |
At least this tidbit insinuates that it's possible: https://s12316.pcdn.co/wp-content/documentation/xceed-filesystem-for-net/topic114.html "Xceed's self-extracting binary modules support Zip64, they can create self-extracting files larger than 2 GB" |
Huh, all details I found online (mostly limitations about all existing tooling I found) implied that this was not possible. I even went through the ZIP format documentation but found nothing relevant there and then, well, ran out of time to look into this on Friday. Yeah, limiting ourselves is probably a bad idea -- unless we could make the contents significantly smaller than they are today. |
All the work we're doing with https://github.com/bazelbuild/bazel/milestone/13 should limit the number of classes that appear in the final bazel zip. We haven't started doing anything similar internal (yet).
Did you measure this for your prototype? |
Maybe we get lucky and we don't even need to invoke the |
@meisterT I didn't measure yet but the impact of this change alone won't be large in the Bazel case: we have a lot of cruft outside of the JAR that we have to extract anyway, and the JAR is not thaaaat large. You can try though :) https://github.com/jmmv/bazel/tree/jar. In the Blaze case this will be more noticeable though because the JAR is much bigger and we only have a handful of files outside of it. My eventual desire is to get rid of the self-extraction altogether (though I'm sure that'll be controversial), which will definitely have an impact. Maybe this is not a big issue for end users because the extraction "only" happens one per upgrade... but think about two more cases:
|
@lberki Huh, good observation. I've been playing with singlejar and, indeed, if I tell it to use the Bazel client as the Java launcher, it creates a valid ZIP file (with all those 75k entries) with the prepended binary. Now, things are more complicated in Blaze because we do use the Java launcher feature there... but this now seems feasible without having a limit on the number of files. |
We do not have any plans to work on this issue. |
This is unfortunate. The change itself seemed straightforward. but a problem remained in the Blaze variant of this... which means even if someone else wanted to tackle the problem outside of Google, they wouldn't be able to due to the lack of a repro. Anyhow, I can see how this has little value and there are more important things to do. |
Yeah, it's unfortunate, but we have to prioritize :( |
The Bazel binary is a combination of the executable C++ client and a ZIP file. These two components are pasted together by a simple
cat
operation followed by azip -A
call to realign the ZIP contents.The ZIP file contains the server JAR. And the server JAR is, in itself, a ZIP file. Somewhat redundant, right?
It should be possible to combine both the auto-extracting ZIP with the server JAR. In other words: take the server JAR, add to it whatever embedded tools and data files we need, and then use the JAR itself as the payload for the C++ client. Then teach the client to just use itself as the JAR passed to the JVM instead of pointing the JVM to a file in the install base.
Doing this does not fully eliminate the need for the install base, but is a step towards that goal. This change would reduce the size of the install base significantly (because the server JAR is the biggest component in it), would also reduce the Bazel's startup time because the binary auto-extraction would need to unpack one fewer large file, and would also reduce the time it takes to build Bazel by shortening the packaging action (which is in the critical path).
I was able to make this work relatively easily with Bazel but then got stuck when applying the same change to Blaze. I think I hit an issue where the
zip
tool didn't like the large number of files within Blaze's server JAR and didn't find a way around it. Then gave up because I had other more important things to do... so filing this bug to avoid this idea getting lost.The text was updated successfully, but these errors were encountered: