Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use buffered IO when writing the output jar. #666

Merged
merged 2 commits into from
Feb 19, 2022

Conversation

mauriciogg
Copy link
Contributor

@mauriciogg mauriciogg commented Feb 9, 2022

This improves io performance for large jars
Running this code on my machine with a compressed 70MB zip with ~40K
entries now takes ~5s from ~25s. For the same uncompressed jar goes from ~20s to a subsecond operation.

* Simplify and optimize manifest modification logic

We only need to modify a file in the Jar but end up manually copying all
entries. For small jars this is not an issue, but for larger jars it has
terrible performance (specially in ci).
E.g on my M1 mac, stamping buckCompiler jar takes ~20 seconds with the
naive implementation. With the changes in this commit it now takes
~1.5s. The performance is even worse on ci which has terrible io (+10min
just to stamp this jar)

* avoid creating file with bad attrs
@shs96c
Copy link
Collaborator

shs96c commented Feb 18, 2022

The test failures look related to this change. Could you please make sure that bazel test tests/com/... passes for you?

This improves io performance for large jars
Running this code on my machine with a compressed 70MB zip with ~40K
entries now takes ~5s from ~25s. For an uncompressed jar this is a
subsecond operation.
@mauriciogg mauriciogg changed the title Simplify and optimize manifest modification logic Use buffered IO when writing the output jar. Feb 19, 2022
@mauriciogg
Copy link
Contributor Author

The test failures look related to this change. Could you please make sure that bazel test tests/com/... passes for you?

The failing tests would have been hard to fix using the FileSystem API, since the manifest has to be the first (not sure why java needs it there) entry and FileSystem hides the zip impl details and operates at a higher level. Turns out that simply wrapping the JarOutputStream around a buffered stream makes the performance good enough so I ended up just going that route. I updated the PR to reflect this change.

Copy link
Collaborator

@shs96c shs96c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for the patch, and the nice performance boost! :)

@shs96c shs96c merged commit 741fc11 into bazelbuild:master Feb 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants