-
Notifications
You must be signed in to change notification settings - Fork 29
Share the download cache with the invoking Bazel installation #63
Comments
Where is |
I'll rephrase the last sentence as a question. Were you able to run it with sandboxing? |
Yep, it runs with sandboxing, but because of a horrendous hack involving absolute paths. Actually in local dev I hardcode this: "-Dbazel.repository.cache=/tmp/cache" and have /tmp/cache as the repository cache of the invoking bazel instance. That's clearly not a portable solution :) I thought about this again and there are IMO three ways to speed up things with shared caching: one that works out of the box, without modifying bazel, one that works with a little bit of help from the user, one that means modifying bazel. Working out of the box, without modifying bazel: prewarmingThis might actually work across the spectrum, including remote execution, and without touching Bazel. Here, the WorkspaceDriver would be used during the build phase to generate a download cache, so that the downloading time is only paid once. I actually like that better that piggybacking on the caching mechanism of the invoking bazel instance because you have less assumptions concerning how bazel works, and those assumptions might break when you have, e.g., remote execution. For example, you might use a slightly modified WorkspaceDriver in a genrule with the following:
Then you test as follows:
I am pretty sure an intuitive API can be designed to encapsulate all that. The advantage here is that you have performance improvement across the board, without depending on a particular way to invoke Bazel. And you don't need to touch Bazel. With a little bit of help from the userHere, we let the user pass some caching options (there are many other than --repository_cache) through a '--define bazel.caching.options="--repository_cache=..."'. This is straightfoward to expand, see bazelbuild/bazel#3736. However, it means that caching does not work out of the box for the users. Moreover, depending on how the caches are implemented, there is a risk of cache poisoning, and it is problematic overall from a security point of view. Modifying bazelHere, we let bazel give the caching options that were passed to its local instance through the ctx object, maybe as a make variable (ctx.var). Again, there is a risk of cache poisoning, and this probably seems like too particular a use case to warrant loading up the ctx object with yet another configuration variable, especially since this comes with security issues. ====== Overall, I'm in support of prewarming, but I'm curious to have your opinion about that. |
I don’t understand what you mean by executing bazel once in the build phase.
Can you elaborate?
…On Tue, 24 Apr 2018 at 13:10 Hadrien Chauvin ***@***.***> wrote:
Yep, it runs with sandboxing, but because of a horrendous hack involving
absolute paths. Actually in local dev I hardcode this:
-"Dbazel.repository.cache=/tmp/cache". That's clearly not a portable
solution :)
I thought of another way of sharing external dependencies, which might
actually work across the spectrum, including remote execution, and without
touching Bazel: "prewarming".
Here, bazel would be executed during the build phase to generate a
download cache, so that the ~14s are only paid once; I actually like that
better because you have less assumptions concerning how bazel work, and
those assumptions might break when you have, e.g., remote execution.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#63 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABUIF7121iAytOF8GykoJChg5LtTbcanks5trvoOgaJpZM4TgI20>
.
|
Sorry, I wrote that too quickly. I have in mind using the WorkspaceDriver in a genrule. Changed the text so that it is clearer there as well. I don't know if it is possible, though, I need to try that. |
Oh interesting, so you’re thinking of somehow segregating the workspace
creation out so you can also run it as a genrule which will do “bazel
fetch”? But then it doesn’t have network access either
…On Tue, 24 Apr 2018 at 22:00 Hadrien Chauvin ***@***.***> wrote:
I have in mind using the WorkspaceDriver in a genrule.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#63 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABUIFxLIyafPd96v-rcggKssjSv4leyiks5tr3Y8gaJpZM4TgI20>
.
|
Ah yes they are not supposed to access the network. But does it mean that they can't? Another possibility, in this case, would be to have a new repository rule that uses repository_ctx.download to have a compressed archive. Then it is possible to use --experimental_distdir (https://github.com/bazelbuild/bazel/blob/beef2c452bb6b1c2dd2b08d3089062f26cccc859/src/main/java/com/google/devtools/build/lib/bazel/repository/RepositoryOptions.java) with a location within the workspace to have all the repository rules cached without rewriting everything. Something like a repository rule:
So this is almost the same, but now you have a new integration_testing_archives rule where you say exactly what you want to cache. And that's the missing piece, because it actually allows you to reuse the download cache of the "invoking Bazel instance" (this is a mouthful, but I didn't find better). Then you can disable downloading entirely by setting a "block-network" tag on the java_test in bazel_java_integration_test:
|
I think it can’t. @damienmg do you know?
Re experimental distdir- if you download to within the repo the. Wouldn’t
that mess up your git?
Sorry if I sound like a party pooper
…On Tue, 24 Apr 2018 at 22:29 Hadrien Chauvin ***@***.***> wrote:
Ah yes they are not supposed to access the network. But does it mean that
they can't?
Another possibility, in this case, would be to have a new repository rule
that uses repository_ctx.download to have a compressed archive. Then it is
possible to use --experimental_distdir (
https://github.com/bazelbuild/bazel/blob/beef2c452bb6b1c2dd2b08d3089062f26cccc859/src/main/java/com/google/devtools/build/lib/bazel/repository/RepositoryOptions.java)
with a location within the workspace to have all the repository rules
cached without rewriting everything.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#63 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABUIFwORWz0hD25YlIlfu1ds9mzbo05hks5tr30mgaJpZM4TgI20>
.
|
Preferably test and build shouldn't access the network and the linux sandbox forbid it I think (unless it was removed for performance reason, the blocking is possible). For build step the network is not restricted (for performance reason). You can also add a tag to ensure that it get access to the network Anyway, you can make the repository cache accessible by using a rctx.symlink. That indeed requires some hack to pass around the name of the repository cache. I don't believe it make sense to have bazel integrate such a feature. |
@damienmg Thank you for the feedback, I agree with you, I don't think that Bazel should expose that, that is in the realm of the test infrastructure. For the blocking of the network, the "block-network" tag can be added. @ittaiz I came up with another approach which feels way less hackish, but it works only for testing bazel > 0.12.0, using --experimental_distdir, see #71. |
Sounds interesting!
I’ll take a look as soon as I can
…On Wed, 25 Apr 2018 at 13:15 Hadrien Chauvin ***@***.***> wrote:
@damienmg <https://github.com/damienmg> Thank you for the feedback, I
agree with you, I don't think that Bazel should expose that, that is in the
realm of the test infrastructure.
For the blocking of the network, the "block-network" tag can be added.
@ittaiz <https://github.com/ittaiz> I came up with another approach which
feels way less hackish, but it works only for testing bazel > 0.12.0, using
--experimental_distdir, see #71
<#71>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#63 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABUIF8q_G_9iPhJN0NOSWwgoVtCsbOB8ks5tsEzUgaJpZM4TgI20>
.
|
... or have another way to share external dependencies.
Currently, the Bazel environment that is created by bazel-integration-testing has a download cache that is separated from the invoking Bazel installation (let's call it the "upstream" download cache). It is possible to have a download cache when developing by pinning the TEST_TMPDIR, but since the download cache is a content-addressable storage, wouldn't it make sense to simply use the upstream download cache as it would be safe anyway? In some cases this could greatly improve performance. Agreed, such external dependencies should be avoided when doing integration testing, but sometimes it is more convenient.
This is not something that could be added right away, though, as it would require Bazel to publish in some way the location of its download cache.
The text was updated successfully, but these errors were encountered: