Share the download cache with the invoking Bazel installation #63

hchauvin · 2018-04-23T15:40:23Z

... or have another way to share external dependencies.

Currently, the Bazel environment that is created by bazel-integration-testing has a download cache that is separated from the invoking Bazel installation (let's call it the "upstream" download cache). It is possible to have a download cache when developing by pinning the TEST_TMPDIR, but since the download cache is a content-addressable storage, wouldn't it make sense to simply use the upstream download cache as it would be safe anyway? In some cases this could greatly improve performance. Agreed, such external dependencies should be avoided when doing integration testing, but sometimes it is more convenient.

This is not something that could be added right away, though, as it would require Bazel to publish in some way the location of its download cache.

ittaiz · 2018-04-23T15:50:01Z

this is really interesting. @anchlovi is also facing some issues with setting up external dependencies for the scratch workspace and your solution sounds really interesting. @anchlovi wdyt?

ittaiz · 2018-04-24T05:45:42Z

Where is REPOSITORY_CACHE that you used in the rules_kotlin PR defined?
Also it sounds strange that the scratch_workspace will have access to the shared cache directory since its out of its sandbox

ittaiz · 2018-04-24T05:46:53Z

I'll rephrase the last sentence as a question. Were you able to run it with sandboxing?

hchauvin · 2018-04-24T10:10:20Z

Yep, it runs with sandboxing, but because of a horrendous hack involving absolute paths. Actually in local dev I hardcode this: "-Dbazel.repository.cache=/tmp/cache" and have /tmp/cache as the repository cache of the invoking bazel instance. That's clearly not a portable solution :)

I thought about this again and there are IMO three ways to speed up things with shared caching: one that works out of the box, without modifying bazel, one that works with a little bit of help from the user, one that means modifying bazel.

Working out of the box, without modifying bazel: prewarming

This might actually work across the spectrum, including remote execution, and without touching Bazel.

Here, the WorkspaceDriver would be used during the build phase to generate a download cache, so that the downloading time is only paid once. I actually like that better that piggybacking on the caching mechanism of the invoking bazel instance because you have less assumptions concerning how bazel works, and those assumptions might break when you have, e.g., remote execution.

For example, you might use a slightly modified WorkspaceDriver in a genrule with the following:

scratch a WORKSPACE with your external deps
build them with a repository_cache location that you control: "bazel build @foo//bar --download_cache=..."
put everything in the repository_cache in a tarball that is the output of the genrule (again, this is just a CAS, so that's perfectly safe to do).

Then you test as follows:

put the tarball as data to the integration test
untar it in a repository_cache location that you control
invoke bazel, e.g.: "bazel test //hello:world --repository_cache=..."

I am pretty sure an intuitive API can be designed to encapsulate all that. The advantage here is that you have performance improvement across the board, without depending on a particular way to invoke Bazel. And you don't need to touch Bazel.

With a little bit of help from the user

Here, we let the user pass some caching options (there are many other than --repository_cache) through a '--define bazel.caching.options="--repository_cache=..."'. This is straightfoward to expand, see bazelbuild/bazel#3736. However, it means that caching does not work out of the box for the users. Moreover, depending on how the caches are implemented, there is a risk of cache poisoning, and it is problematic overall from a security point of view.

Modifying bazel

Here, we let bazel give the caching options that were passed to its local instance through the ctx object, maybe as a make variable (ctx.var). Again, there is a risk of cache poisoning, and this probably seems like too particular a use case to warrant loading up the ctx object with yet another configuration variable, especially since this comes with security issues.

======

Overall, I'm in support of prewarming, but I'm curious to have your opinion about that.

ittaiz · 2018-04-24T18:39:13Z

I don’t understand what you mean by executing bazel once in the build phase. Can you elaborate?

…

On Tue, 24 Apr 2018 at 13:10 Hadrien Chauvin ***@***.***> wrote: Yep, it runs with sandboxing, but because of a horrendous hack involving absolute paths. Actually in local dev I hardcode this: -"Dbazel.repository.cache=/tmp/cache". That's clearly not a portable solution :) I thought of another way of sharing external dependencies, which might actually work across the spectrum, including remote execution, and without touching Bazel: "prewarming". Here, bazel would be executed during the build phase to generate a download cache, so that the ~14s are only paid once; I actually like that better because you have less assumptions concerning how bazel work, and those assumptions might break when you have, e.g., remote execution. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#63 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABUIF7121iAytOF8GykoJChg5LtTbcanks5trvoOgaJpZM4TgI20> .

hchauvin · 2018-04-24T19:00:11Z

Sorry, I wrote that too quickly. I have in mind using the WorkspaceDriver in a genrule. Changed the text so that it is clearer there as well.

I don't know if it is possible, though, I need to try that.

ittaiz · 2018-04-24T19:15:16Z

Oh interesting, so you’re thinking of somehow segregating the workspace creation out so you can also run it as a genrule which will do “bazel fetch”? But then it doesn’t have network access either

…

On Tue, 24 Apr 2018 at 22:00 Hadrien Chauvin ***@***.***> wrote: I have in mind using the WorkspaceDriver in a genrule. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#63 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABUIFxLIyafPd96v-rcggKssjSv4leyiks5tr3Y8gaJpZM4TgI20> .

hchauvin · 2018-04-24T19:29:41Z

Ah yes they are not supposed to access the network. But does it mean that they can't?

Another possibility, in this case, would be to have a new repository rule that uses repository_ctx.download to have a compressed archive. Then it is possible to use --experimental_distdir (https://github.com/bazelbuild/bazel/blob/beef2c452bb6b1c2dd2b08d3089062f26cccc859/src/main/java/com/google/devtools/build/lib/bazel/repository/RepositoryOptions.java) with a location within the workspace to have all the repository rules cached without rewriting everything.

Something like a repository rule:

# WORKSPACE
integration_testing_archives(
    name = "archives",
    content = {
        "https://foo/bar.zip": "<sha256 digest>",
        "https://hello/world.tar.gz": "<sha256 digest>",
    }
)

# Integration WORKSPACE

# some_repository_rule is set to download https://foo/bar.zip.  But with the distdir,
# it actually looks up some predefined path for a matching sha256.
some_repository_rule(
    name = "repo",
)

So this is almost the same, but now you have a new integration_testing_archives rule where you say exactly what you want to cache. And that's the missing piece, because it actually allows you to reuse the download cache of the "invoking Bazel instance" (this is a mouthful, but I didn't find better). Then you can disable downloading entirely by setting a "block-network" tag on the java_test in bazel_java_integration_test:

bazel_java_integration_test(
    ...,
    cache = "@archives",
    tags = ["block-network"],
)

ittaiz · 2018-04-24T20:06:10Z

I think it can’t. @damienmg do you know? Re experimental distdir- if you download to within the repo the. Wouldn’t that mess up your git? Sorry if I sound like a party pooper

…

On Tue, 24 Apr 2018 at 22:29 Hadrien Chauvin ***@***.***> wrote: Ah yes they are not supposed to access the network. But does it mean that they can't? Another possibility, in this case, would be to have a new repository rule that uses repository_ctx.download to have a compressed archive. Then it is possible to use --experimental_distdir ( https://github.com/bazelbuild/bazel/blob/beef2c452bb6b1c2dd2b08d3089062f26cccc859/src/main/java/com/google/devtools/build/lib/bazel/repository/RepositoryOptions.java) with a location within the workspace to have all the repository rules cached without rewriting everything. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#63 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABUIFwORWz0hD25YlIlfu1ds9mzbo05hks5tr30mgaJpZM4TgI20> .

damienmg · 2018-04-25T08:23:31Z

Preferably test and build shouldn't access the network and the linux sandbox forbid it I think (unless it was removed for performance reason, the blocking is possible). For build step the network is not restricted (for performance reason). You can also add a tag to ensure that it get access to the network

Anyway, you can make the repository cache accessible by using a rctx.symlink. That indeed requires some hack to pass around the name of the repository cache. I don't believe it make sense to have bazel integrate such a feature.

hchauvin · 2018-04-25T10:15:47Z

@damienmg Thank you for the feedback, I agree with you, I don't think that Bazel should expose that, that is in the realm of the test infrastructure.

For the blocking of the network, the "block-network" tag can be added.

@ittaiz I came up with another approach which feels way less hackish, but it works only for testing bazel > 0.12.0, using --experimental_distdir, see #71.

ittaiz · 2018-04-25T10:39:29Z

Sounds interesting! I’ll take a look as soon as I can

…

On Wed, 25 Apr 2018 at 13:15 Hadrien Chauvin ***@***.***> wrote: @damienmg <https://github.com/damienmg> Thank you for the feedback, I agree with you, I don't think that Bazel should expose that, that is in the realm of the test infrastructure. For the blocking of the network, the "block-network" tag can be added. @ittaiz <https://github.com/ittaiz> I came up with another approach which feels way less hackish, but it works only for testing bazel > 0.12.0, using --experimental_distdir, see #71 <#71>. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#63 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABUIF8q_G_9iPhJN0NOSWwgoVtCsbOB8ks5tsEzUgaJpZM4TgI20> .

ittaiz · 2018-04-29T03:09:31Z

fixed by #71 thanks @hchauvin!

hchauvin mentioned this issue Apr 23, 2018

Executable and runfiles "teleportation" #64

Open

This was referenced Apr 24, 2018

Speed up local development with a global integration environment #67

Open

add code coverage support bazelbuild/rules_kotlin#52

Open

hchauvin mentioned this issue Apr 25, 2018

add a repository rule for download caching #71

Merged

ittaiz closed this as completed Apr 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Share the download cache with the invoking Bazel installation #63

Share the download cache with the invoking Bazel installation #63

hchauvin commented Apr 23, 2018

ittaiz commented Apr 23, 2018

ittaiz commented Apr 24, 2018

ittaiz commented Apr 24, 2018

hchauvin commented Apr 24, 2018 •

edited

ittaiz commented Apr 24, 2018 via email

hchauvin commented Apr 24, 2018 •

edited

ittaiz commented Apr 24, 2018 via email

hchauvin commented Apr 24, 2018 •

edited

ittaiz commented Apr 24, 2018 via email

damienmg commented Apr 25, 2018

hchauvin commented Apr 25, 2018

ittaiz commented Apr 25, 2018 via email

ittaiz commented Apr 29, 2018

Share the download cache with the invoking Bazel installation #63

Share the download cache with the invoking Bazel installation #63

Comments

hchauvin commented Apr 23, 2018

ittaiz commented Apr 23, 2018

ittaiz commented Apr 24, 2018

ittaiz commented Apr 24, 2018

hchauvin commented Apr 24, 2018 • edited

Working out of the box, without modifying bazel: prewarming

With a little bit of help from the user

Modifying bazel

ittaiz commented Apr 24, 2018 via email

hchauvin commented Apr 24, 2018 • edited

ittaiz commented Apr 24, 2018 via email

hchauvin commented Apr 24, 2018 • edited

ittaiz commented Apr 24, 2018 via email

damienmg commented Apr 25, 2018

hchauvin commented Apr 25, 2018

ittaiz commented Apr 25, 2018 via email

ittaiz commented Apr 29, 2018

hchauvin commented Apr 24, 2018 •

edited

hchauvin commented Apr 24, 2018 •

edited

hchauvin commented Apr 24, 2018 •

edited