New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow using remote cache for repository cache #6359
Comments
Also quoting @buchgr
|
@or-shachar have you made any progress on this? I know you investigated implementing it? :-) |
Is anyone working on this? |
no that I am aware of! Feel free to pick it up :-). |
This would be a great feature for stateless builds. |
There's a Remote Repository Cache proposal from @jmillikin-stripe in https://github.com/bazelbuild/proposals. But its still in draft state. @jmillikin-stripe any updates on the proposal or pointers for how people could help out? |
There's a draft implementation of the .proto at #8782, and I'm currently awaiting review from a Bazel core maintainer before I start writing the implementation. |
Our team is very keen to see this progressed. We've been struggling with flakey CI for awhile now due to
To alleviate this we've tar'd up our external folder and stored it on an internal file server, which we download and extract before CI, while it works to reduce flakyness it is rather manual when updating any external repositories. |
We are currently trying to agree on an API. Here's a proposal similar to @jmillikin-stripe's that we are currently discussing: https://docs.google.com/document/d/10ari9WtTTSv9bqB_UU-oe2gBtaAA7HyQgkpP-RFP80c/edit?disco=AAAADULntWg&ts=5d5eecc1 |
Any progress/updates? |
#10622 is a proposed implementation of the most recent proposal. |
Thank you for the PR @jmillikin-stripe INGORE all above Experimental Remote Asset API Support To use this with Bazel, specify --experimental_remote_downloader=grpc://replace-with-your.host:port. |
#10622 provides parameter "experimental_remote_downloader" Our current topology is: T-Engine server is what we need for Tracing and High Availability(retry 3 times). Remote-cache server can be restart at any time while user can't notice that |
I'm not planning to implement an HTTP version of the remote downloader code. Getting the gRPC version into Bazel took a large amount of work, and I do not have time to do the same for HTTP. According to alibaba/tengine#672, Tengine supports HTTP/2. I believe you could use it to proxy gRPC, because gRPC is built directly on the HTTP/2 protocol. The Tengine changelog says gRPC is available in versions 2.3.0 and later. This would require adding gRPC handlers to your bazel-remote-cache implementation. |
Thank you for the solution @jmillikin-stripe Now I am trying the new feature, but got Error bazel-remote build with totay's source , bazel version 3.3.1 When I build a very simple demo project written in c++ first time I build my project with new param, It worked. In this path /home/admin/.cache/bazel/_bazel_admin/cache directory is old and has some files.
Then I deleted all cache content from local disk ,
then build again. I got Error:
Finally, I found the reason in the bazel-remote start log:
I missed one configuration
|
Is it a correct assessment that the repository execute action is never cached? I was using rules_ruby in the CI and locally and I was surprised that after every CI run (or If this is the case is it an idea to also add this to the remote caching capability of Bazel? |
Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team ( |
@bazelbuild/triage, I think this is still relevant. |
Description of the problem :
The flag
--repository_cache
saves a lot of time wasted otherwise to re-download third party maven jar and http_archive that we have already fetched.Problem is when running on stateless build servers (like GCB) that feature doesn't really help us, as the disk gets reset on each build. Given many external binary dependency from remote sources - just downloading everything may take expensive minutes on each build.
Feature requests:
If the execution is using R/W remote cache - it only makes sense to use the remote cache instead of the disk.
Have you found anything relevant by searching the web?
See discussion here:
A different idea is to use GCS:
comments
Several mitigations are available:
Of course - ideally if we're using remote executions as well, that uses the same cache, the most efficient thing to do is to not really download everything from remote cache to the host environment on early stage Most of the binaries are not used in that environment but only in remote workers that already have access to the cache.
CC: @buchgr and @aehlig
The text was updated successfully, but these errors were encountered: