Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow using remote cache for repository cache #6359

Open
or-shachar opened this issue Oct 11, 2018 · 18 comments
Open

Allow using remote cache for repository cache #6359

or-shachar opened this issue Oct 11, 2018 · 18 comments
Assignees
Labels
not stale Issues or PRs that are inactive but not considered stale P3 We're not considering working on this, but happy to review a PR. (No assignee) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: feature request

Comments

@or-shachar
Copy link
Contributor

or-shachar commented Oct 11, 2018

Description of the problem :

The flag --repository_cache saves a lot of time wasted otherwise to re-download third party maven jar and http_archive that we have already fetched.

Problem is when running on stateless build servers (like GCB) that feature doesn't really help us, as the disk gets reset on each build. Given many external binary dependency from remote sources - just downloading everything may take expensive minutes on each build.

Feature requests:

If the execution is using R/W remote cache - it only makes sense to use the remote cache instead of the disk.

Have you found anything relevant by searching the web?

See discussion here:

A different idea is to use GCS:

comments

Several mitigations are available:

  • If build server allows - enable persistent folder between different builds.
  • Add a step to rsync that folder from storage before build and back to the storage after the build.

Of course - ideally if we're using remote executions as well, that uses the same cache, the most efficient thing to do is to not really download everything from remote cache to the host environment on early stage Most of the binaries are not used in that environment but only in remote workers that already have access to the cache.

CC: @buchgr and @aehlig

@irengrig irengrig added type: feature request team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. untriaged labels Oct 11, 2018
@aehlig aehlig assigned buchgr and unassigned aehlig Oct 15, 2018
@regisd
Copy link
Contributor

regisd commented Oct 18, 2018

Also quoting @buchgr

I would love for this to be implemented as part of a community contribution :-) and would be happy
to work closely with anyone willing to take on this task!

@dslomov dslomov added P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed untriaged labels Oct 22, 2018
@buchgr
Copy link
Contributor

buchgr commented Nov 12, 2018

@or-shachar have you made any progress on this? I know you investigated implementing it? :-)

@ob
Copy link
Contributor

ob commented Mar 4, 2019

Is anyone working on this?

@buchgr
Copy link
Contributor

buchgr commented Mar 5, 2019

no that I am aware of! Feel free to pick it up :-).

@agoulti agoulti self-assigned this Apr 29, 2019
@NaurisSadovskis
Copy link

This would be a great feature for stateless builds.

@robbertvanginkel
Copy link
Contributor

There's a Remote Repository Cache proposal from @jmillikin-stripe in https://github.com/bazelbuild/proposals. But its still in draft state.

@jmillikin-stripe any updates on the proposal or pointers for how people could help out?

@jmillikin-stripe
Copy link
Contributor

There's a draft implementation of the .proto at #8782, and I'm currently awaiting review from a Bazel core maintainer before I start writing the implementation.

@Toxicable
Copy link

Our team is very keen to see this progressed.

We've been struggling with flakey CI for awhile now due to connect timed out errors randomly accross many different external hosts, similar to this:

java.io.IOException: Error downloading [https://nodejs.org/dist/v10.16.0/node-v10.16.0-linux-x64.tar.xz] to /home/runner/.cache/bazel/_bazel_runner/f2e96da83c9a9bca36350376aeb4df02/external/nodejs_linux_amd64/bin/nodejs/node-v10.16.0-linux-x64.tar.xz: connect timed out

To alleviate this we've tar'd up our external folder and stored it on an internal file server, which we download and extract before CI, while it works to reduce flakyness it is rather manual when updating any external repositories.

@buchgr
Copy link
Contributor

buchgr commented Sep 6, 2019

We are currently trying to agree on an API. Here's a proposal similar to @jmillikin-stripe's that we are currently discussing: https://docs.google.com/document/d/10ari9WtTTSv9bqB_UU-oe2gBtaAA7HyQgkpP-RFP80c/edit?disco=AAAADULntWg&ts=5d5eecc1

@arjantop
Copy link

arjantop commented Jan 10, 2020

Any progress/updates?

@jmillikin-stripe
Copy link
Contributor

#10622 is a proposed implementation of the most recent proposal.

@philwo philwo added the team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website label Jun 15, 2020
@xingxinghuo1000
Copy link

xingxinghuo1000 commented Jul 3, 2020

Thank you for the PR @jmillikin-stripe
#10622 implemented a parameter "--experimental_remote_downloader"
How to use this parameter? @jmillikin-stripe
Could you please give an expample?
Does this feature depend on a GRPC-cache-server?


INGORE all above
I got answer in this page : https://github.com/buchgr/bazel-remote

Experimental Remote Asset API Support
There is (very) experimental support for a subset of the Fetch service in the Remote Asset API which can be enabled with the --experimental_remote_asset_api flag.

To use this with Bazel, specify --experimental_remote_downloader=grpc://replace-with-your.host:port.

@xingxinghuo1000
Copy link

xingxinghuo1000 commented Jul 3, 2020

#10622 provides parameter "experimental_remote_downloader"
From the doc, I think current protocal between bazel client and remote-cache is GRPC.
But we deployed multiple remote-cache servers behind a cluster of t-engine, which can only transfer http requests
@jmillikin-stripe Could you please provide an implementation of http protocol. So we can use that feature without any change

Our current topology is:
DNS(global host) ---> LVS clusteer ---> t-engine cluster ---> bazel-remote-cache cluster ---> backend storage( aliyun oss, similar to S3 or GCS)

T-Engine server is what we need for Tracing and High Availability(retry 3 times). Remote-cache server can be restart at any time while user can't notice that

@jmillikin-stripe
Copy link
Contributor

I'm not planning to implement an HTTP version of the remote downloader code. Getting the gRPC version into Bazel took a large amount of work, and I do not have time to do the same for HTTP.

According to alibaba/tengine#672, Tengine supports HTTP/2. I believe you could use it to proxy gRPC, because gRPC is built directly on the HTTP/2 protocol. The Tengine changelog says gRPC is available in versions 2.3.0 and later. This would require adding gRPC handlers to your bazel-remote-cache implementation.

@xingxinghuo1000
Copy link

xingxinghuo1000 commented Jul 3, 2020

Thank you for the solution @jmillikin-stripe

Now I am trying the new feature, but got Error

bazel-remote build with totay's source , bazel version 3.3.1


When I build a very simple demo project written in c++

first time I build my project with new param, It worked. In this path /home/admin/.cache/bazel/_bazel_admin/cache directory is old and has some files.

bazel build //...  --experimental_remote_downloader=grpc://127.0.0.1:9092  --remote_cache=grpc://127.0.0.1:9092 
INFO: Invocation ID: 5a478b83-133c-40ef-924c-f2dc03ef06e5
INFO: Analyzed 3 targets (21 packages loaded, 307 targets configured).
INFO: Found 3 targets...
INFO: Elapsed time: 0.692s, Critical Path: 0.16s
INFO: 22 processes: 22 remote cache hit.
INFO: Build completed successfully, 36 total actions

Then I deleted all cache content from local disk ,

rm -rf /home/admin/.cache/bazel/_bazel_admin

then build again. I got Error:

bazel build //...  --experimental_remote_downloader=grpc://127.0.0.1:9092  --remote_cache=grpc://127.0.0.1:9092 
INFO: Invocation ID: 66db7d37-46d8-4369-93e2-e7e8a18ab74b
INFO: Repository rules_cc instantiated at:
  no stack (--record_rule_instantiation_callstack not enabled)
Repository rule http_archive defined at:
  /home/admin/.cache/bazel/_bazel_admin/c6a5c929336b1584a43833c19bad1c7a/external/bazel_tools/tools/build_defs/repo/http.bzl:336:31: in <toplevel>
ERROR: An error occurred during the fetch of repository 'rules_cc':
   java.io.IOException: io.grpc.StatusRuntimeException: UNIMPLEMENTED: unknown service build.bazel.remote.asset.v1.Fetch
ERROR: While resolving toolchains for target //:function_2_test: com.google.devtools.build.lib.packages.RepositoryFetchException: no such package '@rules_cc//cc': java.io.IOException: io.grpc.StatusRuntimeException: UNIMPLEMENTED: unknown service build.bazel.remote.asset.v1.Fetch
ERROR: Analysis of target '//:function_2_test' failed; build aborted: com.google.devtools.build.lib.packages.RepositoryFetchException: no such package '@rules_cc//cc': java.io.IOException: io.grpc.StatusRuntimeException: UNIMPLEMENTED: unknown service build.bazel.remote.asset.v1.Fetch
INFO: Elapsed time: 0.101s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded, 0 targets configured)
    currently loading: @bazel_tools//tools/cpp

Finally, I found the reason

in the bazel-remote start log:

experimental gRPC remote asset API: disabled

I missed one configuration
in yaml file, add this parameter

experimental_remote_asset_api:
  true

@philwo philwo removed the team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website label Nov 29, 2021
@mvgijssel
Copy link

Is it a correct assessment that the repository execute action is never cached?

I was using rules_ruby in the CI and locally and I was surprised that after every CI run (or bazel clean --expunge) Ruby was re-compiled from source. I was under the assumption that --disk_cache and/or --repository_cache would be responsible for caching the output of the Ruby compilation. But after some experimentation it looks this is never the case.

If this is the case is it an idea to also add this to the remote caching capability of Bazel?

@github-actions
Copy link

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

@github-actions github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Jun 22, 2023
@AustinSchuh
Copy link
Contributor

@bazelbuild/triage, I think this is still relevant.

@sgowroji sgowroji added not stale Issues or PRs that are inactive but not considered stale and removed stale Issues or PRs that are stale (no activity for 30 days) labels Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not stale Issues or PRs that are inactive but not considered stale P3 We're not considering working on this, but happy to review a PR. (No assignee) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: feature request
Projects
None yet
Development

No branches or pull requests