Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify repository_cache documentation #8496

Open
keith opened this issue May 29, 2019 · 16 comments
Open

Clarify repository_cache documentation #8496

keith opened this issue May 29, 2019 · 16 comments
Assignees
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) stale Issues or PRs that are stale (no activity for 30 days) team-Documentation Documentation improvements that cannot be directly linked to other team labels team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: documentation (cleanup)

Comments

@keith
Copy link
Member

keith commented May 29, 2019

Currently in the user guide there is this sentence:

To do so, bazel caches all files downloaded in the repository cache which, by default, is located at ~/.cache/bazel/bazel$USER/cache/repos/v1/. The location can be changed by the --repository_cache option.

This implies to me that if you don't pass --repository_cache, it is enabled at this default path. Based on this logic

RepositoryOptions repoOptions = env.getOptions().getOptions(RepositoryOptions.class);
if (repoOptions != null) {
repositoryCache.setHardlink(repoOptions.useHardlinks);
if (repoOptions.experimentalScaleTimeouts > 0.0) {
skylarkRepositoryFunction.setTimeoutScaling(repoOptions.experimentalScaleTimeouts);
} else {
env.getReporter()
.handle(
Event.warn(
"Ignoring request to scale timeouts for repositories by a non-positive"
+ " factor"));
skylarkRepositoryFunction.setTimeoutScaling(1.0);
}
if (repoOptions.experimentalRepositoryCache != null) {
// A set but empty path indicates a request to disable the repository cache.
if (!repoOptions.experimentalRepositoryCache.isEmpty()) {
Path repositoryCachePath;
if (repoOptions.experimentalRepositoryCache.isAbsolute()) {
repositoryCachePath = filesystem.getPath(repoOptions.experimentalRepositoryCache);
} else {
repositoryCachePath =
env.getBlazeWorkspace()
.getWorkspace()
.getRelative(repoOptions.experimentalRepositoryCache);
}
repositoryCache.setRepositoryCachePath(repositoryCachePath);
}
} else {
Path repositoryCachePath =
env.getDirectories()
.getServerDirectories()
.getOutputUserRoot()
.getRelative(DEFAULT_CACHE_LOCATION);
try {
FileSystemUtils.createDirectoryAndParents(repositoryCachePath);
repositoryCache.setRepositoryCachePath(repositoryCachePath);
} catch (IOException e) {
env.getReporter()
.handle(
Event.warn(
"Failed to set up cache at "
+ repositoryCachePath.toString()
+ ": "
+ e.getMessage()));
}
}
it looks like this isn't the case, and that it's only enabled if some options are passed. Also in my tests not passing this flag does not result in this directory being created.

What is the desired behavior here? Should we update the docs or update this logic to enable the cache by default? The latter sounds preferred to me unless there is a known issue where users wouldn't want this enabled by default?

@aiuto aiuto added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. untriaged labels May 31, 2019
@laurentlb laurentlb added P2 We'll consider working on this in future. (Assignee optional) type: documentation (cleanup) and removed untriaged labels Jul 29, 2019
@dhalperi
Copy link
Contributor

This also bit me just now, for the exact reasons @keith reported. Any opposition to either updating the docs or changing the default?

@dhalperi
Copy link
Contributor

On top of that, I can't figure out how to turn this on (at the default path) at all.

E.g., if I provide a no-op option like --noexperimental_repository_cache_hardlinks, that isn't good enough for the code to kick in. The only thing I've found to work is to provide a --repository_cache myself.. which defeats the purpose of having a nice default. (And it also becomes now something I can't put in a shared rc file since $USER isn't allowed).

Anyone have a better solution?

@dhalperi
Copy link
Contributor

I can't find the author of the proposal on here, but cc: @aehlig who wrote much of the code.

@jin jin self-assigned this Mar 25, 2020
@jin jin added P1 I'll work on this now. (Assignee required) and removed P2 We'll consider working on this in future. (Assignee optional) labels Mar 25, 2020
@jin
Copy link
Member

jin commented Mar 25, 2020

Looking.

@meisterT
Copy link
Member

FWIW, I did the following and couldn't repro:

  • rm -rf ~/.cache/bazel
  • cd bazel; rm .bazelrc
  • bazel --bazelrc=/dev/null build //src:bazel

After this I see again cached files in my local repository cache.

@dhalperi
Copy link
Contributor

dhalperi commented Mar 25, 2020

@meisterT -- confirming everything, here's my repro:

checkout Batfish: https://github.com/batfish/batfish (but probably anything would work)

dan@dan batfish % bazel version
Bazelisk version: v1.3.0
Starting local Bazel server and connecting to it...
Build label: 2.2.0
Build target: bazel-out/darwin-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Tue Mar 3 09:28:15 2020 (1583227695)
Build timestamp: 1583227695
Build timestamp as int: 1583227695
dan@dan batfish % bazel clean --expunge
INFO: Invocation ID: e105028e-51a8-4859-9e26-82a235336ca8
INFO: Starting clean.
dan@dan batfish % rm -rf ~/.cache
dan@dan batfish % bazel --ignore_all_rc_files build //projects/bdd
Starting local Bazel server and connecting to it...
INFO: Analyzed target //projects/bdd:bdd (21 packages loaded, 416 targets configured).
INFO: Found 1 target...
Target //projects/bdd:bdd up-to-date:
  bazel-bin/projects/bdd/libbdd.jar
INFO: Elapsed time: 22.235s, Critical Path: 6.58s
INFO: 5 processes: 4 darwin-sandbox, 1 worker.
INFO: Build completed successfully, 7 total actions
dan@dan batfish % ls -l ~/.cache
gls: cannot access '/Users/dan/.cache': No such file or directory

That jar depends on external dependencies from rules_jvm_external:

java_library(
    name = "bdd",
    srcs = glob([
        "src/main/java/**/*.java",
    ]),
    deps = [
        "@maven//:com_google_code_findbugs_jsr305",
    ],
)

@dhalperi
Copy link
Contributor

dhalperi commented Mar 25, 2020

Repeating with only changing the --ignore_all_rc_files option:

dan@dan batfish % bazel clean --expunge                           
INFO: Invocation ID: bdca90eb-fcc0-4996-930c-d59946cebd73
INFO: Starting clean.
dan@dan batfish % rm -rf ~/.cache                                 
dan@dan batfish % bazel build //projects/bdd       
Starting local Bazel server and connecting to it...
INFO: Invocation ID: 341cbf40-c0ee-40a8-a5e4-31f40c2e4be6
INFO: Analyzed target //projects/bdd:bdd (21 packages loaded, 416 targets configured).
INFO: Found 1 target...
Target //projects/bdd:bdd up-to-date:
  bazel-bin/projects/bdd/libbdd.jar
INFO: Elapsed time: 39.808s, Critical Path: 0.67s
INFO: 5 processes: 5 remote cache hit.
INFO: Build completed successfully, 7 total actions
dan@dan batfish % ls -l ~/.cache/bazel 
total 0
drwxr-xr-x 3 dan staff 96 Mar 25 12:51 _bazel_dan
dan@dan batfish % du -sh ~/.cache/bazel/_bazel_dan 
228M	/Users/dan/.cache/bazel/_bazel_dan

@jin
Copy link
Member

jin commented Mar 26, 2020

The location of the repository cache on macOS is:

$ bazel info repository_cache
Starting local Bazel server and connecting to it...
/var/tmp/_bazel_jingwen/cache/repos/v1

Seems like a documentation issue, instead of a implementation one.

@dhalperi
Copy link
Contributor

Interesting! Do you know why? Certainly, I'd prefer a default location that is persistent.

@meisterT
Copy link
Member

It is set here:
https://github.com/bazelbuild/bazel/blob/master/src/main/cpp/blaze_util_darwin.cc#L99
(compare with https://github.com/bazelbuild/bazel/blob/master/src/main/cpp/blaze_util_linux.cc#L63).

It was changed in
54b21d4

There were two issues mentioned in the commit:

  1. Isn't created by anything, and
  2. Generates too long a path for the name of a Unix domain socket

The first is not a real blocker (if it's still true).
The second is no longer relevant since we are using Grpc these days.

I don't have a Mac, but perhaps a rollback of above commit is already a good enough fix.

@jin
Copy link
Member

jin commented Mar 27, 2020

@meisterT Woud you consider changing the default output root (hence output base, execroot..) a breaking change?

@meisterT
Copy link
Member

Oh, that's a good question. On the one hand it's a bug fix, on the other hand there may be people relying on it. Cc @dslomov

A reasonable compromise would be to make this flag guarded but default the flag to true and inform bazel discuss.

@meisterT
Copy link
Member

I wonder if this is just a documentation isssue.

While the repository cache is in a different location than people think, it should be persisted across reboots IIUC (see https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch05s15.html). Since I don't have a Mac, someone else needs to confirm.

@dhalperi @keith when you say it bit you, is it possible that you were just surprised because it's in a different folder or did you actually see a cache miss where you expected a hit?

@dhalperi
Copy link
Contributor

dhalperi commented Apr 3, 2020

I just rebooted as a test, it did persist.

How I was affected: I was looking at which directory to mount into docker to reuse, and it wasn't there. Then I tried ten options to create it, didn't work.

Runtime effects I cannot effectively recall made me think that there was no local repository cache. One plausibly contributing factor is that you can't (or I can't - is there a way?) tell downloading from other work during fetching stages. I usually look at the network activity as a (crappy) indicator.

@philwo philwo added the team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website label Jun 15, 2020
@sventiffe sventiffe added P2 We'll consider working on this in future. (Assignee optional) and removed P1 I'll work on this now. (Assignee required) labels Nov 9, 2020
@philwo philwo added P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed P2 We'll consider working on this in future. (Assignee optional) labels Dec 8, 2020
@philwo philwo removed the team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website label Nov 29, 2021
@ShreeM01 ShreeM01 added the team-Documentation Documentation improvements that cannot be directly linked to other team labels label Dec 5, 2022
@guyboltonking
Copy link

guyboltonking commented Jan 24, 2023

I can confirm that the behaviour seen in this comment is what I see on macOS using Bazel 5.2.0 i.e. a default value is used on macOS, and the location is filled with cache entries:

❯ bzl info repository_cache
[...]
/var/tmp/_bazel_guy.boltonking/cache/repos/v1
❯ fd --type f . $(bzl info repository_cache)
[...]
/var/tmp/_bazel_guy.boltonking/cache/repos/v1/content_addressable/sha256/0052d452af7742c8f3a4e0929763388a66403de363775db7e90adecb2ba4944b/file
/var/tmp/_bazel_guy.boltonking/cache/repos/v1/content_addressable/sha256/00b0bef5b7f9e0df16536d3961cfb6e84331c065b4066afb39768d0e319411f7/file
/var/tmp/_bazel_guy.boltonking/cache/repos/v1/content_addressable/sha256/0161cfe9544b3656ed0de67d8937828101859e94bcd0caaf58d21ac7011eabd4/file
[...]

Could this issue be closed? I don't think there is a documentation issue (aside from using a linux-specific location for the default). In its current form, the issue creates confusion (it suggests that the --repository_cache flag must be used to enable the repository cache, which I do not think is true: the cache is enabled by default).

Copy link

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs. If you think this issue is still relevant and should stay open, please post any comment here and the issue will no longer be marked as stale.

@github-actions github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Mar 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) stale Issues or PRs that are stale (no activity for 30 days) team-Documentation Documentation improvements that cannot be directly linked to other team labels team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: documentation (cleanup)
Projects
None yet
Development

No branches or pull requests

10 participants