New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
go_repository: git repositories should be stored in the cache #549
Comments
'gazelle fix' and 'gazelle update' now accept -repo_config, the path to a file where information about repositories can be loaded. By default, this is WORKSPACE in the repository root directory. 'gazelle fix' and 'gazelle update-repos' still update the WORKSPACE file in the repository root directory when this flag is set. go_repository passes the path to @//:WORKSPACE to -repo_config. go_repository resolves @//:WORKSPACE and any files mentioned in '# gazelle:repository_macro' directives. When these files, all go_repository rules will be invalidated. It should not be necessary to download cached repositories (except vcs repositories; see bazelbuild#549). On a Macbook Pro, it takes about 22.5s to re-evaluate 70 cached, invalidated go_repository rules for github.com/gohugoio/hugo. If this becomes a project for large projects, we can provide a way to disable or limit this behavior in the future. go_repository_tools and go_repository_cache are moved to their own .bzl files. Changes in go_repository.bzl should not invalidate these in the future. Fixes bazelbuild#529
'gazelle fix' and 'gazelle update' now accept -repo_config, the path to a file where information about repositories can be loaded. By default, this is WORKSPACE in the repository root directory. 'gazelle fix' and 'gazelle update-repos' still update the WORKSPACE file in the repository root directory when this flag is set. go_repository passes the path to @//:WORKSPACE to -repo_config. go_repository resolves @//:WORKSPACE and any files mentioned in '# gazelle:repository_macro' directives. When these files, all go_repository rules will be invalidated. It should not be necessary to download cached repositories (except vcs repositories; see #549). On a Macbook Pro, it takes about 22.5s to re-evaluate 70 cached, invalidated go_repository rules for github.com/gohugoio/hugo. If this becomes a project for large projects, we can provide a way to disable or limit this behavior in the future. go_repository_tools and go_repository_cache are moved to their own .bzl files. Changes in go_repository.bzl should not invalidate these in the future. Fixes #529
There is also a different approach used here: bazelbuild/bazel#7424 and some relevant discussion here: https://groups.google.com/forum/#!searchin/bazel-dev/buchgr%7Csort:date/bazel-dev/7N_6-RbqBf4/bOggKYkUBgAJ |
Nice. I hope that gets merged at some point. I filed bazelbuild/bazel#5086, but I've given up hope of it ever being implemented. |
I'm finding that go_repositories are being fetched when not expecting it. For example, I make a trivial change to the WORKSPACE file, such as inserting whitespace at the end of the file, and it fetches the go_repository. Additionally, I'm finding the same repository being downloaded multiple times in a row, extending fetch times. This is mostly evident when downloading in urls mode without a sha256, although I believe this happens with a sha256 and in git mode just based on the duration of the fetch. Are these all symptoms of this issue or should I go open another issue? |
@mariusgrigoriu Sorry for the delay. Was at GopherCon last week.
This may be working as intended. Gazelle, as run by
Use a sha256. In HTTP mode, HTTP downloads are cached by Bazel. The cache key is the expected sha256, which means downloads without sha256 are not cached. It's also important to do this to ensure builds are authentic and reproducible. VCS downloads are currently not cached, which is this issue. I'd strongly encourage use of module mode instead though. |
Using modules sounds good when they work. We're importing Terraform and parts of k8s, neither of which seem to play nicely with module mode. Since k8s already uses Bazel, I think switching to |
Be aware that the the main Kubernetes repo, |
Understood. This is all because we're consuming e2e tests. Not sure we can do much until kubernetes/kubernetes#74352 moves the e2e framework into staging. (Apologies for hijacking this thread.) |
while not a fix, most go dependencies we use are hosted on github which supplies tar archives over http given a sha. we wrote this as a wrapper to convert existing doesn't work with the more recent go mod stuff though |
@rickypai Archives served by GitHub do not have stable SHA-256 sums. They haven't changed in a couple years, but it's broken us in the past. Use at your own risk. |
On NixOS side, we have been using GitHub's archives for a few years now with no issues with regards to the sha256 stability:
|
@kalbasit I don't think they've changed anything since fall of 2017. However, I spoke with GitHub support ~6 months ago. They're aware of the issue, but they confirmed archives returned from those endpoints are not guaranteed to have stable hashes. These breaks were really painful. If the hashes change, it retroactively breaks deterministic builds that depend on the old hashes. At the time, it broke every version of rules_go, since we were using |
What do you think about introducing a flag to the |
@kalbasit Not sure I follow. |
We experienced several changing sha256 hashes over the last few months for a subset of archives. In one case, a hash change was only experienced by some people on the team, and it eventually reverted. |
What version of gazelle, rules_go and Go should make this work? I'm currently using Bazel 0.28.0, rules_go at 0.20.2, gazelle 0.19.0 and Go 1.12.9 and I'm still showing it Git cloning. Definitely an archive from GOPROXY would help, does it go in the cache as described in https://bazel.build/designs/2016/09/30/repository-cache.html? |
@kalbasit That's a new enough version of Gazelle. Make sure none of your
Modules zips don't get stored in Bazel's cache, but they do get stored in a separate cache within an internal repository. It's a bit of a hack, but it means they don't need to be downloaded whenever WORKSPACE changes. Bazel can't cache them for the same reason as the GitHub archives: module zips don't promise SHA-256 stability. The sums are hashes of the contents, not of the zip files themselves. |
They are using Here's my |
You can run There are some |
If a
go_repository
rule is invalidated but@go_repository_cache
is not, we shouldn't need to clone a Git repository again. The first time we fetch a repository, we could clone it and store a zip file in the cache. Whengo_repository
is invalidated, we would just need to extract the cached zip.The text was updated successfully, but these errors were encountered: