Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Want a way to re-host dependencies easily #6342

Open
AustinSchuh opened this issue Oct 9, 2018 · 13 comments
Open

Want a way to re-host dependencies easily #6342

AustinSchuh opened this issue Oct 9, 2018 · 13 comments
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: feature request

Comments

@AustinSchuh
Copy link
Contributor

AustinSchuh commented Oct 9, 2018

For reproducibility and control of the lifecycle of our artifacts, we need a way to re-host all our dependencies. Most rules (See https://github.com/bazelbuild/rules_go/blob/master/go/private/repositories.bzl for example) fetch dependencies from external domains.

@philwo thought this wasn't crazy.

I'd like to be able to rewrite https://codeload.github.com/golang/tools/zip/7b71b077e1f4a3d5f15ca417a16c3b4dbb629b8b to http://build-deps.peloton-tech.com/codeload.github.com/golang/tools/zip/7b71b077e1f4a3d5f15ca417a16c3b4dbb629b8b for example in Bazel. The same needs to work for git repositories and any other URLs.

@irengrig irengrig added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. untriaged type: feature request labels Oct 11, 2018
@dslomov dslomov added P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed untriaged labels Dec 3, 2018
@dslomov
Copy link
Contributor

dslomov commented Dec 3, 2018

What would be the design for this?

@philwo
Copy link
Member

philwo commented Dec 3, 2018

What about bazel build --repository_mirror=https://mirror.bazel.build and that would automatically rewrite URLs like Austin suggested: https://example.com/download.zip to https://mirror.bazel.build/example.com/download.zip (basically just prefix them).

Open questions:

  • How do the files get to the mirror in the first place? Do we need a little external tool that can do this or should it be part of Bazel?
  • Should Bazel add the mirror URL as a preferred source, but still use the original one when the mirror gives 404? Or should it replace all URLs? (I fear the answer is: "Let's add a flag for this!" :))
  • How does this interact with @buchgr's idea to use the already existing Remote Cache infrastructure also as a repository cache? Bazel could then first check the cache for a hit and if it gets a cache miss, just download from the original URL and upload to the remote cache. That'd be quite elegant, but wouldn't solve the problem where people just want to use an old-school manually maintained mirror server. But maybe it's simple enough to setup so that if we have that, we no longer need the above mentioned URL-rewriting feature?

@buchgr
Copy link
Contributor

buchgr commented Dec 3, 2018

If it's just about hosting publicly available files somewhere closer to home then I think the solution for this problem is to implement using the remote cache as a repository cache. Both are content addressable and running a remote cache is no harder than running your own caching mirror.

However, in general it seems to me that the proper solution for this is to have a --repository_proxy=PROXY flag just like we have a --remote_proxy flag for remote caching / execution. Bazel wouldn't rewrite the URLs but properly proxy them through PROXY. This is a more generic solution to --repository_mirror that solves all kinds of additional problems that will pop up eventually (just like they did for remote caching) like authentication, name resolution / service discovery, load balancing etc.

@AustinSchuh
Copy link
Contributor Author

@buchgr, my requirement is that I need to be able to go back in time and fully re-create an artifact in 5 years. That means that I need to properly track all the dependencies that are downloaded and make sure they are going to be available on my timeline, not someone else's timeline. Our current solution is to edit every dependency we add to change the URLs to point to a server under our control, and block outside access for the CI machines. Which is a pain and not sustainable.

For cache locality, we've setup a NGINX proxy next to the build machines that DNS resolves to instead of our dependency server. There are enough knobs today to make that all work.

I tried HTTP_PROXY before and that also proxies the metadata requests on GCP, which breaks remote builds. A --repository_proxy would solve that.

@philwo , Y'all might have a different desired level of polish, but from my point of view, 1) happy to do by hand, 2) Whichever is easiest. I have other ways of blocking outside access. Flags are cool, but minimum viable feature is fine, 3) It should definately get uploaded to the cache, but the cache will drop the dependency at some point. For final production builds, I'm also required to build locally without a cache. :(

@aiuto
Copy link
Contributor

aiuto commented Dec 5, 2018

+1 to @AustinSchuh's comment about going back in time. It is an absolute requirement for many organizations to check downloaded dependencies in to their source tree - even if they do not vendor them. Virtually every company building products with long life cycles (e.g. embedded systems, flight control software, factory automation) checks in the compilers and entire build tool chain so that they can patch very old releases of their products. 10-15 year life cycles are not uncommon.

@buchgr
Copy link
Contributor

buchgr commented Dec 13, 2018

@AustinSchuh

I tried HTTP_PROXY before and that also proxies the metadata requests on GCP, which breaks remote builds. A --repository_proxy would solve that.

I believe this problem is typically solved by setting NO_PROXY but having --repository_proxy seems good to me, while the --repository_mirror functionality doesn't. So it looks like we are on the same page?

@ob
Copy link
Contributor

ob commented Mar 4, 2019

Our current solution is to edit every dependency we add to change the URLs to point to a server under our control, and block outside access for the CI machines. Which is a pain and not sustainable.

We do the same... lots of hackery. I agree that repository dependencies should be added to the cache, but this is not sufficient.

A --repository_proxy flag seems like it could solve the issue by letting us put the logic behind a service.

@AustinSchuh
Copy link
Contributor Author

@philsc FYI, this was the ticket I was referencing.

@philwo philwo added the team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website label Jun 15, 2020
@sitaktif
Copy link
Contributor

sitaktif commented Apr 6, 2021

I think this feature was addresed by the downloader rewrite feature: #12170

@philwo philwo removed the team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website label Nov 29, 2021
@github-actions
Copy link

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

@github-actions github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label May 24, 2023
@AustinSchuh
Copy link
Contributor Author

I'll claim this is still open.

The downloader rewrite feature helps a ton, but, doesn't work for python packages or npm packages.

@github-actions github-actions bot removed the stale Issues or PRs that are stale (no activity for 30 days) label May 29, 2023
@matts1
Copy link

matts1 commented Jun 6, 2023

Why doesn't it work for python packages? I might be wrong, but I thought they just used http_archive.

@philsc
Copy link
Contributor

philsc commented Jun 6, 2023

rules_python currently uses pip to download the packages. I am hoping to get started on a version of the rules that uses http_archive instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: feature request
Projects
None yet
Development

No branches or pull requests

10 participants