Want a way to re-host dependencies easily #6342

AustinSchuh · 2018-10-09T20:46:26Z

For reproducibility and control of the lifecycle of our artifacts, we need a way to re-host all our dependencies. Most rules (See https://github.com/bazelbuild/rules_go/blob/master/go/private/repositories.bzl for example) fetch dependencies from external domains.

@philwo thought this wasn't crazy.

I'd like to be able to rewrite https://codeload.github.com/golang/tools/zip/7b71b077e1f4a3d5f15ca417a16c3b4dbb629b8b to http://build-deps.peloton-tech.com/codeload.github.com/golang/tools/zip/7b71b077e1f4a3d5f15ca417a16c3b4dbb629b8b for example in Bazel. The same needs to work for git repositories and any other URLs.

The text was updated successfully, but these errors were encountered:

dslomov · 2018-12-03T18:31:02Z

What would be the design for this?

philwo · 2018-12-03T18:47:27Z

What about bazel build --repository_mirror=https://mirror.bazel.build and that would automatically rewrite URLs like Austin suggested: https://example.com/download.zip to https://mirror.bazel.build/example.com/download.zip (basically just prefix them).

Open questions:

How do the files get to the mirror in the first place? Do we need a little external tool that can do this or should it be part of Bazel?
Should Bazel add the mirror URL as a preferred source, but still use the original one when the mirror gives 404? Or should it replace all URLs? (I fear the answer is: "Let's add a flag for this!" :))
How does this interact with @buchgr's idea to use the already existing Remote Cache infrastructure also as a repository cache? Bazel could then first check the cache for a hit and if it gets a cache miss, just download from the original URL and upload to the remote cache. That'd be quite elegant, but wouldn't solve the problem where people just want to use an old-school manually maintained mirror server. But maybe it's simple enough to setup so that if we have that, we no longer need the above mentioned URL-rewriting feature?

buchgr · 2018-12-03T19:28:53Z

If it's just about hosting publicly available files somewhere closer to home then I think the solution for this problem is to implement using the remote cache as a repository cache. Both are content addressable and running a remote cache is no harder than running your own caching mirror.

However, in general it seems to me that the proper solution for this is to have a --repository_proxy=PROXY flag just like we have a --remote_proxy flag for remote caching / execution. Bazel wouldn't rewrite the URLs but properly proxy them through PROXY. This is a more generic solution to --repository_mirror that solves all kinds of additional problems that will pop up eventually (just like they did for remote caching) like authentication, name resolution / service discovery, load balancing etc.

AustinSchuh · 2018-12-04T00:00:25Z

@buchgr, my requirement is that I need to be able to go back in time and fully re-create an artifact in 5 years. That means that I need to properly track all the dependencies that are downloaded and make sure they are going to be available on my timeline, not someone else's timeline. Our current solution is to edit every dependency we add to change the URLs to point to a server under our control, and block outside access for the CI machines. Which is a pain and not sustainable.

For cache locality, we've setup a NGINX proxy next to the build machines that DNS resolves to instead of our dependency server. There are enough knobs today to make that all work.

I tried HTTP_PROXY before and that also proxies the metadata requests on GCP, which breaks remote builds. A --repository_proxy would solve that.

@philwo , Y'all might have a different desired level of polish, but from my point of view, 1) happy to do by hand, 2) Whichever is easiest. I have other ways of blocking outside access. Flags are cool, but minimum viable feature is fine, 3) It should definately get uploaded to the cache, but the cache will drop the dependency at some point. For final production builds, I'm also required to build locally without a cache. :(

aiuto · 2018-12-05T22:44:18Z

+1 to @AustinSchuh's comment about going back in time. It is an absolute requirement for many organizations to check downloaded dependencies in to their source tree - even if they do not vendor them. Virtually every company building products with long life cycles (e.g. embedded systems, flight control software, factory automation) checks in the compilers and entire build tool chain so that they can patch very old releases of their products. 10-15 year life cycles are not uncommon.

buchgr · 2018-12-13T13:23:46Z

@AustinSchuh

I tried HTTP_PROXY before and that also proxies the metadata requests on GCP, which breaks remote builds. A --repository_proxy would solve that.

I believe this problem is typically solved by setting NO_PROXY but having --repository_proxy seems good to me, while the --repository_mirror functionality doesn't. So it looks like we are on the same page?

ob · 2019-03-04T20:42:36Z

Our current solution is to edit every dependency we add to change the URLs to point to a server under our control, and block outside access for the CI machines. Which is a pain and not sustainable.

We do the same... lots of hackery. I agree that repository dependencies should be added to the cache, but this is not sufficient.

A --repository_proxy flag seems like it could solve the issue by letting us put the logic behind a service.

AustinSchuh · 2019-04-09T17:25:53Z

@philsc FYI, this was the ticket I was referencing.

sitaktif · 2021-04-06T12:31:06Z

I think this feature was addresed by the downloader rewrite feature: #12170

github-actions · 2023-05-24T01:32:59Z

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

AustinSchuh · 2023-05-28T19:59:23Z

I'll claim this is still open.

The downloader rewrite feature helps a ton, but, doesn't work for python packages or npm packages.

matts1 · 2023-06-06T02:07:05Z

Why doesn't it work for python packages? I might be wrong, but I thought they just used http_archive.

philsc · 2023-06-06T02:27:50Z

rules_python currently uses pip to download the packages. I am hoping to get started on a version of the rules that uses http_archive instead.

irengrig added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. untriaged type: feature request labels Oct 11, 2018

dslomov added P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed untriaged labels Dec 3, 2018

philwo added the team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website label Jun 15, 2020

philwo removed the team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website label Nov 29, 2021

github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label May 24, 2023

github-actions bot removed the stale Issues or PRs that are stale (no activity for 30 days) label May 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Want a way to re-host dependencies easily #6342

Want a way to re-host dependencies easily #6342

AustinSchuh commented Oct 9, 2018 •

edited

dslomov commented Dec 3, 2018

philwo commented Dec 3, 2018 •

edited

buchgr commented Dec 3, 2018 •

edited

AustinSchuh commented Dec 4, 2018

aiuto commented Dec 5, 2018

buchgr commented Dec 13, 2018

ob commented Mar 4, 2019

AustinSchuh commented Apr 9, 2019

sitaktif commented Apr 6, 2021

github-actions bot commented May 24, 2023

AustinSchuh commented May 28, 2023

matts1 commented Jun 6, 2023

philsc commented Jun 6, 2023

Want a way to re-host dependencies easily #6342

Want a way to re-host dependencies easily #6342

Comments

AustinSchuh commented Oct 9, 2018 • edited

dslomov commented Dec 3, 2018

philwo commented Dec 3, 2018 • edited

buchgr commented Dec 3, 2018 • edited

AustinSchuh commented Dec 4, 2018

aiuto commented Dec 5, 2018

buchgr commented Dec 13, 2018

ob commented Mar 4, 2019

AustinSchuh commented Apr 9, 2019

sitaktif commented Apr 6, 2021

github-actions bot commented May 24, 2023

AustinSchuh commented May 28, 2023

matts1 commented Jun 6, 2023

philsc commented Jun 6, 2023

AustinSchuh commented Oct 9, 2018 •

edited

philwo commented Dec 3, 2018 •

edited

buchgr commented Dec 3, 2018 •

edited