Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resource: try to download from fast mirrors before resource URLs #13868

Closed
wants to merge 3 commits into from

Conversation

XuehaiPan
Copy link
Contributor

@XuehaiPan XuehaiPan commented Sep 14, 2022

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew typecheck with your changes locally?
  • Have you successfully run brew tests with your changes locally?

Resolves Homebrew/glibc-bootstrap#6

Support fast_mirror semantic in resource blocks. For example:

resource "bootstrap-gcc" do
  url "https://github.com/Homebrew/glibc-bootstrap/releases/download/1.0.0/bootstrap-gcc-9.5.0.tar.gz"
  fast_mirror "${HOMEBREW_BOTTLE_DOMAIN}/homebrew/glibc-bootstrap/1.0.0/bootstrap-gcc-9.5.0.tar.gz"
  fast_mirror "${HOMEBREW_ARTIFACT_DOMAIN}/Homebrew/glibc-bootstrap/releases/download/1.0.0/bootstrap-gcc-9.5.0.tar.gz"
  fast_mirror "https://<hardcoded url>"
  mirror "https:/<hardcoded url>"
  sha256 "d549cf096864de5da77b4f068fab3741636206f3b7ace593b46a226d726f4538"
end

The word "fast“ means we should try it before url. For fast mirrors, we will:

  1. Substitute the ${HOMEBREW_BOTTLE_DOMAIN} and ${HOMEBREW_ARTIFACT_DOMAIN} in the URL to the corresponding environment variables (curly brackets are optional). If the environment variable is not set or set to the default value, the fast mirror will be ignored.
  2. The downloader will try to download from the fast mirror if any is present. The original url will treat as a fallback mirror. The priority of the URLs:
    1. fast_mirrors if any
    2. url
    3. mirrors if any
  3. Only CurlDownloadStrategy will use fast_mirrors.

resource.fast_mirror("bar")
resource.fast_mirror("baz")
expect(resource.url).to eq("foo")
expect(resource.downloader.mirrors).to eq(%w[baz foo])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attribute downloader.url is private so the corresponding test has not been added.

Copy link
Member

@carlocab carlocab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems too niche to add a DSL for.

I suggest finding a way to handle this appropriately when HOMEBREW_*_DOMAIN variables are set instead without requiring any changes from the formula.

@XuehaiPan
Copy link
Contributor Author

XuehaiPan commented Sep 15, 2022

This seems too niche to add a DSL for.

@carlocab No, fast_mirror does not require the URL to contain ${HOMEBREW_*_DOMAIN} variables. For example:

resource "some-resource-on-slow-ftp-server" do
  url "ftp://the-official-ftp.com/resource-tag.tar.gz"
  fast_mirror "https://github.com/the-official-mirror/release/tag/resource-tag.tar.gz"
  sha256 "..."
end

resource "pypi-resouce" do
  url "https://files.pythonhosted.org/packages/xx/xx/xxx...xxx/package-tag.tar.gz"
  fast_mirror "https://github.com/author/package/release/tag/package-tag.tar.gz"
  sha256 "..."
end

We will try to download from the fast_mirror prior to url (e.g., from GitHub mirror rather than the slow FTP server).

I suggest finding a way to handle this appropriately when HOMEBREW_*_DOMAIN variables are set instead

By now, we only have Homebrew/glibc-bootstrap, we can handle it by matching the URL strings.

However, this seems invisible to the user/maintainer of third-party mirror sites (documentation required). Also, if we vendor something in a new repo https://github.com/Homebrew/xxx in the future, similar issues like Homebrew/glibc-bootstrap#6 will raise again. We need to handle it in Homebrew/brew again, that's extra work.

Also, explicitly having ${HOMEBREW_*_DOMAIN} in the URL, the maintainers of third-party mirror sites instantly know how to mirror it with an appropriate path. No more documentation is needed. Otherwise, the maintainer should dive deeply into Homebrew/brew's source code or ask for documentation of a mirroring guideline.

In addition to the PyPI example above, having fast_mirror makes URL mapping easier. For example, we can pass the environment variable PIP_EXTRA_INDEX_URL to brew (maybe another PR after this one), then do:

class Resouce
  def url(val = nil, **specs)
    return @url if val.nil?

    specs = specs.dup
    # Don't allow this to be set.
    specs.delete(:insecure)

    @url = val
    @using = specs.delete(:using)
    @download_strategy = DownloadStrategyDetector.detect(url, using)
    @specs.merge!(specs)
    @downloader = nil
    @version = detect_version(@version)
+
+   return unless val.start_with?("https://files.pythonhosted.org/packages")
+   return unless Homebrew::EnvConfig.pip_extra_index_url.present?
+
+   filename = val.partition("/").last
+   Homebrew::EnvConfig.pip_extra_index_url.split(",").each do |extra_index_url|
+     fast_mirror "#{extra_index_url.chomp("/")}/#{name}/#{filename}"
+   end
  end

  def fast_mirror(val)
    fast_mirrors << val
  end
end

in Homebrew/brew. brew will download from PIP_EXTRA_INDEX_URL prior to url.

without requiring any changes from the formula.

I don't think it is a big problem, because it does even not requires a revision bump. We do not need to rebottle things. And only the glibc formula is required so far. We can update the formula after the changes in this PR are released/tagged. berw update always updates itself first before fetching the formula changes.

@carlocab
Copy link
Member

No, fast_mirror does not require the URL to contain ${HOMEBREW_*_DOMAIN} variables.

I understand, but I don't think we're interested in adding fast_mirror to the formula DSL.

If there is a static URL that is reproducibly faster than the existing one declared by the resource/formula from multiple different locations, then we should consider using that as the url and move the existing one to a mirror. I'd want a way to verify that the original source has the same checksum even when used as a mirror, though.

I don't think it is a big problem, because it does even not requires a revision bump. We do not need to rebottle things. We can update the formula after the changes in this PR are released/tagged. berw update always updates itself first before fetching the formula changes.

I disagree. I think adding a bunch of extra code to multiple formulae when it's unclear how many and by how much users will benefit, is a gigantic problem.

@MikeMcQuaid
Copy link
Member

This seems too niche to add a DSL for.

Agreed.

I suggest finding a way to handle this appropriately when HOMEBREW_*_DOMAIN variables are set instead without requiring any changes from the formula.

Agreed.


My suggestion would be just to special-case the https://github.com/Homebrew/glibc-bootstrap/releases.

Alternatively, would I fix things if these were just uploaded to GitHub Packages instead/as well? If so, we could consider just doing that.

@XuehaiPan
Copy link
Contributor Author

Alternatively, would I fix things if these were just uploaded to GitHub Packages instead/as well? If so, we could consider just doing that.

I think uploading the bootstrap packages to GitHub Packages and adding them to https://formulae.brew.sh/api/formula.json is the best solution. No changes are required in Homebrew/brew. We Only need to update the URLs in glibc.rb.

@MikeMcQuaid
Copy link
Member

MikeMcQuaid commented Sep 15, 2022

I think uploading the bootstrap packages to GitHub Packages

We can do this.

adding them to https://formulae.brew.sh/api/formula.json is the best solution.

We won't do this.


How are you handling e.g. portable-ruby:

"https://ghcr.io/v2/homebrew/portable-ruby/portable-ruby/blobs/sha256:${ruby_SHA}"
"https://github.com/Homebrew/homebrew-portable-ruby/releases/download/2.6.8_1/${ruby_FILENAME}"

@XuehaiPan XuehaiPan deleted the resource-fast-mirrors branch September 18, 2022 12:16
@github-actions github-actions bot added the outdated PR was locked due to age label Oct 19, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated PR was locked due to age
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Third-party mirroring support
3 participants