Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot download package possibly because of urllib defaults #218

Closed
colindean opened this issue Nov 29, 2021 · 8 comments · Fixed by #219
Closed

Cannot download package possibly because of urllib defaults #218

colindean opened this issue Nov 29, 2021 · 8 comments · Fixed by #219

Comments

@colindean
Copy link
Contributor

I'm trying to import fontawesome to a project using Peru 1.3.0.

imports:
  fontawesome: deps/font-awesome

curl module fontawesome:
  url: https://use.fontawesome.com/releases/v5.15.4/fontawesome-free-5.15.4-web.zip
  unpack: zip

But it errors:

$ peru sync
In target "fontawesome":
  Error fetching https://use.fontawesome.com/releases/v5.15.4/fontawesome-free-5.15.4-web.zip
  HTTP Error 403: Forbidden

I think it's a urllib problem:

from urllib.error import HTTPError
import urllib.request

url = "https://use.fontawesome.com/releases/v5.15.4/fontawesome-free-5.15.4-web.zip"

try:
  req = urllib.request.urlopen(url)
except HTTPError as e:
  print(e.headers)
# this will fail and print something

# this succeeds
import urllib3
http = urllib3.PoolManager()
resp = http.request("GET", "https://use.fontawesome.com/releases/v5.15.4/fontawesome-free-5.15.4-web.zip")
print(resp.status)
# succeeds

The urllib request fails but urllib3 succeeds. I don't think urllib's HTTPError gives you access to the request headers, so I can't really tell what's wrong here. I'm able to retrieve the file with curl, wget, aria2c, and python via urllib3.

Running peru --verbose sync gave me a stack trace that simply emphasizes "yeah, that's a 403" and the headers I get back from the request don't really say what's wrong. I assume it's some kind of bot protection on Cloudflare…?

@colindean
Copy link
Contributor Author

Workaround: get the FontAwesome package from GitHub instead.

https://github.com/FortAwesome/Font-Awesome/releases/download/5.15.4/fontawesome-free-5.15.4-web.zip

@colindean
Copy link
Contributor Author

I think this is still worth trying to add some more debugging information to or at least trying to figure out if the dependency on urllib is problematic when retrieving from Cloudflare-backed URLs.

@oconnor663
Copy link
Member

Interesting, thanks for the detailed report. I've played with it a little bit, and it seems like the key detail is that the User-Agent header has to be set. Here's a minimized repro:

from urllib.request import Request, urlopen

url = "https://use.fontawesome.com/releases/v5.15.4/fontawesome-free-5.15.4-web.zip"
req = Request(url)
# Even with an empty string as the User-Agent header, the request succeeds. But
# if we remove this line, the request fails with error 403.
req.add_header("user-agent", "")
urlopen(req)

@colindean can you confirm this behavior?

I guess it's reasonable that Peru should set something for the UA. Any proposals for what that should be? @olson-sean-k?

@colindean
Copy link
Contributor Author

Confirmed.

I'd suggest this:

urllib3 uses python-urllib3/{__version__} as its UA, so perhaps it's appropriate for Peru to use

f"peru/{peru_version} python-urllib/{urllib.request.__version__}"

If you look at the source for urllib.request, there are seemingly other ways to create requests that already have the Python default UA but I think the above meets the minimum requirement.

colindean added a commit to colindean/peru that referenced this issue Nov 30, 2021
The header will be of the form:

    peru/{version} Python-urllib/{version}

Fixes buildinspace#218
@colindean
Copy link
Contributor Author

I put up #219 with a stab at a quick fix. It adds the header, at least. I didn't actually test it with my config example above, though, yet.

@colindean
Copy link
Contributor Author

I tested #219 with the example config and it works! It will fix the problem.

@olson-sean-k
Copy link
Member

Thanks for finding (and fixing) this, @colindean!

I guess it's reasonable that Peru should set something for the UA. Any proposals for what that should be?

urllib3 uses python-urllib3/{__version__} as its UA, so perhaps it's appropriate for Peru to use ...

I'm not too sure, but I think there are three reasonable options:

  1. Whatever Python and urllib use by default (e.g., python-urllib3/{__version__}).
  2. Something that identifies peru (e.g., peru/{__version__}).
  3. A combination of the above as @colindean suggested.

FWIW, I think I'd lean a bit toward the first option, but I don't have a strong opinion about it.

@colindean
Copy link
Contributor Author

#219 goes with № 3 in that list and pulls the urllib UA directly from a core class that won't go away. urllib doesn't — IMO — sufficiently abstract this value 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants