Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LazyRemoteFile sometimes raises 403 forbidden error because of urlretrieve headers #284

Closed
joeyjurjens opened this issue Feb 23, 2022 · 3 comments

Comments

@joeyjurjens
Copy link

joeyjurjens commented Feb 23, 2022

First of all; It does not raise a 403 all the time, but lately I've stumbled upon it quite a few times.

The LazyRemoteFile uses urlretrieve to download images from a given url and saves it to a file.

local_filename, _ = urlretrieve(self.url, self.name)

However, the user-agent it uses by default seems to get blocked by quite a few websites.
Unfortunately, urlretrieve doesn't allow us setting requests headers.

If we want to pass headers with urllib, we could do so as following:

import urllib.request
req = urllib.request.Request('http://www.example.com/')
req.add_header('User-Agent', 'Mozilla/5.0')
r = urllib.request.urlopen(req)
# Now we need to read the response content and save to a file

We could also use the requests library which would look a bit cleaner (in my opinion):

import requests
r = requests.get(self.url, headers={'User-Agent', 'Mozilla/5.0'})
# Now we need to read the response content and save to a file

Is this something I can make a PR for, and if so what method would be preferred?

@joeyjurjens joeyjurjens changed the title LazyRemoteFile raises 403 forbidden error because of urlretrieve headers LazyRemoteFile sometimes raises 403 forbidden error because of urlretrieve headers Feb 23, 2022
@specialunderwear
Copy link
Member

Please use just urllib, we don't have a lot of requests we are doing and keeping the dependencies minimal is a goal of this project. Please make sure the User-Agent has a sane default, but can be overridden by a setting. Provide some example settings to emulate common browsers in the documentation.

@maerteijn
Copy link
Member

☝️

@joeyjurjens
Copy link
Author

Fixed in #288

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants