You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, the user-agent it uses by default seems to get blocked by quite a few websites.
Unfortunately, urlretrieve doesn't allow us setting requests headers.
If we want to pass headers with urllib, we could do so as following:
importurllib.requestreq=urllib.request.Request('http://www.example.com/')
req.add_header('User-Agent', 'Mozilla/5.0')
r=urllib.request.urlopen(req)
# Now we need to read the response content and save to a file
We could also use the requests library which would look a bit cleaner (in my opinion):
importrequestsr=requests.get(self.url, headers={'User-Agent', 'Mozilla/5.0'})
# Now we need to read the response content and save to a file
Is this something I can make a PR for, and if so what method would be preferred?
The text was updated successfully, but these errors were encountered:
joeyjurjens
changed the title
LazyRemoteFile raises 403 forbidden error because of urlretrieve headers
LazyRemoteFile sometimes raises 403 forbidden error because of urlretrieve headers
Feb 23, 2022
Please use just urllib, we don't have a lot of requests we are doing and keeping the dependencies minimal is a goal of this project. Please make sure the User-Agent has a sane default, but can be overridden by a setting. Provide some example settings to emulate common browsers in the documentation.
First of all; It does not raise a 403 all the time, but lately I've stumbled upon it quite a few times.
The LazyRemoteFile uses urlretrieve to download images from a given url and saves it to a file.
django-oscar-api/oscarapi/serializers/fields.py
Line 310 in aba90b0
However, the user-agent it uses by default seems to get blocked by quite a few websites.
Unfortunately, urlretrieve doesn't allow us setting requests headers.
If we want to pass headers with urllib, we could do so as following:
We could also use the requests library which would look a bit cleaner (in my opinion):
Is this something I can make a PR for, and if so what method would be preferred?
The text was updated successfully, but these errors were encountered: