LazyRemoteFile sometimes raises 403 forbidden error because of urlretrieve headers #284

joeyjurjens · 2022-02-23T15:18:46Z

First of all; It does not raise a 403 all the time, but lately I've stumbled upon it quite a few times.

The LazyRemoteFile uses urlretrieve to download images from a given url and saves it to a file.

django-oscar-api/oscarapi/serializers/fields.py

Line 310 in aba90b0

local_filename, _ = urlretrieve(self.url, self.name)

However, the user-agent it uses by default seems to get blocked by quite a few websites.
Unfortunately, urlretrieve doesn't allow us setting requests headers.

If we want to pass headers with urllib, we could do so as following:

import urllib.request
req = urllib.request.Request('http://www.example.com/')
req.add_header('User-Agent', 'Mozilla/5.0')
r = urllib.request.urlopen(req)
# Now we need to read the response content and save to a file

We could also use the requests library which would look a bit cleaner (in my opinion):

import requests
r = requests.get(self.url, headers={'User-Agent', 'Mozilla/5.0'})
# Now we need to read the response content and save to a file

Is this something I can make a PR for, and if so what method would be preferred?

specialunderwear · 2022-02-23T15:23:01Z

Please use just urllib, we don't have a lot of requests we are doing and keeping the dependencies minimal is a goal of this project. Please make sure the User-Agent has a sane default, but can be overridden by a setting. Provide some example settings to emulate common browsers in the documentation.

maerteijn · 2022-02-23T15:49:05Z

☝️

joeyjurjens · 2022-05-09T13:55:10Z

Fixed in #288

joeyjurjens changed the title ~~LazyRemoteFile raises 403 forbidden error because of urlretrieve headers~~ LazyRemoteFile sometimes raises 403 forbidden error because of urlretrieve headers Feb 23, 2022

joeyjurjens closed this as completed May 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LazyRemoteFile sometimes raises 403 forbidden error because of urlretrieve headers #284

LazyRemoteFile sometimes raises 403 forbidden error because of urlretrieve headers #284

joeyjurjens commented Feb 23, 2022 •

edited

Loading

specialunderwear commented Feb 23, 2022

maerteijn commented Feb 23, 2022

joeyjurjens commented May 9, 2022

LazyRemoteFile sometimes raises 403 forbidden error because of urlretrieve headers #284

LazyRemoteFile sometimes raises 403 forbidden error because of urlretrieve headers #284

Comments

joeyjurjens commented Feb 23, 2022 • edited Loading

specialunderwear commented Feb 23, 2022

maerteijn commented Feb 23, 2022

joeyjurjens commented May 9, 2022

joeyjurjens commented Feb 23, 2022 •

edited

Loading