purl.obolibrary.org URLs return 403 when using urllib.request (Python) #923

nklsbckmnn · 2023-06-05T16:18:24Z

No description provided.

jamesaoverton · 2023-06-05T16:28:50Z

Please provide example PURLs and code so we can try to replicate.

nklsbckmnn · 2023-06-05T16:40:46Z

import urllib.request

url = "http://purl.obolibrary.org/obo/hp.owl" 

response = urllib.request.urlopen(url)

matentzn · 2023-06-05T16:52:15Z

Is this behaviour new?

This works:

from urllib import request
from urllib.request import Request, urlopen
 
url = "https://purl.obolibrary.org/obo/hp.owl"
request_site = Request(url, headers={"User-Agent": "Mozilla/5.0"})
webpage = urlopen(request_site)

But I am wondering if the recent changes to PURL system now cause:

https://www.pythonpool.com/urllib-error-httperror-http-error-403-forbidden

The wheregoes trace works:

https://wheregoes.com/trace/20232599746/

nklsbckmnn · 2023-06-05T16:55:49Z

Yes it's new. I think it still worked on Friday. Maybe some abuse-suspecting user agent block by GitHub?

nklsbckmnn · 2023-06-05T16:59:49Z

Although requesting https://github.com/obophenotype/human-phenotype-ontology/releases/latest/download/hp.owl or https://github.com/obophenotype/human-phenotype-ontology/releases/download/v2023-04-05/hp.owl works.

matentzn · 2023-06-05T17:00:06Z

On Friday we changed something in our PURL config, cc @kltm, but it is a bit odd that all other tools other than urllib.request work - wget / curl / wheregoes.

Thanks for the report!

jamesaoverton · 2023-06-05T17:12:22Z

EDIT: Posted too soon, the following is incorrect.

I think the problem is GitHub, not the PURL server. The PURL server redirects http://purl.obolibrary.org/obo/hp.owl to https://github.com/obophenotype/human-phenotype-ontology/releases/latest/download/hp.owl ( https://github.com/OBOFoundry/purl.obolibrary.org/blob/master/config/hp.yml#LL9C11-L9C99). This code gives me a 403:

import urllib.request
url = "https://github.com/obophenotype/human-phenotype-ontology/releases/latest/download/hp.owl" 
response = urllib.request.urlopen(url)

matentzn · 2023-06-05T17:25:35Z

Normal requests also works:

import requests
r = requests.get('http://purl.obolibrary.org/obo/omo.owl', allow_redirects=True)
open('omo.owl', 'wb').write(r.content)

matentzn · 2023-06-05T17:30:19Z

Using @eliasweatherfield code in try catch also works:

import urllib.request

url = "http://purl.obolibrary.org/obo/omo.owl"

try:
    response = urllib.request.urlopen(url)
except:
    print("Ignore this error")

print(response.read(100))

This suggests that the request is successful, but the error is thrown regardless.

jamesaoverton · 2023-06-05T17:39:50Z

I can confirm the 403 described by @eliasweatherfield in Python 3.9 and 3.11. I think @matentzn is seeing an old response object, because I get a NameError: name 'response' is not defined error from the final line print(response.read(100)).

jamesaoverton · 2023-06-05T17:51:29Z

Ok, now I think that Cloudflare is rejecting the request, which makes sense given the timing of this issue:

import urllib.request
url = "http://purl.obolibrary.org/obo/hp.owl"
try:
    response = urllib.request.urlopen(url)
except urllib.error.HTTPError as e:
    print(e)
    print(e.code)
    print(e.reason)
    print(e.headers)

HTTP Error 403: Forbidden
403
Forbidden
Date: Mon, 05 Jun 2023 17:48:43 GMT
Content-Type: text/plain; charset=UTF-8
Content-Length: 16
Connection: close
X-Frame-Options: SAMEORIGIN
Referrer-Policy: same-origin
Cache-Control: private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Expires: Thu, 01 Jan 1970 00:00:01 GMT
Vary: Accept-Encoding
Server: cloudflare
CF-RAY: 7d2a3f81995dcab8-YYZ

jamesaoverton · 2023-06-05T17:53:14Z

I think the cause is Cloudflare's Browser Integrity Check, which is a security setting that can be turned off: https://developers.cloudflare.com/support/firewall/settings/understanding-the-cloudflare-browser-integrity-check/

kltm · 2023-06-05T18:21:56Z

@jamesaoverton I believe that I've turned off BIC for this domain (Cloudflare docs are apparently wildly out of date and not great to begin with).

jamesaoverton · 2023-06-05T18:36:36Z

Thanks @kltm! I'm now getting a 200 response from the first test code posted above -- no more error.

@eliasweatherfield Can you confirm that this is now working for you?

nklsbckmnn · 2023-06-05T18:51:03Z

Yes, it's working again. Thanks everyone.

jamesaoverton · 2023-06-05T19:08:15Z

Thanks for the report!

nklsbckmnn changed the title ~~purl.obolibrary.org URLs return 403 when using urlib.request (Python)~~ purl.obolibrary.org URLs return 403 when using urllib.request (Python) Jun 5, 2023

jamesaoverton closed this as completed Jun 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

purl.obolibrary.org URLs return 403 when using urllib.request (Python) #923

purl.obolibrary.org URLs return 403 when using urllib.request (Python) #923

nklsbckmnn commented Jun 5, 2023

jamesaoverton commented Jun 5, 2023

nklsbckmnn commented Jun 5, 2023 •

edited

matentzn commented Jun 5, 2023

nklsbckmnn commented Jun 5, 2023

nklsbckmnn commented Jun 5, 2023

matentzn commented Jun 5, 2023

jamesaoverton commented Jun 5, 2023 •

edited

matentzn commented Jun 5, 2023 •

edited

matentzn commented Jun 5, 2023 •

edited

jamesaoverton commented Jun 5, 2023

jamesaoverton commented Jun 5, 2023

jamesaoverton commented Jun 5, 2023

kltm commented Jun 5, 2023

jamesaoverton commented Jun 5, 2023

nklsbckmnn commented Jun 5, 2023

jamesaoverton commented Jun 5, 2023

purl.obolibrary.org URLs return 403 when using urllib.request (Python) #923

purl.obolibrary.org URLs return 403 when using urllib.request (Python) #923

Comments

nklsbckmnn commented Jun 5, 2023

jamesaoverton commented Jun 5, 2023

nklsbckmnn commented Jun 5, 2023 • edited

matentzn commented Jun 5, 2023

nklsbckmnn commented Jun 5, 2023

nklsbckmnn commented Jun 5, 2023

matentzn commented Jun 5, 2023

jamesaoverton commented Jun 5, 2023 • edited

matentzn commented Jun 5, 2023 • edited

matentzn commented Jun 5, 2023 • edited

jamesaoverton commented Jun 5, 2023

jamesaoverton commented Jun 5, 2023

jamesaoverton commented Jun 5, 2023

kltm commented Jun 5, 2023

jamesaoverton commented Jun 5, 2023

nklsbckmnn commented Jun 5, 2023

jamesaoverton commented Jun 5, 2023

nklsbckmnn commented Jun 5, 2023 •

edited

jamesaoverton commented Jun 5, 2023 •

edited

matentzn commented Jun 5, 2023 •

edited

matentzn commented Jun 5, 2023 •

edited