-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use requests to download files instead of urllib2 AND extend SSL_VERIFY option to these requests (for testing) #145
Conversation
Perhaps @metaodi this is something you might have a chance to review? |
@davidread this looks very good and is a step in the right direction. If you like I could try this code next week on one of my test instances, but I see currently nothing that could break. |
Thanks for that, and trying it next week would be great!
…On Thu, 31 Aug 2017 at 19:00, Stefan Oderbolz ***@***.***> wrote:
@davidread <https://github.com/davidread> this looks very good and is a
step in the right direction. If you like I could try this code next week on
one of my test instances, but I see currently nothing that could break.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#145 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASxnEXinMzW8IQSVf5R2mPBT0k7rKPZks5sdvTTgaJpZM4PJH2l>
.
|
@metaodi I don't suppose... |
@davidread I knew I forgot something... Sorry! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the KeyError
everything works fine as expected.
datapusher/jobs.py
Outdated
|
||
response = urllib2.urlopen(request, timeout=DOWNLOAD_TIMEOUT) | ||
except urllib2.HTTPError as e: | ||
cl = response.headers['content-length'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my instance this leads to a KeyError
, as not all servers provide the content-length header. use response.headers.get('content-length')
instead.
@davidread just ran into this again. Did you have a chance to fix the "typo" so we can merge this? |
@metaodi Thanks for doing the testing and reminder. I've added your fix for the issue you spotted - much appreciated. |
@smotornyuk is this something you'd be ok to merge for us? |
Oh, hold on I have another change |
…ant to encourage turning off verification by adding this. Noted in the docs that SSL_VERIFY only applies to CKAN API calls. Also noted that it doesnt work anyway...
Ok @smotornyuk, this is ready for review & merge, if that's ok? |
Oh wait, why would you not use SSL_VERIFY for the download? I actually would like this behaviour. I consider SSL_VERIFY useful for test envirionments with self-signed certificates. And this applies to both APIs and the download (e.g. for files uploaded to CKAN). I'd very much prefer if you'd revert your last change. |
Ah, that's a good point. |
…ses where self-certified certs are used.
Ok, how's this? |
Perfect, thanks! 👍 |
@smotornyuk this is ready for review - do you have time or shall I ask others? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@smotornyuk @davidread is there somebody who could review and/or merge this? Btw: I'd be happy to help here (all I need is write access 😉) |
@metaodi ask and you shall receive :) Thanks for your help |
Sorry if it was not clear, you have now write access @metaodi |
@amercader thank you! 🎉 |
It seems a no-brainer to use requests instead of urllib2 for this key step of downloading the data file that gets pushed because requests is superior in lots of ways.
Since python 2.7.9, urllib2 went from not verifying certificates at all to verifying them against the OS's built-in list. My experience of the latter is that it goes out of date and even big sites like github.com start getting rejected. Requests handles this much better imo because it updates more frequently via the certifi python module (installed with
requests[security]
. I've used this in ckanext-archiver in a similar for several years with success, but successfully checking HTTPS certs.I've checked the tests pass, but it's not tried in production or battle-hardened, so could do with double-checking.