Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use requests to download files instead of urllib2 AND extend SSL_VERIFY option to these requests (for testing) #145

Merged
merged 4 commits into from
Oct 12, 2017

Conversation

davidread
Copy link
Contributor

It seems a no-brainer to use requests instead of urllib2 for this key step of downloading the data file that gets pushed because requests is superior in lots of ways.

Since python 2.7.9, urllib2 went from not verifying certificates at all to verifying them against the OS's built-in list. My experience of the latter is that it goes out of date and even big sites like github.com start getting rejected. Requests handles this much better imo because it updates more frequently via the certifi python module (installed with requests[security]. I've used this in ckanext-archiver in a similar for several years with success, but successfully checking HTTPS certs.

I've checked the tests pass, but it's not tried in production or battle-hardened, so could do with double-checking.

@davidread
Copy link
Contributor Author

Perhaps @metaodi this is something you might have a chance to review?

@metaodi
Copy link
Member

metaodi commented Aug 31, 2017

@davidread this looks very good and is a step in the right direction. If you like I could try this code next week on one of my test instances, but I see currently nothing that could break.

@davidread
Copy link
Contributor Author

davidread commented Aug 31, 2017 via email

@davidread
Copy link
Contributor Author

@metaodi I don't suppose...

@metaodi
Copy link
Member

metaodi commented Sep 9, 2017

@davidread I knew I forgot something... Sorry!

Copy link
Member

@metaodi metaodi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the KeyError everything works fine as expected.


response = urllib2.urlopen(request, timeout=DOWNLOAD_TIMEOUT)
except urllib2.HTTPError as e:
cl = response.headers['content-length']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my instance this leads to a KeyError, as not all servers provide the content-length header. use response.headers.get('content-length') instead.

@metaodi
Copy link
Member

metaodi commented Sep 28, 2017

@davidread just ran into this again. Did you have a chance to fix the "typo" so we can merge this?

@davidread
Copy link
Contributor Author

@metaodi Thanks for doing the testing and reminder. I've added your fix for the issue you spotted - much appreciated.

@davidread
Copy link
Contributor Author

@smotornyuk is this something you'd be ok to merge for us?

@davidread
Copy link
Contributor Author

Oh, hold on I have another change

@davidread davidread changed the title Use requests to download files instead of urllib2 [WIP] Use requests to download files instead of urllib2 Sep 29, 2017
davidread referenced this pull request in 6aika/datapusher Sep 29, 2017
…ant to encourage turning off verification by adding this. Noted in the docs that SSL_VERIFY only applies to CKAN API calls. Also noted that it doesnt work anyway...
@davidread davidread changed the title [WIP] Use requests to download files instead of urllib2 Use requests to download files instead of urllib2 Sep 29, 2017
@davidread
Copy link
Contributor Author

Ok @smotornyuk, this is ready for review & merge, if that's ok?

@metaodi
Copy link
Member

metaodi commented Sep 29, 2017

Oh wait, why would you not use SSL_VERIFY for the download? I actually would like this behaviour. I consider SSL_VERIFY useful for test envirionments with self-signed certificates. And this applies to both APIs and the download (e.g. for files uploaded to CKAN).

I'd very much prefer if you'd revert your last change.

@davidread
Copy link
Contributor Author

Ah, that's a good point.

@davidread davidread changed the title Use requests to download files instead of urllib2 Use requests to download files instead of urllib2 AND extend SSL_VERIFY option to these requests (for testing) Sep 29, 2017
@davidread
Copy link
Contributor Author

Ok, how's this?

@metaodi
Copy link
Member

metaodi commented Sep 29, 2017

Perfect, thanks! 👍

@davidread
Copy link
Contributor Author

@smotornyuk this is ready for review - do you have time or shall I ask others?

Copy link
Member

@metaodi metaodi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@metaodi
Copy link
Member

metaodi commented Oct 12, 2017

@smotornyuk @davidread is there somebody who could review and/or merge this? Btw: I'd be happy to help here (all I need is write access 😉)

@amercader
Copy link
Member

@metaodi ask and you shall receive :)

Thanks for your help

@amercader
Copy link
Member

Sorry if it was not clear, you have now write access @metaodi

@metaodi metaodi merged commit c258e29 into master Oct 12, 2017
@metaodi metaodi deleted the use-requests branch October 12, 2017 12:23
@metaodi
Copy link
Member

metaodi commented Oct 12, 2017

@amercader thank you! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants