Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest release v1.40 has broken get_archive_link #830

Closed
TheSeubert opened this issue Jun 28, 2018 · 8 comments
Closed

Latest release v1.40 has broken get_archive_link #830

TheSeubert opened this issue Jun 28, 2018 · 8 comments
Labels

Comments

@TheSeubert
Copy link

TheSeubert commented Jun 28, 2018

Since the push of v1.40 and change to requests for HTTP instead of httplib, it has broken the get_archive_link method.

Simple Test:

import github
git = github.Github()
repo = git.get_repo('PyGithub/PyGithub')
print(repo.get_archive_link('zipball', 'master'))

Returns the error:

  File "/usr/local/lib/python3.5/site-packages/github/Repository.py", line 1219, in get_archive_link
    return headers["location"]
KeyError: 'location'

A quick debug reveals that the requests library is following the request from GET /repos/:owner/:repo/:archive_format/:ref, so instead of returning the header info with an expected location, it returns the actual archive zip file:

{'content-disposition': 'attachment; filename=PyGithub-PyGithub-v1.40-0-gd897e86.zip', 'x-content-type-options': 'nosniff', 'access-control-allow-origin': 'https://render.githubusercontent.com', 'x-geo-block-list': '', 'date': 'Thu, 28 Jun 2018 20:49:34 GMT', 'vary': 'Authorization,Accept-Encoding', 'content-length': '3232597', 'strict-transport-security': 'max-age=31536000', 'etag': '"d897e86aa9aa6469971b3e2201ead39cf1aebb62"', 'content-security-policy': "default-src 'none'; style-src 'unsafe-inline'; sandbox", 'content-type': 'application/zip', 'x-github-request-id': 'F53F:2415:309135:4E96ED:5B3549DE', 'x-frame-options': 'deny', 'x-xss-protection': '1; mode=block'}
@TheSeubert TheSeubert changed the title Latest Release v1.40 Has Broken get_archive_link Latest release v1.40 has broken get_archive_link Jun 28, 2018
@sfdye sfdye added the bug label Jun 29, 2018
@sfdye
Copy link
Member

sfdye commented Jun 29, 2018

ping @mikeage @mfonville

It seems this is caused by the requests's behavior to follow http redirect by default, while httplib won't (which is the case before 1.40).

One solution I can think of is to add a allow_redirects=False to the following lines:

https://github.com/PyGithub/PyGithub/blob/master/github/Requester.py#L107
https://github.com/PyGithub/PyGithub/blob/master/github/Requester.py#L133

What do you guys think?

@mfonville
Copy link
Contributor

It seems an easy fix. I am not sure if we would introduce bugs in other places that rely on automatic redirection. Though we already do our own redirect for 301 at https://github.com/PyGithub/PyGithub/blob/master/github/Requester.py#L411 but I don't see 302s handled anywhere.

After reading GitHub's API-documentation, where they indeed literally specify that the 'data' it will return is a header with 301 as target location, I am in doubt whether requestJsonAndCheck is actually the appropriate method for this.
Shouldn't we add a new method, e.g. requestHeaderAndCheck that could set and pass allow_redirects=False through the relevant methods within the Requester class?

@mikeage
Copy link
Contributor

mikeage commented Jun 29, 2018

I'd be surprised if we rely on automatic redirection given that httplib didn't do it.

I think adding allow_redirects=False is definitely the quickest fix. I'm not sure about requestJsonAndCheck vs requestHeaderAndCheck; it sounds logical, but I'd want to review a lot more of the code before I committed myself to that answer.

@sfdye
Copy link
Member

sfdye commented Jun 29, 2018

FYI 302 is handled in a few places, like:

if status == 302:

and
if headers.get('status') == '302 Found' and headers.get('location'):

I am thinking since we are trying to mimic httplib with requests, setting allow_redirects=False should have the minimal impact on existing behavior?

@mfonville
Copy link
Contributor

mfonville commented Jun 29, 2018

Thanks for pointing to the 302s. Mimicking httplib then with allow_redirects=False is indeed the way to go with the current codebase.

We just have to keep an eye we're not painting ourselves in a corner with over-instructing Requests. Because for a better implementation of #803 and some of the issues reported on proxies, shaky connections etc, the best way ahead is to allow passing of a Requests-parameters (or object) to PyGithub. So it would be best if PyGithub does not rely (too many) Requests-specific parameters. (Though I don't see that as an obstacle concerning the allow_redirects)

@mikeage
Copy link
Contributor

mikeage commented Jun 29, 2018

Also, for your consideration: mimicking httplib made this an easy drop in replacement, but long term (and I know this depends more on the testing framework than anything else), is that the right direction? The basic requests API is much easier than sticking to httplib, IMHO.

@djstein
Copy link

djstein commented Aug 7, 2018

currently experiencing: moving down to 1.39 has OpenSSL errors, 1.40 has this error. any thoughts on a resolution or has a pull request been created?

@djstein
Copy link

djstein commented Aug 7, 2018

created a pull request here.... #858

sfdye pushed a commit that referenced this issue Aug 19, 2018
Fixes #830

Latest release v1.40 has broken get_archive_link
#830
candrikos pushed a commit to candrikos/PyGithub that referenced this issue Sep 25, 2020
Fixes PyGithub#830

Latest release v1.40 has broken get_archive_link
PyGithub#830
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants