Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make fetching work when package is in Amazon S3 #2843

Closed
wants to merge 1 commit into from

Conversation

torarvid
Copy link

@torarvid torarvid commented Apr 5, 2024

Summary

When a package is hosted on Amazon S3 (sometimes the case when using Gemfury as a private repository), there might be a redirect from the Gemfury link to the S3 link. The S3 link will be different for HEAD and GET requests. For this reason, we need to use the original Gemfury link when passing arguments to the range reader.

Fixes #2025

Test Plan

Just tested locally where I had a repro of the issue.

When a package is hosted on Amazon S3 (sometimes the case when using
Gemfury as a private repository), there might be a redirect from the
Gemfury link to the S3 link. The S3 link will be different for HEAD and
GET requests. For this reason, we need to use the original Gemfury link
when passing arguments to the range reader.
@zanieb zanieb self-assigned this Apr 6, 2024
@zanieb
Copy link
Member

zanieb commented Apr 6, 2024

I'll have to think about this some more, we've very explicitly used the response URL over the request URL in the past. I worry this could break other user's workflows without further consideration.

cc @baszalmstra perhaps your team would find this an interesting async_http_range_reader edge case.

Do you know how pip handles this case? I guess they're just not doing HEAD requests.

@charliermarsh
Copy link
Member

pip does do a HEAD request: https://github.com/pypa/pip/blob/7c49d06ea4be4635561f16a524e3842817d1169a/src/pip/_internal/network/lazy_wheel.py#L52. But I think you need to pass --use-feature=fast-deps to enable range requests.

@charliermarsh
Copy link
Member

Oh, but they definitely don't use the response URL from the HEAD when making subsequent GET requests. So that part might be wrong?

@charliermarsh
Copy link
Member

My read is that we should consider passing the URL explicitly to async_http_range_reader, and not relying on the response URL in those places. We typically need to use the response URL when (e.g.) the response returns relative paths and there was a redirect. But this is a bit different: it should be the same URL. (I might be totally misunderstanding the issue.)

@baszalmstra
Copy link
Contributor

Thanks for the ping. I think there is a case for both options. I guess if a server reports a different redirect url based on the http method this could be problematic.

I would be happy accept a pr in async_http_range_reader whatever you decide.

@torarvid
Copy link
Author

torarvid commented Apr 6, 2024

@charliermarsh @zanieb Just in case you didn't see it, I wrote about what led me to this PR in the comments of #2025. To sum up: I don't think this is the one-and-only right way to fix this, it's merely a proof of concept that seemed to get me past the problem 😊

Having said that, I thought the api of that range reader was a little bit strange. To me, the part where you give it the headers of the response so that it can determine whether the server supports range requests seems perfectly natural. But it's less clear to me that the URL used to make those range requests need to be the url in the response (and not the url in the original request). I guess I'm somewhat biased since I've made this PR that hacks around the fact that this url can't be overridden 😊

@charliermarsh
Copy link
Member

Yeah, I’m fairly confident we should change it to reuse the original URL. We just need to verify that it won’t regress a few other cases where we explicitly need to use a response URL. (For example, if a registry returns relative paths, those need to be relative to the response URL in the event of a redirect. But that’s a different case than the range requests we’re doing here.)

@zanieb
Copy link
Member

zanieb commented Apr 8, 2024

Started the upstream changes at prefix-dev/async_http_range_reader#11

@charliermarsh
Copy link
Member

Superseded by #3460.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

uv pip install returning 403 from private pypi cloud instance backed by s3
4 participants