Make fetching work when package is in Amazon S3 #2843

torarvid · 2024-04-05T22:21:05Z

Summary

When a package is hosted on Amazon S3 (sometimes the case when using Gemfury as a private repository), there might be a redirect from the Gemfury link to the S3 link. The S3 link will be different for HEAD and GET requests. For this reason, we need to use the original Gemfury link when passing arguments to the range reader.

Fixes #2025

Test Plan

Just tested locally where I had a repro of the issue.

When a package is hosted on Amazon S3 (sometimes the case when using Gemfury as a private repository), there might be a redirect from the Gemfury link to the S3 link. The S3 link will be different for HEAD and GET requests. For this reason, we need to use the original Gemfury link when passing arguments to the range reader.

zanieb · 2024-04-06T15:15:47Z

I'll have to think about this some more, we've very explicitly used the response URL over the request URL in the past. I worry this could break other user's workflows without further consideration.

cc @baszalmstra perhaps your team would find this an interesting async_http_range_reader edge case.

Do you know how pip handles this case? I guess they're just not doing HEAD requests.

charliermarsh · 2024-04-06T15:23:38Z

pip does do a HEAD request: https://github.com/pypa/pip/blob/7c49d06ea4be4635561f16a524e3842817d1169a/src/pip/_internal/network/lazy_wheel.py#L52. But I think you need to pass --use-feature=fast-deps to enable range requests.

charliermarsh · 2024-04-06T15:24:33Z

Oh, but they definitely don't use the response URL from the HEAD when making subsequent GET requests. So that part might be wrong?

charliermarsh · 2024-04-06T15:25:58Z

My read is that we should consider passing the URL explicitly to async_http_range_reader, and not relying on the response URL in those places. We typically need to use the response URL when (e.g.) the response returns relative paths and there was a redirect. But this is a bit different: it should be the same URL. (I might be totally misunderstanding the issue.)

baszalmstra · 2024-04-06T16:00:26Z

Thanks for the ping. I think there is a case for both options. I guess if a server reports a different redirect url based on the http method this could be problematic.

I would be happy accept a pr in async_http_range_reader whatever you decide.

torarvid · 2024-04-06T20:23:21Z

@charliermarsh @zanieb Just in case you didn't see it, I wrote about what led me to this PR in the comments of #2025. To sum up: I don't think this is the one-and-only right way to fix this, it's merely a proof of concept that seemed to get me past the problem 😊

Having said that, I thought the api of that range reader was a little bit strange. To me, the part where you give it the headers of the response so that it can determine whether the server supports range requests seems perfectly natural. But it's less clear to me that the URL used to make those range requests need to be the url in the response (and not the url in the original request). I guess I'm somewhat biased since I've made this PR that hacks around the fact that this url can't be overridden 😊

charliermarsh · 2024-04-06T20:28:39Z

Yeah, I’m fairly confident we should change it to reuse the original URL. We just need to verify that it won’t regress a few other cases where we explicitly need to use a response URL. (For example, if a registry returns relative paths, those need to be relative to the response URL in the event of a redirect. But that’s a different case than the range requests we’re doing here.)

zanieb · 2024-04-08T21:08:03Z

Started the upstream changes at prefix-dev/async_http_range_reader#11

charliermarsh · 2024-05-08T14:45:38Z

Superseded by #3460.

zanieb self-assigned this Apr 6, 2024

zanieb mentioned this pull request Apr 8, 2024

feat: Allow the request URL to be used for subsequent requests prefix-dev/async_http_range_reader#11

Closed

zanieb mentioned this pull request Apr 24, 2024

403 Forbidden Error when downloading dependencies from AWS/Gemfury with uv pip compile #3255

Closed

charliermarsh mentioned this pull request May 8, 2024

Upgrade async_http_range_reader to v0.8.0 #3460

Merged

charliermarsh closed this in 18d229e May 8, 2024

charliermarsh closed this in #3460 May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make fetching work when package is in Amazon S3 #2843

Make fetching work when package is in Amazon S3 #2843

torarvid commented Apr 5, 2024

zanieb commented Apr 6, 2024

charliermarsh commented Apr 6, 2024

charliermarsh commented Apr 6, 2024

charliermarsh commented Apr 6, 2024

baszalmstra commented Apr 6, 2024

torarvid commented Apr 6, 2024

charliermarsh commented Apr 6, 2024

zanieb commented Apr 8, 2024

charliermarsh commented May 8, 2024

Make fetching work when package is in Amazon S3 #2843

Make fetching work when package is in Amazon S3 #2843

Conversation

torarvid commented Apr 5, 2024

Summary

Test Plan

zanieb commented Apr 6, 2024

charliermarsh commented Apr 6, 2024

charliermarsh commented Apr 6, 2024

charliermarsh commented Apr 6, 2024

baszalmstra commented Apr 6, 2024

torarvid commented Apr 6, 2024

charliermarsh commented Apr 6, 2024

zanieb commented Apr 8, 2024

charliermarsh commented May 8, 2024