Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify PUT location with chunked upload #415

Open
rogpeppe opened this issue May 21, 2023 · 4 comments
Open

Clarify PUT location with chunked upload #415

rogpeppe opened this issue May 21, 2023 · 4 comments

Comments

@rogpeppe
Copy link
Contributor

When uploading a blob in chunks, the spec says:

To upload a chunk, issue a PATCH request to a URL path in the following format, and with the following headers and body:

URL path: <location>

[...]

The <location> refers to the URL obtained from the preceding POST request.

This implies to me that <location> can change with each successive POST, so there are potentially
n + 1 locations in play for n chunks (one for the initial POST and one for each PATCH).

The final PUT request is documented as follows:

To close the session, issue a PUT request to a url in the following format, and with the following headers (and optional body, depending on whether or not the final chunk was uploaded already via a PATCH request):

<location>?digest=<digest>

This doesn't make it clear which <location> should be used. Should it be the location returned by the most recent PATCH request, or the <location> returned by the original POST?

@sudo-bmitch
Copy link
Contributor

My understanding is the last <location> value is the one always used for the next request, and registries will reject the usage of older values. It's certainly worth clarifying.

@rogpeppe
Copy link
Contributor Author

That certainly seems to be the case in the registries I've tried. The other language that I believe could do with clarifying is this:

To get the current status after a 416 error, issue a GET request to a URL

The refers to the URL obtained from any preceding POST or PATCH request.

That "any" sounds like it's OK to use the location from any of the sequence of previous POST or PATCH requests, but that appears not to be the case: when experimenting with the docker registry, it requires the most recent location.

There's actually a potential problem with that AFAICS: if a client is in the middle of a large upload and a network outage prevents a response reaching a client, the client might not be able to resume because they won't have the latest location value as expected by the server. I guess it's too late to change that now.

@sudo-bmitch
Copy link
Contributor

I wouldn't say things are too late to change. See #366 that was adding this recently. Chunked uploads aren't well supported, the docker engine uses them with a single large chunk when pushing images. I'm not aware of any client tooling that defaults to chunked uploads. So this is one of the safer areas to clarify without risking breaking existing use cases.

@rogpeppe
Copy link
Contributor Author

@sudo-bmitch One other possible change to make in that area would be to specify that the 416 response itself could contain information sufficient to make another correct PATCH request; for example, it could contain Location and Range headers like the GET response. That would avoid the need for the extra round trip in most cases, AFAICS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants