Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: JSON Registry API V2.1 #9015

Closed
3 of 9 tasks
stevvooe opened this issue Nov 6, 2014 · 142 comments
Closed
3 of 9 tasks

Proposal: JSON Registry API V2.1 #9015

stevvooe opened this issue Nov 6, 2014 · 142 comments
Labels
kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny

Comments

@stevvooe
Copy link
Contributor

stevvooe commented Nov 6, 2014

Proposal: JSON Registry API V2.1

Abstract

The docker registry is a service to manage information about docker images and enable their distribution. While the current registry is usable, there are several problems with the architecture that have led to this proposal. For relevant details, please see the following issues:

The main driver of this proposal are changes to the docker the image format, covered in #8093. The new, self-contained image manifest simplifies the image definition and the underlying backend layout. To reduce bandwidth usage, the new registry will be architected to avoid uploading existing layers and will support resumable layer uploads.

While out of scope for this specification, the URI layout of the new API will be structured to support a rich Authentication and Authorization model by leveraging namespaces.

Furthermore, to bring docker registry in line with docker core, the registry is written in Go.

Scope

This proposal covers the URL layout and protocols of the Docker Registry V2 JSON API. This will affect the docker core registry API and the rewrite of docker-registry.

This includes the following features:

  • Namespace-oriented URI Layout
  • PUSH/PULL registry server for V2 image manifest format
  • Resumable layer PUSH support
  • V2 Client library implementation

While authentication and authorization support will influence this specification, details of the protocol will be left to a future specification. Other features marked as next generation will be incorporated when the initial support is complete. Please see the road map for details.

Use Cases

For the most part, the use cases of the former registry API apply to the new version. Differentiating uses cases are covered below.

Resumable Push

Company X's build servers lose connectivity to docker registry before completing an image layer transfer. After connectivity returns, the build server attempts to re-upload the image. The registry notifies the build server that the upload has already been partially attempted. The build server responds by only sending the remaining data to complete the image file.

Resumable Pull

Company X is having more connectivity problems but this time in their deployment datacenter. When downloading an image, the connection is interrupted before completion. The client keeps the partial data and uses http Range requests to avoid downloading repeated data.

Layer Upload De-duplication

Company Y's build system creates two identical docker layers from build processes A and B. Build process A completes uploading the layer before B. When process B attempts to upload the layer, the registry indicates that its not necessary because the layer is already known.

If process A and B upload the same layer at the same time, both operations will proceed and the first to complete will be stored in the registry (Note: we may modify this to prevent dogpile with some locking mechanism).

Access Control

Company X would like to control which developers can push to which repositories. By leveraging the URI format of the V2 registry, they can control who is able to access which repository, who can pull images and who can push layers.

Dependencies

Initially, a V2 client will be developed in conjunction with the new registry service to facilitate rich testing and verification. Once this is ready, the new client will be used in docker to communicate with V2 registries.

Proposal

This section covers proposed client flows and details of the proposed API endpoints. All endpoints will be prefixed by the API version and the repository name:

/v2/<name>/

For example, an API endpoint that will work with the library/ubuntu repository, the URI prefix will be:

/v2/library/ubuntu/

This scheme will provide rich access control over various operations and methods using the URI prefix and http methods that can be controlled in variety of ways.

Classically, repository names have always been two path components where each path component is less than 30 characters. The V2 registry API does not enforce this. The rules for a repository name are as follows:

  1. A repository name is broken up into path components. A component of a repository name must be at least two characters, optionally separated by periods, dashes or underscores. More strictly, it must match the regular expression [a-z0-9]+(?:[._-][a-z0-9]+)* and the matched result must be 2 or more characters in length.
  2. The name of a repository must have at least two path components, separated by a forward slash.
  3. The total length of a repository name, including slashes, must be less the 256 characters.

These name requirements only apply to the registry API and should accept a superset of what is supported by other docker community components.

API Methods

A detailed list of methods and URIs are covered in the table below:

Method Path Entity Description
GET /v2/ Check Check that the endpoint implements Docker Registry API V2.
GET /v2/<name>/tags/list Tags Fetch the tags under the repository identified by name.
GET /v2/<name>/manifests/<tag> Manifest Fetch the manifest identified by name and tag.
PUT /v2/<name>/manifests/<tag> Manifest Put the manifest identified by name and tag.
DELETE /v2/<name>/manifests/<tag> Manifest Delete the manifest identified by name and tag.
GET /v2/<name>/blobs/<digest> Blob Retrieve the blob from the registry identified by digest.
HEAD /v2/<name>/blobs/<digest> Blob Check if the blob is known to the registry.
POST /v2/<name>/blobs/uploads/ Blob Upload Initiate a resumable blob upload. If successful, an upload location will be provided to complete the upload. Optionally, if the digest parameter is present, the request body will be used to complete the upload in a single request.
GET /v2/<name>/blobs/uploads/<uuid> Blob Upload Retrieve status of upload identified by uuid. The primary purpose of this endpoint is to resolve the current status of a resumable upload.
HEAD /v2/<name>/blobs/uploads/<uuid> Blob Upload Retrieve status of upload identified by uuid. This is identical to the GET request.
PATCH /v2/<name>/blobs/uploads/<uuid> Blob Upload Upload a chunk of data for the specified upload.
PUT /v2/<name>/blobs/uploads/<uuid> Blob Upload Complete the upload specified by uuid, optionally appending the body as the final chunk.
DELETE /v2/<name>/blobs/uploads/<uuid> Blob Upload Cancel outstanding upload processes, releasing associated resources. If this is not called, the unfinished uploads will eventually timeout.

All endpoints should support aggressive http caching, compression and range headers, where appropriate. Details of each method are covered in the following sections.

The new API will attempt to leverage HTTP semantics where possible but may break from standards to implement targeted features.

Errors

Actionable failure conditions, covered in detail in their relevant sections, will be reported as part of 4xx responses, in a json response body. One or more errors will be returned in the following format:

{
    "errors:" [{
            "code": <error identifier>,
            "message": <message describing condition>,
            "detail": <unstructured>
        },
        ...
    ]
}

The code field will be a unique identifier, all caps with underscores by convention. The message field will be a human readable string. The optional detail field may contain arbitrary json data providing information the client can use to resolve the issue.

The error codes encountered via the API are enumerated in the following table:

Code Message Description HTTPStatusCodes
UNKNOWN unknown error Generic error returned when the error does not have an API classification. Any
DIGEST_INVALID provided digest did not match uploaded content When a blob is uploaded, the registry will check that the content matches the digest provided by the client. The error may include a detail structure with the key "digest", including the invalid digest string. This error may also be returned when a manifest includes an invalid layer digest. 400, 404
SIZE_INVALID provided length did not match content length When a layer is uploaded, the provided size will be checked against the uploaded content. If they do not match, this error will be returned. 400
NAME_INVALID manifest name did not match URI During a manifest upload, if the name in the manifest does not match the uri name, this error will be returned. 400, 404
TAG_INVALID manifest tag did not match URI During a manifest upload, if the tag in the manifest does not match the uri tag, this error will be returned. 400, 404
NAME_UNKNOWN repository name not known to registry This is returned if the name used during an operation is unknown to the registry. 404
MANIFEST_UNKNOWN manifest unknown This error is returned when the manifest, identified by name and tag is unknown to the repository. 404
MANIFEST_INVALID manifest invalid During upload, manifests undergo several checks ensuring validity. If those checks fail, this error may be returned, unless a more specific error is included. The detail will contain information the failed validation. 400
MANIFEST_UNVERIFIED manifest failed signature verification During manifest upload, if the manifest fails signature verification, this error will be returned. 400
BLOB_UNKNOWN blob unknown to registry This error may be returned when a blob is unknown to the registry in a specified repository. This can be returned with a standard get or if a manifest references an unknown layer during upload. 400, 404
BLOB_UPLOAD_UNKNOWN blob upload unknown to registry If a blob upload has been cancelled or was never started, this error code may be returned. 404

While the client can take action on certain error codes, the registry may add new error codes over time. All client implementations should treat unknown error codes as UNKNOWN, allowing future error codes to be added without breaking API compatibility. For the purposes of the specification error codes will only be added and never removed.

API Version Check

A minimal endpoint, mounted at /v2/ will provide version support information based on its response statuses. The request format is as follows:

GET /v2/

If a 200 OK response is returned, the registry implements the V2(.1) registry API and the client may proceed safely with other V2 operations. Optionally, the response may contain information about the supported paths in the response body. The client should be prepared to ignore this data.

If a 401 Unauthorized response is returned, the client should take action based on the contents of the "WWW-Authenticate" header and try the endpoint again. Depending on access control setup, the client may still have to authenticate against different resources, even if this check succeeds.

If 404 Not Found response status, or other unexpected status, is returned, the client should proceed with the assumption that the registry does not implement V2 of the API.

Pulling An Image

An "image" is a combination of a JSON manifest and individual layer files. The process of pulling an image centers around retrieving these two components.

The first step in pulling an image is to retrieve the manifest. For reference, the relevant manifest fields for the registry are the following:

field description
name The name of the image.
tag The tag for this version of the image.
fsLayers A list of layer descriptors (including tarsum)
signature A JWS used to verify the manifest content

For more information about the manifest format, please see docker/docker#8093.

When the manifest is in hand, the client must verify the signature to ensure the names and layers are valid. Once confirmed, the client will then use the tarsums to download the individual layers. Layers are stored in as blobs in the V2 registry API, keyed by their tarsum digest.

The API details follow.

Pulling an Image Manifest

The image manifest can be fetched with the following url:

GET /v2/<name>/manifests/<tag>

The "name" and "tag" parameter identify the image and are required.

A 404 Not Found response will be returned if the image is unknown to the registry. If the image exists and the response is successful, the image manifest will be returned, with the following format (see #8093 for details):

{
   "name": <name>,
   "tag": <tag>,
   "fsLayers": [
      {
         "blobSum": <tarsum>
      },
      ...
    ]
   ],
   "history": <v1 images>,
   "signature": <JWS>
}

The client should verify the returned manifest signature for authenticity before fetching layers.

Pulling a Layer

Layers are stored in the blob portion of the registry, keyed by tarsum digest. Pulling a layer is carried out by a standard http request. The URL is as follows:

GET /v2/<name>/blobs/<tarsum>

Access to a layer will be gated by the name of the repository but is identified uniquely in the registry by tarsum. The tarsum parameter is an opaque field, to be interpreted by the tarsum library.

This endpoint may issue a 307 (302 for <HTTP 1.1) redirect to another service for downloading the layer and clients should be prepared to handle redirects.

This endpoint should support aggressive HTTP caching for image layers. Support for Etags, modification dates and other cache control headers should be included. To allow for incremental downloads, Range requests should be supported, as well.

Pushing An Image

Pushing an image works in the opposite order as a pull. After assembling the image manifest, the client must first push the individual layers. When the layers are fully pushed into the registry, the client should upload the signed manifest.

The details of each step of the process are covered in the following sections.

Pushing a Layer

All layer uploads use two steps to manage the upload process. The first step starts the upload in the registry service, returning a url to carry out the second step. The second step uses the upload url to transfer the actual data. Uploads are started with a POST request which returns a url that can be used
to push data and check upload status.

The Location header will be used to communicate the upload location after each request. While it won't change in the this specification, clients should use the most recent value returned by the API.

Starting An Upload

To begin the process, a POST request should be issued in the following format:

POST /v2/<name>/blobs/uploads/

The parameters of this request are the image namespace under which the layer will be linked. Responses to this request are covered below.

Existing Layers

The existence of a layer can be checked via a HEAD request to the blob store API. The request should be formatted as follows:

HEAD /v2/<name>/blobs/<digest>

If the layer with the tarsum specified in digest is available, a 200 OK response will be received, with no actual body content (this is according to http specification). The response will look as follows:

200 OK
Content-Length: <length of blob>

When this response is received, the client can assume that the layer is already available in the registry under the given name and should take no further action to upload the layer. Note that the binary digests may differ for the existing registry layer, but the tarsums will be guaranteed to match.

Uploading the Layer

If the POST request is successful, a 202 Accepted response will be returned with the upload URL in the Location header:

202 Accepted
Location: /v2/<name>/blobs/uploads/<uuid>
Range: bytes=0-<offset>
Content-Length: 0

The rest of the upload process can be carried out with the returned url, called the "Upload URL" from the Location header. All responses to the upload url, whether sending data or getting status, will be in this format. Though the URI format (/v2/<name>/blobs/uploads/<uuid>) for the Location header is specified, clients should treat it as an opaque url and should never try to assemble the it. While the uuid parameter may be an actual UUID, this proposal imposes no constraints on the format and clients should never impose any.

Upload Progress

The progress and chunk coordination of the upload process will be coordinated through the Range header. While this is a non-standard use of the Range header, there are examples of similar approaches in APIs with heavy use. For an upload that just started, for an example with a 1000 byte layer file, the Range header would be as follows:

Range: bytes=0-0

To get the status of an upload, issue a GET request to the upload URL:

GET /v2/<name>/blobs/uploads/<uuid>
Host: <registry host>

The response will be similar to the above, except will return 204 status:

204 No Content
Location: /v2/<name>/blobs/uploads/<uuid>
Range: bytes=0-<offset>

Note that the HTTP Range header byte ranges are inclusive and that will be honored, even in non-standard use cases.

Monolithic Upload

A monolithic upload is simply a chunked upload with a single chunk and may be favored by clients that would like to avoided the complexity of chunking. To carry out a "monolithic" upload, one can simply put the entire content blob to the provided URL:

PUT /v2/<name>/blobs/uploads/<uuid>?digest=<tarsum>[&digest=sha256:<hex digest>]
Content-Length: <size of layer>
Content-Type: application/octet-stream

<Layer Binary Data>

The "digest" parameter must be included with the PUT request. Please see the Completed Upload section for details on the parameters and expected responses.

Additionally, the download can be completed with a single POST request to the uploads endpoint, including the "size" and "digest" parameters:

POST /v2/<name>/blobs/uploads/?digest=<tarsum>[&digest=sha256:<hex digest>]
Content-Length: <size of layer>
Content-Type: application/octet-stream

<Layer Binary Data>

On the registry service, this should allocate a download, accept and verify the data and return the same response as the final chunk of an upload. If the POST request fails collecting the data in any way, the registry should attempt to return an error response to the client with the Location header providing a place to continue the download.

The single POST method is provided for convenience and most clients should implement POST + PUT to support reliable resume of uploads.

Chunked Upload

To carry out an upload of a chunk, the client can specify a range header and only include that part of the layer file:

PATCH /v2/<name>/blobs/uploads/<uuid>
Content-Length: <size of chunk>
Content-Range: <start of range>-<end of range>
Content-Type: application/octet-stream

<Layer Chunk Binary Data>

There is no enforcement on layer chunk splits other than that the server must receive them in order. The server may enforce a minimum chunk size. If the server cannot accept the chunk, a 416 Requested Range Not Satisfiable response will be returned and will include a Range header indicating the current status:

416 Requested Range Not Satisfiable
Location: /v2/<name>/blobs/uploads/<uuid>
Range: 0-<last valid range>
Content-Length: 0

If this response is received, the client should resume from the "last valid range" and upload the subsequent chunk. A 416 will be returned under the following conditions:

  • Invalid Content-Range header format
  • Out of order chunk: the range of the next chunk must start after the "last valid range" from the last response.

When a chunk is accepted as part of the upload, a 202 Accepted response will be returned, including a Range header with the current upload status:

202 Accepted
Location: /v2/<name>/blobs/uploads/<uuid>
Range: bytes=0-<offset>
Content-Length: 0
Completed Upload

For an upload to be considered complete, the client must submit a PUT request on the upload endpoint with a digest parameter. If it is not provided, the download will not be considered complete. The format for the final chunk will be as follows:

PUT /v2/<name>/blob/uploads/<uuid>?digest=<tarsum>[&digest=sha256:<hex digest>]
Content-Length: <size of chunk>
Content-Range: <start of range>-<end of range>
Content-Type: application/octet-stream

<Last Layer Chunk Binary Data>

Optionally, if all chunks have already been uploaded, a PUT request with a digest parameter and zero-length body may be sent to complete and validated the upload. Multiple "digest" parameters may be provided with different digests. The server may verify none or all of them but must notify the client if the content is rejected.

When the last chunk is received and the layer has been validated, the client will receive a 201 Created response:

201 Created
Location: /v2/<name>/blobs/<tarsum>
Content-Length: 0

The Location header will contain the registry URL to access the accepted layer file.

Digest Parameter

The "digest" parameter is designed as an opaque parameter to support verification of a successful transfer. The initial version of the registry API will support a tarsum digest, in the standard tarsum format. For example, a HTTP URI parameter might be as follows:

tarsum.v1+sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b

Given this parameter, the registry will verify that the provided content does result in this tarsum. Optionally, the registry can support other other digest parameters for non-tarfile content stored as a layer. A regular hash digest might be specified as follows:

sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b

Such a parameter would be used to verify that the binary content (as opposed to the tar content) would be verified at the end of the upload process.

For the initial version, registry servers are only required to support the tarsum format.

Canceling an Upload

An upload can be cancelled by issuing a DELETE request to the upload endpoint. The format will be as follows:

DELETE /v2/<name>/blobs/uploads/<uuid>

After this request is issued, the upload uuid will no longer be valid and the registry server will dump all intermediate data. While uploads will time out if not completed, clients should issue this request if they encounter a fatal error but still have the ability to issue an http request.

Errors

If an 502, 503 or 504 error is received, the client should assume that the download can proceed due to a temporary condition, honoring the appropriate retry mechanism. Other 5xx errors should be treated as terminal.

If there is a problem with the upload, a 4xx error will be returned indicating the problem. After receiving a 4xx response (except 416, as called out above), the upload will be considered failed and the client should take appropriate action.

The following table covers the various error conditions that may be returned after completing a layer upload:

Code Message
DIGEST_INVALID provided digest did not match uploaded content
SIZE_INVALID provided size did not match content size

Note that the upload url will not be available forever. If the upload uuid is unknown to the registry, a 404 Not Found response will be returned and the client must restart the upload process.

Pushing an Image Manifest

Once all of the layers for an image are uploaded, the client can upload the image manifest. An image can be pushed using the following request formats:

PUT /v2/<name>/manifests/<tag>

{
   "name": <name>,
   "tag": <tag>,
   "fsLayers": [
      {
         "blobSum": <tarsum>
      },
      ...
    ]
   ],
   "history": <v1 images>,
   "signature": <JWS>,
   ...
}

The name and tag fields of the response body must match those specified in the URL.

If there is a problem with pushing the manifest, a relevant 4xx response will be returned with a JSON error message. The following table covers the various error conditions and their corresponding codes:

Code Message
NAME_INVALID Manifest name did not match URI
TAG_INVALID Manifest tag did not match URI
MANIFEST_INVALID Returned when an invalid manifest is received
MANIFEST_UNVERIFIED Manifest failed signature validation
BLOB_UNKNOWN Referenced layer not available

For the UNKNOWN_LAYER error, the detail field of the error response will have an "unknown" field with information about the missing layer. For now, that will just be the tarsum. There will be an error returned for each unknown blob. The response format will be as follows:

{
    "errors:" [{
            "code": "UNKNOWN_LAYER",
            "message": "Referenced layer not available",
            "detail": {
                "unknown": {
                    "blobSum": <tarsum>
                 }
            }
        },
        ...
    ]
}

Listing Image Tags

It may be necessary to list all of the tags under a given repository. The tags for an image repository can be retrieved with the following request:

GET /v2/<name>/tags/list

The response will be in the following format:

200 OK
Content-Type: application/json

{
    "name": <name>,
    "tags": [
        <tag>,
        ...
    ]
}

For repositories with a large number of tags, this response may be quite large, so care should be taken by the client when parsing the response to reduce copying.

Deleting an Image

An image may be deleted from the registry via its name and tag. A delete may be issued with the following request format:

DELETE /v2/<name>/manifests/<tag>

If the image exists and has been successfully deleted, the following response will be issued:

202 Accepted
Content-Length: None

If the image had already been deleted or did not exist, a 404 Not Found response will be issued instead.

Roadmap

  • Write Registry REST API V2 proposal
    • Solicit feedback
  • Implement V2 API server
    • Basic Layer API
    • Basic Image API
    • Resumable upload support
  • Implement V2 API client
  • Implement API compliance tests
  • Port docker core to use client from registry project for v2 pushes

Reviewers

@thaJeztah
Copy link
Member

At a first glance, looks good! Will have a proper re-read at a later stage.

One thing I noticed are the proposed JSON error messages; perhaps they could be "namespaced" as well, by reversing the parts, ie

INVALID_TAG would become TAG_INVALID

Perhaps a more "rich" approach could be taken by combining "global" error-types with the namespace / object they affect, so that it is easier to handle. (e.g. format-error and tag)

Finally; the proposal describes returning a single error-code, which can be limiting. Being able to return multiple errors could offer more flexibility.

@stevvooe
Copy link
Contributor Author

stevvooe commented Nov 7, 2014

@thaJeztah Thank for the suggestion about namespacing the errors. I'll play around with it.

Do you have examples for the registry API use case where we'd like to see multiple errors returned? Or are you suggesting this as a measure of future-proofing? Either way, its a good suggestion.

@thaJeztah
Copy link
Member

Regarding the end points (first impression, will add more suggestions in a later stage); for consistency, a different approach could be taken;

endpoint description
/v2/<name>/ returns a list of all images in <name>
/v2/<name>/<image>/ returns a list of all tags available for <image>
/v2/<name>/<image>/<tag>/ returns the manifest of <tag>

This will make <tag> a "required" part of the URL to fetch a manifest. I think that's actually a good thing, because some images don't have a :latest tag. Making the <tag> required, will more clearly state what the intended manifest is.

@thaJeztah
Copy link
Member

@stevvooe I think the Twitter API does this, but it's not the best example of a good API https://dev.twitter.com/overview/api/response-codes I can try to find better examples, I know I saw some when doing some research for an API I was working on.

@thaJeztah
Copy link
Member

upload progress

Wondering if a separate endpoint/request is required to check upload progress. I'll need to dig a bit deeper into this for the technical side, but I think it would be possible to have the server respond with the current upload progress while uploading? For reference, see https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest/Using_XMLHttpRequest#Monitoring_progress and https://dvcs.w3.org/hg/progress/raw-file/tip/Overview.html

edit I wasn't thinking when adding those links, because those are pure client-side
implementations and require no feedback from the server.

Additionally, as @wking pointed out (#9015 (comment)), an endpoint is required for resuming
partial uploads.

@stevvooe
Copy link
Contributor Author

stevvooe commented Nov 7, 2014

@thaJeztah

Errors

I think the Twitter API does this, but it's not the best example of a good API https://dev.twitter.com/overview/api/response-codes I can try to find better examples, I know I saw some when doing some research for an API I was working on.

that is simple enough and a valid addition.

Tag API Layout

The current desire for this api is to continue to support the notion of a default tag for a repository, aka "latest", so we won't be able to repurpose the tag-less URI to list tags. We may want to reconsider that but it might be better not to overload that API method.

Based on your response, it also seems the contents of <name> should be clarified in the specification or the specification for image name needs to be referenced (does anyone know where?). The <name> component, for the purpose of this API, represents the full "image name" and the contents of the "name" field in the new image manifest format. For a user, it would something like stevvooe/hot-new-thing and for an image listed as ubuntu, it would be library/ubuntu.

This will make a "required" part of the URL to fetch a manifest. I think that's actually a good thing, because some images don't have a :latest tag. Making the required, will more clearly state what the intended manifest is.

As far as I understand, all images have a tag but if the tag is not specified, the default tag of "latest" is used. We may need to clarify the relationship between image, tag and repository.

Upload Progress

Upload progress is served via the GET method, while uploads are using PUT, so grabbing progress concurrently should be supported with separate requests. The progress will only be reported on the server after a chunk is accepted and only at the granularity of the chunk size. Otherwise, maintaining backend consistency of upload state would be problematic.

Keep in mind, the purpose of this feature is not to broadcast the upload progress to other consumers. Rather, the goal is manage resumable uploads. There is no reason one could not use this feature with the progress monitoring capability of XMLHttpRequest but extra work would be required if uploading multiple chunks.

@wking
Copy link

wking commented Nov 7, 2014

On Thu, Nov 06, 2014 at 03:39:35PM -0800, Stephen Day wrote:

Pulling a Layer

Pulling a layer is carried out by a standard http request. The URL is as follows:

GET /v2/<name>/layer/<tarsum>


This endpoint should support aggresive HTTP caching for image
layers. Support for etags, modification dates and other cache
control headers should be included. To allow for incremental
downloads, Range requests should be supported, as well.

If I understand correctly, this is just the (possibly compressed?
docker-archive/docker-registry#694) tarball with a layer's filesystem changes.
I don't see why you need etags, modification dates, etc. while caching
that. It should be immutable, content-addressable data, so anyone can
cache it wherever they like for as long as they want without fear of
their cached value going stale.

The only places where you'd want to limit caching via ETags,
Last-Modified, and the like would be for mutable data, and that's just
manifests, tag values, and tag lists.

It would be nice if there was a way to upload descriptions for the
search engine too, but maybe that's part of a different spec? Or part
of the image metadata?

@wking
Copy link

wking commented Nov 7, 2014

On Thu, Nov 06, 2014 at 05:16:34PM -0800, Stephen Day wrote:

This will make a "required" part of the URL to fetch a
manifest. I think that's actually a good thing, because some
images don't have a :latest tag. Making the required, will
more clearly state what the intended manifest is.

As far as I understand, all images have a tag but if the tag is not
specified, the default tag of "latest" is used. We may need to
clarify the relationship between image, tag and repository.

I don't know what the Git wire protocol looks like for this, but we
could follow their lead and have a configurable, per-repository
default tag that just defaults to ‘latest’ (like ‘HEAD’ defaulting to
‘master’). Then folks without a ‘latest’ tag can:

PUT /v2//image

which would upload the image (if it wasn't already uploaded) like:

PUT /v2//image/

but it would also set the default tag (to whichever tag was in the
uploaded image's metadata).

@wking
Copy link

wking commented Nov 7, 2014

On Thu, Nov 06, 2014 at 08:34:24PM -0800, W. Trevor King wrote:

I don't know what the Git wire protocol looks like for this, but we
could follow their lead and have a configurable, per-repository
default tag that just defaults to ‘latest’ (like ‘HEAD’ defaulting
to ‘master’). Then folks without a ‘latest’ tag can:

PUT /v2//image

And if you wanted to get really radical, you could have everyone do
this and drop explicit ‘latest’ tags entirely. Then you could have
immutable tags, and restrict the mutable information to just:

  • The default tag
  • The list of available tags

So folks performing an unqualified:

$ docker pull debian

would get library/debian:7.7 (or whatever the default tag was) without
the need for aliases (#8141).

On the other hand, you'd have to have separate names if you wanted
multiple mutable references (e.g. library/debian6 for the newest 6.x
release, and library/debian7 for the newest 7.x release). I'm not
sure how many docker repositories use multiple mutable references, so
I don't know whether the benefits to having immutable tags (where
library/debian6:6.0 always points to the same image, even if someone
pushes a library/debian6:6.0.10) outweigh the annoyance of names like
debian6 for those repositories.

@wking
Copy link

wking commented Nov 7, 2014

On Thu, Nov 06, 2014 at 05:16:34PM -0800, Stephen Day wrote:

Keep in mind, the purpose of this feature is not to broadcast the
upload progress to other consumers. Rather, the goal is manage
resumable uploads. There is no reason one could not use this feature
with the progress monitoring capability of XMLHttpRequest but
extra work would be required if uploading multiple chunks.

I haven't used XMLHttpRequest's ProgressEvents myself, but looking
over the suggested names 1, I get the impression that these are
purely client-side (loadstart = “I've started sending”, progress = “I
sent a chunk”), not server-generated progress updates. I don't think
that helps resumable uploads at all, because putting a chunk on the
wire doesn't mean the registry actually gets it. I agree with the
original spec that you need an independent way to request how much of
your previous upload (for session ) the registry received, so
you know where to start the next upload attempt.

@thaJeztah
Copy link
Member

I get the impression that these are purely client-side

Yes, you are right. I realised after posting that the examples
I gave were completely bogus.

Basically, what I wanted to link to, is an example where
the server would "stream" progress information back during
the upload, so that "polling" the server or opening a second
request to get that information wouldn't be necessary.

I'm not really sure if that's possible and the "resume" reason
Is something I didn't include in my consideration.

@thaJeztah
Copy link
Member

The current desire for this api is to continue to support the notion of a default tag for a repository, aka "latest"

But should this be a default that the repository uses, or the client that calls the repository? if I'm correct, currently the docker client automatically requests the :latest tag if no tag is specified, doing this at both places (client and repository) seems wrong. (And, as mentioned, not all images have a :latest tag?)

@ncdc
Copy link
Contributor

ncdc commented Nov 7, 2014

I don't like that GET /v2/<name>/image gives you back the information about the latest tag while GET /v2/<name>/image/<tag> gives you information about a specific tag. I am strongly in favor of eliminating the former route that defaults to latest and only having the latter where you must explicitly specify the tag you want to retrieve. If you want to default things to latest, that can be done in the clients.

@ncdc
Copy link
Contributor

ncdc commented Nov 7, 2014

Have you thought at all about how to implement quotas? What if I'm and admin and I want to limit each namespace to e.g. 1GB of unique layer content? Here's an example: my namespace is currently at 700MB and I'm pushing a new image/tag that has 400MB of unique layer content split between 2 layers (299MB and 101MB). First my client would push the 299MB layer (which keeps me under quota), then my client would attempt to push the 101MB layer (which should not be allowed because that would put me over quota). At this point, there's an orphaned 299MB layer that should be deleted. In an ideal case, the registry should never have allowed that layer to have been uploaded in the first place.

Would it be possible to take the overall size of the new layers into account at the beginning of a push?

@wking
Copy link

wking commented Nov 7, 2014

On Fri, Nov 07, 2014 at 06:57:32AM -0800, Andy Goldstein wrote:

If you want to default things to latest, that can be done in the clients.

I agree, unless we want folks to be able to configure the default tag
that gets pulled on a per-repo basis (setting it to things other than
‘latest’ 1, like you can set Git's HEAD). That would have to
happen in the registry repository.

@ncdc
Copy link
Contributor

ncdc commented Nov 7, 2014

I agree, unless we want folks to be able to configure the default tag
that gets pulled on a per-repo basis (setting it to things other than
‘latest’ [1], like you can set Git's HEAD). That would have to
happen in the registry repository.

That sounds fine to me.

Re [1], I'm not clear why you'd need multiple repos to support debian:6.0 and debian:7.0? And what's the motivation for immutable tags?

[1] #9015 (comment)

@wking
Copy link

wking commented Nov 7, 2014

On Fri, Nov 07, 2014 at 08:37:19AM -0800, Andy Goldstein wrote:

Re 1, I'm not clear why you'd need multiple repos to support
debian:6.0 and debian:7.0?

You don't need immutable tags with a configurable default branch, but
my next comment 1 explains how I think having a single mutable
default-tag reference with immutable tags covers most of the use-cases
I can think of for ‘latest’. However, if you only have one mutable
tag-reference per repo, you can't have something sliding for 6.x and
something else sliding for 7.x unless you have two repositories.

And what's the motivation for immutable tags?

Predictable results for a given tag, no need for alias fetching, easy
caching, and content-addressable storage with a fixed address.

@ncdc
Copy link
Contributor

ncdc commented Nov 7, 2014

Ah, I see what you're saying. But, I can definitely think of use cases where multiple mutable tags would be useful. For example, using tags to signify when an image is "QA-ready", what should be deployed to "staging" or "production", etc. I wouldn't want separate repos just to be able to have sliding tags for these different targets.

/cc @smarterclayton

@wking
Copy link

wking commented Nov 7, 2014

On Fri, Nov 07, 2014 at 08:53:53AM -0800, Andy Goldstein wrote:

But, I can definitely think of use cases where multiple mutable tags
would be useful. For example, using tags to signify when an image is
"QA-ready", what should be deployed to "staging" or "production",
etc. I wouldn't want separate repos just to be able to have sliding
tags for these different targets.

Right. Hence my 1:

“I'm not sure how many docker repositories use multiple mutable
references, so I don't know whether the benefits to having immutable
tags … outweigh the annoyance of names like debian6 for those
repositories.”

You could certainly have foo/bar-QA-ready, foo/bar-staging,
foo/bar-production, …. I don't even think it would be that hard to
maintain. You'd lose easy mass-push, but I doubt you'd be releasing
to multiple streams simultaneously unless you were populating a fresh
repository. What else would be more difficult with that workflow?

Still, immutable tags aren't that big a win. Folks who care about
predictable results from a given tag can just use the patch-level tags
and trust the maintainers not to mess with those ;). And a bit of
extra cache-checking to make sure you had a fairly recent version of
the tag isn't that hard to do.

“no need for alias fetching” 2 is actually a feature of having a
default-tag reference, so strike that from the list of benefits to
immutable tags.

@stevvooe
Copy link
Contributor Author

stevvooe commented Nov 7, 2014

Tags

@wking @ncdc @thaJeztah

We are going to drop the notion of default "latest" from the registry API and will leave that "sugar" to the client to resolve. Concretely, if a user only specifies "stevvooe/foo", it would be up to the client to fill in "latest" as the default tag.

While the rest of the discussion about tags (aliases, immutable vs mutable) is constructive and you all make excellent points, changes to the tagging scheme are outside of the scope of this proposal.

HTTP Caching

@wking

The only places where you'd want to limit caching via ETags,
Last-Modified, and the like would be for mutable data, and that's just
manifests, tag values, and tag lists.

This section is indicating the immutable nature of the layer files should be leveraged at the HTTP caching layer, allowing docker clients to make a quick determination about the existence of the layer with a 304 response. Everything that can be done will be done to ensure that HTTP standard clients (read: proxies) will cache the content.

Any other caching support for tags and manifests will be implemented as needed, depending on the nature of the resource.

Quotas

@ncdc

This is an interesting request but its outside of the scope for this first revision. Could you file a feature request issue in docker/docker-registry with the prefix "NG:"?

@wking
Copy link

wking commented Nov 7, 2014

On Fri, Nov 07, 2014 at 11:15:15AM -0800, Stephen Day wrote:

We are going to drop the notion of default "latest" from the
registry API and will leave that "sugar" to the client to resolve.

In that case I agree with @thaJeztah 1 and @ncdc 2 that we should
probably eliminate any mention of ‘latest’ from the spec.

While the rest of the discussion about tags (aliases, immutable vs
mutable) is constructive and you all make excellent points, changes
to the tagging scheme are outside of the scope of this proposal.

Fair enough ;).

The only places where you'd want to limit caching via ETags,
Last-Modified, and the like would be for mutable data, and that's
just manifests, tag values, and tag lists.

This section is indicating the immutable nature of the layer files
should be leveraged at the HTTP caching layer, allowing docker
clients to make a quick determination about the existence of the
layer with a 304 response. Everything that can be done will be done
to ensure that HTTP standard clients (read: proxies) will cache the
content.

Right. Are you going to use all of that for caching immutable
stuff? Can't you just set:

Expires: Fri, 1 Jan 2038 03:14:07 GMT

and be done with it? I don't see why you'd also want to set ETags,
Last-Modified, ….

@ncdc
Copy link
Contributor

ncdc commented Nov 7, 2014

@stevvooe re caching, if you don't take it into account up front, I'm worried that it won't be possible going forward, at least not the ideal case where you disallow any layer push if the sum of the layers in the "transaction" would put you over quota. @dmp42 what are your thoughts on this?

@stevvooe
Copy link
Contributor Author

stevvooe commented Nov 7, 2014

Changes:

  • added "errors" envelope to error responses and adjusted unknown layers response accordingly
  • dropped implicit "latest" tag from fall API methods
  • length and checksum submission moved to end of layer upload so client doesn't have to precompute
  • added the ability to explicitly cancel an upload

@ncdc
Copy link
Contributor

ncdc commented Nov 7, 2014

Added docker-archive/docker-registry#698 for tracking the quota request.

@stevvooe
Copy link
Contributor Author

stevvooe commented Nov 7, 2014

@ncdc Thank you for filing the issue!

I don't think there is anything within this proposal that prevents quotas from being implemented. Enforcement can be within these core API methods, but a management API could be added such that the client can avoid hitting those quotas before making uploads.

We'll take the full discussion to docker-archive/docker-registry#698.

@wking
Copy link

wking commented Nov 7, 2014

On Fri, Nov 07, 2014 at 12:00:20PM -0800, Stephen Day wrote:

  • length and checksum submission moved to end of layer upload so
    client doesn't have to precompute

You have to precomute the tarsum, so you might as well precompute
these while you're at it.

Also, the spec now has:

PUT PUT /v2//image/

which should just be:

PUT /v2//image/

Also, I think:

POST /v2//layer/

should be a PUT call, because you're pushing to the same URI you'll be
fetching from 1:

“The fundamental difference between the POST and PUT requests is
reflected in the different meaning of the Request-URI. The URI in a
POST request identifies the resource that will handle the enclosed
entity. … In contrast, the URI in a PUT request identifies the
entity enclosed with the request…

@stevvooe
Copy link
Contributor Author

stevvooe commented Nov 7, 2014

@wking Thank you again for your careful feedback! I'll make sure the typos are corrected.

You have to precomute the tarsum, so you might as well precompute these while you're at it.

This allows the client to be as lazy as possible.

Also, I think:

POST /v2//layer/

should be a PUT call, because you're pushing to the same URI you'll be
fetching from [1]:

“The fundamental difference between the POST and PUT requests is
reflected in the different meaning of the Request-URI. The URI in a
POST request identifies the resource that will handle the enclosed
entity. … In contrast, the URI in a PUT request identifies the
entity enclosed with the request…

From section 9.5:

The actual function performed by the POST method is determined by the
server and is usually dependent on the Request-URI. The posted entity
is subordinate to that URI in the same way that a file is subordinate
to a directory containing it, a news article is subordinate to a
newsgroup to which it is posted, or a record is subordinate to a
database

POST is used here because the resulting creation is subordinate to the layer
URI. Arguably, the following would be a better POST URI:

POST /v2/<name>/layer/<tarsum>/upload/

POST is also used here because the request is not idempotent, in that
multiple requests to the same endpoint will result in creating multiple
uploads. Use of PUT would be incorrect.

I'll add the "/upload/" suffix.

@nealmcb
Copy link
Contributor

nealmcb commented Feb 25, 2015

Re: searching for images by tag etc:

@dmp42
search will be implemented as an extension (see docker-archive/docker-registry#613 and docker-archive/docker-registry#687).

Indexing image name, tag, and creation date will certainly be part of it.

Thanks. The issues you reference are closed, but it isn't clear to me if the resolutions include the ability to search by tag. Does anyone have an update on this that responds e.g. to this query? http://stackoverflow.com/questions/24481564/how-can-i-find-docker-image-with-specific-tag-in-docker-registry-in-docker-comma
Cheers.

@dmp42
Copy link
Contributor

dmp42 commented Feb 25, 2015

Hi @nealmcb
These issues have been moved to their new home @ https://github.com/docker/distribution - specifically distribution/distribution#136 - although there is currently no specification for a new/revised search API.

Ideas/proposals are definitely welcome over there (docker/distribution).

@jessfraz jessfraz added Proposal kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny and removed Proposal labels Feb 28, 2015
@stevvooe
Copy link
Contributor Author

stevvooe commented Mar 5, 2015

@jfrazelle Should we close this as accepted? The api spec lives in distribution now: https://github.com/docker/distribution/blob/master/doc/spec/api.md. Should we backport this into the docker core docs?

@jessfraz
Copy link
Contributor

yes yayyyyy!

@thaJeztah
Copy link
Member

@grexe
Copy link

grexe commented Oct 2, 2015

I could not find an official (not even an unofficial;) JSON Schema for the new v2 Registry API.
This would make my live as a Java developer so much easier, because now I have to feed sample output from all REST calls to code-generators instead of using one canonical schema and deriving all POJOs from there...

@thaJeztah
Copy link
Member

@grexe could you open a feature request for that in the https://github.com/docker/distribution issue tracker? I know the specs are actually generated from code, perhaps there's even something there already. If in doubt, you can ask in the #docker-dev or #docker-distribution IRC channels

@grexe
Copy link

grexe commented Oct 2, 2015

wow @thaJeztah that was really fast! Did so, see distribution/distribution#1060. Now let's hope the implementation is also as fast,-)

@RichardScothern
Copy link
Contributor

@grexe : documentation and specs for the v2 registry live here: https://github.com/docker/distribution/tree/master/docs/spec

There is a code generator but it is written in go.

@grexe
Copy link

grexe commented Oct 2, 2015

thanks @RichardScothern but there is still no reference to a JSON schema, only canonicalization (which is not so important to me, personally,-).
But I have another question: is it planned to support creation of repositories (a PUT equivalent to GET _catalogs) as mentioned in the spec on listing repositories?

This would allow me to create separate repositories (e.g. per customer/realm/...) in my private registry from code, without having to shut down the entire registry and alter configuration by hand just to add a new repository...

@RichardScothern
Copy link
Contributor

You can create repositories by uploading an image and its layers using the REST API
https://github.com/docker/distribution/blob/master/docs/spec/api.md#pushing-an-image

@jlhawn
Copy link
Contributor

jlhawn commented Oct 2, 2015

I realized about a year ago that "JSON Schema" is actually a draft-standard for specifying the structure of JSON objects/types used by your API and is not just examples of JSON forms/responses.

https://en.wikipedia.org/wiki/JSON#JSON_Schema
http://json-schema.org/

@grexe
Copy link

grexe commented Oct 2, 2015

Thanks again @RichardScothern it was not obvious to me that a new repository can be created just by specifying a non-existant name in the PUSH URI, but it's really there: completed upload specifically says that

The Location header will contain the registry URL to access the accepted layer file.

Seems to be exactly what I need, perfect, thanks!

@grexe
Copy link

grexe commented Oct 2, 2015

exactly @jlhawn, just stumbled over another snag where a Boolean was not correctly identified by a mapper because my sample output was not sufficient (String vs. Boolean (even vs.Integer)) was not possible to distinguish from the output).

@thaJeztah
Copy link
Member

@jlhawn @grexe @RichardScothern docker compose recently added a schema for validation (docker/compose#2089). Plans are to use the same schema in libcompose docker/libcompose#34.

Just linking these to prevent duplicated work / research :-)

@stevvooe
Copy link
Contributor Author

Please don't comment on closed tickets.

stevvooe added a commit to stevvooe/distribution-spec that referenced this issue Apr 6, 2018
As a baseline for the new registry API specification, we are checking in the
proposal as currently covered in moby/moby#9015. This will allow us to
trace the process of transforming the proposal into a specification. The goal
is to use api descriptors to generate templated documentation into SPEC.md. The
resulting product will be submitted into docker core as part of the client PR.
@thaJeztah
Copy link
Member

👆 reported account for abuse (spam activity)

Let me lock the conversation on this ticket

@moby moby locked as spam and limited conversation to collaborators Sep 30, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny
Projects
None yet
Development

No branches or pull requests

18 participants