Permalink
Cannot retrieve contributors at this time
Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign up
Fetching contributors…
| --- | |
| title: "Building a Docker Registry" | |
| slug: "building-a-docker-registry" | |
| publishedAt: "Sep 3rd, 2016" | |
| status: "draft" | |
| --- | |
| Containers are surging right now. This series of blog posts will | |
| explore a small corner of that universe by building a Docker Registry | |
| that adheres to the [Docker Registry HTTP V2 API][v2-api]. The | |
| information contained in these posts will take a conceptual approach | |
| rather than a step-by-step approach. The code will be available in | |
| full on GitHub, as the [Superhuman Registry][superhuman-registry]. | |
| # Intro | |
| For this project we'll use Haskell as the implementation language so | |
| that we can use Servant. | |
| > Servant is a set of packages for declaring web APIs at the | |
| > type-level and then using those API specifications to: | |
| * write servers (this part of servant can be considered a web | |
| framework), | |
| * obtain client functions (in Haskell), | |
| * generate client functions for other programming languages, | |
| * generate documentation | |
| Servant allows us to specify everything from request bodies to Headers | |
| at the type level, which will help us be explicit as we explore | |
| Manifests, Tags and Digests. Since the purpose of this set of articles | |
| is informative, the types will help ground our conversations. | |
| # Getting Started | |
| Firstly, we'll need a new Haskell project. [Stack][stack] provides | |
| nice templating functionality, so we'll use that to scaffold a new | |
| project. | |
| ```haskell | |
| stack new servant sr --resolver nightly-2016-07-31 | |
| ``` | |
| Since we aren't focusing on Servant itself for this series, we'll | |
| skip a bunch of the boilerplate and backing code to focus in on the | |
| handlers and business logic. The code for this section *is* | |
| [on GitHub][lib.hs-1] for those that want to investigate further. | |
| ## Routes | |
| One of the benefits of working from a spec is that there are a full | |
| set of routes already penned out for us to implement so we can achieve | |
| compatibility with the wider ecosystem of tools, such as the Docker | |
| Engine. | |
| To start, we'll translate the routes pretty loosely. Then we'll go | |
| back and fill in the return types as we write each of the route | |
| handlers. Translating the [V2 API][v2-api] into types looks like the | |
| following. | |
| ```haskell | |
| type Head = Verb 'HEAD 200 | |
| type V2Base = "v2" :> Get '[JSON] (Headers '[ | |
| Header "Docker-Distribution-API-Version" String | |
| ] NoContent) | |
| -- | Main API Type | |
| type API = V2Base :<|> "v2" :> V2API | |
| -- | V2 API Definition | |
| type V2API = Metadata | |
| :<|> "_catalog" :> Get '[JSON] NoContent | |
| type Tags = "tags" :> "list" :> Get '[JSON] NoContent | |
| type Metadata = Capture "name" Name :> ( | |
| Tags :<|> | |
| "manifests" :> Manifests :<|> | |
| "blobs" :> Blobs | |
| ) | |
| type Blobs = Digests :<|> Upload | |
| type Manifests = Capture "reference" Ref :> ( | |
| Get '[JSON] NoContent :<|> | |
| Put '[JSON] NoContent :<|> | |
| Delete '[JSON] NoContent :<|> | |
| Head '[JSON] NoContent | |
| ) | |
| type Digests = Capture "digest" Digest :> ( | |
| Head '[JSON] NoContent :<|> | |
| Get '[JSON] NoContent :<|> | |
| Delete '[JSON] NoContent | |
| ) | |
| type Upload = "uploads" :> ( | |
| Post '[JSON] NoContent :<|> | |
| Capture "uuid" UUID :> ( | |
| Get '[JSON] NoContent :<|> | |
| Patch '[JSON] NoContent :<|> | |
| Put '[JSON] NoContent :<|> | |
| Delete '[JSON] NoContent | |
| ) | |
| ) | |
| ``` | |
| This produces a set of routes that lay out as follows: | |
| ``` | |
| / | |
| └─ v2/ | |
| ├─• | |
| ┆ | |
| ┆ | |
| ├─ <capture>/ | |
| │ ├─ blobs/ | |
| │ │ ├─ <capture>/ | |
| │ │ │ ├─• | |
| │ │ │ ┆ | |
| │ │ │ ├─• | |
| │ │ │ ┆ | |
| │ │ │ └─• | |
| │ │ ┆ | |
| │ │ └─ uploads/ | |
| │ │ ├─• | |
| │ │ ┆ | |
| │ │ ┆ | |
| │ │ └─ <capture>/ | |
| │ │ ├─• | |
| │ │ ┆ | |
| │ │ ├─• | |
| │ │ ┆ | |
| │ │ ├─• | |
| │ │ ┆ | |
| │ │ └─• | |
| │ ├─ manifests/ | |
| │ │ └─ <capture>/ | |
| │ │ ├─• | |
| │ │ ┆ | |
| │ │ ├─• | |
| │ │ ┆ | |
| │ │ ├─• | |
| │ │ ┆ | |
| │ │ └─• | |
| │ └─ tags/ | |
| │ └─ list/ | |
| │ └─• | |
| ┆ | |
| └─ _catalog/ | |
| └─• | |
| ``` | |
| This matches up with the spec quite well and gives us a nice base to | |
| start writing more specific code without worrying about whether we'll | |
| miss a route. | |
| ## The Types | |
| If we take a closer look at the types we just wrote out we see a bunch | |
| of concepts including `Name`, `Tags`, `Manifests`, `Blobs`, and | |
| `Digests`. Interestingly, we don't see an `Image` or `Container` | |
| anywhere. | |
| ### Name | |
| We use `Name` to represent an repository name. `Name`s must adhere to | |
| a specific regex (`[a-z0-9]+(?:[._-][a-z0-9]+)*`) and be less than 256 | |
| characters. In plain english from the spec: | |
| > A repository name is broken up into path components. A component of | |
| > a repository name must be at least one lowercase, alpha-numeric | |
| > characters, optionally separated by periods, dashes or | |
| > underscores. | |
| ### Tags | |
| `Tags` are strings that reference images. For example, if we were | |
| using `debian` and wanted to only use the tag `jessie`, we could pull | |
| using the format `debian:jessie`. | |
| ### Manifests | |
| An image manifest provides a configuration and a set of | |
| layers for a container image. It looks like the following JSON: | |
| ```javascript | |
| { | |
| "schemaVersion": 2, | |
| "mediaType": "application/vnd.docker.distribution.manifest.v2+json", | |
| "config": { | |
| "mediaType": "application/vnd.docker.container.image.v1+json", | |
| "size": 7023, | |
| "digest": "sha256:b5b2b2c507a0944348e0303114d8d93aaaa081732b86451d9bce1f432a537bc7" | |
| }, | |
| "layers": [ | |
| { | |
| "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", | |
| "size": 32654, | |
| "digest": "sha256:e692418e4cbaf90ca69d05a66403747baa33ee08806650b51fab815ad7fc331f" | |
| }, | |
| { | |
| "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", | |
| "size": 16724, | |
| "digest": "sha256:3c3a4604a545cdc127456d94e421cd355bca5b528f4a9c1905b15da2eb4a4c6b" | |
| }, | |
| { | |
| "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", | |
| "size": 73109, | |
| "digest": "sha256:ec4b8955958665577945c89419d1af06b5f7636b4ac3da7f12184802ad867736" | |
| } | |
| ], | |
| } | |
| ``` | |
| ### Blobs & Digests | |
| Layers are stored in the blob portion of the registry, keyed by | |
| digest. | |
| ## Our First Handler | |
| The first route we'll look at implementing is also the most | |
| simple. It's the route that lets clients know that this registry | |
| implements the V2 APIs. | |
| The type of the `/v2` route is | |
| ```haskell | |
| type V2Base = "v2" :> Get '[JSON] (Headers '[ | |
| Header "Docker-Distribution-API-Version" String | |
| ] NoContent) | |
| ``` | |
| Which breaks down to a `GET` request with an `application/json` | |
| content type. The response has a single header, | |
| `Docker-Distribution-API-Version`, which is what lets a client know | |
| which API version our registry implements. We also send back no body | |
| content. Finally, we can dig a bit deeper into `Get`, which is a type | |
| alias for `Verb 'GET 200`. This tells us that a successful response | |
| will have a 200 code. | |
| The only other valid codes for this route `401 Unauthorized` and `429 | |
| Too Many Requests` but since we haven't implemented authorization or | |
| rate-limiting, we'll skip that for now. | |
| ```haskell | |
| v2 :: App (Headers '[Header "Docker-Distribution-API-Version" String] NoContent) | |
| v2 = do | |
| $(logTM) InfoS "registry/2.0" | |
| return $ addHeader "registry/2.0" NoContent | |
| ``` | |
| Our logging is pretty basic right now. We'll worry about bulking it up | |
| later. For now, we're going to leave the default Katip stdout which | |
| leaves us with time, loglevel, hostname (container id), thread id and | |
| source location: | |
| ``` | |
| [2016-08-08 21:47:50][superhuman-registry][Info][85f79bec070f][33][ThreadId 11][sr-0.1.0.0-61fGnb6tOFbKD5fzBNSxKr:Lib src/Lib.hs:63:5] registry/2.0 | |
| ``` | |
| # Dealing with Layers | |
| The primary purpose of a Registry is to store layers and manifests so | |
| a client (such as a Docker Engine) can pull images. We'll avoid | |
| supporting legacy versions of the registry for security and simplicity | |
| reasons, which means our registry will only work for docker 1.10 and | |
| above. Benefits of this include not having to rewrite v2 manifests | |
| into the v1 format. | |
| ## Docker Client | |
| We need to figure out what the docker is doing on a push. Since I'm | |
| running Docker for Mac, booting a server to act as a registry is | |
| pretty simple. We'll use `nc` for a first attempt. | |
| ```shell | |
| docker run -itp 9000:9000 alpine nc -l 9000 | |
| ``` | |
| Now that we have a server acting as a "registry", we need to tag and | |
| push an image to it. | |
| ```shell | |
| > docker tag hello-world localhost:9000/hello-world | |
| > docker push localhost:9000/hello-world | |
| The push refers to a repository [localhost:9000/hello-world] | |
| Put http://localhost:9000/v1/repositories/hello-world/: EOF | |
| ``` | |
| Great! Our server is listening and the engine is pushing to the right | |
| place. If it wasn't, we could've seen something like this: | |
| ```shell | |
| Put http://localhost:9000/v1/repositories/hello-world/: read tcp | |
| [::1]:56492->[::1]:9000: read: connection reset by peer | |
| ``` | |
| There's a problem though, `nc` doesn't implement the `/v2/` endpoint, | |
| so the docker client falls back to v1 of the api. Luckily, we've | |
| implemented the v2 endpoint already so we'll skip netcat and jump back | |
| into Haskell. | |
| We can use `Wai.Middleware.RequestLogger` to log out everything docker | |
| tries to do to our registry. Using docker-compose to boot up our | |
| registry and re-attempting the push yields: | |
| ``` | |
| api_1 | GET /v2/ | |
| api_1 | Accept: | |
| api_1 | Status: 200 OK 0.000190373s | |
| api_1 | Prelude.undefined | |
| api_1 | CallStack (from HasCallStack): | |
| api_1 | error, called at libraries/base/GHC/Err.hs:79:14 in base:GHC.Err | |
| api_1 | undefined, called at src/SR/Blobs.hs:26:14 in sr-0.1.0.0-5isXdkrmBvbJTFdJY734op:SR.Blobs | |
| api_1 | Prelude.undefined | |
| api_1 | CallStack (from HasCallStack): | |
| api_1 | error, called at libraries/base/GHC/Err.hs:79:14 in base:GHC.Err | |
| api_1 | undefined, called at src/SR/Blobs.hs:26:14 in sr-0.1.0.0-5isXdkrmBvbJTFdJY734op:SR.Blobs | |
| ``` | |
| From the information, we see that the `/v2/` route is working as | |
| expected, but we hit `undefined` at `src/SR/Blobs.hs:26:14`, which is | |
| totally expected because we haven't implemented `uploadBlob` | |
| yet. Notice that the engine retries the upload request. | |
| If the route didn't exist, we would have seen a 404 in the logs. | |
| ``` | |
| api_1 | GET /v2/ | |
| api_1 | Accept: | |
| api_1 | Status: 200 OK 0.011277359s | |
| api_1 | POST /v2/hello-world/blobs/uploads/ | |
| api_1 | Accept: | |
| api_1 | Status: 404 Not Found 0.000028066s | |
| ``` | |
| This matches with what we know about the | |
| [upload process](https://github.com/docker/distribution/blob/dea554fc7cce2f2e7af5b1e1d38e28c5e96e1d9e/docs/spec/api.md#starting-an-upload). | |
| ## uploadBlob | |
| We can throw a couple print statements in to replace the undefined as | |
| such: | |
| ```haskell | |
| uploadBlob :: Namespace -> Name -> App NoContent | |
| uploadBlob namespace' name' = do | |
| liftIO $ print namespace' | |
| liftIO $ print name' | |
| return NoContent | |
| ``` | |
| Which will yield us some progress when trying to push. | |
| ``` | |
| api_1 | GET /v2/ | |
| api_1 | Accept: | |
| api_1 | Status: 200 OK 0.004299263s | |
| api_1 | Namespace "lib" | |
| api_1 | Name "hello-world" | |
| api_1 | POST /v2/lib/hello-world/blobs/uploads/ | |
| api_1 | Accept: | |
| api_1 | Status: 200 OK 0.000105174s | |
| ``` | |
| This is good progress, but we clearly have some issues since the | |
| docker engine is still retrying the endpoint. | |
| There are two approaches to blob upload | |
| [monolithic][v2-monolithic-upload] and | |
| [resumeable][v2-resumeable-upload]. The docs for | |
| `/v2/<name>/blobs/uploads` detail that the digest query param is the | |
| differentiator between monolithic and resumable upload. | |
| > Initiate a resumable blob upload. If successful, an upload location | |
| > will be provided to complete the upload. Optionally, if the digest | |
| > parameter is present, the request body will be used to complete the | |
| > upload in a single request. | |
| Let's take a look at an implementation for the `uploadBlob` | |
| (`<>/blobs/uploads`) route. We modifiy the type to reflect the various | |
| headers and response codes (docker engine is a picky client). All of | |
| the relevant information is communicated through headers, so we return | |
| `NoContent` as well. `PostAccepted` is a shortcut for `202` responses. | |
| ```haskell | |
| PostAccepted '[JSON] (Headers '[ | |
| Header "Location" URI, | |
| Header "Range" String, | |
| Header "Docker-Upload-UUID" UUID | |
| ] NoContent) | |
| ``` | |
| Now the handler code. We generate a new uuid to send back in the | |
| response. Our first go is just trying to get the docker client to | |
| continue to the next request but in the future we should do something | |
| with the uuid so we can respond to status requests. `uploadAPI` might | |
| look scary, but it's just specifying the route we want to generate for | |
| the `Location` header. We do this so that Servant will automatically | |
| check that the route is valid for the `api` we are serving and we get | |
| a compile error if it doesn't typecheck. | |
| We add 3 headers, setting the Range to `"0-0"` because we are only | |
| responding to resumable upload requests for now. (Otherwise we'd have | |
| to handle the case of an extra query string parameter). Once we | |
| generate the `Location` and the `Docker-Upload-UUID`, we send them | |
| back so the docker engine can start uploading blobs at the specified | |
| `Location`. | |
| ```haskell | |
| uploadBlob :: Namespace -> Name -> App (Headers '[ | |
| Header "Location" URI, | |
| Header "Range" String, | |
| Header "Docker-Upload-UUID" UUID | |
| ] NoContent) | |
| uploadBlob namespace' name' = do | |
| uuid <- liftIO $ nextRandom | |
| let uploadAPI = Proxy :: Proxy ("v2" :> Capture "namespace" Namespace :> Capture "name" Name :> "blobs" :> "uploads" :> Capture "uuid" UUID :> Put '[JSON] NoContent) | |
| mkURI = safeLink api uploadAPI | |
| uri = mkURI namespace' name' uuid | |
| response = addHeader uri | |
| $ addHeader "0-0" | |
| $ addHeader uuid NoContent | |
| $(logTM) InfoS (logStr $ show $ getHeaders response) | |
| return response | |
| ``` | |
| We also need a couple instances which allow us to render types like | |
| `UUID` into path components and headers. (note: these are orphan | |
| instances, but we could fix that by using a newtype and declaring the | |
| instances for the newtypes instead). | |
| ```haskell | |
| instance FromHttpApiData UUID where | |
| parseUrlPiece text = case (fromText text) of | |
| Nothing -> Left $ T.append "Invalid UUID" text | |
| Just uuid -> Right uuid | |
| instance ToByteString URI where | |
| builder = lazyByteString . pack . show | |
| instance ToHttpApiData UUID where | |
| toUrlPiece = toText | |
| toHeader = toASCIIBytes | |
| instance ToByteString UUID where | |
| builder = lazyByteString . toLazyASCIIBytes | |
| ``` | |
| We push again to test the route | |
| ``` | |
| docker push localhost:9000/lib/hello-world | |
| ``` | |
| And voilà, we get the desired effect. The docker engine accepts the | |
| UUID and tries to upload blobs to `PATCH | |
| /v2/lib/hello-world/blobs/uploads/v2/lib/hello-world/blobs/uploads/aeab6f5e-4b80-4c7b-9027-616b1cbe6a55`. That's | |
| totally not the right URI though. We've accidentally used a relative | |
| URI in our `Location` header. We'll fix that though :) | |
| ``` | |
| GET /v2/ | |
| Accept: | |
| Status: 200 OK 0.000458567s | |
| [2016-08-30 18:12:49][superhuman-registry][Info][b4fa86e02706][6258][ThreadId 15][sr-0.1.0.0-4uXdy03mbpG98AEB4xHfiO:SR.Blobs src/SR/Blobs.hs:47:5] [("Location","v2/lib/hello-world/blobs/uploads/aeab6f5e-4b80-4c7b-9027-616b1cbe6a55"),("Range","0-0"),("Docker-Upload-UUID","aeab6f5e-4b80-4c7b-9027-616b1cbe6a55")] | |
| POST /v2/lib/hello-world/blobs/uploads/ | |
| Accept: | |
| Status: 202 Accepted 0.000303745s | |
| PATCH /v2/lib/hello-world/blobs/uploads/v2/lib/hello-world/blobs/uploads/aeab6f5e-4b80-4c7b-9027-616b1cbe6a55 | |
| Accept: | |
| Status: 404 Not Found 0.000041239s | |
| ``` | |
| ## patchBlob | |
| The next route, as shown in the logs above, is the `PATCH` to the | |
| `Location` header we sent back down. The type for the `PATCH` route | |
| changes to: | |
| ```haskell | |
| ReqBody '[OctetStream] ByteString :> | |
| Header "range" String :> | |
| PatchNoContent '[JSON] (Headers '[ | |
| Header "Location" URI, | |
| Header "Range" String, | |
| Header "Docker-Upload-UUID" UUID | |
| ] NoContent) | |
| ``` | |
| We need to accept and echo back the `Range` header, while the request | |
| body comes in as an `OctetStream`. We take this information and just | |
| write out the `OctetStream` to a file for now. | |
| ```haskell | |
| patchBlob :: Namespace | |
| -> Name | |
| -> UUID | |
| -> ByteString | |
| -> Maybe String | |
| -> App (Headers '[ | |
| Header "Location" URI, | |
| Header "Range" String, | |
| Header "Docker-Upload-UUID" UUID | |
| ] NoContent) | |
| patchBlob namespace' name' uuid' blob range' = do | |
| liftIO $ Data.ByteString.writeFile ("./tmp/" ++ toString uuid') blob | |
| response <- mkHeaders range' uuid' namespace' name' | |
| return response | |
| ``` | |
| With this code (and another upload attempt from the engine), we can | |
| see that the next request is a `PUT`, which indicates the last request | |
| for this layer. | |
| ``` | |
| GET /v2/ | |
| Accept: | |
| Status: 200 OK 0.005978465s | |
| [2016-09-03 20:21:39][superhuman-registry][Info][b4fa86e02706][130][ThreadId 14][sr-0.1.0.0-7Q5s7SCyVcbA5o0UAD7J0W:SR.Blobs src/SR/Blobs.hs:74:5] [("Location","http://localhost:9000/v2/lib/hello-world/blobs/uploads/f525cc29-b588-417b-aac2-85c5752ce07b"),("Range","0-0"),("Docker-Upload-UUID","f525cc29-b588-417b-aac2-85c5752ce07b")] | |
| POST /v2/lib/hello-world/blobs/uploads/ | |
| Accept: | |
| Status: 202 Accepted 0.000442272s | |
| PATCH /v2/lib/hello-world/blobs/uploads/f525cc29-b588-417b-aac2-85c5752ce07b | |
| Accept: | |
| Status: 204 No Content 0.012519558s | |
| PUT /v2/lib/hello-world/blobs/uploads/f525cc29-b588-417b-aac2-85c5752ce07b | |
| Params: [("digest","sha256:a9d36faac0fe2a855f798346f33bd48917bf3af9b6e4b77870ef8862fee8a8a3")] | |
| Accept: | |
| Status: 200 OK 0.000077408s | |
| ``` | |
| [docs](https://github.com/docker/distribution/blob/41f383fb9a3b4e3ff428a92db4f7836f8053058b/docs/spec/api.md#patch-blob-upload) | |
| [v2-api]: https://github.com/docker/distribution/blob/bfa0a9c0973b5026d2e942dec29115c120e7f731/docs/spec/api.md | |
| [v2-monolithic-upload]: https://github.com/docker/distribution/blob/b1b100cf011b037b8821e8d0ae4f5ab3e2222c48/docs/spec/api.md#initiate-monolithic-blob-upload | |
| [v2-resumeable-upload]: https://github.com/docker/distribution/blob/b1b100cf011b037b8821e8d0ae4f5ab3e2222c48/docs/spec/api.md#initiate-resumable-blob-upload | |
| [superhuman-registry]: https://github.com/ChristopherBiscardi/superhuman-registry | |
| [stack]: https://www.haskellstack.org/ | |
| [lib.hs-1]: https://github.com/ChristopherBiscardi/superhuman-registry/blob/861b20d317132d3ea43dc05cc03d507ca325d3e0/src/Lib.hs |