Skip to content

Commit

Permalink
Merge branch 'ingest_doc' into 2.6
Browse files Browse the repository at this point in the history
  • Loading branch information
jblomer committed Mar 25, 2019
2 parents a0fe056 + 9fd53ad commit 2c31575
Show file tree
Hide file tree
Showing 3 changed files with 220 additions and 0 deletions.
191 changes: 191 additions & 0 deletions cpt-ducc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
.. _cpt_ducc:

==================================================
Working with DUCC and Docker Images (Experimental)
==================================================

DUCC (Daemon that Unpacks Container Images into CernVM-FS) helps in publishing
container images in CernVM-FS. The daemon publishes images in their extracted
form in order for clients to benefit from CernVM-FS' on-demand loading of files.
The DUCC service is deployed as an extra package and supposed to be co-located
with a publisher node having the ``cvmfs-server`` package installed.

Converted images are usable with Docker through the :ref:`CernVM-FS docker graph
driver <cpt_graphdriver>` and with container engines that can use a flat root
file system from CernVM-FS such as Singularity and runc. For use with Docker,
DUCC will upload a so-called "thin image" to the registry for every converted
image. Only the thin image makes an image available through CernVM-FS.

Vocabulary
==========

The following section introduces the terms used in the context of DUCC
publishing container images.

**Registry** A Docker image registry such as:

* https://registry.hub.docker.com
* https://gitlab-registry.cern.ch

**Image Repository** This specifies a group of images. Each image in an image
repository is addressed by tag or by digest. Examples are:

* library/redis
* library/ubuntu

The term **image repository** is unrelated to a CernVM-FS repository.

**Image Tag** An image tag identifies an image inside an image repository.
Tags are mutable and may refer to different container images over time.
Examples are:

* 4
* 3-alpine

**Image Digest** A digest is an immutable identifier for a container image.
Digests are calculated based on the result of a hash function to the content of
the image. Examples are:

* sha256:2aa24e8248d5c6483c99b6ce5e905040474c424965ec866f7decd87cb316b541
* sha256:d582aa10c3355604d4133d6ff3530a35571bd95f97aadc5623355e66d92b6d2c


To uniquely identify an image, we need to provide:
1. registry
2. image repository
3. image tag or image digest (or both)

We use a slash (`/`) to separate the `registry` from the `repository`, a
colon (`:`) to separate the `repository` from the `tag` and the at (`@`) to
separate the `digest` from the tag or from the `repository`. The syntax is

::

REGISTRY/REPOSITORY[:TAG][@DIGEST]

Examples of fully identified images are:

* https://registry.hub.docker.com/library/redis:4
* https://registry.hub.docker.com/minio/minio@sha256:b1e5dd4a7be831107822243a0675ceb5eabe124356a9815f2519fe02beb3f167
* https://registry.hub.docker.com/wurstmeister/kafka:1.1.0@sha256:3a63b48894bce633fb2f0d2579e162163367113d79ea12ca296120e90952b463


**Thin Image** A Docker image that contains only a reference to the image
contents in CernVM-FS. Requires the CernVM-FS Docker graph driver in order to
start.


Image Whish List
=================

The user specifices the set of images supposed to be published on CernVM-FS
in the form of a whish list. The whish list consists of triplets of input image,
the output thin image and the cvmfs destination repository for the unpacked
data.

::

wish => (input_image, output_thin_image, cvmfs_repository)

The input image in your wish should unambigously specify an image as decribed
above.


Whish List Syntax v1
********************

The whish list is provided as YAML file. An example of a whish list containing
four images is show below.

::

version: 1
user: smosciat
cvmfs_repo: unpacked.cern.ch
output_format: '$(scheme)://registry.gitlab.cern.ch/thin/$(image)'
input:
- 'https://registry.hub.docker.com/econtal/numpy-mkl:latest'
- 'https://registry.hub.docker.com/agladstein/simprily:version1'
- 'https://registry.hub.docker.com/library/fedora:latest'
- 'https://registry.hub.docker.com/library/debian:stable'

**version**: whish list version; at the moment only `1` is supported.

**user**: the account that will push the thin images into the docker registry.
The password must be stored in the ``DOCKER2CVMFS_DOCKER_REGISTRY_PASS``
environment variable.

**cvmfs_repo**: the target CernVM-FS repository to store the layers and the
flat root file systems.

**output_format**: how to name the thin images. It accepts a few variables that
reference to the input image.

* $(scheme), the image url protocol, most likely `http` or `https`

* $(registry), the Docker registry of the input image, in the case of the
example it would be `registry.hub.docker.com`

* $(repository), the image repository of the input image, like
`library/ubuntu` or `atlas/athena`

* $(tag), the tag of the image, which could be `latest`, `stable` or
`v0.1.4`

* $(image), combines $(repository) and $(tag)

**input**: list of docker images to convert

The current whish list format reauires all the images to be stored in the same
CernVM-FS repository and have the same thin output image format.

DUCC Commands
=============

DUCC supports the following commands.

convert
*******

The `convert` command provides the core functionality of DUCC:

::

ducc convert whishlist.yaml


where `whishlist.yaml` is the path of a whish list file.

This command will try to ingest all the specified images into CernVM-FS.

The process consists of downloading the manifest of the image, downloading
and ingesting the layers that compose each image, uploading the thin image,
creating the flat root file system necessary to work with Singularity and
writing DUCC specific metadata in the CernVM-FS repository next to the unpacked
image data.

The layers are stored in the `.layer` subdirectory in the CernVM-FS repository,
while the flat root file systems are stored in the `.flat` subdirectory.

loop
****

The `loop` comman continously executes the `convert` command. For each
iteration, the whish list file is read again in order to pick up changes.

::

ducc loop recipe.yaml



Incremental Conversion
======================

The `convert` command will extract image contents into CernVM-FS only where
necessary. In general, some parts of the wish list will be already converted
while others will need to be converted ex-novo.

An image that has been already unpacked in CernVM-FS will be skipped. For
unconverted images, only the missing layers will be unpacked.

28 changes: 28 additions & 0 deletions cpt-repo.rst
Original file line number Diff line number Diff line change
Expand Up @@ -558,6 +558,34 @@ for the currently active repository state, like
attr -g revision /cvmfs/cernvm-prod.cern.ch


.. _sct_tarball:

Tarball Publishing
~~~~~~~~~~~~~~~~~~

Tarballs can be directly published in a repository without the need to extract
them first. The ``ingest`` command can be used to publish the contents of a
tarball at a given subdirectory:

::

cvmfs_server ingest --tar_file <tarball.tar> --base_dir <path/where/extract/> <repository name>

The optional ``--catalog`` switch of the ``ingest`` command is used to
automatically create a nested file catalog at the base directory where the
tarball is extracted. (See :ref:`sct_nestedcatalogs`)

The ``ingest`` command can also be used for the reverse operation of recursively
removing a directory tree:

::

cvmfs_server ingest --delete <path/to/delete> <repository name>

The ``ingest`` command internally opens and closes a transaction. Therefore,
it can only run if no other transactions are currently open.


.. _sct_grafting:

Grafting Files
Expand Down
1 change: 1 addition & 0 deletions part-advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Advanced Topics
cpt-tracer
cpt-hpc
cpt-graphdriver
cpt-ducc
cpt-xcache
cpt-large-scale
cpt-shrinkwrap
Expand Down

0 comments on commit 2c31575

Please sign in to comment.