Skip to content

Commit

Permalink
ovl: overlayfs snapshots documentation
Browse files Browse the repository at this point in the history
Document the overlayfs snapshots feature.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
  • Loading branch information
amir73il committed Dec 8, 2016
1 parent b934af8 commit 89b2801
Showing 1 changed file with 221 additions and 0 deletions.
221 changes: 221 additions & 0 deletions Documentation/filesystems/overlayfs-snapshots.txt
@@ -0,0 +1,221 @@
Written by: Amir Goldstein

See Documentation/filesystems/overlayfs.txt for required background.

Overlayfs Snapshots
===================

This document describes the overlayfs snapshots feature.

Snapshot overlay
----------------

A 'snapshot overlay' may be thought of as a 'reverse overlay'.
It looks exactly like a regular overlay mount with one 'lower' layer
and one 'upper' layer, combined into a unified view, e.g.:

mount -t overlay snap0 -olowerdir=/lower,upperdir=/upper/0,\
workdir=/work /snap/0

Although the mount looks the same and has similar characteristics to
a regular overlay mount, it is used in a non conventional way for a
different use case.

With a regular overlay mount, the lower layer is expected to remain
unchanged, while upper layer is modified to contain all changes
performed on the union overlay mount.

With a snapshot overlay mount, lower layer is allowed to change,
while upper layer is modified to 'cover up' on these changes by
creating copies of the original objects, before they are modified
in the lower layer.

The result is that the content of the snapshot overlay remains
constant and therefore, can be used as a snapshot in time of the
lower layer at the time that the snapshot overlay was mounted.

As with regular overlay, the st_dev and st_ino fields of an
object in the snapshot overlay may change during the life time
of that object, but its content shall remain constant.


Snapshot mount
--------------

The secret sauce that is responsible of 'covering up' before lower layer
changes is the 'snapshot mount'. A 'snapshot mount', although similar by
name, is not the same as a 'snapshot overlay'. In fact, it is not an
overlay at all.

The snapshot mount acts as a shim over the lower layer to intercept
filesystem operations before modifying the lower layer objects and precede
those operations with "copy up" to upper layer.

A snapshot mount takes 2 mount options: 'snapshot=' and 'upperdir='.
The 'snapshot=' mount option points to a snapshot overlay mount point.
The 'upperdir=' mount option points to the lower dir of the snapshot overlay.
For example:

mount -t snapshot current -oupperdir=/lower,snapshot=/snap/0 /lower

In this example, the snapshot mount is mounted at /lower, on top of the
underlying filesystem, so any future access to /lower directory will not go
unnoticed.

Notice that the file system type used for the snapshot mount is 'snapshot'
and not 'overlay'. This distinction is merely a way to identify the role
of the mount. Under the hood, the snapshot mount super block operations
are somewhat different then the standard overlayfs super block operations,
because they serve a different purpose.

The most notably different operation is d_real(). Like, the standard
overlayfs d_real() it will trigger copy up, before any change to an object.
Unlike standard overlayfs d_real(), it always returns the same dentry
is was given as input. So when an application opens a file in /lower it
will really always get a direct handle to the file is lower.

As a result, filesystem operations on the snapshot mount should not exhibit
any of the overlayfs non standard behavior patterns.


Underlying filesystem
---------------------

The upper and lower directories of a snapshot overlay must be on the
same underlying filesystem. The underlying filesystem must be supported
for an overlay upper layer, so it must be writable and must be a local
filesystem with extended attributes support.

On top of these standard overlayfs requirements, the underlying filesystem
must also support NFS export operations, so it could use the "redirect_fh"
feature (see "Renaming directories" section).


Unlink and rmdir
----------------

One of the core functions of a snapshot mount is that write access to *any*
object in the filesystem is preceded by copy up to upper. This is similar,
but not the the same as regular overlay. with regular overlay, unlink() and
rmdir() do not copy up the removed object, but only its parents.

With snapshot mount, unlink() and rmdir() in lower first copies up the
object to upper and then the object is removed from lower. The result is that
the object is only in upper, which is the desired outcome.

The case of recursive directory remove is not any different, except, the
removed directories are already in upper before the rmdir(), because they had
to be copied up to contain copied up files on earlier unlink() calls.


Readdir
-------

Readdir from a snapshot overlay is very similar to readdir from a regular
overlay of single upper and single lower with one exception - with snapshot
overlay, the lower directory may have been deleted.

With regular overlay, upper dir with no lower dir means this is a new 'pure'
upper directory, so readdir from overlay is a native readdir of the upper dir.

With snapshot overlay, upper dir with no lower means that upper was copied
from lower and then lower was deleted. In this case there may be residue
whiteouts in the upper directory, so readdir from overlay must hide them
like it does when reading a merged upper+lower directory.

Readdir from the snapshot mount is a native readdir of the lower dir.


Renaming directories
--------------------

When renaming a directory in the lower layer, snapshot mount can handle it
in two different ways:

1. return EXDEV error: this error is returned by rename(2) when trying to
move a file or directory across filesystem boundaries. Hence
applications are usually prepared to handle this error (mv(1) for example
recursively copies the directory tree). This is the default behavior.

2. If the "redirect_fh" feature is enabled, then the file handle of the lower
directory will be stored in an extended attribute "trusted.overlay.fh" on
the copied up directory. The file handle is then used to lookup the lower
directory when reading from the snapshot overlay. This lookup method is
invariant to lower directory renames.


Implicit opaque directory
-------------------------

With regular overlay, when a new directory is created in upper on top of
a whited out object, that directory is marked as opaque to prevent merging
it with lower directories of the same name.

With snapshot overlay, a similar result is achieved implicitly from the
"redirect_fh" feature. When a lower directory has been deleted and a new
object of the same name created in its place, the file handle stored in
the upper directory, that used to lookup the lower directory becomes stale.

When the snapshot overlay lookup reaches a stale directory file handle, it
treats it as if the upper directory is opaque to get the expected result
of not exposing the new objects in lower in the snapshot overlay.


Explicit whiteouts
------------------

In order to support create and mkdir in lower without the risk of those
objects being exposed in the snapshot overlay, whiteouts need to be created
in upper prior to creating objects in lower.

Explicit whiteout is requested from the overlay mount by passing a
negative dentry and non zero open flags to d_real(). d_real() is
normally used to request a copy up of a file from lower to upper before
opening the file for write. Similarly, the new API means "copy nothing
to upper before it changes to something".


Multiple snapshots
------------------

An overlayfs mount may be stacked on top of another (lower) overlayfs
mount, but only a single level of nesting is allowed. Together with
the underlying filesystem at level 0, this amounts to filesystem stack
depth of 2, the maximum allowed by VFS (FILESYSTEM_MAX_STACK_DEPTH).

To get a view of anything but the latest snapshot overlay, a single
overlayfs mount is stacked on top of the latest snapshot overlay and
the historic upper layers are used as lower layers in reverse order,
oldest upper layer on top. For example, to get a view at time 2 from
the latest snapshot overlay at time 4:

mount -t overlay snap2 -olowerdir=/upper/2:/upper/3:/snap/4 /snap/2

As the example shows, "upperdir=" and "workdir=" are omitted, so the
stacked overlay mount is read-only.

Similarly, we could mount more nested snapshot overlays to get a view
of the lower dir at any other snapshot time, e.g.:

mount -t overlay snap3 -olowerdir=/upper/3:/snap/4 /snap/3

NOTE, that all these mounts will become stale once /snap/4 is no
longer the latest snapshot and they will have to be remounted with
the new latest snapshot as the lowest layer in order to revalidate
their content, e.g.:

mount -t overlay snap3 -olowerdir=/upper/3:/upper/4:/snap/5 /snap/3


Testsuite
---------

There is a fork of the testsuite developed by David Howells, with support
for testing overlayfs snapshots at:

https://github.com/amir73il/unionmount-testsuite.git

Run as root:

# cd unionmount-testsuite
# ./run --sn

0 comments on commit 89b2801

Please sign in to comment.