ovl: overlayfs snapshots documentation

Document the overlayfs snapshots feature. Signed-off-by: Amir Goldstein <amir73il@gmail.com>
amir73il · Dec 8, 2016 · 89b2801 · 89b2801
1 parent b934af8
commit 89b2801
Showing 1 changed file with 221 additions and 0 deletions.
diff --git a/Documentation/filesystems/overlayfs-snapshots.txt b/Documentation/filesystems/overlayfs-snapshots.txt
@@ -0,0 +1,221 @@
+Written by: Amir Goldstein
+
+See Documentation/filesystems/overlayfs.txt for required background.
+
+Overlayfs Snapshots
+===================
+
+This document describes the overlayfs snapshots feature.
+
+Snapshot overlay
+----------------
+
+A 'snapshot overlay' may be thought of as a 'reverse overlay'.
+It looks exactly like a regular overlay mount with one 'lower' layer
+and one 'upper' layer, combined into a unified view, e.g.:
+
+  mount -t overlay snap0 -olowerdir=/lower,upperdir=/upper/0,\
+  workdir=/work /snap/0
+
+Although the mount looks the same and has similar characteristics to
+a regular overlay mount, it is used in a non conventional way for a
+different use case.
+
+With a regular overlay mount, the lower layer is expected to remain
+unchanged, while upper layer is modified to contain all changes
+performed on the union overlay mount.
+
+With a snapshot overlay mount, lower layer is allowed to change,
+while upper layer is modified to 'cover up' on these changes by
+creating copies of the original objects, before they are modified
+in the lower layer.
+
+The result is that the content of the snapshot overlay remains
+constant and therefore, can be used as a snapshot in time of the
+lower layer at the time that the snapshot overlay was mounted.
+
+As with regular overlay, the st_dev and st_ino fields of an
+object in the snapshot overlay may change during the life time
+of that object, but its content shall remain constant.
+
+
+Snapshot mount
+--------------
+
+The secret sauce that is responsible of 'covering up' before lower layer
+changes is the 'snapshot mount'. A 'snapshot mount', although similar by
+name, is not the same as a 'snapshot overlay'. In fact, it is not an
+overlay at all.
+
+The snapshot mount acts as a shim over the lower layer to intercept
+filesystem operations before modifying the lower layer objects and precede
+those operations with "copy up" to upper layer.
+
+A snapshot mount takes 2 mount options: 'snapshot=' and 'upperdir='.
+The 'snapshot=' mount option points to a snapshot overlay mount point.
+The 'upperdir=' mount option points to the lower dir of the snapshot overlay.
+For example:
+
+  mount -t snapshot current -oupperdir=/lower,snapshot=/snap/0 /lower
+
+In this example, the snapshot mount is mounted at /lower, on top of the
+underlying filesystem, so any future access to /lower directory will not go
+unnoticed.
+
+Notice that the file system type used for the snapshot mount is 'snapshot'
+and not 'overlay'. This distinction is merely a way to identify the role
+of the mount. Under the hood, the snapshot mount super block operations
+are somewhat different then the standard overlayfs super block operations,
+because they serve a different purpose.
+
+The most notably different operation is d_real().  Like, the standard
+overlayfs d_real() it will trigger copy up, before any change to an object.
+Unlike standard overlayfs d_real(), it always returns the same dentry
+is was given as input. So when an application opens a file in /lower it
+will really always get a direct handle to the file is lower.
+
+As a result, filesystem operations on the snapshot mount should not exhibit
+any of the overlayfs non standard behavior patterns.
+
+
+Underlying filesystem
+---------------------
+
+The upper and lower directories of a snapshot overlay must be on the
+same underlying filesystem.  The underlying filesystem must be supported
+for an overlay upper layer, so it must be writable and must be a local
+filesystem with extended attributes support.
+
+On top of these standard overlayfs requirements, the underlying filesystem
+must also support NFS export operations, so it could use the "redirect_fh"
+feature (see "Renaming directories" section).
+
+
+Unlink and rmdir
+----------------
+
+One of the core functions of a snapshot mount is that write access to *any*
+object in the filesystem is preceded by copy up to upper. This is similar,
+but not the the same as regular overlay. with regular overlay, unlink() and
+rmdir() do not copy up the removed object, but only its parents.
+
+With snapshot mount, unlink() and rmdir() in lower first copies up the
+object to upper and then the object is removed from lower. The result is that
+the object is only in upper, which is the desired outcome.
+
+The case of recursive directory remove is not any different, except, the
+removed directories are already in upper before the rmdir(), because they had
+to be copied up to contain copied up files on earlier unlink() calls.
+
+
+Readdir
+-------
+
+Readdir from a snapshot overlay is very similar to readdir from a regular
+overlay of single upper and single lower with one exception - with snapshot
+overlay, the lower directory may have been deleted.
+
+With regular overlay, upper dir with no lower dir means this is a new 'pure'
+upper directory, so readdir from overlay is a native readdir of the upper dir.
+
+With snapshot overlay, upper dir with no lower means that upper was copied
+from lower and then lower was deleted. In this case there may be residue
+whiteouts in the upper directory, so readdir from overlay must hide them
+like it does when reading a merged upper+lower directory.
+
+Readdir from the snapshot mount is a native readdir of the lower dir.
+
+
+Renaming directories
+--------------------
+
+When renaming a directory in the lower layer, snapshot mount can handle it
+in two different ways:
+
+1. return EXDEV error: this error is returned by rename(2) when trying to
+   move a file or directory across filesystem boundaries.  Hence
+   applications are usually prepared to handle this error (mv(1) for example
+   recursively copies the directory tree).  This is the default behavior.
+
+2. If the "redirect_fh" feature is enabled, then the file handle of the lower
+   directory will be stored in an extended attribute "trusted.overlay.fh" on
+   the copied up directory.  The file handle is then used to lookup the lower
+   directory when reading from the snapshot overlay.  This lookup method is
+   invariant to lower directory renames.
+
+
+Implicit opaque directory
+-------------------------
+
+With regular overlay, when a new directory is created in upper on top of
+a whited out object, that directory is marked as opaque to prevent merging
+it with lower directories of the same name.
+
+With snapshot overlay, a similar result is achieved implicitly from the
+"redirect_fh" feature. When a lower directory has been deleted and a new
+object of the same name created in its place, the file handle stored in
+the upper directory, that used to lookup the lower directory becomes stale.
+
+When the snapshot overlay lookup reaches a stale directory file handle, it
+treats it as if the upper directory is opaque to get the expected result
+of not exposing the new objects in lower in the snapshot overlay.
+
+
+Explicit whiteouts
+------------------
+
+In order to support create and mkdir in lower without the risk of those
+objects being exposed in the snapshot overlay, whiteouts need to be created
+in upper prior to creating objects in lower.
+
+Explicit whiteout is requested from the overlay mount by passing a
+negative dentry and non zero open flags to d_real().  d_real() is
+normally used to request a copy up of a file from lower to upper before
+opening the file for write.  Similarly, the new API means "copy nothing
+to upper before it changes to something".
+
+
+Multiple snapshots
+------------------
+
+An overlayfs mount may be stacked on top of another (lower) overlayfs
+mount, but only a single level of nesting is allowed. Together with
+the underlying filesystem at level 0, this amounts to filesystem stack
+depth of 2, the maximum allowed by VFS (FILESYSTEM_MAX_STACK_DEPTH).
+
+To get a view of anything but the latest snapshot overlay, a single
+overlayfs mount is stacked on top of the latest snapshot overlay and
+the historic upper layers are used as lower layers in reverse order,
+oldest upper layer on top. For example, to get a view at time 2 from
+the latest snapshot overlay at time 4:
+
+  mount -t overlay snap2 -olowerdir=/upper/2:/upper/3:/snap/4 /snap/2
+
+As the example shows, "upperdir=" and "workdir=" are omitted, so the
+stacked overlay mount is read-only.
+
+Similarly, we could mount more nested snapshot overlays to get a view
+of the lower dir at any other snapshot time, e.g.:
+
+  mount -t overlay snap3 -olowerdir=/upper/3:/snap/4 /snap/3
+
+NOTE, that all these mounts will become stale once /snap/4 is no
+longer the latest snapshot and they will have to be remounted with
+the new latest snapshot as the lowest layer in order to revalidate
+their content, e.g.:
+
+  mount -t overlay snap3 -olowerdir=/upper/3:/upper/4:/snap/5 /snap/3
+
+
+Testsuite
+---------
+
+There is a fork of the testsuite developed by David Howells, with support
+for testing overlayfs snapshots at:
+
+  https://github.com/amir73il/unionmount-testsuite.git
+
+Run as root:
+
+  # cd unionmount-testsuite
+  # ./run --sn