BTRFS backend instead of AUFS? #443

Closed
shykes opened this Issue Apr 20, 2013 · 18 comments

Projects

None yet
@shykes
Contributor
shykes commented Apr 20, 2013

I have heard a lot of requests for supporting BTRFS as an alternative layering backend.

I heard the following arguments:

  • "BTRFS is part of upstream kernel, which means docker would be usable on more systems"
  • "Performance is better in use cas X/Y/Z"
  • My existing production system uses BTRFS, it would be easier to start using docker
  • BTRFS's design is more elegant and sustainable than AUFS, which patches the vfs layer
  • It would solve the "stale NFS handle" issue

If you have an opinion on the matter, let me know!

@jpetazzo
Contributor

I don't see any downside to btrfs (or zfs). It just makes some things more
difficult; not necessarily because of btrfs, but also to have aufs and
btrfs coexist.

  1. "rebasing" a layer would be more difficult: we would have to extract
    changed files and re-apply the diff on another image, instead of merely
    stacking the layer with aufs. It means that injecting something into an
    image (when containers are using the image) would be impossible.
  2. We would have to write a formal spec for the layer format (the current
    spec is "we use whatever aufs uses").
  3. It might ridicule the aufs format in some cases (i.e. when doing small
    changes to big files, a btrfs diff will be small, while a aufs diff will
    include the whole changed files), prompting people to request for multiple
    formats (and the mess that comes with it).
  4. Commands to list diffs would be a bit more complicated, and maybe
    costlier (note: I saw that you could list added/changed files, but didn't
    see about deleted files).
  5. We might have to deal with file corruption and data loss instead of
    "stale NFS handle" ;-)

On Fri, Apr 19, 2013 at 5:40 PM, Solomon Hykes notifications@github.comwrote:

I have heard a lot of requests for supporting BTRFS as an alternative
layering backend.

I heard the following arguments:

"BTRFS is part of upstream kernel, which means docker would be usable
on more systems"

"Performance is better in use cas X/Y/Z"

My existing production system uses BTRFS, it would be easier to start
using docker

BTRFS's design is more elegant and sustainable than AUFS, which
patches the vfs layer

It would solve the "stale NFS handle" issue

If you have an opinion on the matter, let me know!


Reply to this email directly or view it on GitHubhttps://github.com/dotcloud/docker/issues/443
.

@passy
Contributor
passy commented Apr 30, 2013

It looks like AUFS won't be part of the 3.9 kernel series in Ubuntu, so it would be a good opportunity to look into alternatives. With BTRFS being available as part of upstream is one giant plus.

@shykes
Contributor
shykes commented Apr 30, 2013

So far I have run into 3 issues with btrfs. I'm a btrfs newbie, so all 3
might be solvable.

  1. Multi-layer union mounts. I didn't find a way to do union mounts with an
    arbitrary number of layers.

  2. File-based layers. Docker stores and moves layers as tarballs with a
    full copy of all files changes, plus special "whiteout" metadata to express
    file removal. On the other hand, btrfs stores snapshots at the block level.
    I think I know how to extract well-formed layers from btrfs snapshots
    (get list of files changed on a given snapshot, use that as a filter to
    create a partial tarball of the mountpoint), as well as the reverse
    operation (create snapshot, apply tarball). But I haven't tested it yet. It
    also poses the problem of imperfect interop. Untar-ing layers on top of
    each other is not exactly the same as mounting them on top of each other
    with aufs.

  3. Freedom of block-level filesystem. One great thing about aufs: it
    doesn't care how your block devices are formatted. You can just drop
    /var/lib/docker on whatever filesystem happens to be there, and it should
    just work. Contrast this with btrfs, which requires a duly formatted block
    device. Would this mean Docker would only work on btrfs-formatted
    devices? Seems like a harsh barrier to entry. Possible solutions:

     a) Support both aufs and btrfs (interop would have to be perfect)
     b) Support a degraded mode with regular copy (layers could not be

reused, but this could be used for final use in production for example, or
on older kernels with no COW support)
c) Dev mode with loop-mounted sparse file. This could be great for
builds & dev usage where disk IO is not critical.

All in all, it looks doable, although it would be non-trivial. But if we
want to expand the number of servers which can run Docker, we might not
have any other choice.

On Tue, Apr 30, 2013 at 12:09 AM, Pascal Hartig notifications@github.comwrote:

It looks like AUFS won't be part of the 3.9 kernel series in Ubuntu, so it
would be a good opportunity to look into alternatives. With BTRFS being
available as part of upstream is one giant plus.


Reply to this email directly or view it on GitHubhttps://github.com/dotcloud/docker/issues/443#issuecomment-17212624
.

@tobert
tobert commented Apr 30, 2013

I've been using the "docker pattern" for years on btrfs, ZFS, and LVM snapshots. The big advantage of using CoW filesystems is that only block changes are tracked, which makes things like database snapshots and other large file systems viable.

I haven't personally bothered to optimize image shipping. With (my) app roots well under a gigabyte, rsync tends to be a perfectly usable way to manage layers. I've messed with send/recv, git repos (works better than you'd think), and mostly just use rsync similar to this: https://gist.github.com/tobert/5491155 In other words, maybe you want to find a way to manage layers independent of various filesystem implementations (the LCD) then worry about specific optimizations after that's available. Maybe aufs is required to build images, but the could be deployed to any FS with varying degrees of optimization?

If you want to look at other prior art, look at Solaris Zones with ZFS. It has been around for many years and is used in production all over the place. Joyent and OmniTI won't shut up about it :)

https://github.com/joyent/illumos-joyent/blob/master/usr/src/cmd/zoneadm/zfs.c

@jpetazzo
Contributor

Re/ BTRFS:

  1. Multi-layer union mounts

BTRFS (and ZFS) have different concepts. You can do this:

  • put a ubuntu base image somewhere
  • create a read-only snapshot of that "somewhere": now that's an image
  • out of that r-o image, create a r-w snapshot: that's a container rootfs
    (- repeat previous step as many times as necessary to instantiate multiple
    containers)
  • after running (/modifying) a container, create a r-o snapshot: that's a
    new image
  • rinse and repeat as many times as necessary (you can stack as many
    snapshots as you like)
    Note: the images can also be r-w, no problem with that. But when you modify
    an image, it obviously won't modify automatically the containers based on
    the image.
    In other words: with AUFS, you CAN modify the base image (and changes will
    be visible in containers); with BTRFS, you CANNOT (well, you can, but
    changes won't propagate).
  1. File-based layers

Yes, that's the main interop point; creating AUFS-style tarballs out of
BTRFS subvolume snapshots (with find-new) or ZFS snapshots (with "zfs
diff"). If we leave pseudo-links and external inode translation issues on
the side, it should work like a charm ("...famous last words!"). We don't
have a real-world use-case for plinks and xino but we could think about it
to see if we're about to lose something important.

  1. Freedom of block-level FS

Yes, btrfs requires its own block device. I think there is a kind of choice
between:

  • running on my existing system but requiring custom kernel with AUFS
  • running my existing kernel but needing /var/lib/docker to be BTRFS
    I'm sure that we can find very vocal supporters for both propositions :)
@jpetazzo
Contributor

For the record—current work in progress on BTRFS integration: https://github.com/jpetazzo/docker/tree/btrfs

@titanous
Contributor

Any plans to implement pluggable storage backends? (eg. aufs, btrfs, zfs, etc)

@jpetazzo
Contributor

Yes, absolutely. If only because it will make the aufs/btrfs transition (or
cohabitation) easier :-)

@dmacvicar

See #172 (comment) for overlayfs

@jefferai

FWIW, having used both BTRFS and ZFS on Linux (ZoL) quite a bit over the past few years, I'd be much happier if ZoL was natively supported along with BTRFS. I've never had ZoL corrupt data, whereas that's happened multiple times to me with BTRFS.

These days, the capabilities relevant to Docker are nearly the same; just a matter of how they're invoked/used. But if you're looking at transitioning capability from AUFS, I'd ask that you take care to think about doing it in a way that makes both filesystems viable, because ZoL would be my clear preference, even though it is not shipped in an upstream kernel.

@shykes
Contributor
shykes commented Jun 27, 2013

Jeff, absolutely. I am keeping that in mind and would like to see zfs support as well.

@solomonstre
@getdocker

On Thu, Jun 27, 2013 at 6:42 AM, Jeff Mitchell notifications@github.com
wrote:

FWIW, having used both BTRFS and ZFS on Linux (ZoL) quite a bit over the past few years, I'd be much happier if ZoL was natively supported along with BTRFS. I've never had ZoL corrupt data, whereas that's happened multiple times to me with BTRFS.

These days, the capabilities relevant to Docker are nearly the same; just a matter of how they're invoked/used. But if you're looking at transitioning capability from AUFS, I'd ask that you take care to think about doing it in a way that makes both filesystems viable, because ZoL would be my clear preference, even though it is not shipped in an upstream kernel.

Reply to this email directly or view it on GitHub:
#443 (comment)

@keeb
Contributor
keeb commented Aug 15, 2013

This is a plugin use case for 1.0. Tagged and closing.

@keeb keeb closed this Aug 15, 2013
@danielkza

Is the plugin API out in 1.0? Is there any documentation for it, or a roadmap?

@tianon
Member
tianon commented Feb 25, 2014

We are only up to 0.8.1, with 0.9 coming next month. A plugin system is in development but hasn't merged yet. Also, recent versions do include an experimental btrfs driver.

@danielkza

Thank you, I understand it now. I was actually looking at using ZFS. Apparently there is an existing port by @gurjeet, but are there any plans to have something officially supported? I would actually be interested in doing something myself, should I wait until the plugin API is out before making any plans?

@rektide
rektide commented Nov 7, 2014

I take it this has never happened? Why would this be closed just because it might be handleable as a plugin? Is there any relevant follow-up anywhere on this ticket?

@jessfraz
Contributor
jessfraz commented Nov 7, 2014

it has been implemented as a storage driver :) I use it and love it

@rektide
rektide commented Nov 7, 2014

Whoop whoop! First announced in 0.8 (February 2014). Thanks all. Sorry, was getting some noisiness in my search results that took some more time to pick through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment