Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EOVERFLOW error when using an osxfs mount inside an overlay filesystem #3643

Closed
1 of 2 tasks
aarcamp opened this issue Apr 30, 2019 · 9 comments
Closed
1 of 2 tasks

Comments

@aarcamp
Copy link

aarcamp commented Apr 30, 2019

  • I have tried with the latest version of my channel (Stable or Edge)
  • I have uploaded Diagnostics
  • Diagnostics ID:

Expected behavior

I'm trying to use overlayfs to create a union mount out of two filesystems, like follows:

(1) fuse.osxfs at /srv/share (lower layer; read-only)
(2) ext4 Docker volume at /data (upper layer; read-write)

Imagine that /srv/share contains software that I want to build in the container from source. Furthermore, I want to use overlayfs to merge this source filesystem with a separate writable volume such that during compilation, all build artifacts are written to the Docker volume at the /data mount (as opposed to polluting the pristine source tree through the shared /srv mount).

Note: Just to dispel any potential confusion, I'm aware that Docker itself works in part by leveraging overlayfs as a storage driver. This is a separate use case.

Actual behavior

It mostly works, except after any change to the filesystem on the host side (e.g., touching a new file), a subsequent readdir() on the container side results in EOVERFLOW ("Value too large for defined data type"), as follows:

AaronsMPMid2010:~ aaron$ touch share/foo
AaronsMPMid2010:~ aaron$ docker exec eoverflow-demo ls /data/merged 2>&1 | head -1
ls: reading directory /data/merged: Value too large for defined data type
AaronsMPMid2010:~ aaron$

Please see below for a simple procedure to reproduce. I believe it to be highly reproducible, as I was able to replicate the issue easily on two separate machines, and my co-workers have witnessed this behavior well.

Information

AaronsMPMid2010:~ aaron$ uname -a
Darwin AaronsMPMid2010.x41s 17.7.0 Darwin Kernel Version 17.7.0: Wed Feb 27 00:43:23 PST 2019; root:xnu-4570.71.35~1/RELEASE_X86_64 x86_64
AaronsMPMid2010:~ aaron$ docker --version
Docker version 18.09.2, build 6247962
AaronsMPMid2010:~ aaron$

Steps to reproduce the behavior

With a fresh install of Docker for Mac 2.0.0.3 (Engine 18.09.2):

AaronsMPMid2010:~ aaron$ mkdir -p share
AaronsMPMid2010:~ aaron$ touch share/foo
AaronsMPMid2010:~ aaron$ docker volume create writelayer
writelayer
AaronsMPMid2010:~ aaron$ docker create --cap-add SYS_ADMIN --interactive --tty --mount source=writelayer,target=/data --name eoverflow-demo --volume /Users/aaron/share:/srv/share:consistent,ro centos /sbin/init
feb7cc4506205cbd79b277d5611b02d0fe800775326826f8404dbb6931020b86
AaronsMPMid2010:~ aaron$ docker start eoverflow-demo
eoverflow-demo
AaronsMPMid2010:~ aaron$ docker exec eoverflow-demo mkdir -p /data/{upper,work,merged}
AaronsMPMid2010:~ aaron$ docker exec eoverflow-demo mount -t overlay -o lowerdir=/srv/share,workdir=/data/work,upperdir=/data/upper none /data/merged
AaronsMPMid2010:~ aaron$ docker exec eoverflow-demo ls /data/merged
foo
AaronsMPMid2010:~ aaron$ touch share/bar
AaronsMPMid2010:~ aaron$ docker exec eoverflow-demo ls /data/merged 2>&1 | head -5
ls: reading directory /data/merged: Value too large for defined data type
ls: reading directory /data/merged: Value too large for defined data type
ls: reading directory /data/merged: Value too large for defined data type
ls: reading directory /data/merged: Value too large for defined data type
ls: reading directory /data/merged: Value too large for defined data type
AaronsMPMid2010:~ aaron$ docker exec eoverflow-demo ls /data/merged 2>&1 | head -5
ls: reading directory /data/merged: Value too large for defined data type
ls: reading directory /data/merged: Value too large for defined data type
ls: reading directory /data/merged: Value too large for defined data type
ls: reading directory /data/merged: Value too large for defined data type
ls: reading directory /data/merged: Value too large for defined data type
AaronsMPMid2010:~ aaron$

To get out of this state, we can make a stat(2) system call on the shared filesystem mount (accomplished here with the mountpoint(1) tool):

AaronsMPMid2010:~ aaron$ docker exec eoverflow-demo mountpoint /srv/share
/srv/share is a mountpoint
AaronsMPMid2010:~ aaron$ docker exec eoverflow-demo ls /data/merged 2>&1 | head -5
bar
foo
AaronsMPMid2010:~ aaron$

I believe the root cause of this bug is related to the integrated support for FSEvents/inotify translation. I developed this theory by noticing that whenever I make a change to the source on the host side, it results in some FUSE lookup traffic:

AaronsMPMid2010:~ aaron$ ( /Applications/Docker.app/Contents/MacOS/com.docker.osxfs trace & ); sleep 1; touch share/xyzzy
AaronsMPMid2010:~ aaron$
15645818553008: returning UNKNOWN SUCCESS of length 144 from 8
15645819230981: 9 (2) FUSE_LOOKUP.p1191.u0.g0 share
15645819342671: returning nodeid=0.3 valid=0.000000000s attr={ino=0 size=-1 blocks=0 atime=0 mtime=0 ctime=0 mode=drwxrwxrwx (0x41ff) nlink=1 uid=0 gid=0 rdev=0 blksize=512} from 9
15645819981104: 10 (3) FUSE_LOOKUP.p1191.u0.g0 xyzzy
15645820030614: returning err [ ENOENT ] from 10
15645820506541: 11 (3) FUSE_MKNOD.p1191.u0.g0 mode=-rw------- (8180) rdev=0 umask=0o22 xyzzy
15645820597166: returning nodeid=0.4 valid=0.000000000s attr={ino=0 size=-1 blocks=0 atime=0 mtime=0 ctime=0 mode=-rwxrwxrwx (0x81ff) nlink=1 uid=0 gid=0 rdev=0 blksize=512} from 11

To test my theory, I sent the STOP signal to the macOS fseventsd daemon, and this prevented the issue:

AaronsMPMid2010:~ aaron$ sudo kill -STOP $(pgrep fseventsd)
Password:
AaronsMPMid2010:~ aaron$ touch share/baz
AaronsMPMid2010:~ aaron$ docker exec eoverflow-demo ls /data/merged 2>&1 | head -5
bar
baz
foo
AaronsMPMid2010:~ aaron$

Since the osxfs drivers are closed-source, I cannot debug any further.

Furthermore, I haven't found a way to disable the FSEvents feature. Is it possible? Any other suggested workarounds you can think of would be highly appreciated.

@guillaumerose
Copy link
Contributor

@dgageot it seems related to your issue: #3203

@docker-robott
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale comment.
Stale issues will be closed after an additional 30d of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle stale

@william-perry
Copy link

All of our developers on OSX are seeing this exact same issue. Avoiding the overlay in our case is not really feasible because things like the linux builds do not like case-insensitive filesystems. Any suggestions on what we can do to provide more information to help get this issue resolved?

@william-perry
Copy link

/remove-lifecycle stale

codefromthecrypt pushed a commit to openzipkin-attic/docker-zipkin that referenced this issue Oct 17, 2019
Under certain circumstances, the `ls` command can't be invoked in mac.
This avoids it by moving the `ls` to build time.

See docker/for-mac#3643
Fixes #232
@docker-robott
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale comment.
Stale issues will be closed after an additional 30d of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle stale

@william-perry
Copy link

/remove-lifecycle stale

This is still happening daily. Anybody have any cycles?

@aarcamp
Copy link
Author

aarcamp commented Dec 6, 2019

Apologies for taking so long to follow up here.

We're no longer using this setup, but I don't think it's a Docker bug.

Disabling FSEvents had the illusion (for awhile) of fixing the problem, but did not completely solve it. For posterity, I'll note that I did find a funny hack for disabling FSEvents for a specific folder on macOS. If the path on the host contains a subfolder named ".ubd" (e.g., /Users/aaron/.ubd/share), changes to any files underneath will be ignored with respect to FSEvents (I figured this out by disassembling fseventsd).

Sadly, that's insufficient. Here's the rub, from the overlayfs documentation:

Changes to underlying filesystems
---------------------------------

...

Changes to the underlying filesystems while part of a mounted overlay
filesystem are not allowed.  If the underlying filesystem is changed,
the behavior of the overlay is undefined, though it will not result in
a crash or deadlock.

Basically, if you share a folder from the host into a container and use it as a lower layer in an overlay mount, then any subsequent changes to that folder on the host will produce undefined behavior. This seems to be exacerbated by FSEvents, but even if you disable them, overlayfs can end up in a bad state. FWIW, in our case this would usually manifest in ESTALE errors.

We've concluded that overlayfs is not well suited to this use case. To avoid file syncing, I would suggest using osxfs with to an out-of-source build setup instead, similar to what CMake recommends, i.e.:

To maintain a pristine source tree, perform an out-of-source build by using
a separate dedicated build tree. An in-source build in which the build tree is
placed in the same directory as the source tree is also supported, but discouraged.

This is more elegant than having to sync on every change (there are lots of existing solutions out there for that), but may require a lot of rework depending on your project.

@aarcamp aarcamp closed this as completed Dec 6, 2019
@william-perry
Copy link

Well, !@#%!@ - thanks for the info Aaron. Looks like some poor slob (me) will get to go rewrite large chunks of our build system. Or we ditch MacOS... not sure which is more expensive. My time or a few dozen beefy linux boxes for the developers instead of their Macs...

@docker-robott
Copy link
Collaborator

Closed issues are locked after 30 days of inactivity.
This helps our team focus on active issues.

If you have found a problem that seems similar to this, please open a new issue.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle locked

@docker docker locked and limited conversation to collaborators Jul 1, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants