New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overlay2 mount fails for larger Dockerfiles when length of data-root is 24 #1012
Comments
I suspect you're hitting |
So what's weird is that if length is 23 or 25 it works. I only see the problem when the length is exactly 24. I suspect that there is a faulty edge case somewhere. (Note that the length of |
That's definitely ..... interesting 🤔 No direct ideas 😂 |
After I lost several days of productive time to this bug, I decided it was worth dedicating a full day to making it reproducible inside a VM. I succeeded, so if this piques the interest of a curious developer, it would be great if they could follow my reproducing instructions and track down the bug. (Unfortunately I can't do it myself since I'm only really proficient in Python atm.) A lot of other people seem to be afflicted by similar bugs (see references), so it's not just some theoretical curiosity. |
I can repro this on RedHat 7.6. One clue is that the output of
Note that the pathname is truncated. I suspect the combination of many layers (61) and a |
So the problem wasn't in the kernel, it was an off-by-one bug in docker's overlay2 driver. |
This is a fix for docker/for-linux#1012. The code was not considering that C strings are NULL-terminated so we need to leave one extra byte. Without this fix, the testcase in docker/for-linux#1012 fails with ``` Step 61/1001 : RUN echo 60 > 60 ---> Running in dde85ac3b1e3 Removing intermediate container dde85ac3b1e3 ---> 80a12a18a241 Step 62/1001 : RUN echo 61 > 61 error creating overlay mount to /23456789112345678921234/overlay2/d368abcc97d6c6ebcf23fa71225e2011d095295d5d8c9b31d6810bea748bdf07-init/merged: no such file or directory ``` with the output of `dmesg -T` as: ``` [Sat Dec 19 02:35:40 2020] overlayfs: failed to resolve '/23456789112345678921234/overlay2/89e435a1b24583c463abb73e8abfad8bf8a88312ef8253455390c5fa0a765517-init/wor': -2 ``` with this fix, you get the expected: ``` Step 126/1001 : RUN echo 125 > 125 ---> Running in 2f2e56da89e0 max depth exceeded ``` Signed-off-by: Oscar Bonilla <6f6231@gmail.com>
@maresb Thanks a lot for providing a script for reproducing. It was super-helpful in debugging this. |
Wow, @ob, that's amazing work!!! I am so glad that you were able to fix this, and that my report was helpful, especially since it took me so much time. I didn't notice the For my own knowledge, I'm trying to understand the source code around this point. From what I can tell, this I feel relieved that you managed to fix this, since I had a bit the feeling that I was alone and that nobody would be able to reproduce this. Thank you!!! |
@maresb Yeah, your description of the bug is spot-on, that was exactly what was happening. |
This is a fix for docker/for-linux#1012. The code was not considering that C strings are NULL-terminated so we need to leave one extra byte. Without this fix, the testcase in docker/for-linux#1012 fails with ``` Step 61/1001 : RUN echo 60 > 60 ---> Running in dde85ac3b1e3 Removing intermediate container dde85ac3b1e3 ---> 80a12a18a241 Step 62/1001 : RUN echo 61 > 61 error creating overlay mount to /23456789112345678921234/overlay2/d368abcc97d6c6ebcf23fa71225e2011d095295d5d8c9b31d6810bea748bdf07-init/merged: no such file or directory ``` with the output of `dmesg -T` as: ``` [Sat Dec 19 02:35:40 2020] overlayfs: failed to resolve '/23456789112345678921234/overlay2/89e435a1b24583c463abb73e8abfad8bf8a88312ef8253455390c5fa0a765517-init/wor': -2 ``` with this fix, you get the expected: ``` Step 126/1001 : RUN echo 125 > 125 ---> Running in 2f2e56da89e0 max depth exceeded ``` Signed-off-by: Oscar Bonilla <6f6231@gmail.com> Upstream-commit: c923f6ac3bf61c8eb369a978b55a5d3f1fad0fbb Component: engine
moby/moby#41830 was merged, and is included in docker 20.10.2, so this should be fixed in that version |
@maresb Thank you for the description! For those who cannot upgrade: adding a |
Expected behavior
Regardless of
data-root
when building a large Dockerfile such asI would expect either a successful build, or the error
Actual behavior
Steps to reproduce the behavior
universe
apt source, from whichDocker version 19.03.6, build 369ce74a3c
can be installed.)Output is
Notes:
docker run
on the layer80a12a18a241
produces the same error.len(data-root)
is exactly 24.Docker version 19.03.6, build 369ce74a3c
to the latest stableDocker version 19.03.9, build 9d988398e7
following these instructions.data-root
and reproduce it consistently in a VM.Related:
#791
docker/for-mac#1396
docker/for-mac#1974
Environment details
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.)
Present in both my physical machine and in QEMU/KVM virtualization.
Processor: Core i7-6700 BX80662I76700
The text was updated successfully, but these errors were encountered: