Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk/image corruption with large numbers of files #2625

Closed
tristanpemble opened this issue Feb 21, 2018 · 25 comments
Closed

Disk/image corruption with large numbers of files #2625

tristanpemble opened this issue Feb 21, 2018 · 25 comments

Comments

@tristanpemble
Copy link

Expected behavior

Generating large numbers of files in the container will not corrupt the files

Actual behavior

Files are corrupted

Information

Diagnose & Feedback:

Docker for Mac: version: 18.02.0-ce-mac53 (60e488931a12eeba773efb9e92be068a3c081758)
macOS: version 10.13.3 (build: 17D47)
logs: /tmp/BD6AFC3D-1322-4D56-956C-65EDDD22BE75/20180220-180841.tar.gz
[OK]     vmnetd
[OK]     dns
[OK]     driver.amd64-linux
[OK]     virtualization VT-X
[OK]     app
[OK]     moby
[OK]     system
[OK]     moby-syslog
[OK]     kubernetes
[OK]     env
[OK]     virtualization kern.hv_support
[OK]     slirp
[OK]     osxfs
[OK]     moby-console
[OK]     logs
[OK]     docker-cli
[OK]     menubar
[OK]     disk

Dockerfile:

FROM alpine
WORKDIR /work

# Generate 100,000 files with 10,000 alphanumeric characters each (around a gigabyte of data)
RUN for i in `seq 1 100000`; do cat /dev/urandom | tr -dc 'A-Za-z0-9' | fold -w 10000 | head -n 1 > $i.txt ; done

# Look for non-alphanumeric characters.. should output no files
RUN grep -l -e '[^A-Za-z0-9]' *.txt

Steps to reproduce the behavior

  1. First I reset Docker for Mac to factory defaults.
  2. Immediately after resetting to factory defaults (and copying diagnose & feedback information above), using the above Dockerfile, I get this output from docker build .:
    λ docker build .
    Sending build context to Docker daemon  2.048kB
    Step 1/4 : FROM alpine
    latest: Pulling from library/alpine
    ff3a5c916c92: Pull complete
    Digest: sha256:7df6db5aa61ae9480f52f0b3a06a140ab98d427f86d8d5de0bedab9b8df6b1c0
    Status: Downloaded newer image for alpine:latest
     ---> 3fd9065eaf02
    Step 2/4 : WORKDIR /work
    Removing intermediate container f0915c5abd00
     ---> 30821123326c
    Step 3/4 : RUN for i in `seq 1 100000`; do cat /dev/urandom | tr -dc 'A-Za-z0-9' | fold -w 10000 | head -n 1 > $i.txt ; done
     ---> Running in 8211bc831d57
    Removing intermediate container 8211bc831d57
     ---> 74ae74bd63f0
    Step 4/4 : RUN grep -l -e '[^A-Za-z0-9]' *.txt
     ---> Running in 0e98eafb4c78
    6390.txt
    63900.txt
    63901.txt
    63902.txt
    63903.txt
    63904.txt
    63905.txt
    63906.txt
    63907.txt
    63908.txt
    63909.txt
    6391.txt
    63910.txt
    63911.txt
    63912.txt
    63913.txt
    63914.txt
    63915.txt
    63916.txt
    63917.txt
    63918.txt
    63919.txt
    6392.txt
    63920.txt
    63921.txt
    63922.txt
    63923.txt
    63924.txt
    63925.txt
    63926.txt
    63927.txt
    63928.txt
    63929.txt
    6393.txt
    63930.txt
    63931.txt
    63932.txt
    63933.txt
    63934.txt
    63935.txt
    63936.txt
    63937.txt
    63938.txt
    63939.txt
    6394.txt
    63940.txt
    63941.txt
    63942.txt
    63943.txt
    63944.txt
    63945.txt
    63946.txt
    63947.txt
    63948.txt
    63949.txt
    6395.txt
    63950.txt
    63951.txt
    63952.txt
    63953.txt
    63954.txt
    63955.txt
    63956.txt
    63957.txt
    63958.txt
    63959.txt
    6396.txt
    63960.txt
    63961.txt
    63962.txt
    63963.txt
    63964.txt
    63965.txt
    63966.txt
    63967.txt
    63968.txt
    63969.txt
    6397.txt
    63970.txt
    63971.txt
    63972.txt
    63973.txt
    63974.txt
    63975.txt
    63976.txt
    63977.txt
    64360.txt
    64361.txt
    64362.txt
    64363.txt
    64364.txt
    64365.txt
    64366.txt
    64367.txt
    64368.txt
    64369.txt
    6437.txt
    64370.txt
    64371.txt
    64372.txt
    64373.txt
    64374.txt
    64375.txt
    64376.txt
    64377.txt
    64378.txt
    64379.txt
    6438.txt
    64380.txt
    64381.txt
    64382.txt
    64383.txt
    64384.txt
    64385.txt
    64386.txt
    64387.txt
    64388.txt
    64389.txt
    6439.txt
    64390.txt
    64391.txt
    64392.txt
    64393.txt
    64394.txt
    64395.txt
    64396.txt
    64397.txt
    64398.txt
    64399.txt
    644.txt
    6440.txt
    64400.txt
    64401.txt
    64402.txt
    64403.txt
    64404.txt
    64405.txt
    64406.txt
    64407.txt
    64408.txt
    64409.txt
    6441.txt
    64410.txt
    64411.txt
    64412.txt
    64413.txt
    64414.txt
    64415.txt
    64416.txt
    64417.txt
    64418.txt
    64419.txt
    6442.txt
    64420.txt
    64421.txt
    64422.txt
    64423.txt
    64424.txt
    64425.txt
    64426.txt
    64427.txt
    64428.txt
    64429.txt
    6443.txt
    64430.txt
    64431.txt
    64432.txt
    64433.txt
    64434.txt
    64435.txt
    64436.txt
    64437.txt
    64438.txt
    64439.txt
    6444.txt
    64440.txt
    64441.txt
    64442.txt
    64443.txt
    64444.txt
    64445.txt
    64446.txt
    64447.txt
    64448.txt
    64449.txt
    6445.txt
    64450.txt
    64451.txt
    64452.txt
    64453.txt
    64454.txt
    64455.txt
    64456.txt
    64457.txt
    64458.txt
    64459.txt
    6446.txt
    64460.txt
    64461.txt
    64462.txt
    64463.txt
    64464.txt
    64465.txt
    64466.txt
    64467.txt
    64468.txt
    64469.txt
    6447.txt
    64470.txt
    64471.txt
    64472.txt
    64473.txt
    64474.txt
    64475.txt
    64476.txt
    64477.txt
    64478.txt
    64479.txt
    6448.txt
    64480.txt
    64481.txt
    64482.txt
    64483.txt
    64484.txt
    64485.txt
    64486.txt
    64487.txt
    64488.txt
    64489.txt
    6449.txt
    64490.txt
    64491.txt
    64492.txt
    64493.txt
    64494.txt
    64495.txt
    64496.txt
    64497.txt
    64498.txt
    64499.txt
    645.txt
    6450.txt
    64500.txt
    64501.txt
    64502.txt
    64503.txt
    64504.txt
    64505.txt
    64506.txt
    64507.txt
    64508.txt
    64509.txt
    6451.txt
    64510.txt
    64511.txt
    64512.txt
    64513.txt
    64514.txt
    64515.txt
    64516.txt
    64517.txt
    64518.txt
    64519.txt
    6452.txt
    64520.txt
    64521.txt
    64522.txt
    64523.txt
    64524.txt
    64525.txt
    64526.txt
    64527.txt
    64528.txt
    64529.txt
    6453.txt
    64530.txt
    64531.txt
    64532.txt
    64533.txt
    64534.txt
    64535.txt
    64536.txt
    64537.txt
    64538.txt
    64539.txt
    6454.txt
    64540.txt
    64541.txt
    64542.txt
    64543.txt
    64544.txt
    64545.txt
    64546.txt
    64547.txt
    64548.txt
    64549.txt
    6455.txt
    64550.txt
    64551.txt
    64552.txt
    64553.txt
    64554.txt
    64555.txt
    64556.txt
    64557.txt
    64558.txt
    64559.txt
    6456.txt
    64560.txt
    64561.txt
    64562.txt
    64563.txt
    64564.txt
    64565.txt
    64566.txt
    64567.txt
    64568.txt
    64569.txt
    6457.txt
    64570.txt
    64571.txt
    64572.txt
    64573.txt
    64574.txt
    64575.txt
    64576.txt
    64577.txt
    64578.txt
    64579.txt
    6458.txt
    64580.txt
    64581.txt
    64582.txt
    64583.txt
    64584.txt
    64585.txt
    64586.txt
    64587.txt
    64588.txt
    64589.txt
    6459.txt
    64590.txt
    64591.txt
    64592.txt
    64593.txt
    64594.txt
    64595.txt
    64596.txt
    64597.txt
    64598.txt
    64599.txt
    646.txt
    6460.txt
    64600.txt
    64601.txt
    64602.txt
    64603.txt
    64604.txt
    64605.txt
    64606.txt
    64607.txt
    64608.txt
    64609.txt
    6461.txt
    64610.txt
    64611.txt
    64612.txt
    64613.txt
    64614.txt
    64615.txt
    64616.txt
    64617.txt
    64618.txt
    64619.txt
    6462.txt
    64620.txt
    64621.txt
    64622.txt
    64623.txt
    64624.txt
    64625.txt
    64626.txt
    64627.txt
    64628.txt
    64629.txt
    6463.txt
    64630.txt
    64631.txt
    64632.txt
    64633.txt
    64634.txt
    64635.txt
    64636.txt
    64637.txt
    64638.txt
    64639.txt
    6464.txt
    64640.txt
    64641.txt
    64642.txt
    64643.txt
    64644.txt
    64645.txt
    64646.txt
    64647.txt
    64648.txt
    64649.txt
    6465.txt
    64650.txt
    64651.txt
    64652.txt
    64653.txt
    64654.txt
    64655.txt
    64656.txt
    64657.txt
    64658.txt
    64659.txt
    6466.txt
    64660.txt
    64661.txt
    64662.txt
    64663.txt
    64664.txt
    64665.txt
    64666.txt
    64667.txt
    64668.txt
    64669.txt
    6467.txt
    64670.txt
    64671.txt
    64672.txt
    64673.txt
    64674.txt
    64675.txt
    64676.txt
    64677.txt
    64678.txt
    64679.txt
    6468.txt
    64680.txt
    64681.txt
    64682.txt
    64683.txt
    64684.txt
    64685.txt
    64686.txt
    64687.txt
    64688.txt
    64689.txt
    6469.txt
    64690.txt
    64691.txt
    64692.txt
    64693.txt
    64694.txt
    64695.txt
    64696.txt
    64697.txt
    64698.txt
    64699.txt
    647.txt
    6470.txt
    64700.txt
    64701.txt
    64702.txt
    64703.txt
    64704.txt
    64705.txt
    64706.txt
    64707.txt
    64708.txt
    64709.txt
    6471.txt
    64710.txt
    64711.txt
    64712.txt
    64713.txt
    64714.txt
    64715.txt
    64716.txt
    64717.txt
    64718.txt
    64719.txt
    6472.txt
    64720.txt
    64721.txt
    64722.txt
    64723.txt
    64724.txt
    64725.txt
    64726.txt
    64727.txt
    64728.txt
    64729.txt
    6473.txt
    64730.txt
    64731.txt
    64732.txt
    64733.txt
    64734.txt
    64735.txt
    64736.txt
    64737.txt
    64738.txt
    64739.txt
    6474.txt
    64740.txt
    64741.txt
    64742.txt
    64743.txt
    64744.txt
    64745.txt
    64746.txt
    64747.txt
    64748.txt
    64749.txt
    6475.txt
    64750.txt
    64751.txt
    64752.txt
    64753.txt
    64754.txt
    64755.txt
    64756.txt
    64757.txt
    64758.txt
    64759.txt
    6476.txt
    64760.txt
    64761.txt
    64762.txt
    64763.txt
    64764.txt
    64765.txt
    64766.txt
    64767.txt
    64768.txt
    64769.txt
    6477.txt
    64770.txt
    64771.txt
    64772.txt
    64773.txt
    64774.txt
    64775.txt
    64776.txt
    64777.txt
    64778.txt
    64779.txt
    6478.txt
    64780.txt
    64781.txt
    64782.txt
    64783.txt
    64784.txt
    64785.txt
    64786.txt
    64787.txt
    64788.txt
    64789.txt
    6479.txt
    64790.txt
    64791.txt
    64792.txt
    64793.txt
    64794.txt
    64795.txt
    64796.txt
    64797.txt
    64798.txt
    64799.txt
    648.txt
    6480.txt
    64800.txt
    64801.txt
    64802.txt
    64803.txt
    64804.txt
    64805.txt
    64806.txt
    64807.txt
    64808.txt
    64809.txt
    6481.txt
    64810.txt
    64811.txt
    64812.txt
    64813.txt
    64814.txt
    64815.txt
    64816.txt
    64817.txt
    64818.txt
    64819.txt
    6482.txt
    64820.txt
    64821.txt
    Removing intermediate container 0e98eafb4c78
     ---> 9968c5d7fc72
    Successfully built 9968c5d7fc72
    
  3. Examine one of the files in the list with docker run --rm 9968c5d7fc72 cat 6482.txt (truncated output):
     λ docker run --rm 9968c5d7fc72 cat 6482.txt
     �wa\�X�E=IΪm�SP�	��-F�#ɕaQ1�A��R#�*9P�#�j���z�X�2��be�ԵG}Jc#�.�-���kC6�ЗE���N��A�,���1:��R�������R�?A}B��:�QK��GU�5'F�^]cQ�S�Y��9��D�L�r9��P4�#�
     ��y�(f>h��=Hҍs"o{�?\���Y�T���:%������A;��H��% �g� 8w�\�1)�?k�v-0֪�k�;������O����4B%И�FA�	��`�IL�s!��L��h���8ِ���;aV�+��mXi� �vD�g�K~���݆�#l���'�Q��IS|�&Xi�
                                       s�	����/���v�Dj�����F�}B�����%򡏱���{Dfg=�į�~��I���|L[)�I�
@tristanpemble
Copy link
Author

This issue has been most problematic in the real world scenario of an unfortunately large npm install generating around 130,000 files. docker build will fail randomly and unpredictably, and the only way to get around it is to reset Docker for Mac and try again.

@tristanpemble
Copy link
Author

I believe this is related to a myriad of other issue reports (#2567 #2542 etc, just search "corrupt") but it's not clear to me which. This is the only report I can find that has a reliable strategy to reproduce the issue.

@zhm
Copy link

zhm commented Feb 21, 2018

I'm seeing the same behavior of random corrupt files on 17.12.0-ce-mac49. Out of curiosity I ran your test case and I was able to also see garbage binary data in some of the files. From the other issues, it seems like the temporary workaround is to switch to the old image format as mentioned here #2327 (comment).

@tristanpemble
Copy link
Author

thanks. I tried switching to the qcow image format, and was unable to reproduce the issue with this Dockerfile.

@pgayvallet
Copy link

Hi,

Thanks for the reproduction scenario. We are actively looking for a fix on this issue.

@rn
Copy link

rn commented Feb 22, 2018

@tristanpemble I tried to repro this (one of colleagues was able to), but so far I had no luck and I'm trying on several machines in parallel.

One question, how full is your disk? Is it close to the limit or is there plenty of free space?

Thanks

@tristanpemble
Copy link
Author

@rn:

  • Docker for Mac VM's disk image is allocated to 64GiB, and this is after a reset to factory defaults
  • My MacBook's SSD is at 60% capacity:
    Filesystem      Size   Used  Avail Capacity iused               ifree %iused  Mounted on
    /dev/disk1s1   420Gi  237Gi  162Gi    60% 2334929 9223372036852440878    0%   /
    

@tristanpemble
Copy link
Author

Some additional information: several of my coworkers are seeing the same issues in other scenarios, and saw them go away with qcow. When trying to use this Dockerfile just now, we were unable to reliably reproduce, unfortunately.

I've been able to reproduce every time:

  • MacBook Pro (15-inch, 2017)
    macOS 10.13.3 (17D47)

One coworker could not, but saw the issue in other scenarios before switching to qcow:

  • MacBook Pro (15-inch, 2017)
    macOS 10.13.3 (17D102)

Another coworker, on Sierra, was unable to reproduce (since Sierra will default to the qcow2 format).

@djs55
Copy link
Contributor

djs55 commented Feb 22, 2018

This was reproducing for me reliably on 2 laptops yesterday:

  • MacBook Pro (13-inch, Early 2015) running a 10.13.4 Beta build
  • MacBook Pro (15-inch, Early 2013) running macOS 10.13.3 (17D102)

I updated the beta laptop to 10.13.4 Beta (17E160e) and now I can't reproduce it on that machine, but can still on the other one. Furthermore in the beta build another tangentially-related APFS bug was fixed (a bug in F_PUNCHHOLE as used by TRIM). Perhaps Apple have fixed several APFS bugs in the latest beta?

@tristanpemble
Copy link
Author

tristanpemble commented Feb 22, 2018

This is pretty frustrating. I'm actually unable to reproduce anymore. Here's the steps I've taken:

  1. [2 DAYS AGO] Reset to factory defaults, using raw
  2. Reproduced with that Dockerfile numerous times (using --no-cache to avoid layer caching)
  3. Confirmed switching to qcow by editing config file fixes it
  4. [TODAY] Came back today, switched back to raw by editing the config file
  5. Was unable to reproduce at all with a variety of RAM/CPU adjustments to VM
  6. BUT THEN — I reset to factory defaults again just now, and reproduced it twice in a row

I'm wondering if there's something about switching back and forth between qcow/raw that makes it disappear?

@tristanpemble
Copy link
Author

tristanpemble commented Feb 22, 2018

I had another suspicion, so I ran docker image rm alpine and re-built this Dockerfile.

Before removing the image, it reproduced.
After removing the image, it did not.

I'm starting to think that this Dockerfile is surfacing a symptom of the image being corrupted at pull time

UPDATE:

  1. Reset factory settings
  2. docker build --no-cache . – Reproduced once
  3. docker build --no-cache . – Reproduced again
  4. docker image rm alpine – Deleted alpine image
  5. docker build --no-cache . – Did not reproduce
  6. docker build --no-cache . – Reproduced again

Nevermind on that one. Sorry for all of the comments, trying to give as much info as I can since this is a pretty unpredictable bug.

@znerd
Copy link

znerd commented Feb 28, 2018

Note that the release notes for Docker 17.12.0-ce-mac55 reference this ticket:

Revert the default disk format to qcow2 for users running macOS 10.13 (High Sierra). There are confirmed reports of file corruption using the raw format which uses sparse files on APFS. Note this change only takes effect after a reset to factory defaults (from the Whale menu -> Preferences -> Reset). Related to #2625

@agodoroja
Copy link

Could this be related to an issue that I am seeing with mounting volumes? I am using Docker to run the toolchain for building a project in Ubuntu. I have all of the source in the Mac filesystem and the tools installed in a Docker container. I need a case-sensitive filesystem, so I initially created one on the Mac using a sparse bundle. When I ran my Docker container with the Mac disk image mounted as a volume, the make process failed because it created many files of apparently the right size, but filled with 0x00. Changing to a sparse image didn't help, but it seems to work when I created a .dmg file with the total amount pre-allocated by Disk Utility. I am using 17.12.0-ce-mac49 (21995) on High Sierra.

@djs55
Copy link
Contributor

djs55 commented Mar 1, 2018

Sorry I forgot to comment here when we released the update switching back to qcow2.

@agodoroja I suspect that is the same (or very similar) issue. It seems that when APFS allocates blocks for sparse files, they can occasionally end up with corrupt contents. I suspect the bug has been fixed on the macOS developer beta (10.13.4) recently and I hope the fix will be released as a regular update soon. In the meantime we've switched back to qcow2 by default for safety. When we're convinced the underlying bug has been fixed we'll switch back to raw again.

Thanks all (especially @tristanpemble for the test case)

@davidthornton
Copy link

@djs55 will this be added directly to stable or incubated in edge first?

@RoyHP
Copy link

RoyHP commented Apr 2, 2018

macOS 10.13.4 has released - have we confirmed that it is safe to reset back to raw format? I don't know if Apple fixed this issue with sparsebundle allocation

@djs55
Copy link
Contributor

djs55 commented Apr 2, 2018

I confirmed the bug still reproduces on my 10.13.3 machine and then updated to 10.13.4. I re-ran the test 5 times in a row and the bug didn't reproduce. It's hard to prove the bug has been fixed (there's no mention in the release notes). Perhaps someone else who has a machine where it reproduces could try updating, switch to raw mode (by editing the file extension in settings.json in the ~/Library/Group Containers/group.com.docker directory, restarting the app and verify a raw file is created) and let us know what happens?

I suspect we'll want to incubate the change in edge before moving to stable. We'll also need to add a version check somewhere to keep qcow2 for 10.13.0-10.13.3.

@davidthornton
Copy link

My 10.13.4 took a while to install with a few more reboots than normal which leads me to believe an APFS update was included. If it was, and the bug was patched (I never experienced it on my machine) then I think the lack of transparency from Apple regarding APFS is a little bit concerning. Say what you want about keeping features secret, but filesystem bugs should surely be reported.

That being said, do we know when k8s support will enter the stable release track?

@rn
Copy link

rn commented Apr 6, 2018

FWIW, I just updated to 10.13.4 and switched back to raw. On 10.13.3 I had some file corruption when untar'ing linux kernel source trees, maybe 1 in 3 or 4. I've now done about 12 iterations without any noticeable corruption. Definitely an improvement and another datapoint that this might be fixed in APFS

@cascornelissen
Copy link

cascornelissen commented Apr 6, 2018

I personally can't seem to reproduce it with the testcase provided in this issue on my end (D4M stable 18.03.0-ce-mac60 (23751), High Sierra 10.13.4 (17E199)). Haven't tested it on 10.13.3 though.

Not sure if this is related to the APFS bug described here: https://bombich.com/blog/2018/02/15/macos-may-lose-data-on-apfs-formatted-disk-images but it notes the issue persists in the 10.13.4 release 🤔

Update March 30, 2018: This issue persists on macOS 10.13.4 (17E199)

@glyph
Copy link

glyph commented Apr 26, 2018

I just saw the edge channel update suggesting that this is the default again, linking me to this issue, but I don't see any indication that the underlying issue is fixed. Does anyone have a reference on the fix? Do I need to update macOS?

@gondalez
Copy link

gondalez commented Apr 27, 2018

hi @glyph you need to make sure you're on macos 10.13.4 and reset docker to factory defaults. 10.13.3 introduced the bug I believe.

Re-enable raw as the the default disk format for users running macOS >>>10.13.4 and higher<<<. Note this change only takes effect after a >>>“reset to factory defaults” or “remove all data”<<< (from the Whale menu -> Preferences -> Reset). Related to #2625

About the reference to the fix; docker is closed source so you won't see a commit link here. I assume the docker team will resolve this one once the fix makes its way to stable.

I personally can attest to raw working in macos 10.13.4, I have been using raw in stable using the settings.json workaround for weeks with no corruption issues. The performance improvement is significant :)

It's been talked about above but in case it helps others, here are the exact steps to use raw with docker stable (18.03.0-ce-mac60):

  • edit ~/Library/Group Containers/group.com.docker/settings.json
  • change the extension in diskPath from qcow -> raw
  • open docker and factory reset

@rn
Copy link

rn commented Apr 27, 2018

Yes, it looks like Apple has fixed the issue in APFS which caused disk corruption with sparse files. We had a number of tests, including the one kindly provided by @tristanpemble, and none of them triggered any corruption on 10.13.4. So the latest edge enables raw disks again.

We did not see any mention in the Apple's changelog and noticed some other APFS related chances too.

@markruys
Copy link

markruys commented Jun 4, 2018

I've ran @tristanpemble's test on both 10.13.4 and 10.13.5 and a raw backing store. In neither cases corrupt files were created.

@docker-robott
Copy link
Collaborator

Closed issues are locked after 30 days of inactivity.
This helps our team focus on active issues.

If you have found a problem that seems similar to this, please open a new issue.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle locked

@docker docker locked and limited conversation to collaborators Jun 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests