Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filesystem under load occasionally reports mounted files as not existing #2870

Closed
nwalters512 opened this issue May 2, 2018 · 8 comments

Comments

@nwalters512
Copy link

commented May 2, 2018

  • I have tried with the latest version of my channel (Stable or Edge)
  • I have submitted Diagnostics
  • I have included the Diagnostics ID in this Issue
  • macOS Version: 10.13.4 (17E199)

Expected behavior

When multiple processes are running in a tight loop that is checking if a file exists and reading it, the file should never be reported as not existing, assuming that the file is not accidentally deleted by another process.

Actual behavior

Node code (and likely other code as well, though I haven't yet tested other languages) that does what's described above will occasionally see that the file in question does not exist, even though it definitely does.

Information

This bug was observed when I was trying to compile the same EJS template in several worker processes at once. I observed that occasionally, one of the .ejs files that was included from the main template I was rendering would be reported as not existing, even though it definitely did. The included template was being used many times in the file, and so EJS was frequently checking if it existed and reading it.

Eventually, I determined that this failure only occurred if the following conditions
were met:

  • The code is being run inside a Docker container
  • Docker is being run on a Mac
  • The file that we're trying to check the existence of is on a mounted volume.

If either of those conditions are broken (the code was run directly on the host, the file exists in the image itself, or Docker is run on a Windows/Linux machine), then the bug disappears. For this reason, I believe the bug is with Docker for Mac, and not Node or EJS.

Diagnostic ID: 15D5E555-47BB-4C80-9B0C-BEFAF3F272AF

Docker for Mac: version: 18.03.1-ce-mac65 (dd2831d4b7421cf559a0881cc7a5fdebeb8c2b98)
macOS: version 10.13.4 (build: 17E199)
logs: /tmp/15D5E555-47BB-4C80-9B0C-BEFAF3F272AF/20180502-163929.tar.gz
[OK]     vpnkit
[OK]     vmnetd
[OK]     dns
[OK]     driver.amd64-linux
[OK]     virtualization VT-X
[OK]     app
[OK]     moby
[OK]     system
[OK]     moby-syslog
[OK]     kubernetes
[OK]     files
[OK]     env
[OK]     virtualization kern.hv_support
[OK]     osxfs
[OK]     moby-console
[OK]     logs
[OK]     docker-cli
[OK]     disk

Steps to reproduce the behavior

I was able to reproduce this case with a simple program that spawns two children with the threads package. Each child repeatedly checks if a file exists and reads the file in a tight loop. On my Mac, I observe that the file sometimes appears to not exist.

Given that the reproducible case requires more than just a simple Dockerfile (several other files, dependencies, and a specific folder), I have set up a self-contained example in the repo at https://github.com/nwalters512/docker-file-bug. I've copy-pasted the Dockerfile and Node code here.

Dockerfile

FROM node:carbon

WORKDIR /usr/src/app

COPY package*.json ./

RUN npm install

COPY . .

CMD [ "node", "index.js" ]

index.js

const Pool = require('threads').Pool;
const pool = new Pool();

pool.run((opts, done) => {
    const fs = require('fs');
    const file = '/volume/exist.data';
    for (let i = 0; i < 100; i++) {
        if (!fs.existsSync(file)) {
            console.log('File does not exist!');
        } else {
            console.log('File exists.')
        }
        fs.readFileSync(file);
    }
    done();
});

for (let i = 0; i < 2; i++) {
    pool.send('test');
}

pool.on('finished', () => pool.killAll());
@djs55

This comment has been minimized.

Copy link
Contributor

commented May 3, 2018

Thanks for the detailed repro instructions. I'll give them a go later and see what happens!

@djs55

This comment has been minimized.

Copy link
Contributor

commented May 9, 2018

This reproduces for me with the latest build -- I'll investigate further.

@djs55 djs55 closed this May 10, 2018

@djs55

This comment has been minimized.

Copy link
Contributor

commented May 10, 2018

@nwalters512 thanks again for the detailed report and repro case -- it was very useful.

I've found and fixed a race condition in com.docker.osxfs that is triggered (as you noticed) by concurrent I/O to the same file. When a file handle is opened in a container, we also open a file handle on the host. Calls to read, write, close etc are passed-through. Calls to stat (used by fs.existsSync) are translated into FUSE GETATTR requests which target an inode which references the file itself, not a handle of it. In com.docker.osxfs there was logic to "borrow" an file descriptor associated with an inode/file to call fstat but unfortunately this raced with a concurrent file close which invalidated the file descriptor. There was logic to handle this case... but unfortunately it was flawed. I was able to track this down thanks to your nice reproduction. I've also written a regression test for the CI so it shouldn't happen again.

If you'd like to try a development build, this one should have the fix: https://download-stage.docker.com/mac/bysha1/d2e134eea43e57a7fe7234add9dfa1eef984761c/Docker.dmg On my machine your test passes with the fix, and reliably fails without it. Note that this build only suitable for testing, not production.

The fix should be released in the next edge build, due early next week.

Thanks again for your report!

@djs55

This comment has been minimized.

Copy link
Contributor

commented May 10, 2018

Sorry for prematurely closing this issue -- I accidentally tagged it in a PR elsewhere. I'll re-open until the fix is released next week.

@djs55 djs55 reopened this May 10, 2018

@YRM64

This comment has been minimized.

Copy link

commented May 14, 2018

Thanks djs55 for your contributions. Please keep us informed as soon as the fix in the next edge build is released.

@nwalters512

This comment has been minimized.

Copy link
Author

commented May 14, 2018

Thanks @djs55, glad to hear the report was useful! Looking forward to seeing this fixed 🎉

@docker-desktop-robot

This comment has been minimized.

Copy link
Collaborator

commented Aug 12, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale comment.
Stale issues will be closed after an additional 30d of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle stale

@djs55

This comment has been minimized.

Copy link
Contributor

commented Aug 13, 2018

The fix has been released in the latest stable and edge channels so I'm closing the issue.

If there are further problems, please open a fresh issue (or issues). Thanks again for all your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.