New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`docker: Stale file handle` after updating Container Linux to alpha #2152

Closed
jrave9000 opened this Issue Sep 13, 2017 · 9 comments

Comments

Projects
None yet
7 participants
@jrave9000

jrave9000 commented Sep 13, 2017

Bug

After an update from latest stable to latest alpha I encounter 'Stale file handle' errors while trying to use apt (dpkg) in Ubuntu-based docker containers (tested on 16.04 and 14.04). In Debian-based containers everything is ok.
Totally reproducible.

Everything is fine with fresh alpha setup, but maybe there's an update issue. And I'll be grateful for ANY advice, because it's way more boring to move all the stuff to another instance.

Container Linux Version

cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1520.1.0
VERSION_ID=1520.1.0
BUILD_ID=2017-09-05-2146
PRETTY_NAME="Container Linux by CoreOS 1520.1.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"

Environment

Google Cloud Engine

Reproduction Steps

  1. Create an instance with latest stable Container Linux.
  2. Update it to alpha.
  3. Run the following command in ubuntu docker container:
    $ apt update && apt install software-properties-common

Error:

dpkg: error: error removing old backup file '/var/lib/dpkg/status-old': Stale file handle

@nathankleyn

This comment has been minimized.

Show comment
Hide comment
@nathankleyn

nathankleyn Sep 15, 2017

We are also seeing this issue on the alpha release, pretty much identical scenario as described above - however happy to provide more info if needed!

nathankleyn commented Sep 15, 2017

We are also seeing this issue on the alpha release, pretty much identical scenario as described above - however happy to provide more info if needed!

@bgilbert

This comment has been minimized.

Show comment
Hide comment
@bgilbert

bgilbert Sep 16, 2017

Member

One other step is needed to reproduce: between steps 1 and 2, run a container.

Container Linux stable currently has docker 1.12, which defaults to the overlay storage driver. docker 17.06 defaults to overlay2 unless it finds existing data from the overlay driver, in which case it uses overlay. The Stale file handle problem apparently only occurs with overlay.

As a workaround, you can stop docker.service and delete /var/lib/docker. This will, of course, delete all your Docker data. Afterward, you should be using the overlay2 storage driver; you can verify this in docker info.

I don't see any recent Stale file handle reports in moby/moby; they may be interested in knowing about the issue as well.

Member

bgilbert commented Sep 16, 2017

One other step is needed to reproduce: between steps 1 and 2, run a container.

Container Linux stable currently has docker 1.12, which defaults to the overlay storage driver. docker 17.06 defaults to overlay2 unless it finds existing data from the overlay driver, in which case it uses overlay. The Stale file handle problem apparently only occurs with overlay.

As a workaround, you can stop docker.service and delete /var/lib/docker. This will, of course, delete all your Docker data. Afterward, you should be using the overlay2 storage driver; you can verify this in docker info.

I don't see any recent Stale file handle reports in moby/moby; they may be interested in knowing about the issue as well.

@bgilbert

This comment has been minimized.

Show comment
Hide comment
@bgilbert

bgilbert Sep 16, 2017

Member
  • Reproduced on CL alpha 1535.1.0 on AWS.
  • Reproduced on CL beta 1520.3.0, which has kernel 4.13.2 and docker 1.12.6.
  • Not reproduced on CL stable 1465.7.0, which has kernel 4.12.10 and docker 1.12.6.

So it looks as though this is a problem with the overlay driver + kernel 4.13, and not specifically with docker 17.06. On 17.06, overlay2 is at least available to switch to, but no such luck on 1.12.

apt-get update && apt-get install pv is sufficient to provoke the problem, but only once per container. dpkg -i <package> does not provoke the problem. I've captured a partial strace by copying in a /usr/bin/strace and attaching to dpkg processes as they show up. The obvious unusual sequence of events is the create/unlink/link/rename of /var/lib/dpkg/status{,-old,-new}, but replicating that did not immediately reproduce the problem.

There are three overlayfs commits that add ESTALE returns in 4.13. Two of them only affect mount time, and the third is torvalds/linux@b9ac5c2. Other 4.13 overlayfs changes may have introduced new calls to existing functions returning ESTALE, but I haven't checked.

Member

bgilbert commented Sep 16, 2017

  • Reproduced on CL alpha 1535.1.0 on AWS.
  • Reproduced on CL beta 1520.3.0, which has kernel 4.13.2 and docker 1.12.6.
  • Not reproduced on CL stable 1465.7.0, which has kernel 4.12.10 and docker 1.12.6.

So it looks as though this is a problem with the overlay driver + kernel 4.13, and not specifically with docker 17.06. On 17.06, overlay2 is at least available to switch to, but no such luck on 1.12.

apt-get update && apt-get install pv is sufficient to provoke the problem, but only once per container. dpkg -i <package> does not provoke the problem. I've captured a partial strace by copying in a /usr/bin/strace and attaching to dpkg processes as they show up. The obvious unusual sequence of events is the create/unlink/link/rename of /var/lib/dpkg/status{,-old,-new}, but replicating that did not immediately reproduce the problem.

There are three overlayfs commits that add ESTALE returns in 4.13. Two of them only affect mount time, and the third is torvalds/linux@b9ac5c2. Other 4.13 overlayfs changes may have introduced new calls to existing functions returning ESTALE, but I haven't checked.

@euank

This comment has been minimized.

Show comment
Hide comment
@euank

euank Sep 18, 2017

Contributor

The upstream patch https://patchwork.kernel.org/patch/9955803/ fixes this issue for me.
After applying it locally, apt-get update && apt-get install pv on 17.06 configured with -s overlay no longer reproduces the issue.

Contributor

euank commented Sep 18, 2017

The upstream patch https://patchwork.kernel.org/patch/9955803/ fixes this issue for me.
After applying it locally, apt-get update && apt-get install pv on 17.06 configured with -s overlay no longer reproduces the issue.

@weikinhuang

This comment has been minimized.

Show comment
Hide comment
@weikinhuang

weikinhuang Sep 19, 2017

Happens to me too on the beta channel since 4 days ago.

cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1520.3.0
VERSION_ID=1520.3.0
BUILD_ID=2017-09-15-2017
PRETTY_NAME="Container Linux by CoreOS 1520.3.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"

Reproduced with:

docker run -it --rm centos:7 bash
Unable to find image 'centos:7' locally
7: Pulling from library/centos
d9aaf4d82f24: Pull complete
Digest: sha256:eba772bac22c86d7d6e72421b4700c3f894ab6e35475a34014ff8de74c10872e
Status: Downloaded newer image for centos:7
[root@cc3077c5252a /]# yum install -y -q openssh-clients
warning: /var/cache/yum/x86_64/7/base/packages/fipscheck-lib-1.4.1-6.el7.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID f4a80eb5: NOKEY
Public key for fipscheck-lib-1.4.1-6.el7.x86_64.rpm is not installed
Public key for openssh-7.4p1-12.el7_4.x86_64.rpm is not installed
Importing GPG key 0xF4A80EB5:
 Userid     : "CentOS-7 Key (CentOS 7 Official Signing Key) <security@centos.org>"
 Fingerprint: 6341 ab27 53d7 8a78 a7c2 7bb1 24c6 a8a7 f4a8 0eb5
 Package    : centos-release-7-4.1708.el7.centos.x86_64 (@CentOS)
 From       : /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
Traceback (most recent call last):
  File "/usr/bin/yum", line 29, in <module>
    yummain.user_main(sys.argv[1:], exit_code=True)
  File "/usr/share/yum-cli/yummain.py", line 370, in user_main
    errcode = main(args)
  File "/usr/share/yum-cli/yummain.py", line 276, in main
    return_code = base.doTransaction()
  File "/usr/share/yum-cli/cli.py", line 783, in doTransaction
    resultobject = self.runTransaction(cb=cb)
  File "/usr/lib/python2.7/site-packages/yum/__init__.py", line 1943, in runTransaction
    self.verifyTransaction(resultobject, vTcb)
  File "/usr/lib/python2.7/site-packages/yum/__init__.py", line 2006, in verifyTransaction
    po.yumdb_info.releasever = self.conf.yumvar['releasever']
  File "/usr/lib/python2.7/site-packages/yum/rpmsack.py", line 1930, in __setattr__
    self._write(attr, value)
  File "/usr/lib/python2.7/site-packages/yum/rpmsack.py", line 1854, in _write
    misc.unlink_f(fn + '.tmp')
  File "/usr/lib/python2.7/site-packages/yum/misc.py", line 955, in unlink_f
    os.unlink(filename)
OSError: [Errno 116] Stale file handle: '/var/lib/yum/yumdb/f/cb7e013b0931dc495c9295d40ffbd0f49e31484b-fipscheck-lib-1.4.1-6.el7-x86_64/releasever.tmp'

weikinhuang commented Sep 19, 2017

Happens to me too on the beta channel since 4 days ago.

cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1520.3.0
VERSION_ID=1520.3.0
BUILD_ID=2017-09-15-2017
PRETTY_NAME="Container Linux by CoreOS 1520.3.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"

Reproduced with:

docker run -it --rm centos:7 bash
Unable to find image 'centos:7' locally
7: Pulling from library/centos
d9aaf4d82f24: Pull complete
Digest: sha256:eba772bac22c86d7d6e72421b4700c3f894ab6e35475a34014ff8de74c10872e
Status: Downloaded newer image for centos:7
[root@cc3077c5252a /]# yum install -y -q openssh-clients
warning: /var/cache/yum/x86_64/7/base/packages/fipscheck-lib-1.4.1-6.el7.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID f4a80eb5: NOKEY
Public key for fipscheck-lib-1.4.1-6.el7.x86_64.rpm is not installed
Public key for openssh-7.4p1-12.el7_4.x86_64.rpm is not installed
Importing GPG key 0xF4A80EB5:
 Userid     : "CentOS-7 Key (CentOS 7 Official Signing Key) <security@centos.org>"
 Fingerprint: 6341 ab27 53d7 8a78 a7c2 7bb1 24c6 a8a7 f4a8 0eb5
 Package    : centos-release-7-4.1708.el7.centos.x86_64 (@CentOS)
 From       : /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
Traceback (most recent call last):
  File "/usr/bin/yum", line 29, in <module>
    yummain.user_main(sys.argv[1:], exit_code=True)
  File "/usr/share/yum-cli/yummain.py", line 370, in user_main
    errcode = main(args)
  File "/usr/share/yum-cli/yummain.py", line 276, in main
    return_code = base.doTransaction()
  File "/usr/share/yum-cli/cli.py", line 783, in doTransaction
    resultobject = self.runTransaction(cb=cb)
  File "/usr/lib/python2.7/site-packages/yum/__init__.py", line 1943, in runTransaction
    self.verifyTransaction(resultobject, vTcb)
  File "/usr/lib/python2.7/site-packages/yum/__init__.py", line 2006, in verifyTransaction
    po.yumdb_info.releasever = self.conf.yumvar['releasever']
  File "/usr/lib/python2.7/site-packages/yum/rpmsack.py", line 1930, in __setattr__
    self._write(attr, value)
  File "/usr/lib/python2.7/site-packages/yum/rpmsack.py", line 1854, in _write
    misc.unlink_f(fn + '.tmp')
  File "/usr/lib/python2.7/site-packages/yum/misc.py", line 955, in unlink_f
    os.unlink(filename)
OSError: [Errno 116] Stale file handle: '/var/lib/yum/yumdb/f/cb7e013b0931dc495c9295d40ffbd0f49e31484b-fipscheck-lib-1.4.1-6.el7-x86_64/releasever.tmp'
@bgilbert

This comment has been minimized.

Show comment
Hide comment
@bgilbert

bgilbert Sep 20, 2017

Member

This should be fixed in alpha 1535.2.0 and beta 1520.4.0, due shortly. Thanks for reporting.

Member

bgilbert commented Sep 20, 2017

This should be fixed in alpha 1535.2.0 and beta 1520.4.0, due shortly. Thanks for reporting.

@bgilbert bgilbert closed this Sep 20, 2017

@nathankleyn

This comment has been minimized.

Show comment
Hide comment
@nathankleyn

nathankleyn Sep 21, 2017

Many thanks @bgilbert and @euank for the amazingly quick fix - you guys make OSS look easy! 👍

nathankleyn commented Sep 21, 2017

Many thanks @bgilbert and @euank for the amazingly quick fix - you guys make OSS look easy! 👍

@ccureau

This comment has been minimized.

Show comment
Hide comment
@ccureau

ccureau Oct 12, 2017

👍 We ran into this exact issue today, and updating the kernel to 4.13.6-1.el7.elrepo solved the issue. Thanks!!!

ccureau commented Oct 12, 2017

👍 We ran into this exact issue today, and updating the kernel to 4.13.6-1.el7.elrepo solved the issue. Thanks!!!

@hordemark

This comment has been minimized.

Show comment
Hide comment
@hordemark

hordemark Nov 22, 2017

Awesome! resolve it by upgrading kernel to 4.14.1-1.el7.elrepo

hordemark commented Nov 22, 2017

Awesome! resolve it by upgrading kernel to 4.14.1-1.el7.elrepo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment