Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid lazy umount MNT_DETACH for NFS mounts as this causes system hang during reboot #48

Open
rhmcruiser opened this issue Mar 6, 2020 · 0 comments

Comments

@rhmcruiser
Copy link

I have raised a PR request as well, please check.

lazy umounts in src/oci-umount.c

Background of the issue:
Lazy unmount of nfs mounts causes several NFS IO blocked tasks and also block the reboot process. The vmcore details indicate this behavior. Issue is reproducible and we have observed the NFS IO tasks in blocked state, waiting to commit IO on the lazy umounted file system, and the reboot/shutdown also is seen blocked in UN state waiting for syncing of inodes on the nfs superblock before shutdown. But the NFS tasks wont progress and are blocked waiting to complete IO. Although the lazy umount with umount2(mnt_path, MNT_DE
TACH) removes it from the mount table, the reference by the tasks to its superblock are still active, the superblock.s_count and s_active reflecting it is in use and holding the number of references by several tasks for NFS IO. In general lazy umount causes undefined behavior and not recommended.

MNT_DETACH does not actually unmount a file-system which is in-use; it just detaches the mount from the visible file system tree, and makes it difficult to see what processes are still using the mount. This prevents normal shutdown of systems, due to continued access to the mount.

The issue is confirmed to happen only in dockers/containers environment with NFS mounts.
** Whereas if the NFS mounts are manually unmounted before reboot, the issue is not observed. But when allowing the containers oci-umount to lazy umount during a reboot, the issue is exhibited where the NFS tasks performing IO are blocked and reboot task attempting sync_inodes_sb( ) on this nfs superblock also gets blocked.

== Details, snippet from vmcore analysis:
The below shows the nfs superblock still holding a reference count for the superblock although it is not in the mount table. The superblock fields indicate write waiter tasks for this.

crash> mount | grep ffff9a0018ae2000
crash> <<
^^ although mount is removed from filesystem tree due to lazy umount,
the superblock fields have references and tasks waiting for nfs IO onto this superblock.

crash> p ((struct super_block*)0xffff9a0018ae2000)->s_op
$13 = (const struct super_operations *) 0xffffffffc0901b60 <nfs4_sops>

crash> p ((struct super_block*)0xffff9a0018ae2000)->s_count
$12 = 2 << usage count is still positive

crash> p ((struct super_block*)0xffff9a0018ae2000)->s_active
$14 = {
counter = 4 << 4 blocked tasks still holding reference to this nfs superblock
}
The 4 blocked tasks attempting IO onto that lazy umounted, nf file system were
crash> ps -m | grep UN
[0 00:10:44.049] [UN] PID: 2136 TASK: ffff99f3fef64f10 CPU: 17 COMMAND: "poweroff" => blocked performing sync_inodes_sb( )
[0 00:10:55.262] [UN] PID: 31589 TASK: ffff99ff78ba0000 CPU: 7 COMMAND: "java" => blocked for nfs_file_write( )
[0 00:10:55.345] [UN] PID: 62574 TASK: ffff99f44baa0000 CPU: 17 COMMAND: "java" => blocked for nfs_file_write( )
[0 00:11:02.028] [UN] PID: 63909 TASK: ffff99ed7cffaf70 CPU: 10 COMMAND: "prometheus" => blocked for nfs_file_write( )

< downsized the debug details >

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant