Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core dump directory is bind-mounted 32767 times #119

Closed
iameli opened this issue Dec 19, 2022 · 5 comments
Closed

core dump directory is bind-mounted 32767 times #119

iameli opened this issue Dec 19, 2022 · 5 comments

Comments

@iameli
Copy link

iameli commented Dec 19, 2022

I have no idea who to report this bug to, so I'm going to duplicate the report a few places.
local-path-provisioner: rancher/local-path-provisioner#287
kubernetes: kubernetes/kubernetes#114583

Environmental Info:
core-dump-handler image: quay.io/icdh/core-dump-handler:v8.2.0
K3s Version:

root@dp2426:~# k3s -v
k3s version v1.23.4+k3s1 (43b1cb48)
go version go1.17.5

I have also seen this behavior on a different node running a more recent version

root@dp7744:~# k3s -v
k3s version v1.25.3+k3s1 (f2585c16)
go version go1.19.2

Node(s) CPU architecture, OS, and Version:

root@dp2426:~# uname -a
Linux dp2426 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
6 Server Nodes

Describe the bug:
I'm running core-dump-handler on a few nodes. When core-dump-handler comes under load — we had a service elsewhere that was malfunctioning and segfaulting many times per second — its directory gets bind-mounted over and over and over and over. I do not know by whom.

mount | grep core
/dev/md1 on /home/data/core-dump-handler/cores type ext4 (rw,relatime,stripe=256)
/dev/md1 on /home/data/core-dump-handler/cores type ext4 (rw,relatime,stripe=256)
/dev/md1 on /home/data/core-dump-handler/cores type ext4 (rw,relatime,stripe=256)
/dev/md1 on /home/data/core-dump-handler/cores type ext4 (rw,relatime,stripe=256)
/dev/md1 on /home/data/core-dump-handler/cores type ext4 (rw,relatime,stripe=256)
/dev/md1 on /home/data/core-dump-handler/cores type ext4 (rw,relatime,stripe=256)
/dev/md1 on /home/data/core-dump-handler/cores type ext4 (rw,relatime,stripe=256)
/dev/md1 on /home/data/core-dump-handler/cores type ext4 (rw,relatime,stripe=256)
/dev/md1 on /home/data/core-dump-handler/cores type ext4 (rw,relatime,stripe=256)
/dev/md1 on /home/data/core-dump-handler/cores type ext4 (rw,relatime,stripe=256)
/dev/md1 on /home/data/core-dump-handler/cores type ext4 (rw,relatime,stripe=256)
[...]

# mount | grep core | wc -l
32767

Steps To Reproduce:
No idea how to reproduce this in an isolated environment, but I'll give it a shot as I continue debugging.

Here's core-dump-handler's DaemonSet configuration file and the PVCs that back it. The pertinent volumes section:

      volumes:
      - name: host-volume
        persistentVolumeClaim:
          claimName: host-storage-pvc
      - name: core-volume
        persistentVolumeClaim:
          claimName: core-storage-pvc
[...]
        volumeMounts:
        - mountPath: /home/data/core-dump-handler
          mountPropagation: Bidirectional
          name: host-volume
        - mountPath: /home/data/core-dump-handler/cores
          mountPropagation: Bidirectional
          name: core-volume

Possibly a problem with bind-mounting one directory inside another...?

@No9
Copy link
Collaborator

No9 commented Dec 19, 2022

Thanks for the report @iameli - This is a tricky one alright

Can you provide logs from the agent container and the composer log in /home/data/core-dump-handler/composer.log?
It would be good to confirm if it's the composer on the host or the agent in the container that's causing the issue.

From 10000 feet it seems like /home/data/core-dump-handler/cores isn't getting mounted on provision but on first use or it's causing issues when accessed from the host and and this is leading to a race condition of some sort.

You may also want to see if you can create a file in the core folder outside of the core dump process with touch /home/data/core-dump-handler/cores/RANDOM_NAME on the host to see if that forces a mount.

As an aside can you also confirm that a single core dump is working?

kubectl run -it segfaulter --image=quay.io/icdh/segfaulter --restart=Never
kubectl delete pod segfaulter

@No9
Copy link
Collaborator

No9 commented Jan 27, 2023

@iameli This has been open for a month with no further feedback so I'm closing it off as I think the issue is with the local-path-provisioner. If I'm wrong or you have more information please feel free to re-open this and I have also subscribed to the rancher/local-path-provisioner issue to make sure I get updates.

@No9 No9 closed this as completed Jan 27, 2023
@gonzalesraul
Copy link
Contributor

gonzalesraul commented Aug 7, 2023

@No9 it looks like it's an easy issue to simulate, and it might be related to the Bidirectional mode the spec use it.

Look this on the docs https://kubernetes.io/docs/concepts/storage/volumes/#local

Warning: Bidirectional mount propagation can be dangerous. It can damage the host operating system and therefore it is allowed only in privileged containers. Familiarity with Linux kernel behavior is strongly recommended. In addition, any volume mounts created by containers in pods must be destroyed (unmounted) by the containers on termination.

On every new corefile generated , it could find a new mount on the underlying machine

@philipp1992
Copy link

we are facing the same issue with thousands of mounts for ...cores being created.
Is Bidirectional mode required?

@philipp1992
Copy link

image
it seems like every restart of the coredump handler causes the mounts to double. restarting the host resets this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants