Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: async (background) mount may cause problems #1088

Merged
merged 30 commits into from
Mar 22, 2023

Conversation

cvvz
Copy link
Contributor

@cvvz cvvz commented Mar 19, 2023

Fix

#1079
Pod deletion will cause global mount point to disappear. The root cause is csi driver will bind mount the global mount point into Pod volume immediately after blobfuse2 mount return successfully. However, fuse is still mounting at background, so actually bind mount and fuse mount happened at the same time, this will lead to a problem that unmount the bind mount (when deleting a Pod) will cause the original mount point unmounted as well.
#1081
Mount is actually failed, but the errno returned by blobfuse2 is 0 and no error log (both terminal and log file).

Why

  1. libfuse provide -f option to decide whether running the process in foreground or as a daemon process. Daemonizaion is based on fork.
  2. blobfuse v2 uses cgo to compile blobfuse v2 and libfuse. However, golang cannot handle damonization well, so v2 uses go-daemon as a workaround and always pass -f option to avoid damonizing in libfuse.
  3. The daemonization in libfuse actually happens AFTER mount completed, so the errorno and error message during mount stage can be returned successfully. Blobfuse v1 is written in C++, which can leverage the daemonization in libfuse, so blobfuse v1 can work normally.
  4. The difference in blobfuse v2 is that it uses go-daemon to daemonize BEFORE fuse_main (the entry point of libfuse), which makes the mount operation and all the other operations before libfuse daemonization in background now, thus makes the mount error (and the other operations errors) invisible from user/csi driver.

How

  1. The libfuse_init callback can be an indicator that the mount operation has been successfully completed. At this point child process can send a signal to parent process to notify it that mount has been completed successfully and it can exit normally.
  2. When parent process receives sigchild signal, it means the child process exited unexpectedly, and the parent process will exit with error code and error message sent by child.
  3. Libfuse(child) print error log to stderr, so I redirect the child process stderr to a temp file. By reading this file before exiting, parent process can print error message in terminal. BTW, go-daemon uses forkExec to fork and exec a new child process, so it's impossible to use PIPE for communicating since after exec, they won't share any opened file descriptor.
  4. Updated: We can pass the opened fd as arguments when exec the new child process. so, it is still possible to communicate between child and parent process through PIPE. The problem is that go-daemon does not support to set child stderr as an opened PIPE fd very well, I've filed a PR to the project.

Test

I've done test manually and it can solve the problem in both #1079 and #1081.

csi driver

To be safe, blob csi driver has to use IsLikelyNotMountPoint or MountedFast after executing blobfuse2 mount to make sure the mount point is really mounted successfully. Refer: kubernetes-sigs/blob-csi-driver#852

cmd/mount.go Outdated Show resolved Hide resolved
cmd/mount.go Outdated Show resolved Hide resolved
cmd/mount.go Outdated Show resolved Hide resolved
cmd/mount.go Outdated Show resolved Hide resolved
cmd/mount.go Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
common/util.go Outdated Show resolved Hide resolved
cmd/mount.go Show resolved Hide resolved
cmd/mount.go Outdated Show resolved Hide resolved
@cvvz
Copy link
Contributor Author

cvvz commented Mar 21, 2023

Since this PR has been merged by the maintainer of go-daemon, I think we can use PIPE now. @vibhansa-msft

@cvvz
Copy link
Contributor Author

cvvz commented Mar 21, 2023

LGTM

cmd/mount.go Outdated Show resolved Hide resolved
@vibhansa-msft
Copy link
Member

These changes are impacting the UT and code-coverage a lot :(

@vibhansa-msft vibhansa-msft merged commit 8f655cf into Azure:main Mar 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can not mount blob to AKS cluster via Python image using blobfuse2.
3 participants