Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process hangs when writing to mount point served by the same process #264

Closed
tomyl opened this issue Nov 13, 2020 · 2 comments
Closed

Process hangs when writing to mount point served by the same process #264

tomyl opened this issue Nov 13, 2020 · 2 comments

Comments

@tomyl
Copy link

tomyl commented Nov 13, 2020

I was writing unit tests for my filesystem implementation when I noticed that the test process sometimes hanged. It seems like everything works fine when my binary serves the mount point and other processes access it but when the same process is both serving and using the mount point something gets stuck. Example code:

package main

import (
	"context"
	"io/ioutil"
	"log"
	"os"
	"os/exec"
	"path/filepath"

	"bazil.org/fuse"
	fusefs "bazil.org/fuse/fs"
)

type FS struct{}

func (fs FS) Root() (fusefs.Node, error) {
	return &Dir{}, nil
}

type Dir struct{}

func (d *Dir) Attr(ctx context.Context, a *fuse.Attr) error {
	return nil
}

func (d *Dir) Create(ctx context.Context, req *fuse.CreateRequest, resp *fuse.CreateResponse) (fusefs.Node, fusefs.Handle, error) {
	return &File{}, &WriteHandle{}, nil
}

type File struct{}

func (f *File) Attr(ctx context.Context, a *fuse.Attr) error {
	return nil
}

func (f *File) Open(ctx context.Context, req *fuse.OpenRequest, resp *fuse.OpenResponse) (fusefs.Handle, error) {
	return &WriteHandle{}, nil
}

type WriteHandle struct{}

var (
	_ fusefs.FS          = (*FS)(nil)
	_ fusefs.Node        = (*Dir)(nil)
	_ fusefs.NodeCreater = (*Dir)(nil)
	_ fusefs.Node        = (*File)(nil)
	_ fusefs.NodeOpener  = (*File)(nil)
)

func spam(mountpoint string) error {
	name := filepath.Join(mountpoint, "myfile")

	for i := 0; i < 10000; i++ {
		log.Printf("create %d", i)
		f, err := os.OpenFile(name, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0666)
		if err != nil {
			return err
		}
		if err := f.Close(); err != nil {
			return err
		}
	}

	return nil
}

func run(fork bool) error {
	mountpoint, err := ioutil.TempDir("", "fusetest")
	if err != nil {
		return err
	}

	log.Printf("mountpoint %s", mountpoint)
	defer os.RemoveAll(mountpoint)

	conn, err := fuse.Mount(
		mountpoint,
		fuse.FSName("test"),
		fuse.Subtype("testfs"),
	)
	if err != nil {
		return err
	}

	go func() {
		if err := fusefs.Serve(conn, FS{}); err != nil {
			panic(err)
		}
	}()

	defer func() {
		if err := fuse.Unmount(mountpoint); err != nil {
			panic(err)
		}
		if err := conn.Close(); err != nil {
			panic(err)
		}
	}()

	if fork {
		log.Printf("forking")
		cmd := exec.Command(os.Args[0])
		cmd.Env = []string{"MOUNTPOINT=" + mountpoint}
		cmd.Stdout = os.Stdout
		cmd.Stderr = os.Stderr
		if err := cmd.Run(); err != nil {
			return err
		}
	} else if err := spam(mountpoint); err != nil {
		return err
	}

	return nil
}

func main() {
	mountpoint := os.Getenv("MOUNTPOINT")
	fork := os.Getenv("FORK") == "1"
	if mountpoint != "" {
		if err := spam(mountpoint); err != nil {
			panic(err)
		}
	} else if err := run(fork); err != nil {
		panic(err)
	}
}

Running FORK=1 go run main.go fires up the mount point and accesses it from a different process. This works fine:

2020/11/13 22:46:58 mountpoint /tmp/fusetest131726489
2020/11/13 22:46:58 forking
2020/11/13 22:46:58 create 0
...
2020/11/13 22:46:59 create 9999

However running FORK=0 go run main.go hangs most of the time. In output below I send SIGABRT to the hanged main process:

2020/11/13 22:47:04 mountpoint /tmp/fusetest044385943
2020/11/13 22:47:04 create 0
...
2020/11/13 22:47:04 create 807
SIGABRT: abort
PC=0x46a4c1 m=0 sigcode=0

goroutine 0 [idle]:
runtime.futex(0x648e48, 0x80, 0x0, 0x0, 0x0, 0x0, 0x7fff00000000, 0x1, 0x7fffbc7adf38, 0x40c49f, ...)
        /home/tomyl/sdk/go/src/runtime/sys_linux_amd64.s:587 +0x21
runtime.futexsleep(0x648e48, 0x7fff00000000, 0xffffffffffffffff)
        /home/tomyl/sdk/go/src/runtime/os_linux.go:45 +0x46
runtime.notesleep(0x648e48)
        /home/tomyl/sdk/go/src/runtime/lock_futex.go:159 +0x9f
runtime.stopm()
        /home/tomyl/sdk/go/src/runtime/proc.go:1910 +0xc5
runtime.findrunnable(0xc00002f000, 0x0)
        /home/tomyl/sdk/go/src/runtime/proc.go:2471 +0xa7f
runtime.schedule()
        /home/tomyl/sdk/go/src/runtime/proc.go:2669 +0x2d7
runtime.park_m(0xc0000e0480)
        /home/tomyl/sdk/go/src/runtime/proc.go:2837 +0x9d
runtime.mcall(0x0)
        /home/tomyl/sdk/go/src/runtime/asm_amd64.s:318 +0x5b
goroutine 1 [runnable]:
syscall.SetNonblock(0x3, 0x557f01, 0x4, 0x1)
        /home/tomyl/sdk/go/src/syscall/exec_unix.go:106 +0xc8
os.newFile(0x3, 0xc000018720, 0x1d, 0x1, 0xc0000001b6)
        /home/tomyl/sdk/go/src/os/file_unix.go:165 +0x165
os.openFileNolog(0xc000018720, 0x1d, 0x241, 0x1b6, 0x0, 0x0, 0xa)
        /home/tomyl/sdk/go/src/os/file_unix.go:224 +0x1a5
os.OpenFile(0xc000018720, 0x1d, 0x241, 0x1b6, 0x1, 0x0, 0x0)
        /home/tomyl/sdk/go/src/os/file.go:327 +0x65
main.spam(0xc000018380, 0x16, 0xc00006a180, 0x2)
        /home/tomyl/git/toli/sandbox/fusetest/main.go:56 +0x159
main.run(0x557e00, 0x0, 0x0)
        /home/tomyl/git/toli/sandbox/fusetest/main.go:110 +0x4f8
main.main()
        /home/tomyl/git/toli/sandbox/fusetest/main.go:124 +0xb9

goroutine 1 [runnable]:
syscall.SetNonblock(0x3, 0x557f01, 0x4, 0x1)
        /home/tomyl/sdk/go/src/syscall/exec_unix.go:106 +0xc8
os.newFile(0x3, 0xc000018720, 0x1d, 0x1, 0xc0000001b6)
        /home/tomyl/sdk/go/src/os/file_unix.go:165 +0x165
os.openFileNolog(0xc000018720, 0x1d, 0x241, 0x1b6, 0x0, 0x0, 0xa)
        /home/tomyl/sdk/go/src/os/file_unix.go:224 +0x1a5
os.OpenFile(0xc000018720, 0x1d, 0x241, 0x1b6, 0x1, 0x0, 0x0)
        /home/tomyl/sdk/go/src/os/file.go:327 +0x65
main.spam(0xc000018380, 0x16, 0xc00006a180, 0x2)
        /home/tomyl/git/toli/sandbox/fusetest/main.go:56 +0x159
main.run(0x557e00, 0x0, 0x0)
        /home/tomyl/git/toli/sandbox/fusetest/main.go:110 +0x4f8
main.main()
        /home/tomyl/git/toli/sandbox/fusetest/main.go:124 +0xb9

goroutine 8 [runnable]:
syscall.Syscall(0x0, 0x6, 0xc0002a2000, 0x21000, 0x40, 0x21000, 0x0)
        /home/tomyl/sdk/go/src/syscall/asm_linux_amd64.s:18 +0x5
syscall.read(0x6, 0xc0002a2000, 0x21000, 0x21000, 0x40, 0x0, 0x0)
        /home/tomyl/sdk/go/src/syscall/zsyscall_linux_amd64.go:686 +0x5a
syscall.Read(...)
        /home/tomyl/sdk/go/src/syscall/syscall_unix.go:187
bazil.org/fuse.(*Conn).ReadRequest(0xc00006a180, 0x562618, 0xc000090000, 0x587600, 0xc00058a050)
        /home/tomyl/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:580 +0xcf
bazil.org/fuse/fs.(*Server).Serve(0xc000090000, 0x586020, 0x677e10, 0x0, 0x0)
        /home/tomyl/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:501 +0x373
bazil.org/fuse/fs.Serve(0xc00006a180, 0x586020, 0x677e10, 0x55ace5, 0x12)
        /home/tomyl/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:522 +0xdf
main.run.func1(0xc00006a180)
        /home/tomyl/git/toli/sandbox/fusetest/main.go:87 +0x45
created by main.run
        /home/tomyl/git/toli/sandbox/fusetest/main.go:86 +0x21f

rax    0xca
rbx    0x648d00
rcx    0x46a4c3
rdx    0x0
rdi    0x648e48
rsi    0x80
rbp    0x7fffbc7adf00
rsp    0x7fffbc7adeb8
r8     0x0
r9     0x0
r10    0x0
r11    0x286
r12    0x3
r13    0x6487a0
r14    0x57fc69
r15    0x30
rip    0x46a4c1
rflags 0x286
cs     0x33
fs     0x0
gs     0x0

Am I missing anything? Happy to help debugging this if I get some pointers. Thanks.

$ go version
go version go1.15.2 linux/amd64
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.5 LTS
Release:        18.04
Codename:       bionic
$ dpkg -l fuse
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                            Version                      Architecture                 Description
+++-===============================================-============================-============================-===================================================================================================
ii  fuse                                            2.9.7-1ubuntu1               amd64                        Filesystem in Userspace
$ uname -a
Linux helium 5.4.0-53-generic #59~18.04.1-Ubuntu SMP Wed Oct 21 12:14:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
@tv42
Copy link
Member

tv42 commented Nov 14, 2020

You're not missing anything; that just simply won't work, due to how Go runtime is doing things. If there's a way around the epoll deadlock, I'm not aware of it. Go assumes certain syscalls don't block, FUSE makes them block, and thus Go doesn't let another goroutine service the FUSE request to let the first one complete.

This has already been "fixed" as much as we can; by basically just not doing that. See the following commit:

commit 3f5ad90
Author: Tommi Virtanen tv@eagain.net
Date: 2019-12-12 12:54:57 -0700

Add framework for running test helpers in subprocesses

Separating the FUSE client and server into two Go runtimes is at this
time the only way to truly fix (well, workaround) the Go epoll syscall
deadlock in https://github.com/golang/go/issues/21014

We explicitly reject any changes that merely decrease the duration of
the race window, while also preventing FUSE poll support forever.

Add a framework that runs a command in a subprocess, most commonly
acting as the FUSE client. Run a HTTP server (over UNIX domain
sockets) to let tests inspect and control the subprocess.

Tests will be rewritten to use this framework in the following
commits. Users of this library are encouraged to rewrite any tests
that mount a filesystem and then access it from the same process, to
avoid flaky hanging tests.

Fixes https://github.com/bazil/fuse/issues/183

(Repeat for Github linking: #183 )

@tv42 tv42 closed this as completed Nov 14, 2020
@tv42 tv42 added the duplicate label Nov 14, 2020
@tomyl
Copy link
Author

tomyl commented Nov 16, 2020

Thanks for the explanation and the pointer to the framework!

evgmik added a commit to sahib/brig that referenced this issue Jan 8, 2021
Fixes #78
The fuse tests were sporadically failing often leaving <defunct>
processes which were loading process queue. I observed load of 30
for several our until reboot, though there we no negative effects.

After investigation, we found that this is due to `go` (v1.9 and up)
ways of handling poll requests. Here are the outcomes:

The routines which manage fuse layer
and OS dependent code (os.Open, and similar)
MUST BE SEPARATE OS PROCESSES.  Note: not different `go` routines but processes!
See bazil/fuse#264 (comment)
This separation happens automatically during normal brig operations, but
TESTING FUSE LAYER IN GO IS VERY TRICKY.
See brig relevant discussion at
#77 (comment)
However, this problem is general for any go program from version 1.9,
as can be seen in references to the issue.

bazil/fuse offers "bazil.org/fuse/fs/fstestutil/spawntest"
infrastructure which helps run tests in different communicating via socket processes.

This commit rewrites fuse test to use methods from `spawntest`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants