Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestMemNodeFsWrite: content mismatch: got "", want "hello" #479

Open
navytux opened this issue Jun 14, 2023 · 1 comment
Open

TestMemNodeFsWrite: content mismatch: got "", want "hello" #479

navytux opened this issue Jun 14, 2023 · 1 comment

Comments

@navytux
Copy link
Contributor

navytux commented Jun 14, 2023

Hello everyone.

I'm hitting TestMemNodeFsWrite failure in tests. Even though this test lives in deprecated nodefs, it might be due to 2021 kernel regression which might be affecting whole go-fuse and other FUSE libraries. Please read below for details.

The test fails reliably and quickly when run in a loop:

GOMAXPROCS=1 go test -run TestMemNodeFsWrite -count 1000 -failfast
(neo) (z-dev) (g.env) kirr@deca:~/src/neo/src/github.com/hanwen/go-fuse/fuse/nodefs$ GOMAXPROCS=1 go test  -run TestMemNodeFsWrite -count 1000 -failfast
16:33:29.722233 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:29.773813 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:29.799427 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:29.848899 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:29.952907 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:30.002003 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:30.027920 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
--- FAIL: TestMemNodeFsWrite (0.00s)
    memnode_test.go:67: content mismatch: got "", want "hello"
FAIL
exit status 1
FAIL    github.com/hanwen/go-fuse/fuse/nodefs   0.517s

(neo) (z-dev) (g.env) kirr@deca:~/src/neo/src/github.com/hanwen/go-fuse/fuse/nodefs$ GOMAXPROCS=1 go test  -run TestMemNodeFsWrite -count 1000 -failfast
16:33:31.362206 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:31.362356 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:31.436443 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:31.565719 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
--- FAIL: TestMemNodeFsWrite (0.00s)
    memnode_test.go:67: content mismatch: got "", want "hello"
FAIL
exit status 1
FAIL    github.com/hanwen/go-fuse/fuse/nodefs   0.523s

(neo) (z-dev) (g.env) kirr@deca:~/src/neo/src/github.com/hanwen/go-fuse/fuse/nodefs$ GOMAXPROCS=1 go test  -run TestMemNodeFsWrite -count 1000 -failfast
16:33:32.804827 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:32.804902 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:32.979033 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:33.059869 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:33.159474 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:33.288119 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:33.366934 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:33.468874 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:33.519913 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:33.654345 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:33.679886 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:33.743035 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:33.819909 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:33.845533 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:33.941402 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:33.996450 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:34.022149 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:34.072047 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:34.114008 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:34.140888 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:34.140927 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:34.217649 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:34.225709 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:34.367711 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
16:33:34.572854 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASE
--- FAIL: TestMemNodeFsWrite (0.00s)
    memnode_test.go:67: content mismatch: got "", want "hello"
FAIL
exit status 1
FAIL    github.com/hanwen/go-fuse/fuse/nodefs   1.967s

Note: for this test to fail it experimentally turned out to be important for other fuse filesystems to be unmounted and /sys/fs/fuse/connections to be empty.

Here is debug log of a failure:

=== RUN   TestMemNodeFsWrite
14:29:30.963701 callFusermount: executing ["/usr/bin/fusermount3" "/tmp/go-fuse-memnode_test230041599/mnt" "-o" "subtype=nodefs.memNode,max_read=131072"]
14:29:30.966811 rx 2: INIT n0 {7.37 Ra 131072 ASYNC_READ,POSIX_LOCKS,ATOMIC_O_TRUNC,EXPORT_SUPPORT,BIG_WRITES,DONT_MASK,SPLICE_WRITE,SPLICE_MOVE,SPLICE_READ,FLOCK_LOCKS,IOCTL_DIR,AUTO_INVAL_DATA,READDIRPLUS,READDIRPLUS_AUTO,ASYNC_DIO,WRITEBACK_CACHE,NO_OPEN_SUPPORT,PARALLEL_DIROPS,HANDLE_KILLPRIV,POSIX_ACL,ABORT_ERROR,MAX_PAGES,CACHE_SYMLINKS,NO_OPENDIR_SUPPORT,EXPLICIT_INVAL_DATA,0x70000000} "\x03\x00\x00\x00\x00\x00\x00\x00"... 48b
14:29:30.966881 tx 2:     OK, {7.28 Ra 131072 ASYNC_READ,BIG_WRITES,AUTO_INVAL_DATA,READDIRPLUS,NO_OPEN_SUPPORT,PARALLEL_DIROPS,MAX_PAGES 0/0 Wr 131072 Tg 0 MaxPages 32}
14:29:30.967154 rx 4: ACCESS n1 {u=1000 g=1000 r}
14:29:30.967183 tx 4:     38=function not implemented
14:29:30.967210 rx 6: LOOKUP n1 [".go-fuse-epoll-hack"] 20b
14:29:30.967229 tx 6:     OK, {n18446744073709551615 g0 tE=0s tA=0s {M0100644 SZ=0 L=1 0:0 B0*0 i0:18446744073709551615 A 0.000000 M 0.000000 C 0.000000}}
14:29:30.967275 rx 8: LOOKUP n1 [".Trash"] 7b
14:29:30.967293 tx 8:     2=no such file or directory, {n0 g0 tE=0s tA=0s {M00 SZ=0 L=0 0:0 B0*0 i0:0 A 0.000000 M 0.000000 C 0.000000}}
14:29:30.967318 rx 10: OPEN n18446744073709551615 {O_RDONLY,0x8000}
14:29:30.967335 tx 10:     OK, {Fh 18446744073709551615 }
14:29:30.967354 rx 12: LOOKUP n1 [".Trash-1000"] 12b
14:29:30.967371 tx 12:     2=no such file or directory, {n0 g0 tE=0s tA=0s {M00 SZ=0 L=0 0:0 B0*0 i0:0 A 0.000000 M 0.000000 C 0.000000}}
14:29:30.967402 rx 14: POLL n18446744073709551615
14:29:30.967447 tx 14:     38=function not implemented
14:29:30.967468 rx 16: FLUSH n18446744073709551615 {Fh 18446744073709551615}
14:29:30.967473 tx 16:     OK
14:29:30.967517 rx 18: RELEASE n18446744073709551615 {Fh 18446744073709551615 0x8000  L0}
14:29:30.967527 tx 18:     OK
14:29:30.968109 rx 20: LOOKUP n1 ["test"] 5b
14:29:30.968191 tx 20:     2=no such file or directory, {n0 g0 tE=0s tA=0s {M00 SZ=0 L=0 0:0 B0*0 i0:0 A 0.000000 M 0.000000 C 0.000000}}
14:29:30.968341 rx 22: CREATE n1 {0100644 [WRONLY,CREAT,TRUNC,0x8000] (022)} ["test"] 5b
14:29:30.968539 tx 22:     OK, {n3 g2 {M0100644 SZ=0 L=1 0:0 B0*0 i0:3 A 1686742170.968376 M 1686742170.968376 C 1686742170.968376} &{2 0 0}}
14:29:30.968690 rx 24: GETXATTR n3 {sz 0} ["security.capability"] 20b
14:29:30.968986 tx 24:     61=no data available
14:29:30.969040 rx 26: WRITE n3 {Fh 2 [0 +5)  L 0 WRONLY,NONBLOCK,0x8000} "hello" 5b
14:29:30.969060 tx 26:     OK
14:29:30.969120 rx 30: GETATTR n3 {Fh 0 }
14:29:30.969162 tx 30:     OK, {tA=0.1s {M0100644 SZ=0 L=1 0:0 B0*0 i0:3 A 1686742170.968376 M 1686742170.968376 C 1686742170.968376}}
14:29:30.969142 rx 28: FLUSH n3 {Fh 2}
14:29:30.969362 tx 28:     OK
14:29:30.969452 rx 34: OPEN n3 {O_RDONLY,0x8000}
14:29:30.969510 tx 34:     OK, {Fh 3 }
14:29:30.969585 rx 36: FLUSH n3 {Fh 3}
14:29:30.969605 tx 36:     OK
    memnode_test.go:67: content mismatch: got "", want "hello"
14:29:30.969818 unmount: executing ["/usr/bin/fusermount3" "-u" "/tmp/go-fuse-memnode_test230041599/mnt"]
14:29:30.970508 rx 32: RELEASE n3 {Fh 2 WRONLY,NONBLOCK,0x8000  L0}
14:29:30.970527 tx 32:     OK
14:29:30.970738 rx 38: RELEASE n3 {Fh 3 NONBLOCK,0x8000  L0}
14:29:30.970749 tx 38:     OK
14:29:30.972261 received ENODEV (unmount request), thread exiting
14:29:30.972323 received ENODEV (unmount request), thread exiting
14:29:30.972336 received ENODEV (unmount request), thread exiting
--- FAIL: TestMemNodeFsWrite (0.01s)

This test(*) first writes hello into /test file and then wants to read it back. The file is ultimately served by loopback filesystem, which delegates getattr, write and read to underlying os fs.

The write happens in rx 26 WRITE n3 {Fh 2 [0 +5) L 0 WRONLY,NONBLOCK,0x8000} "hello" 5b. It goes ok. Then the kernel issues GETATTR request. This request - in this particular run - is probably issued because attributes happen to become timed out on the kernel side (attributes TTL in this test is 100ms). The getattr is served by loopbackfs by statting the underlying os file which still stats as with st_size=0. This seems to be correct behaviour since FLUSH was not yet called. Then the FLUSH is called - when file handle opened for writing is closed. Then the file is opened again for reading. This happens on rx 34: OPEN n3 {O_RDONLY,0x8000}. The kernel does not send new GETATTR request to refresh its view of file size, and since previous getattr returned st_size=0 the kernel serves all reads from its pagecache directly without querying the filesystem with any READ request. Which leads to failure since the file reads as empty instead of hello.

There is no documentation for FUSE protocol anywhere (semantic of it, not messages serialization), but the bug seems to be due to that the kernel does not invoke GETATTR after FLUSH.

This behaviour was introduced in 2021 in https://git.kernel.org/linus/fa5eee57e33e (Linux 5.16) which changed fuse_flush to invalidate only st_blocks, instead of all attributes, in particularly stopping to invalidate st_size.

Kirill

/cc @szmi, @Nikratio, @amir73il

(*)

func TestMemNodeFsWrite(t *testing.T) {
wd, _, clean := setupMemNodeTest(t)
defer clean()
want := "hello"
err := ioutil.WriteFile(wd+"/test", []byte(want), 0644)
if err != nil {
t.Fatalf("WriteFile failed: %v", err)
}
content, err := ioutil.ReadFile(wd + "/test")
if string(content) != want {
t.Fatalf("content mismatch: got %q, want %q", content, want)
}
}

go-fuse: v2.3.0-11-g255ab74
os: Linux deca 6.1.0-9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.27-1 (2023-05-08) x86_64 GNU/Linux
go: go version go1.20.5 linux/amd64

@navytux
Copy link
Contributor Author

navytux commented Jun 14, 2023

/cc @hanwen, @rfjakob

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant