Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weed mount memory leak #805

Closed
Hacky-DH opened this Issue Dec 28, 2018 · 25 comments

Comments

Projects
None yet
2 participants
@Hacky-DH
Copy link

Hacky-DH commented Dec 28, 2018

Sponsors SeaweedFS via Patreon https://www.patreon.com/seaweedfs

Describe the bug
I install weed at one host with filer, and mount to local dir ./mnt.
Then I run a benchmark (you can find the detail of this tool here )
perftest -n 5m -c 32
This means writing 5,000,000 number of 1 KiB files with 32 concurrency.
BUT after writing 4681 files the weed mount process is killed by kernel

kernel: Out of memory: Kill process 24472 (weed) score 856 or sacrifice child
kernel: Killed process 24472 (weed) total-vm:58118120kB, anon-rss:57921524kB, file-rss:0kB

weed mount process occupied about 55.4GiB RAM
I suspect that the process memory leaked.
Just 4681 1KiB files can't occupy 55.4GiB RAM
What do you think?

System Setup

weed version 1.14 linux amd64
weed server -dir data/ -filer
weed mount -dir mnt

Expected behavior
Run the benchmark correctly

@chrislusf

This comment has been minimized.

Copy link
Owner

chrislusf commented Dec 28, 2018

Thanks a lot! Your tool is very helpful!

Checked in a fix 308ac1d for this

Running with 5 minute tests now. Here is my 1 minute results.

chris$ perftest -n 5m -c 32 -period 1m
2018/12/28 03:24:16 client1 period 1m0s 8 concurrent 5000000 files in 8 dirs with 625000 segs
2018/12/28 03:24:16           stage        max_tps        min_tps        avg_tps     period
2018/12/28 03:25:16   create_write       2163.550       2163.550       2163.514 1m0.004707678s
2018/12/28 03:26:16      open_read      21502.033      21502.033      21472.398 1m0.083182482s
2018/12/28 03:27:16           open      20770.717      20770.717      20760.080 1m0.03112627s
2018/12/28 03:28:16          utime      20462.733      20462.733      20462.670 1m0.00057793s
2018/12/28 03:29:16         rename       9389.650       9389.650       9386.792 1m0.019118716s
@Hacky-DH

This comment has been minimized.

Copy link
Author

Hacky-DH commented Dec 28, 2018

Thanks I will check it

@chrislusf

This comment has been minimized.

Copy link
Owner

chrislusf commented Dec 28, 2018

on SeaweedFS mount folder

chris$ perftest -n 5m -c 32
2018/12/28 14:41:15 client1 period 5m0s 8 concurrent 5000000 files in 8 dirs with 625000 segs
2018/12/28 14:41:15           stage        max_tps        min_tps        avg_tps     period
2018/12/28 14:46:15   create_write       2193.983       2003.983       2094.375 5m0.160148422s
2018/12/28 14:51:15      open_read      14414.033      14234.267      14309.764 5m0.143948551s
2018/12/28 14:56:15           open      15063.650      14418.083      14689.647 5m0.028519139s
2018/12/28 15:01:15          utime      17090.367      16589.150      16786.934 5m0.084570642s
2018/12/28 15:06:15         rename       7509.917       6730.700       7069.078 5m0.050728304s

on local disk

chris$ perftest -n 5m -c 32
2018/12/28 03:29:47 client1 period 5m0s 8 concurrent 5000000 files in 8 dirs with 625000 segs
2018/12/28 03:29:47           stage        max_tps        min_tps        avg_tps     period
2018/12/28 03:34:48   create_write       9676.333       5397.483       6504.089 5m0.77138632s
2018/12/28 03:39:48      open_read      21269.967      14763.433      16839.301 5m0.105511465s
2018/12/28 03:44:48           open      39672.517      20090.217      35417.248 5m0.062052439s
2018/12/28 03:49:48          utime      17504.117      15478.383      16444.789 5m0.092323675s
2018/12/28 03:54:49         rename       5876.150       4926.217       5328.695 5m0.287394569s
@chrislusf

This comment has been minimized.

Copy link
Owner

chrislusf commented Dec 28, 2018

release 1.16 has the fix.

@Hacky-DH

This comment has been minimized.

Copy link
Author

Hacky-DH commented Dec 29, 2018

I can't build from source due to some depends could't download.
Thanks the release, I will run benchmark as soon

@Hacky-DH

This comment has been minimized.

Copy link
Author

Hacky-DH commented Dec 29, 2018

Very sad message!
The weed mount process was still killed by kernel!
The message from kernel

Out of memory: Kill process 25468 (weed) score 862 or sacrifice child
kernel: Killed process 25468 (weed) total-vm:105301776kB, anon-rss:58274536kB, file-rss:0kB

This time weed mount process occupied about 100.4GiB RAM! Memory leak still exists.

The perftest tool only run 2 hours 16 minutes and then halt after open stage

perftest -n 5m -c 100 -id 1 -interval 1m -period 1h
2018/12/29 16:09:32 client1 period 1h0m0s 32 concurrent 5000000 files in 32 dirs with 156250 segs
2018/12/29 16:09:32           stage        max_tps        min_tps        avg_tps     period
2018/12/29 16:26:09   create_write       8408.033       4509.767       5016.486 16m36.707233177s
2018/12/29 17:26:09      open_read       5416.467       2111.367       4637.814 1h0m0.00555132s
2018/12/29 18:26:09           open     617178.133          0.167      90265.168 1h0m0.008047451s
2018/12/29 18:26:09 Error: mkdir client1: transport endpoint is not connected
@chrislusf

This comment has been minimized.

Copy link
Owner

chrislusf commented Dec 29, 2018

changed to updated FUSE implementation.

perftest -n 5m -c 32
2018/12/29 11:26:05 client1 period 5m0s 8 concurrent 5000000 files in 8 dirs with 625000 segs
2018/12/29 11:26:05           stage        max_tps        min_tps        avg_tps     period
2018/12/29 11:31:05   create_write       2336.100       2206.950       2248.892 5m0.074847796s
2018/12/29 11:36:05      open_read      16721.833      14791.250      16039.434 5m0.008349003s
2018/12/29 11:41:05           open      16122.583      14802.217      15455.518 5m0.006387436s
2018/12/29 11:46:06          utime      17246.417      15996.900      16420.475 5m0.076041191s
2018/12/29 11:51:06         rename       7740.467       6741.217       7311.280 5m0.152905183s

@chrislusf

This comment has been minimized.

Copy link
Owner

chrislusf commented Dec 30, 2018

Results for the 1hour test, with release 1.17

perftest -n 5m -c 100 -id 1 -interval 1m -period 1h
2018/12/29 12:00:22 client1 period 1h0m0s 8 concurrent 5000000 files in 8 dirs with 625000 segs
2018/12/29 12:00:22           stage        max_tps        min_tps        avg_tps     period
2018/12/29 12:37:16   create_write       2372.417       2132.433       2257.643 36m54.69569374s
2018/12/29 13:37:16      open_read       4962.933       4269.483       4784.662 1h0m0.113587625s
2018/12/29 14:37:17           open       5299.433       4425.450       5016.863 1h0m0.173118983s
2018/12/29 15:37:17          utime       6697.067       6262.233       6530.296 1h0m0.101864747s
2018/12/29 16:37:17         rename       7136.983       1835.867       3782.821 1h0m0.083824065s

@chrislusf

This comment has been minimized.

Copy link
Owner

chrislusf commented Dec 30, 2018

another round

/Users/chris/Uber/fs-benchmark/meta-benchmark/src/perftest/perftest -n 5m -c 100 -id 1 -interval 1m -period 1h
2018/12/29 18:28:19 client1 period 1h0m0s 8 concurrent 5000000 files in 8 dirs with 625000 segs
2018/12/29 18:28:19           stage        max_tps        min_tps        avg_tps     period
2018/12/29 19:06:40   create_write       2370.000       1887.000       2173.328 38m20.61525234s
2018/12/29 20:06:41      open_read       4906.033       4572.050       4727.563 1h0m1.050923532s
2018/12/29 21:06:41           open       4842.083       4698.700       4770.991 1h0m0.005625916s
2018/12/29 22:06:41          utime       6534.717       6161.450       6396.131 1h0m0.222595532s
2018/12/29 22:39:53         rename       2600.100       2423.467       2510.283 33m11.804032549s
2018/12/29 23:00:35         unlink       4174.567       3712.517       4025.431 20m42.100988664s

@Hacky-DH

This comment has been minimized.

Copy link
Author

Hacky-DH commented Dec 30, 2018

It seems OK, do you notice the memory of weed mount process?
what you done to fix this?

@chrislusf

This comment has been minimized.

Copy link
Owner

chrislusf commented Dec 30, 2018

memory is stable. peaks at about 1~2GB. Right now running at "-period=2h" now in "rename" stage with 690MB memory.

The release fixed the bugs from the underlying FUSE library.

Let me know your results so that I can close this.

@chrislusf

This comment has been minimized.

Copy link
Owner

chrislusf commented Dec 30, 2018

perftest -n 50m -c 100 -id 1 -interval 1m -period 2h
2018/12/30 01:58:37 client1 period 2h0m0s 8 concurrent 50000000 files in 8 dirs with 6250000 segs
2018/12/30 01:58:37           stage        max_tps        min_tps        avg_tps     period
2018/12/30 03:58:37   create_write       2796.683       2353.450       2697.094 2h0m0.185081551s
2018/12/30 05:58:37      open_read       9239.500       8269.050       9001.612 2h0m0.228869768s
2018/12/30 07:58:37           open      15049.900      13886.200      14625.826 2h0m0.029321498s
2018/12/30 09:58:38          utime      10950.567      10209.833      10535.036 2h0m0.042987616s
2018/12/30 11:58:38         rename       1888.683       1635.900       1809.096 2h0m0.00967121s
2018/12/30 13:58:38         unlink       5860.183       1917.883       4189.748 2h0m0.32449328s
@Hacky-DH

This comment has been minimized.

Copy link
Author

Hacky-DH commented Dec 31, 2018

Thanks for reply. I will show my results on linux after a day
because of New year's holiday

@Hacky-DH

This comment has been minimized.

Copy link
Author

Hacky-DH commented Jan 2, 2019

Happy New Year Chris!
I rerun perftest benchmark using weed 1.18 linux amd64
BUT it failed

cd ./test
perftest -n 100m -c 100 -id 1 -interval 1m -period 2h
2019/01/02 14:14:17 client1 period 2h0m0s 32 concurrent 100000000 files in 32 dirs with 3125000 segs
2019/01/02 14:14:17           stage        max_tps        min_tps        avg_tps     period
2019/01/02 15:42:19 Error: close client1/dir0000000030/file0000735665: input/output error

./test is weed mount point, and occurs same error

ls test/
ls: reading directory test/: Input/output error

Here is the part of error log

I0102 15:42:19 19978 dirty_page.go:178] assign volume failure count:1 : rpc error: code = Unavailable desc = transport is closing
I0102 15:42:19 19978 dirty_page.go:178] assign volume failure count:1 : rpc error: code = Unavailable desc = transport is closing
I0102 15:42:19 19978 dirty_page.go:178] assign volume failure count:1 : rpc error: code = Unavailable desc = transport is closing
I0102 15:42:19 19978 dirty_page.go:178] assign volume failure count:1 : rpc error: code = Unavailable desc = transport is closing
E0102 15:42:19 19978 filehandle.go:191] flush /test/client1/dir0000000031/file0000740066: filerGrpcAddress assign volume: rpc error: code = Unavailable desc = transport is closing
E0102 15:42:19 19978 filehandle.go:191] flush /test/client1/dir0000000024/file0000735753: filerGrpcAddress assign volume: rpc error: code = Unavailable desc = transport is closing
I0102 15:42:19 19978 dirty_page.go:178] assign volume failure count:1 : rpc error: code = Unavailable desc = transport is closing
E0102 15:42:19 19978 filehandle.go:191] flush /test/client1/dir0000000007/file0000736560: filerGrpcAddress assign volume: rpc error: code = Unavailable desc = transport is closing
E0102 15:42:19 19978 filehandle.go:191] flush /test/client1/dir0000000027/file0000736225: filerGrpcAddress assign volume: rpc error: code = Unavailable desc = transport is closing
I0102 15:42:19 19978 dirty_page.go:178] assign volume failure count:1 : rpc error: code = Unavailable desc = transport is closing
E0102 15:42:19 19978 filehandle.go:191] flush /test/client1/dir0000000012/file0000736035: filerGrpcAddress assign volume: rpc error: code = Unavailable desc = transport is closing
E0102 15:42:19 19978 filehandle.go:191] flush /test/client1/dir0000000025/file0000736201: filerGrpcAddress assign volume: rpc error: code = Unavailable desc = transport is closing
I0102 15:42:19 19978 dir.go:72] read dir /test attr directory:"/" name:"test" : rpc error: code = Unavailable desc = transport is closing
I0102 15:42:19 19978 dirty_page.go:178] assign volume failure count:1 : rpc error: code = Unavailable desc = transport is closing
E0102 15:42:19 19978 filehandle.go:191] flush /test/client1/dir0000000014/file0000735920: filerGrpcAddress assign volume: rpc error: code = Unavailable desc = transport is closing
I0102 15:42:19 19978 dirty_page.go:178] assign volume failure count:1 : rpc error: code = Unavailable desc = transport is closing
E0102 15:42:19 19978 filehandle.go:191] flush /test/client1/dir0000000008/file0000735626: filerGrpcAddress assign volume: rpc error: code = Unavailable desc = transport is closing
I0102 15:42:19 19978 wfs.go:144] reading filer stats ttl:"0s" : rpc error: code = Unavailable desc = transport is closing
I0102 15:42:19 19978 wfs.go:157] filer Statistics: rpc error: code = Unavailable desc = transport is closing
I0102 15:42:19 19978 dir.go:72] read dir /test attr directory:"/" name:"test" : rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: <nil>
I0102 15:42:40 19978 wfs.go:144] reading filer stats ttl:"0s" : rpc error: code = Unavailable desc = transport is closing
I0102 15:42:40 19978 wfs.go:157] filer Statistics: rpc error: code = Unavailable desc = transport is closing
I0102 15:43:01 19978 dir.go:72] read dir /test attr directory:"/" name:"test" : rpc error: code = Unavailable desc = transport is closing
I0102 15:43:01 19978 dir.go:72] read dir /test attr directory:"/" name:"test" : rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: <nil>
I0102 15:43:47 19978 wfs.go:144] reading filer stats ttl:"0s" : rpc error: code = Unavailable desc = transport is closing
I0102 15:43:47 19978 wfs.go:157] filer Statistics: rpc error: code = Unavailable desc = transport is closing
I0102 15:43:47 19978 dir.go:72] read dir /test attr directory:"/" name:"test" : rpc error: code = Unavailable desc = transport is closing
I0102 15:43:47 19978 dir.go:72] read dir /test attr directory:"/" name:"test" : rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: <nil>
I0102 15:44:14 19978 wfs.go:144] reading filer stats ttl:"0s" : rpc error: code = Unavailable desc = transport is closing
I0102 15:44:14 19978 wfs.go:157] filer Statistics: rpc error: code = Unavailable desc = transport is closing
I0102 15:44:14 19978 wfs.go:144] reading filer stats ttl:"0s" : rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: <nil>
I0102 15:44:14 19978 wfs.go:157] filer Statistics: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: <nil>
I0102 15:44:14 19978 wfs.go:144] reading filer stats ttl:"0s" : rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: <nil>
I0102 15:44:14 19978 wfs.go:157] filer Statistics: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: <nil>
I0102 15:44:17 19978 dir.go:72] read dir /test attr directory:"/" name:"test" : rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: <nil>
I0102 15:44:17 19978 dir.go:72] read dir /test attr directory:"/" name:"test" : rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: <nil>
I0102 15:44:44 19978 wfs.go:144] reading filer stats ttl:"0s" : rpc error: code = Unavailable desc = transport is closing
I0102 15:44:44 19978 wfs.go:157] filer Statistics: rpc error: code = Unavailable desc = transport is closing
I0102 15:44:47 19978 dir.go:72] read dir /test attr directory:"/" name:"test" : rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: <nil>
I0102 15:44:47 19978 dir.go:72] read dir /test attr directory:"/" name:"test" : rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: <nil>
I0102 15:45:20 19978 dir.go:72] read dir /test attr directory:"/" name:"test" : rpc error: code = Unavailable desc = transport is closing
I0102 15:45:20 19978 dir.go:72] read dir /test attr directory:"/" name:"test" : rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: <nil>
I0102 15:45:23 19978 wfs.go:144] reading filer stats ttl:"0s" : rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: <nil>
I0102 15:45:23 19978 wfs.go:157] filer Statistics: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: <nil>
I0102 15:46:31 19978 wfs.go:144] reading filer stats ttl:"0s" : rpc error: code = Unavailable desc = transport is closing
I0102 15:46:31 19978 wfs.go:157] filer Statistics: rpc error: code = Unavailable desc = transport is closing
I0102 15:46:31 19978 wfs.go:144] reading filer stats ttl:"0s" : rpc error: code = Unavailable desc = transport is closing
I0102 15:46:31 19978 wfs.go:157] filer Statistics: rpc error: code = Unavailable desc = transport is closing
I0102 15:46:31 19978 dir.go:72] read dir /test attr directory:"/" name:"test" : rpc error: code = Unavailable desc = transport is closing
I0102 15:46:31 19978 dir.go:72] read dir /test attr directory:"/" name:"test" : rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: <nil>

I try to restart weed mount process and remount, but the IO error of ls can't recover.
After restarting all weed processes, It's OK
Could you help me fix this?

@chrislusf

This comment has been minimized.

Copy link
Owner

chrislusf commented Jan 2, 2019

can not tell now. seems related to grpc itself.

@Hacky-DH

This comment has been minimized.

Copy link
Author

Hacky-DH commented Jan 3, 2019

Hi Chris
I rerun the perftest

perftest -n 10m -c 100 -id 1 -interval 1m -period 2h
2019/01/02 18:42:19 client1 period 2h0m0s 32 concurrent 10000000 files in 32 dirs with 312500 segs
2019/01/02 18:42:19           stage        max_tps        min_tps        avg_tps     period
2019/01/02 19:46:59   create_write       8866.000       2107.717       2577.600 1h4m39.566187415s
2019/01/02 21:46:59      open_read       3112.067        543.033       2280.534 2h0m0.008349476s
2019/01/02 23:46:59           open       2608.900          0.000       2206.763 2h0m0.007859083s
2019/01/03 01:46:59          utime    1380794.900          0.000     787478.056 2h0m0.000150942s
2019/01/03 01:46:59 Error: mkdir client1: transport endpoint is not connected

The weed mount process is down. But I can't find any messages about weed in syslog
Here is the part of weed log, and log files is so big, 9.4GiB,

I0103 00:07:22 18197 topology_vacuum.go:151] check vacuum on collection: volume:6
fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x149c84d, 0x16)
        /home/travis/.gimme/versions/go/src/runtime/panic.go:617 +0x72
runtime.sysMap(0xceb4000000, 0x4000000, 0x2262db8)
        /home/travis/.gimme/versions/go/src/runtime/mem_linux.go:165 +0xc7
runtime.(*mheap).sysAlloc(0x2249b80, 0x8000, 0x2249b90, 0x4)
        /home/travis/.gimme/versions/go/src/runtime/malloc.go:633 +0x1cd
runtime.(*mheap).grow(0x2249b80, 0x4, 0x0)
        /home/travis/.gimme/versions/go/src/runtime/mheap.go:1105 +0x42
runtime.(*mheap).allocSpanLocked(0x2249b80, 0x4, 0x2262de0, 0x0)
        /home/travis/.gimme/versions/go/src/runtime/mheap.go:1033 +0x37f
runtime.(*mheap).allocManual(0x2249b80, 0x4, 0x2262de0, 0x2021f6120)
        /home/travis/.gimme/versions/go/src/runtime/mheap.go:960 +0x53
runtime.stackpoolalloc(0x2260601, 0x0)
        /home/travis/.gimme/versions/go/src/runtime/stack.go:179 +0x76
runtime.stackcacherefill(0x7feb95c48610, 0x185f901)
        /home/travis/.gimme/versions/go/src/runtime/stack.go:262 +0x44
runtime.stackalloc(0xcc00001000, 0xccba7fd000, 0x0)
        /home/travis/.gimme/versions/go/src/runtime/stack.go:364 +0x143
runtime.copystack(0xc0ed965c80, 0x1000, 0xc000000001)
        /home/travis/.gimme/versions/go/src/runtime/stack.go:830 +0x7b
runtime.newstack()
        /home/travis/.gimme/versions/go/src/runtime/stack.go:1050 +0x2fd
runtime.morestack()
        /home/travis/.gimme/versions/go/src/runtime/asm_amd64.s:429 +0x8f

goroutine 2620056809 [copystack]:
github.com/seaweedfs/fuse/fs.(*Server).dropNode(0xc0004320c0, 0x14de05d, 0x1, 0x0)
        /home/travis/gopath/src/github.com/seaweedfs/fuse/fs/serve.go:526 +0x344 fp=0xccba7fa308 sp=0xccba7fa300 pc=0x10d2124
github.com/seaweedfs/fuse/fs.(*Server).handleRequest(0xc0004320c0, 0x173a1a0, 0xc1c5168f40, 0x1714960, 0xc224e586e0, 0xc196d22100, 0x1730ba0, 0xca1b115500, 0xccba7fa6f8, 0x0, ...)
        /home/travis/gopath/src/github.com/seaweedfs/fuse/fs/serve.go:1186 +0x1e8c fp=0xccba7fa610 sp=0xccba7fa308 pc=0x10d5f3c
github.com/seaweedfs/fuse/fs.(*Server).serve(0xc0004320c0, 0x1730ba0, 0xca1b115500)
        /home/travis/gopath/src/github.com/seaweedfs/fuse/fs/serve.go:882 +0x2b8 fp=0xccba7fa7a0 sp=0xccba7fa610 pc=0x10d3b08
github.com/seaweedfs/fuse/fs.(*Server).Serve.func1(0xc0004320c0, 0x1730ba0, 0xca1b115500)
        /home/travis/gopath/src/github.com/seaweedfs/fuse/fs/serve.go:424 +0x6e fp=0xccba7fa7c8 sp=0xccba7fa7a0 pc=0x10d859e
runtime.goexit()
        /home/travis/.gimme/versions/go/src/runtime/asm_amd64.s:1337 +0x1 fp=0xccba7fa7d0 sp=0xccba7fa7c8 pc=0x45a691
created by github.com/seaweedfs/fuse/fs.(*Server).Serve
        /home/travis/gopath/src/github.com/seaweedfs/fuse/fs/serve.go:422 +0x399

goroutine 1 [running]:
        goroutine running on other thread; stack unavailable

goroutine 18 [chan receive]:
github.com/chrislusf/seaweedfs/weed/glog.(*loggingT).flushDaemon(0x2242cc0)
        /home/travis/gopath/src/github.com/chrislusf/seaweedfs/weed/glog/glog.go:883 +0x8b
created by github.com/chrislusf/seaweedfs/weed/glog.init.0
        /home/travis/gopath/src/github.com/chrislusf/seaweedfs/weed/glog/glog.go:410 +0x272

goroutine 19 [syscall, 341 minutes]:
os/signal.signal_recv(0x0)
        /home/travis/.gimme/versions/go/src/runtime/sigqueue.go:139 +0x9c
os/signal.loop()
        /home/travis/.gimme/versions/go/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
        /home/travis/.gimme/versions/go/src/os/signal/signal_unix.go:29 +0x41

goroutine 5 [chan receive, 341 minutes]:
github.com/chrislusf/seaweedfs/weed/util.init.1.func1()
        /home/travis/gopath/src/github.com/chrislusf/seaweedfs/weed/util/signal_handling.go:29 +0x88
created by github.com/chrislusf/seaweedfs/weed/util.init.1
        /home/travis/gopath/src/github.com/chrislusf/seaweedfs/weed/util/signal_handling.go:28 +0x17b

goroutine 6 [select]:
go.opencensus.io/stats/view.(*worker).start(0xc00003b080)
        /home/travis/gopath/src/go.opencensus.io/stats/view/worker.go:147 +0xdd
created by go.opencensus.io/stats/view.init.0
        /home/travis/gopath/src/go.opencensus.io/stats/view/worker.go:29 +0x57

goroutine 33 [select, 341 minutes]:
github.com/chrislusf/seaweedfs/weed/stats.(*ServerStats).Start(0xc0003c7080)
        /home/travis/gopath/src/github.com/chrislusf/seaweedfs/weed/stats/stats.go:92 +0x273
created by github.com/chrislusf/seaweedfs/weed/server.init.0
        /home/travis/gopath/src/github.com/chrislusf/seaweedfs/weed/server/common.go:32 +0x57

goroutine 2626855912 [semacquire]:
sync.runtime_SemacquireMutex(0xc0004320f4, 0x42c200)
        /home/travis/.gimme/versions/go/src/runtime/sema.go:71 +0x3d
sync.(*Mutex).Lock(0xc0004320f0)
        /home/travis/.gimme/versions/go/src/sync/mutex.go:134 +0xff
github.com/seaweedfs/fuse/fs.(*Server).serve(0xc0004320c0, 0x1730ba0, 0xc2abd36b60)
        /home/travis/gopath/src/github.com/seaweedfs/fuse/fs/serve.go:786 +0xdb
github.com/seaweedfs/fuse/fs.(*Server).Serve.func1(0xc0004320c0, 0x1730ba0, 0xc2abd36b60)
        /home/travis/gopath/src/github.com/seaweedfs/fuse/fs/serve.go:424 +0x6e
created by github.com/seaweedfs/fuse/fs.(*Server).Serve
        /home/travis/gopath/src/github.com/seaweedfs/fuse/fs/serve.go:422 +0x399

goroutine 2624018129 [semacquire]:
sync.runtime_SemacquireMutex(0xc0004320f4, 0x42c200)
        /home/travis/.gimme/versions/go/src/runtime/sema.go:71 +0x3d
sync.(*Mutex).Lock(0xc0004320f0)
        /home/travis/.gimme/versions/go/src/sync/mutex.go:134 +0xff
github.com/seaweedfs/fuse/fs.(*Server).serve(0xc0004320c0, 0x1730ba0, 0xc0d71f4ee0)
        /home/travis/gopath/src/github.com/seaweedfs/fuse/fs/serve.go:786 +0xdb
github.com/seaweedfs/fuse/fs.(*Server).Serve.func1(0xc0004320c0, 0x1730ba0, 0xc0d71f4ee0)
        /home/travis/gopath/src/github.com/seaweedfs/fuse/fs/serve.go:424 +0x6e
created by github.com/seaweedfs/fuse/fs.(*Server).Serve
        /home/travis/gopath/src/github.com/seaweedfs/fuse/fs/serve.go:422 +0x399

goroutine 68 [select, 341 minutes]:
google.golang.org/grpc.(*ccBalancerWrapper).watcher(0xc0003f0480)
        /home/travis/gopath/src/google.golang.org/grpc/balancer_conn_wrappers.go:122 +0x110
created by google.golang.org/grpc.newCCBalancerWrapper
        /home/travis/gopath/src/google.golang.org/grpc/balancer_conn_wrappers.go:113 +0x14f
......
@chrislusf

This comment has been minimized.

Copy link
Owner

chrislusf commented Jan 3, 2019

@Hacky-DH

This comment has been minimized.

Copy link
Author

Hacky-DH commented Jan 3, 2019

@chrislusf Here is one hour test

weed version 1.18 linux amd64
perftest -n 10m -c 100 -id 1 -interval 1m -period 1h
2019/01/03 14:55:51 client1 period 1h0m0s 32 concurrent 10000000 files in 32 dirs with 312500 segs
2019/01/03 14:55:51           stage        max_tps        min_tps        avg_tps     period
2019/01/03 15:55:51   create_write       8638.833       1163.067       2593.634 1h0m0.007404494s
2019/01/03 16:55:51      open_read       2880.900          0.000       2336.249 1h0m0.007468724s
2019/01/03 17:55:51           open       2903.133          0.017       2292.545 1h0m0.007554009s
2019/01/03 18:55:51          utime    1350836.700          0.000     412250.407 1h0m0.000161413s
2019/01/03 18:55:51 Error: mkdir client1: transport endpoint is not connected

At I0103 18:09:34 in weed log

I0103 18:09:34 12937 topology_vacuum.go:151] check vacuum on collection: volume:4
fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x149c84d, 0x16)
        /home/travis/.gimme/versions/go/src/runtime/panic.go:617 +0x72
runtime.sysMap(0xce6c000000, 0x4000000, 0x2262db8)
        /home/travis/.gimme/versions/go/src/runtime/mem_linux.go:165 +0xc7
runtime.(*mheap).sysAlloc(0x2249b80, 0x2000, 0x2249b90, 0x1)
        /home/travis/.gimme/versions/go/src/runtime/malloc.go:633 +0x1cd
runtime.(*mheap).grow(0x2249b80, 0x1, 0x0)
        /home/travis/.gimme/versions/go/src/runtime/mheap.go:1105 +0x42
runtime.(*mheap).allocSpanLocked(0x2249b80, 0x1, 0x2262dc8, 0x20339a00000000)
        /home/travis/.gimme/versions/go/src/runtime/mheap.go:1033 +0x37f
runtime.(*mheap).alloc_m(0x2249b80, 0x1, 0x2a, 0x7f486e0fafff)
        /home/travis/.gimme/versions/go/src/runtime/mheap.go:860 +0xc2
runtime.(*mheap).alloc.func1()
        /home/travis/.gimme/versions/go/src/runtime/mheap.go:931 +0x4c
runtime.(*mheap).alloc(0x2249b80, 0x1, 0x7f486e01002a, 0x2249b80)
        /home/travis/.gimme/versions/go/src/runtime/mheap.go:930 +0x8a
runtime.(*mcentral).grow(0x224a980, 0x0)
        /home/travis/.gimme/versions/go/src/runtime/mcentral.go:256 +0x95
runtime.(*mcentral).cacheSpan(0x224a980, 0x444330)
        /home/travis/.gimme/versions/go/src/runtime/mcentral.go:106 +0x2f0
runtime.(*mcache).refill(0x7f491ef11610, 0x2a)
        /home/travis/.gimme/versions/go/src/runtime/mcache.go:135 +0x86
runtime.(*mcache).nextFree(0x7f491ef11610, 0x12a, 0x178, 0x1474d40, 0xc00030bea8)
        /home/travis/.gimme/versions/go/src/runtime/malloc.go:786 +0x88
runtime.mallocgc(0x180, 0x1474d40, 0x1, 0x21a4360)
        /home/travis/.gimme/versions/go/src/runtime/malloc.go:939 +0x76d
runtime.newobject(0x1474d40, 0x451bc6)
        /home/travis/.gimme/versions/go/src/runtime/malloc.go:1068 +0x38
runtime.malg(0xc000000800, 0x0)
        /home/travis/.gimme/versions/go/src/runtime/proc.go:3220 +0x31
runtime.newproc1(0x14f1278, 0xc1b2c69c20, 0x18, 0xc000000180, 0x10d13a9)
        /home/travis/.gimme/versions/go/src/runtime/proc.go:3273 +0x443
runtime.newproc.func1()
        /home/travis/.gimme/versions/go/src/runtime/proc.go:3244 +0x4f
runtime.systemstack(0x6)
        /home/travis/.gimme/versions/go/src/runtime/asm_amd64.s:351 +0x66
runtime.mstart()
        /home/travis/.gimme/versions/go/src/runtime/proc.go:1153

goroutine 1 [running]:
runtime.systemstack_switch()
        /home/travis/.gimme/versions/go/src/runtime/asm_amd64.s:311 fp=0xc1b2c69bc8 sp=0xc1b2c69bc0 pc=0x458740
runtime.newproc(0xc000000018, 0x14f1278)
        /home/travis/.gimme/versions/go/src/runtime/proc.go:3243 +0x6f fp=0xc1b2c69c10 sp=0xc1b2c69bc8 pc=0x43691f
github.com/seaweedfs/fuse/fs.(*Server).Serve(0xc0001ca180, 0x1714980, 0xc0001241e0, 0x0, 0x0)
        /home/travis/gopath/src/github.com/seaweedfs/fuse/fs/serve.go:422 +0x399 fp=0xc1b2c69ce0 sp=0xc1b2c69c10 pc=0x10d13a9
github.com/seaweedfs/fuse/fs.Serve(0xc0005f8060, 0x1714980, 0xc0001241e0, 0xc0005ee1f0, 0xf)
        /home/travis/gopath/src/github.com/seaweedfs/fuse/fs/serve.go:434 +0xba fp=0xc1b2c69d20 sp=0xc1b2c69ce0 pc=0x10d16fa
github.com/chrislusf/seaweedfs/weed/command.runMount(0x2231900, 0xc00010e030, 0x0, 0x0, 0x0)
        /home/travis/gopath/src/github.com/chrislusf/seaweedfs/weed/command/mount_std.go:92 +0x709 fp=0xc1b2c69ee8 sp=0xc1b2c69d20 pc=0x1110579
main.main()
        /home/travis/gopath/src/github.com/chrislusf/seaweedfs/weed/weed.go:67 +0x31b fp=0xc1b2c69f98 sp=0xc1b2c69ee8 pc=0x111a6cb
runtime.main()
        /home/travis/.gimme/versions/go/src/runtime/proc.go:200 +0x20c fp=0xc1b2c69fe0 sp=0xc1b2c69f98 pc=0x42ef0c
runtime.goexit()
        /home/travis/.gimme/versions/go/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc1b2c69fe8 sp=0xc1b2c69fe0 pc=0x45a691
...

I don't know why this weed log is so big, 9.6GiB this time
Still the problem of underline fuse module?

@chrislusf

This comment has been minimized.

Copy link
Owner

chrislusf commented Jan 3, 2019

@Hacky-DH

This comment has been minimized.

Copy link
Author

Hacky-DH commented Jan 3, 2019

Maybe, fatal error: runtime: out of memory has no timestamp
Another info is that the min_tps of open_read open and utime stages is near 0
This is not normal

@chrislusf

This comment has been minimized.

Copy link
Owner

chrislusf commented Jan 4, 2019

Made a change aac11fb

Running 2h tests now. Looks promising.

for you convenience, you can use this binary file:
weed.gz

@chrislusf

This comment has been minimized.

Copy link
Owner

chrislusf commented Jan 4, 2019

$ perftest -n 10m -c 100 -id 1 -interval 1m -period 1h
2019/01/04 00:11:49 client1 period 1h0m0s 8 concurrent 10000000 files in 8 dirs with 1250000 segs
2019/01/04 00:11:49           stage        max_tps        min_tps        avg_tps     period
2019/01/04 01:11:49   create_write       2558.867       1821.283       2381.071 1h0m0.305844818s
2019/01/04 02:11:49      open_read       5145.583       4380.100       5002.525 1h0m0.132701341s
2019/01/04 03:11:50           open       8948.683       8458.550       8823.641 1h0m0.245537464s
2019/01/04 04:11:50          utime       4632.133       4280.917       4514.461 1h0m0.061837828s
2019/01/04 05:11:51         rename       2216.717       1983.183       2098.020 1h0m0.858602248s
2019/01/04 06:11:51         unlink       5837.550       1969.567       2529.293 1h0m0.157982428s

@Hacky-DH

This comment has been minimized.

Copy link
Author

Hacky-DH commented Jan 7, 2019

@chrislusf This time maybe OK, I will run with large parameter and long time

perftest -n 10m -c 100 -interval 1m -period 1h
2019/01/07 11:35:51 client1 period 1h0m0s 32 concurrent 10000000 files in 32 dirs with 312500 segs
2019/01/07 11:35:51           stage        max_tps        min_tps        avg_tps     period
2019/01/07 12:35:51   create_write       6920.967        704.933        834.026 1h0m0.041583082s
2019/01/07 13:35:51      open_read        900.083        728.933        757.530 1h0m0.040541027s
2019/01/07 14:35:51           open       1009.583        719.867        758.982 1h0m0.038280065s
2019/01/07 15:35:51          utime        774.900        726.967        752.247 1h0m0.041435321s
2019/01/07 16:35:51         rename        254.550        224.967        244.802 1h0m0.093278965s
2019/01/07 17:35:51         unlink        713.583        212.667        266.991 1h0m0.039195534
@Hacky-DH

This comment has been minimized.

Copy link
Author

Hacky-DH commented Jan 8, 2019

@chrislusf I run perftest 2 hours with 100 million files

perftest -n 100m -c 100 -interval 1m -period 2h
2019/01/07 20:31:36 client1 period 2h0m0s 32 concurrent 100000000 files in 32 dirs with 3125000 segs
2019/01/07 20:31:36           stage        max_tps        min_tps        avg_tps     period
2019/01/07 22:31:36   create_write       7392.750        693.850        802.191 2h0m0.045976107s
2019/01/08 00:31:36      open_read        832.833        745.050        810.303 2h0m0.037693415s
2019/01/08 02:31:36           open        839.517        767.400        806.916 2h0m0.03607769s
2019/01/08 04:31:37          utime        848.883        692.633        813.032 2h0m0.036830462s
2019/01/08 06:31:37         rename        269.467        225.100        259.904 2h0m0.080311303s
2019/01/08 08:31:37         unlink        656.583        239.600        268.350 2h0m0.040927848s

The weed mount process's memory is about 4.859GiB
You can close this.

@chrislusf

This comment has been minimized.

Copy link
Owner

chrislusf commented Jan 8, 2019

Thanks for testing this!

@chrislusf chrislusf closed this Jan 8, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.