Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filer hangs or restarts on deleting large buckets #1715

Closed
divanikus opened this issue Dec 29, 2020 · 30 comments
Closed

Filer hangs or restarts on deleting large buckets #1715

divanikus opened this issue Dec 29, 2020 · 30 comments

Comments

@divanikus
Copy link
Contributor

Describe the bug
I'm running single master, single filer setup with s3 gateway. Filer's store is leveldb2.

I store lots of small files (up to 1MB) in separate per day buckets. Files are stored in nested directories (not a one dir for all). It works pretty well, unless I'm trying to drop bucket, both through API or bucket.delete. Filer might just hang with other components spitting rpc error: code = Unavailable desc = transport is closing or simply do a restart. Deleting collections is breezily fast though. Am I missing something so I can painlessly delete a whole bucket in one operation? Or I should move to another filer store?

System Setup

  • Debian 10
  • version 30GB 2.16 6912bf9 linux amd64
@chrislusf
Copy link
Collaborator

Could you please try to run with 2.18 with this option?

weed s3 -allowEmptyFolder

@divanikus
Copy link
Contributor Author

divanikus commented Dec 30, 2020

I have updated my setup to 2.18. I'm running s3 with filer

weed filer -s3

Should I add this option there?

So ok, I've add it as

weed filer -s3 -s3.allowEmptyFolder

I'll check if it would help

@chrislusf
Copy link
Collaborator

How is it going?

@chrislusf
Copy link
Collaborator

Could you test what’s the speed to remove the bucket? E.g., aws s3 rb s3://thebucket

@divanikus
Copy link
Contributor Author

Hi Chris.

Unfortunately, seems like it didn't helped. It had failed several times already. I'll try to collect some logs on next failure.

@divanikus
Copy link
Contributor Author

So what just happened. I've tried to delete bucket at 00:25. After a while filer became unresponsive from both it's API and weed shell. I've waited a little bit and killed it at 01:00 with kill -9.

My setup is a "master" (virtual) machine with 8 GB of RAM running master and filer and two "volume" machines with same 8 GB of RAM and running just volume. At the end of the incident, "master" machine become rather unresponsive, filer process ate 4 GBs of RAM and the CPU usage was at it's 100%.

Logs doesn't seem to be much informative though.

Master:

Jan 12 00:31:55 wm-00 weed[1684]: I0112 00:31:55  1684 master_grpc_server.go:114] master see deleted volume 83319 from 192.168.75.17:8080
Jan 12 00:31:55 wm-00 weed[1684]: I0112 00:31:55  1684 master_grpc_server.go:114] master see deleted volume 83304 from 192.168.75.17:8080
Jan 12 00:31:55 wm-00 weed[1684]: I0112 00:31:55  1684 master_grpc_server.go:114] master see deleted volume 83303 from 192.168.75.17:8080
Jan 12 00:31:55 wm-00 weed[1684]: I0112 00:31:55  1684 master_grpc_server.go:114] master see deleted volume 83324 from 192.168.75.17:8080
Jan 12 00:31:55 wm-00 weed[1684]: I0112 00:31:55  1684 master_grpc_server.go:114] master see deleted volume 83311 from 192.168.75.17:8080
Jan 12 00:31:55 wm-00 weed[1684]: I0112 00:31:55  1684 master_grpc_server.go:114] master see deleted volume 83302 from 192.168.75.17:8080
Jan 12 00:31:55 wm-00 weed[1684]: I0112 00:31:55  1684 master_grpc_server.go:114] master see deleted volume 83323 from 192.168.75.17:8080
Jan 12 00:31:55 wm-00 weed[1684]: I0112 00:31:55  1684 master_grpc_server.go:114] master see deleted volume 83312 from 192.168.75.17:8080
Jan 12 00:31:55 wm-00 weed[1684]: I0112 00:31:55  1684 master_grpc_server.go:114] master see deleted volume 83322 from 192.168.75.17:8080
Jan 12 00:31:55 wm-00 weed[1684]: I0112 00:31:55  1684 master_grpc_server.go:114] master see deleted volume 83321 from 192.168.75.17:8080
Jan 12 00:31:55 wm-00 weed[1684]: I0112 00:31:55  1684 master_grpc_server.go:114] master see deleted volume 83317 from 192.168.75.17:8080
Jan 12 00:31:55 wm-00 weed[1684]: I0112 00:31:55  1684 master_grpc_server.go:155] master send to filer@192.168.75.15:8888: url:"192.168.75.17:8080" public_url:"192.168.75.17:8080" deleted_vids:83314 deleted_vids:83316 deleted_vids:83325 del
eted_vids:83320 deleted_vids:83313 deleted_vids:83318 deleted_vids:83315 deleted_vids:83319 deleted_vids:83304 deleted_vids:83303 deleted_vids:83324 deleted_vids:83311 deleted_vids:83302 deleted_vids:83323 deleted_vids:83312 deleted_vids:83
322 deleted_vids:83321 deleted_vids:83317 data_center:"dc1"
Jan 12 00:31:55 wm-00 weed[1684]: I0112 00:31:55  1684 master_grpc_server.go:155] master send to adminShell@192.168.75.15:46432: url:"192.168.75.17:8080" public_url:"192.168.75.17:8080" deleted_vids:83314 deleted_vids:83316 deleted_vids:833
25 deleted_vids:83320 deleted_vids:83313 deleted_vids:83318 deleted_vids:83315 deleted_vids:83319 deleted_vids:83304 deleted_vids:83303 deleted_vids:83324 deleted_vids:83311 deleted_vids:83302 deleted_vids:83323 deleted_vids:83312 deleted_v
ids:83322 deleted_vids:83321 deleted_vids:83317 data_center:"dc1"
Jan 12 00:31:55 wm-00 weed[1684]: I0112 00:31:55  1684 master_grpc_server.go:155] master send to master@192.168.75.15:46432: url:"192.168.75.17:8080" public_url:"192.168.75.17:8080" deleted_vids:83314 deleted_vids:83316 deleted_vids:83325 d
eleted_vids:83320 deleted_vids:83313 deleted_vids:83318 deleted_vids:83315 deleted_vids:83319 deleted_vids:83304 deleted_vids:83303 deleted_vids:83324 deleted_vids:83311 deleted_vids:83302 deleted_vids:83323 deleted_vids:83312 deleted_vids:
83322 deleted_vids:83321 deleted_vids:83317 data_center:"dc1"
Jan 12 00:36:48 wm-00 weed[1684]: I0112 00:36:48  1684 master_grpc_server.go:250] + client adminShell@[::1]:43310
Jan 12 00:37:01 wm-00 weed[1684]: I0112 00:37:01  1684 master_grpc_server.go:266] - client adminShell@[::1]:43310
Jan 12 00:41:31 wm-00 weed[1684]: I0112 00:41:09  1684 master_grpc_server.go:266] - client filer@192.168.75.15:8888
Jan 12 00:42:36 wm-00 weed[1684]: I0112 00:40:50  1684 masterclient.go:120] master masterClient failed to receive from 192.168.75.15:9333: rpc error: code = Unavailable desc = transport is closing
Jan 12 00:43:04 wm-00 weed[1684]: I0112 00:42:58  1684 master_grpc_server.go:266] - client master@192.168.75.15:46432
Jan 12 00:43:45 wm-00 weed[1684]: I0112 00:43:08  1684 masterclient.go:120] adminShell masterClient failed to receive from 192.168.75.15:9333: rpc error: code = Unavailable desc = transport is closing
Jan 12 00:44:05 wm-00 weed[1684]: I0112 00:44:03  1684 master_grpc_server.go:266] - client adminShell@192.168.75.15:46432
Jan 12 00:53:51 wm-00 weed[1684]: I0112 00:52:56  1684 master_server.go:257] executing: lock []
Jan 12 00:54:45 wm-00 weed[1684]: W0112 00:54:31  1684 master_grpc_server.go:170] SendHeartbeat.Send response to to 192.168.75.16:8080 rpc error: code = Unavailable desc = transport is closing
Jan 12 00:56:39 wm-00 weed[1684]: I0112 00:56:29  1684 topology_event_handling.go:57] Removing Volume 4240 from the dead volume server 192.168.75.16:8080
Jan 12 00:57:48 wm-00 weed[1684]: I0112 00:57:16  1684 volume_layout.go:366] Volume 4240 has 1 replica, less than required 2
Jan 12 00:58:42 wm-00 weed[1684]: I0112 00:58:20  1684 volume_layout.go:340] Volume 4240 becomes unwritable
Jan 12 00:59:27 wm-00 weed[1684]: I0112 00:59:04  1684 topology_event_handling.go:57] Removing Volume 545 from the dead volume server 192.168.75.16:8080
Jan 12 01:00:14 wm-00 weed[1684]: I0112 01:00:14  1684 volume_layout.go:366] Volume 545 has 1 replica, less than required 2
Jan 12 01:00:14 wm-00 weed[1684]: I0112 01:00:14  1684 volume_layout.go:340] Volume 545 becomes unwritable
Jan 12 01:00:14 wm-00 weed[1684]: I0112 01:00:14  1684 topology_event_handling.go:57] Removing Volume 1324 from the dead volume server 192.168.75.16:8080
Jan 12 01:00:14 wm-00 weed[1684]: I0112 01:00:14  1684 volume_layout.go:366] Volume 1324 has 1 replica, less than required 2
Jan 12 01:00:14 wm-00 weed[1684]: I0112 01:00:14  1684 volume_layout.go:340] Volume 1324 becomes unwritable

Filer (not posting the whole stack trace because it was just complaints about me killing it with signal 9):

Jan 12 00:25:56 wm-00 weed[1683]: E0112 00:25:56  1683 log_read.go:58] LoopProcessLogData: read buffer 2021-01-12 00:25:56.388061357 +0300 MSK read 7955 [1972987,827606833) from [0,4194072)
Jan 12 00:25:56 wm-00 weed[1683]: E0112 00:25:56  1683 filer_grpc_server_sub_meta.go:96] processed to 2021-01-12 00:25:56.388061357 +0300 MSK: resume
Jan 12 00:25:59 wm-00 weed[1683]: E0112 00:25:59  1683 log_read.go:58] LoopProcessLogData: read buffer 2021-01-12 00:25:59.597718793 +0300 MSK read 2548 [632385,929761959) from [0,4194076)
Jan 12 00:25:59 wm-00 weed[1683]: E0112 00:25:59  1683 filer_grpc_server_sub_meta.go:96] processed to 2021-01-12 00:25:59.597718793 +0300 MSK: resume
Jan 12 00:26:02 wm-00 weed[1683]: E0112 00:26:02  1683 log_read.go:58] LoopProcessLogData: read buffer 2021-01-12 00:26:02.751238706 +0300 MSK read 7960 [1974487,976038346) from [0,4194070)
Jan 12 00:26:02 wm-00 weed[1683]: E0112 00:26:02  1683 filer_grpc_server_sub_meta.go:96] processed to 2021-01-12 00:26:02.751238706 +0300 MSK: resume
Jan 12 00:26:05 wm-00 weed[1683]: E0112 00:26:05  1683 log_read.go:58] LoopProcessLogData: read buffer 2021-01-12 00:26:05.954663937 +0300 MSK read 3510 [874500,3470088744) from [0,4194248)
Jan 12 00:26:05 wm-00 weed[1683]: E0112 00:26:05  1683 filer_grpc_server_sub_meta.go:96] processed to 2021-01-12 00:26:05.954663937 +0300 MSK: resume
Jan 12 00:26:09 wm-00 weed[1683]: E0112 00:26:09  1683 log_read.go:58] LoopProcessLogData: read buffer 2021-01-12 00:26:09.123860577 +0300 MSK read 4690 [1163186,826535144) from [0,4194149)
Jan 12 00:26:09 wm-00 weed[1683]: E0112 00:26:09  1683 filer_grpc_server_sub_meta.go:96] processed to 2021-01-12 00:26:09.123860577 +0300 MSK: resume
Jan 12 00:26:12 wm-00 weed[1683]: E0112 00:26:12  1683 log_read.go:58] LoopProcessLogData: read buffer 2021-01-12 00:26:12.34916261 +0300 MSK read 13188 [3262701,547090781) from [0,4194115)
Jan 12 00:26:12 wm-00 weed[1683]: E0112 00:26:12  1683 filer_grpc_server_sub_meta.go:96] processed to 2021-01-12 00:26:12.34916261 +0300 MSK: resume
Jan 12 00:26:15 wm-00 weed[1683]: E0112 00:26:15  1683 log_read.go:58] LoopProcessLogData: read buffer 2021-01-12 00:26:15.654401029 +0300 MSK read 6430 [1599735,1669385768) from [0,4194243)
Jan 12 00:26:15 wm-00 weed[1683]: E0112 00:26:15  1683 filer_grpc_server_sub_meta.go:96] processed to 2021-01-12 00:26:15.654401029 +0300 MSK: resume
Jan 12 00:37:50 wm-00 weed[1683]: E0112 00:37:50  1683 log_read.go:58] LoopProcessLogData: read buffer 2021-01-12 00:37:50.118156274 +0300 MSK read 4832 [1214260,1914894073) from [0,4194249)
Jan 12 00:37:50 wm-00 weed[1683]: E0112 00:37:50  1683 filer_grpc_server_sub_meta.go:96] processed to 2021-01-12 00:37:50.118156274 +0300 MSK: resume
Jan 12 00:37:53 wm-00 weed[1683]: E0112 00:37:53  1683 log_read.go:58] LoopProcessLogData: read buffer 2021-01-12 00:37:53.298713238 +0300 MSK read 7749 [1950317,18336603) from [0,4194087)
Jan 12 00:37:53 wm-00 weed[1683]: E0112 00:37:53  1683 filer_grpc_server_sub_meta.go:96] processed to 2021-01-12 00:37:53.298713238 +0300 MSK: resume
Jan 12 00:37:56 wm-00 weed[1683]: E0112 00:37:56  1683 log_read.go:58] LoopProcessLogData: read buffer 2021-01-12 00:37:56.501152496 +0300 MSK read 628 [159359,1698211255) from [0,4194301)
Jan 12 00:37:56 wm-00 weed[1683]: E0112 00:37:56  1683 filer_grpc_server_sub_meta.go:96] processed to 2021-01-12 00:37:56.501152496 +0300 MSK: resume
Jan 12 00:37:59 wm-00 weed[1683]: E0112 00:37:59  1683 log_read.go:58] LoopProcessLogData: read buffer 2021-01-12 00:37:59.633826058 +0300 MSK read 12396 [3105758,577924256) from [0,4194210)
Jan 12 00:37:59 wm-00 weed[1683]: E0112 00:37:59  1683 filer_grpc_server_sub_meta.go:96] processed to 2021-01-12 00:37:59.633826058 +0300 MSK: resume
Jan 12 00:38:03 wm-00 weed[1683]: E0112 00:38:03  1683 log_read.go:58] LoopProcessLogData: read buffer 2021-01-12 00:38:02.981930907 +0300 MSK read 14384 [3601462,2890728695) from [0,4194071)
Jan 12 00:38:03 wm-00 weed[1683]: E0112 00:38:03  1683 filer_grpc_server_sub_meta.go:96] processed to 2021-01-12 00:38:02.981930907 +0300 MSK: resume
Jan 12 00:38:06 wm-00 weed[1683]: E0112 00:38:06  1683 log_read.go:58] LoopProcessLogData: read buffer 2021-01-12 00:38:06.28482004 +0300 MSK read 725 [181687,859175152) from [0,4194053)
Jan 12 00:38:06 wm-00 weed[1683]: E0112 00:38:06  1683 filer_grpc_server_sub_meta.go:96] processed to 2021-01-12 00:38:06.28482004 +0300 MSK: resume
Jan 12 00:38:09 wm-00 weed[1683]: E0112 00:38:09  1683 log_read.go:58] LoopProcessLogData: read buffer 2021-01-12 00:38:09.429274045 +0300 MSK read 12249 [3084567,811549285) from [0,4194222)
Jan 12 00:38:09 wm-00 weed[1683]: E0112 00:38:09  1683 filer_grpc_server_sub_meta.go:96] processed to 2021-01-12 00:38:09.429274045 +0300 MSK: resume
Jan 12 00:55:22 wm-00 weed[1683]: E0112 00:45:03  1683 upload_content.go:234] upload  36024 bytes to http://192.168.75.17:8080/83952,24913b162969cc30: Post "http://192.168.75.17:8080/83952,24913b162969cc30": EOF
Jan 12 00:55:55 wm-00 weed[1683]: E0112 00:49:31  1683 s3api_object_handlers.go:311] post to filer: Put "http://192.168.75.15:8888/buckets/pp-cache-2021-01-12/5b2/d9434ff": EOF
Jan 12 00:58:03 wm-00 weed[1683]: E0112 00:54:06  1683 s3api_object_handlers.go:311] post to filer: Put "http://192.168.75.15:8888/buckets/pp-cache-2021-01-12/273/d37a864": EOF
Jan 12 00:59:12 wm-00 weed[1683]: goroutine 3421246 [running]:
Jan 12 00:59:12 wm-00 weed[1683]: runtime/debug.Stack(0x30383a37312e3537, 0x32353933382f3038, 0x316233313934322c)
Jan 12 00:59:12 wm-00 weed[1683]:         /home/travis/.gimme/versions/go1.15.6.linux.amd64/src/runtime/debug/stack.go:24 +0x9f
Jan 12 00:59:12 wm-00 weed[1683]: runtime/debug.PrintStack()
Jan 12 00:59:12 wm-00 weed[1683]:         /home/travis/.gimme/versions/go1.15.6.linux.amd64/src/runtime/debug/stack.go:16 +0x25
Jan 12 00:59:12 wm-00 weed[1683]: github.com/chrislusf/seaweedfs/weed/operation.upload_content(0xc0cff981e0, 0x30, 0xc013465260, 0x0, 0x0, 0x0, 0x8cb8, 0x218735f, 0x12, 0x0, ...)
Jan 12 00:59:12 wm-00 weed[1683]:         /home/travis/gopath/src/github.com/chrislusf/seaweedfs/weed/operation/upload_content.go:235 +0xd79
Jan 12 00:59:12 wm-00 weed[1683]: github.com/chrislusf/seaweedfs/weed/operation.doUploadData(0xc0cff981e0, 0x30, 0x0, 0x0, 0x82858d4235942900, 0xc0bd000000, 0x8cb8, 0xfe00, 0x0, 0x218735f, ...)
Jan 12 00:59:12 wm-00 weed[1683]:         /home/travis/gopath/src/github.com/chrislusf/seaweedfs/weed/operation/upload_content.go:169 +0x490

Volume:

Jan 12 00:40:55 wv-00 weed[17057]: I0112 00:40:55 17057 volume_grpc_client_to_master.go:208] Volume Server Failed to talk with master 192.168.75.15:9333: EOF
Jan 12 00:40:55 wv-00 weed[17057]: I0112 00:40:55 17057 volume_grpc_client_to_master.go:76] heartbeat error: EOF
Jan 12 00:44:24 wv-00 weed[17057]: I0112 00:44:24 17057 volume_grpc_client_to_master.go:114] Heartbeat to: wm-00.old:9333
Jan 12 00:45:14 wv-00 weed[17057]: I0112 00:45:14 17057 volume_grpc_client_to_master.go:208] Volume Server Failed to talk with master wm-00.old:9333: EOF
Jan 12 00:45:14 wv-00 weed[17057]: I0112 00:45:14 17057 volume_grpc_client_to_master.go:76] heartbeat error: EOF
Jan 12 01:02:16 wv-00 weed[17057]: I0112 01:02:16 17057 volume_grpc_client_to_master.go:114] Heartbeat to: wm-00.old:9333
Jan 12 01:02:16 wv-00 weed[17057]: I0112 01:02:16 17057 store.go:122] In dir /var/lib/weed/volume adds volume:83958 collection:pp-cache-2021-01-12 replicaPlacement:001 ttl:
Jan 12 01:02:16 wv-00 weed[17057]: I0112 01:02:16 17057 volume_loading.go:128] loading leveldb /var/lib/weed/volume/pp-cache-2021-01-12_83958.ldb
Jan 12 01:02:16 wv-00 weed[17057]: I0112 01:02:16 17057 volume_grpc_client_to_master.go:135] Volume Server found a new master newLeader: 192.168.75.15:9333 instead of wm-00.old:9333
Jan 12 01:02:17 wv-00 weed[17057]: I0112 01:02:17 17057 store.go:126] add volume 83958
Jan 12 01:02:19 wv-00 weed[17057]: I0112 01:02:19 17057 volume_grpc_client_to_master.go:114] Heartbeat to: 192.168.75.15:9333

@divanikus
Copy link
Contributor Author

divanikus commented Jan 11, 2021

A typical bucket is something like the following, ~3.5M files

        volume id:83326 size:5257464080 collection:"pp-cache-2020-12-25" file_count:130914 delete_count:11011 deleted_byte_count:549794262 replica_placement:1 version:3 modified_at_second:1608929885
        volume id:83327 size:5257216560 collection:"pp-cache-2020-12-25" file_count:130739 delete_count:10872 deleted_byte_count:543103320 replica_placement:1 version:3 modified_at_second:1608929893
        volume id:83328 size:5248888368 collection:"pp-cache-2020-12-25" file_count:130372 delete_count:11125 deleted_byte_count:556456185 replica_placement:1 version:3 modified_at_second:1608929399
        volume id:83335 size:5249614648 collection:"pp-cache-2020-12-25" file_count:120104 delete_count:10637 deleted_byte_count:531445506 replica_placement:1 version:3 modified_at_second:1608929869
        volume id:83336 size:5246068408 collection:"pp-cache-2020-12-25" file_count:119972 delete_count:10725 deleted_byte_count:536837903 replica_placement:1 version:3 modified_at_second:1608929850
        volume id:83337 size:5267552416 collection:"pp-cache-2020-12-25" file_count:120533 delete_count:10857 deleted_byte_count:543662528 replica_placement:1 version:3 modified_at_second:1608929879
        volume id:83338 size:5244122384 collection:"pp-cache-2020-12-25" file_count:116661 delete_count:9197 deleted_byte_count:462619501 replica_placement:1 version:3 modified_at_second:1608905661
        volume id:83339 size:5252092456 collection:"pp-cache-2020-12-25" file_count:116832 delete_count:9358 deleted_byte_count:471237058 replica_placement:1 version:3 modified_at_second:1608927777
        volume id:83340 size:5255925040 collection:"pp-cache-2020-12-25" file_count:116915 delete_count:9357 deleted_byte_count:470641939 replica_placement:1 version:3 modified_at_second:1608927520
        volume id:83341 size:5256768312 collection:"pp-cache-2020-12-25" file_count:116122 delete_count:9471 deleted_byte_count:463380754 replica_placement:1 version:3 modified_at_second:1608893910
        volume id:83342 size:5247570544 collection:"pp-cache-2020-12-25" file_count:116009 delete_count:9584 deleted_byte_count:467846021 replica_placement:1 version:3 modified_at_second:1608894018
        volume id:83343 size:5253284104 collection:"pp-cache-2020-12-25" file_count:115943 delete_count:9380 deleted_byte_count:457702981 replica_placement:1 version:3 modified_at_second:1608907847
        volume id:83344 size:5245621768 collection:"pp-cache-2020-12-25" file_count:106397 delete_count:12520 deleted_byte_count:599349325 replica_placement:1 version:3 modified_at_second:1608925541
        volume id:83345 size:5257325464 collection:"pp-cache-2020-12-25" file_count:106922 delete_count:12514 deleted_byte_count:600912335 replica_placement:1 version:3 modified_at_second:1608912477
        volume id:83346 size:5251017496 collection:"pp-cache-2020-12-25" file_count:106411 delete_count:12473 deleted_byte_count:597467568 replica_placement:1 version:3 modified_at_second:1608903653
        volume id:83347 size:5256472776 collection:"pp-cache-2020-12-25" file_count:104822 delete_count:11950 deleted_byte_count:557424362 replica_placement:1 version:3 modified_at_second:1608894356
        volume id:83348 size:5249906976 collection:"pp-cache-2020-12-25" file_count:104293 delete_count:12043 deleted_byte_count:560921794 replica_placement:1 version:3 modified_at_second:1608901282
        volume id:83349 size:5254525744 collection:"pp-cache-2020-12-25" file_count:104633 delete_count:11995 deleted_byte_count:557593812 replica_placement:1 version:3 modified_at_second:1608907991
        volume id:83350 size:5255632272 collection:"pp-cache-2020-12-25" file_count:105900 delete_count:13817 deleted_byte_count:636074261 replica_placement:1 version:3 modified_at_second:1608929389
        volume id:83351 size:5250032424 collection:"pp-cache-2020-12-25" file_count:105598 delete_count:13820 deleted_byte_count:639482541 replica_placement:1 version:3 modified_at_second:1608899974
        volume id:83352 size:5249276896 collection:"pp-cache-2020-12-25" file_count:106014 delete_count:13985 deleted_byte_count:641308815 replica_placement:1 version:3 modified_at_second:1608894738
        volume id:83353 size:5245209400 collection:"pp-cache-2020-12-25" file_count:107102 delete_count:13058 deleted_byte_count:617783163 replica_placement:1 version:3 modified_at_second:1608923854
        volume id:83354 size:5257121016 collection:"pp-cache-2020-12-25" file_count:107328 delete_count:12975 deleted_byte_count:613953604 replica_placement:1 version:3 modified_at_second:1608924595
        volume id:83355 size:5244335288 collection:"pp-cache-2020-12-25" file_count:106669 delete_count:12864 deleted_byte_count:609733548 replica_placement:1 version:3 modified_at_second:1608927536
        volume id:83356 size:5257308656 collection:"pp-cache-2020-12-25" file_count:110088 delete_count:19797 deleted_byte_count:983885802 replica_placement:1 version:3 modified_at_second:1608929891
        volume id:83357 size:5247025752 collection:"pp-cache-2020-12-25" file_count:109911 delete_count:20044 deleted_byte_count:996211637 replica_placement:1 version:3 modified_at_second:1608929790
        volume id:83358 size:5245055648 collection:"pp-cache-2020-12-25" file_count:109866 delete_count:20008 deleted_byte_count:994887442 replica_placement:1 version:3 modified_at_second:1608929235
        volume id:83359 size:5245040032 collection:"pp-cache-2020-12-25" file_count:121993 delete_count:13124 deleted_byte_count:672110168 replica_placement:1 version:3 modified_at_second:1608929986
        volume id:83360 size:5244358664 collection:"pp-cache-2020-12-25" file_count:122062 delete_count:13360 deleted_byte_count:683943911 replica_placement:1 version:3 modified_at_second:1608929995
        volume id:83361 size:5244982024 collection:"pp-cache-2020-12-25" file_count:122168 delete_count:13223 deleted_byte_count:677093739 replica_placement:1 version:3 modified_at_second:1608929985
        volume id:83362 size:3641164072 collection:"pp-cache-2020-12-25" file_count:88596 delete_count:982 deleted_byte_count:52294366 replica_placement:1 version:3 modified_at_second:1608929999
        volume id:83363 size:3659655016 collection:"pp-cache-2020-12-25" file_count:89011 delete_count:1006 deleted_byte_count:53557692 replica_placement:1 version:3 modified_at_second:1608929999
        volume id:83364 size:3682091752 collection:"pp-cache-2020-12-25" file_count:89698 delete_count:982 deleted_byte_count:52340282 replica_placement:1 version:3 modified_at_second:1608929999

@chrislusf
Copy link
Collaborator

please show the stack trace that was truncated.

@divanikus
Copy link
Contributor Author

Here it is.

Jan 12 00:59:12 wm-00 weed[1683]: goroutine 3421246 [running]:
Jan 12 00:59:12 wm-00 weed[1683]: runtime/debug.Stack(0x30383a37312e3537, 0x32353933382f3038, 0x316233313934322c)
Jan 12 00:59:12 wm-00 weed[1683]:         /home/travis/.gimme/versions/go1.15.6.linux.amd64/src/runtime/debug/stack.go:24 +0x9f
Jan 12 00:59:12 wm-00 weed[1683]: runtime/debug.PrintStack()
Jan 12 00:59:12 wm-00 weed[1683]:         /home/travis/.gimme/versions/go1.15.6.linux.amd64/src/runtime/debug/stack.go:16 +0x25
Jan 12 00:59:12 wm-00 weed[1683]: github.com/chrislusf/seaweedfs/weed/operation.upload_content(0xc0cff981e0, 0x30, 0xc013465260, 0x0, 0x0, 0x0, 0x8cb8, 0x218735f, 0x12, 0x0, ...)
Jan 12 00:59:12 wm-00 weed[1683]:         /home/travis/gopath/src/github.com/chrislusf/seaweedfs/weed/operation/upload_content.go:235 +0xd79
Jan 12 00:59:12 wm-00 weed[1683]: github.com/chrislusf/seaweedfs/weed/operation.doUploadData(0xc0cff981e0, 0x30, 0x0, 0x0, 0x82858d4235942900, 0xc0bd000000, 0x8cb8, 0xfe00, 0x0, 0x218735f, ...)
Jan 12 00:59:12 wm-00 weed[1683]:         /home/travis/gopath/src/github.com/chrislusf/seaweedfs/weed/operation/upload_content.go:169 +0x490
Jan 12 00:59:12 wm-00 weed[1683]: github.com/chrislusf/seaweedfs/weed/operation.retriedUploadData(0xc0cff981e0, 0x30, 0x0, 0x0, 0x82858d4235942900, 0xc0bd000000, 0x8cb8, 0xfe00, 0x40b700, 0x0, ...)
Jan 12 00:59:12 wm-00 weed[1683]:         /home/travis/gopath/src/github.com/chrislusf/seaweedfs/weed/operation/upload_content.go:96 +0x1ba
Jan 12 00:59:12 wm-00 weed[1683]: github.com/chrislusf/seaweedfs/weed/operation.doUpload(0xc0cff981e0, 0x30, 0x0, 0x0, 0x7fce12f74e00, 0x2526ce0, 0xc0bcf284c0, 0x216dc00, 0x0, 0x0, ...)
Jan 12 00:59:12 wm-00 weed[1683]:         /home/travis/gopath/src/github.com/chrislusf/seaweedfs/weed/operation/upload_content.go:90 +0x11a
Jan 12 00:59:12 wm-00 weed[1683]: github.com/chrislusf/seaweedfs/weed/operation.Upload(...)
Jan 12 00:59:12 wm-00 weed[1683]:         /home/travis/gopath/src/github.com/chrislusf/seaweedfs/weed/operation/upload_content.go:75
Jan 12 00:59:12 wm-00 weed[1683]: github.com/chrislusf/seaweedfs/weed/server.(*FilerServer).doUpload(0xc000269420, 0xc0cff981e0, 0x30, 0x255aea0, 0xc0bc535340, 0xc0327cc700, 0x2526ce0, 0xc0bcf284c0, 0x0, 0x0, ...)
Jan 12 00:59:12 wm-00 weed[1683]:         /home/travis/gopath/src/github.com/chrislusf/seaweedfs/weed/server/filer_server_handlers_write_autochunk.go:277 +0x1c9
Jan 12 00:59:12 wm-00 weed[1683]: github.com/chrislusf/seaweedfs/weed/server.(*FilerServer).uploadReaderToChunks(0xc000269420, 0x255aea0, 0xc0bc535340, 0xc0327cc700, 0x7fce12f85070, 0xc0bc541ac0, 0x2000000, 0x0, 0x0, 0x0, ...)
Jan 12 00:59:14 wm-00 weed[1683]:         /home/travis/gopath/src/github.com/chrislusf/seaweedfs/weed/server/filer_server_handlers_write_autochunk.go:231 +0x31b
Jan 12 00:59:14 wm-00 weed[1683]: github.com/chrislusf/seaweedfs/weed/server.(*FilerServer).doPutAutoChunk(0xc000269420, 0x2565660, 0xc00004e038, 0x255aea0, 0xc0bc535340, 0xc0327cc700, 0xc002000000, 0xc0bca3caa0, 0xa6944c, 0xc00012b9b0, ...
)
Jan 12 00:59:14 wm-00 weed[1683]:         /home/travis/gopath/src/github.com/chrislusf/seaweedfs/weed/server/filer_server_handlers_write_autochunk.go:116 +0xcd
Jan 12 00:59:14 wm-00 weed[1683]: github.com/chrislusf/seaweedfs/weed/server.(*FilerServer).autoChunk(0xc000269420, 0x2565660, 0xc00004e038, 0x255aea0, 0xc0bc535340, 0xc0327cc700, 0xc0bca3caa0)
Jan 12 00:59:14 wm-00 weed[1683]:         /home/travis/gopath/src/github.com/chrislusf/seaweedfs/weed/server/filer_server_handlers_write_autochunk.go:57 +0x1e5
Jan 12 00:59:14 wm-00 weed[1683]: github.com/chrislusf/seaweedfs/weed/server.(*FilerServer).PostHandler(0xc000269420, 0x255aea0, 0xc0bc535340, 0xc0327cc700)
Jan 12 00:59:14 wm-00 weed[1683]:         /home/travis/gopath/src/github.com/chrislusf/seaweedfs/weed/server/filer_server_handlers_write.go:68 +0x1a5
Jan 12 00:59:14 wm-00 weed[1683]: github.com/chrislusf/seaweedfs/weed/server.(*FilerServer).filerHandler(0xc000269420, 0x255aea0, 0xc0bc535340, 0xc0327cc700)
Jan 12 00:59:14 wm-00 weed[1683]:         /home/travis/gopath/src/github.com/chrislusf/seaweedfs/weed/server/filer_server_handlers.go:40 +0x436
Jan 12 00:59:14 wm-00 weed[1683]: net/http.HandlerFunc.ServeHTTP(0xc0027cf190, 0x255aea0, 0xc0bc535340, 0xc0327cc700)
Jan 12 00:59:14 wm-00 weed[1683]:         /home/travis/.gimme/versions/go1.15.6.linux.amd64/src/net/http/server.go:2042 +0x44
Jan 12 00:59:14 wm-00 weed[1683]: net/http.(*ServeMux).ServeHTTP(0xc00045d580, 0x255aea0, 0xc0bc535340, 0xc0327cc700)
Jan 12 00:59:14 wm-00 weed[1683]:         /home/travis/.gimme/versions/go1.15.6.linux.amd64/src/net/http/server.go:2417 +0x1ad
Jan 12 00:59:14 wm-00 weed[1683]: net/http.serverHandler.ServeHTTP(0xc003135c00, 0x255aea0, 0xc0bc535340, 0xc0327cc700)
Jan 12 00:59:14 wm-00 weed[1683]:         /home/travis/.gimme/versions/go1.15.6.linux.amd64/src/net/http/server.go:2843 +0xa3
Jan 12 00:59:14 wm-00 weed[1683]: net/http.(*conn).serve(0xc0bc537860, 0x2565620, 0xc0bc541900)
Jan 12 00:59:14 wm-00 weed[1683]:         /home/travis/.gimme/versions/go1.15.6.linux.amd64/src/net/http/server.go:1925 +0x8ad
Jan 12 00:59:14 wm-00 weed[1683]: created by net/http.(*Server).Serve
Jan 12 00:59:14 wm-00 weed[1683]:         /home/travis/.gimme/versions/go1.15.6.linux.amd64/src/net/http/server.go:2969 +0x36c
Jan 12 01:00:14 wm-00 systemd[1]: seaweedfs@filer.service: Main process exited, code=killed, status=9/KILL
Jan 12 01:00:14 wm-00 systemd[1]: seaweedfs@filer.service: Failed with result 'signal'.
Jan 12 01:00:15 wm-00 systemd[1]: seaweedfs@filer.service: Service RestartSec=100ms expired, scheduling restart.
Jan 12 01:00:15 wm-00 systemd[1]: seaweedfs@filer.service: Scheduled restart job, restart counter is at 2.

@chrislusf
Copy link
Collaborator

chrislusf commented Jan 12, 2021

The latest git tip has added a new filer store, leveldb3. Dropping a large bucket should be instant.

see 629c996

@divanikus
Copy link
Contributor Author

Oh, cool. Would it be included in the next release?

@chrislusf
Copy link
Collaborator

try the latest version here https://github.com/chrislusf/seaweedfs/releases/tag/dev

@divanikus
Copy link
Contributor Author

Thanks, I'll give it a shoot and write back.

@divanikus
Copy link
Contributor Author

So, I have migrated to leveldb3 for my setup. It took a lot of time, 6 hours or so. In the end, no, it's not deleting buckets instantly, filer still hangs on deletion. Within bot API and s3.bucket.delete.

@chrislusf
Copy link
Collaborator

chrislusf commented Jan 13, 2021

can you please share a screenshot of the ls -l in the leveldb3 directory that you configured?

@divanikus
Copy link
Contributor Author

Sure
image

@divanikus
Copy link
Contributor Author

It looks almost the same as the previous version. BTW, I have updated only master and filer, should I upgrade volume also?

> s3.bucket.delete -name=pp-cache-2020-12-28
error: rpc error: code = Unavailable desc = transport is closing
> I0113 23:42:31  4671 masterclient.go:120] adminShell masterClient failed to receive from localhost:9333: rpc error: code = Unavailable desc = transport is closing

@chrislusf
Copy link
Collaborator

should be some bug. Need to debug. Deleting a bucket should be just removing volume collection and the level db folder, e.g. the pp-cache-2020-12-28 or vh-2020-12-28 folder.

@divanikus
Copy link
Contributor Author

divanikus commented Jan 13, 2021

Yeah, I guess so, but on commiting those commands whole shell hangs and usually spit some error after a while, or I just have to kill it manually.

@chrislusf
Copy link
Collaborator

Added a fix f17aa1d

Just need to update the filer.

@divanikus
Copy link
Contributor Author

divanikus commented Jan 14, 2021

Things are definitely better with that fix. Shell is still hanging for a while, but at least it's not that crucial, filer doesn't eat memory and stays at the bare minimum and definitely not freezing.

Seems like the delay is caused by removing collections now. I can reproduce it by manually deleting them through collection.delete. I think it is caused by busy volume which uses dedicated 10TB HDDs and might be not that responsive. At least I have a little hangup even on lsing it.

@chrislusf
Copy link
Collaborator

I did not expect deleting volumes can be that slow. How many volumes and how large are they? And was the master started with -volumePreallocate option?

@divanikus
Copy link
Contributor Author

divanikus commented Jan 15, 2021

I have 2 volume servers with 10 TB HDD each. 972 volumes (volumeSizeLimitMB=5000) on each of them. I'm not using -volumePreallocate since volume servers are on btrfs (single disk), so I think it wouldn't help much on QoW fs. Correct me if I'm wrong.

@divanikus
Copy link
Contributor Author

divanikus commented Jan 15, 2021

Some logs from master

Jan 15 16:02:06 wm-00 weed[5632]: I0115 16:02:06  5632 master_grpc_server.go:114] master see deleted volume 83798 from 192.168.75.17:8080
Jan 15 16:02:06 wm-00 weed[5632]: I0115 16:02:06  5632 master_grpc_server.go:114] master see deleted volume 83806 from 192.168.75.17:8080
Jan 15 16:02:06 wm-00 weed[5632]: I0115 16:02:06  5632 master_grpc_server.go:114] master see deleted volume 83793 from 192.168.75.17:8080
Jan 15 16:02:06 wm-00 weed[5632]: I0115 16:02:06  5632 master_grpc_server.go:114] master see deleted volume 83802 from 192.168.75.17:8080
Jan 15 16:02:06 wm-00 weed[5632]: I0115 16:02:06  5632 master_grpc_server.go:114] master see deleted volume 83795 from 192.168.75.17:8080
Jan 15 16:02:06 wm-00 weed[5632]: I0115 16:02:06  5632 master_grpc_server.go:114] master see deleted volume 83808 from 192.168.75.17:8080
Jan 15 16:02:06 wm-00 weed[5632]: I0115 16:02:06  5632 master_grpc_server.go:114] master see deleted volume 83803 from 192.168.75.17:8080
Jan 15 16:02:06 wm-00 weed[5632]: I0115 16:02:06  5632 master_grpc_server.go:114] master see deleted volume 83787 from 192.168.75.17:8080
Jan 15 16:02:06 wm-00 weed[5632]: I0115 16:02:06  5632 master_grpc_server.go:114] master see deleted volume 83815 from 192.168.75.17:8080
Jan 15 16:02:06 wm-00 weed[5632]: I0115 16:02:06  5632 master_grpc_server.go:114] master see deleted volume 83809 from 192.168.75.17:8080
Jan 15 16:02:06 wm-00 weed[5632]: I0115 16:02:06  5632 master_grpc_server.go:155] master send to filer@192.168.75.15:8888: url:"192.168.75.17:8080"  public_url:"192.168.75.17:8080"  deleted_vids:83818  deleted_vids:83797  deleted_vids:83794
  deleted_vids:83800  deleted_vids:83788  deleted_vids:83799  deleted_vids:83817  deleted_vids:83811  deleted_vids:83796  deleted_vids:83810  deleted_vids:83805  deleted_vids:83813  deleted_vids:83789  deleted_vids:83819  deleted_vids:83816
  deleted_vids:83812  deleted_vids:83807  deleted_vids:83804  deleted_vids:83814  deleted_vids:83801  deleted_vids:83798  deleted_vids:83806  deleted_vids:83793  deleted_vids:83802  deleted_vids:83795  deleted_vids:83808  deleted_vids:83803
  deleted_vids:83787  deleted_vids:83815  deleted_vids:83809  data_center:"dc1"
Jan 15 16:02:06 wm-00 weed[5632]: I0115 16:02:06  5632 master_grpc_server.go:155] master send to adminShell@192.168.75.15:40700: url:"192.168.75.17:8080"  public_url:"192.168.75.17:8080"  deleted_vids:83818  deleted_vids:83797  deleted_vids
:83794  deleted_vids:83800  deleted_vids:83788  deleted_vids:83799  deleted_vids:83817  deleted_vids:83811  deleted_vids:83796  deleted_vids:83810  deleted_vids:83805  deleted_vids:83813  deleted_vids:83789  deleted_vids:83819  deleted_vids
:83816  deleted_vids:83812  deleted_vids:83807  deleted_vids:83804  deleted_vids:83814  deleted_vids:83801  deleted_vids:83798  deleted_vids:83806  deleted_vids:83793  deleted_vids:83802  deleted_vids:83795  deleted_vids:83808  deleted_vids
:83803  deleted_vids:83787  deleted_vids:83815  deleted_vids:83809  data_center:"dc1"
Jan 15 16:02:06 wm-00 weed[5632]: I0115 16:02:06  5632 master_grpc_server.go:155] master send to master@192.168.75.15:40700: url:"192.168.75.17:8080"  public_url:"192.168.75.17:8080"  deleted_vids:83818  deleted_vids:83797  deleted_vids:837
94  deleted_vids:83800  deleted_vids:83788  deleted_vids:83799  deleted_vids:83817  deleted_vids:83811  deleted_vids:83796  deleted_vids:83810  deleted_vids:83805  deleted_vids:83813  deleted_vids:83789  deleted_vids:83819  deleted_vids:838
16  deleted_vids:83812  deleted_vids:83807  deleted_vids:83804  deleted_vids:83814  deleted_vids:83801  deleted_vids:83798  deleted_vids:83806  deleted_vids:83793  deleted_vids:83802  deleted_vids:83795  deleted_vids:83808  deleted_vids:838
03  deleted_vids:83787  deleted_vids:83815  deleted_vids:83809  data_center:"dc1"
Jan 15 16:06:00 wm-00 weed[5632]: I0115 16:06:00  5632 data_node.go:89] Deleting volume id: 1746
Jan 15 16:06:00 wm-00 weed[5632]: I0115 16:06:00  5632 data_node.go:89] Deleting volume id: 1748
Jan 15 16:06:00 wm-00 weed[5632]: I0115 16:06:00  5632 data_node.go:89] Deleting volume id: 1747
Jan 15 16:06:00 wm-00 weed[5632]: I0115 16:06:00  5632 topology.go:185] removing volume info:Id:1746, Size:202247160, ReplicaPlacement:001, Collection:vh-2020-10-16, Version:3, FileCount:3510, DeleteCount:856, DeletedByteCount:47910133, Rea
dOnly:false
Jan 15 16:06:00 wm-00 weed[5632]: I0115 16:06:00  5632 volume_layout.go:340] Volume 1746 becomes unwritable
Jan 15 16:06:00 wm-00 weed[5632]: I0115 16:06:00  5632 topology.go:185] removing volume info:Id:1748, Size:203112576, ReplicaPlacement:001, Collection:vh-2020-10-16, Version:3, FileCount:3616, DeleteCount:874, DeletedByteCount:48046349, Rea
dOnly:false
Jan 15 16:06:00 wm-00 weed[5632]: I0115 16:06:00  5632 volume_layout.go:340] Volume 1748 becomes unwritable
Jan 15 16:06:00 wm-00 weed[5632]: I0115 16:06:00  5632 topology.go:185] removing volume info:Id:1747, Size:204551240, ReplicaPlacement:001, Collection:vh-2020-10-16, Version:3, FileCount:3553, DeleteCount:870, DeletedByteCount:49113389, Rea
dOnly:false
Jan 15 16:06:00 wm-00 weed[5632]: I0115 16:06:00  5632 volume_layout.go:340] Volume 1747 becomes unwritable
Jan 15 16:06:00 wm-00 weed[5632]: I0115 16:06:00  5632 master_grpc_server.go:114] master see deleted volume 1746 from 192.168.75.16:8080
Jan 15 16:06:00 wm-00 weed[5632]: I0115 16:06:00  5632 master_grpc_server.go:114] master see deleted volume 1748 from 192.168.75.16:8080
Jan 15 16:06:00 wm-00 weed[5632]: I0115 16:06:00  5632 master_grpc_server.go:114] master see deleted volume 1747 from 192.168.75.16:8080
Jan 15 16:06:00 wm-00 weed[5632]: I0115 16:06:00  5632 master_grpc_server.go:155] master send to filer@192.168.75.15:8888: url:"192.168.75.16:8080"  public_url:"192.168.75.16:8080"  deleted_vids:1746  deleted_vids:1748  deleted_vids:1747  d
ata_center:"dc1"
Jan 15 16:06:00 wm-00 weed[5632]: I0115 16:06:00  5632 master_grpc_server.go:155] master send to adminShell@192.168.75.15:40700: url:"192.168.75.16:8080"  public_url:"192.168.75.16:8080"  deleted_vids:1746  deleted_vids:1748  deleted_vids:1
747  data_center:"dc1"
Jan 15 16:06:00 wm-00 weed[5632]: I0115 16:06:00  5632 master_grpc_server.go:155] master send to master@192.168.75.15:40700: url:"192.168.75.16:8080"  public_url:"192.168.75.16:8080"  deleted_vids:1746  deleted_vids:1748  deleted_vids:1747
 data_center:"dc1"
Jan 15 16:06:01 wm-00 weed[5632]: I0115 16:06:01  5632 data_node.go:89] Deleting volume id: 1746
Jan 15 16:06:01 wm-00 weed[5632]: I0115 16:06:01  5632 data_node.go:89] Deleting volume id: 1748
Jan 15 16:06:01 wm-00 weed[5632]: I0115 16:06:01  5632 data_node.go:89] Deleting volume id: 1747

You might notice the 4 minute gap between operations. Also, i've logged to volume too, and it doesn't look like something overloaded: load average: 0.45, 0.74, 1.00. And yes, I can ls on it fairly quick now.

Volume logs were silent at the moment.

@chrislusf
Copy link
Collaborator

Jan 15 16:06:00 wm-00 weed[5632]: I0115 16:06:00 5632 data_node.go:89] Deleting volume id: 1746
Jan 15 16:06:00 wm-00 weed[5632]: I0115 16:06:00 5632 data_node.go:89] Deleting volume id: 1748
Jan 15 16:06:00 wm-00 weed[5632]: I0115 16:06:00 5632 data_node.go:89] Deleting volume id: 1747

why this is repeated twice in the logs? Can I see the full logs?

@divanikus
Copy link
Contributor Author

Sure, here it is: https://gist.github.com/divanikus/cfd311591400c0b700de1315f0922bce

My cleanup script kicked in somewhere at 15:58 I guess. It simply deletes buckets one by one.

@chrislusf
Copy link
Collaborator

how did you delete the bucket? Please share your script.

@chrislusf
Copy link
Collaborator

the 4 minutes gap is just the heartbeat timing difference between the two volume servers.

@divanikus
Copy link
Contributor Author

Not the full code, but I guess it would be enough

resp = send_http("http://localhost:8888/buckets", "Get", {'Accept' => 'application/json'}, {limit: 1000})

JSON.parse(resp.body)["Entries"].each do |bucket|
  if time_difference(bucket["Collection"])
    begin
      resp = send_http("http://localhost:8888/buckets/#{bucket["Collection"]}", "Delete", {}, {recursive: 'true'})

      if resp.code == "204"
        puts "bucket #{bucket["Collection"]} deleted!"
      else
        puts "bucket #{bucket["Collection"]} delete failed!"
      end
    rescue
      puts "Http request error!"
    end
  end
end

@chrislusf
Copy link
Collaborator

seems working fine now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants