`weed mount` stalls some read operations while writing #2263

PeterCxy · 2021-08-16T14:55:14Z

Describe the bug

It seems that on my setup, a weed mount process will stall some chunk read requests while writing a large file. After the write operation is finished, the process is unstalled and all read requests are processed. It does not seem like a volume server performance issue, because other weed mount processes and the S3 gateway work completely fine while the one doing writes is stuck.

To reproduce, try to read a large file while writing another large file to the mountpoint, e.g. using dd. If the file being read is not in the local cache, then the reading process may get stuck for a long time.

System Setup

SeaweedFS version: 2.62
Mount command: weed mount -filer=<redacted> -filer.path=<redacted> -collection=<redacted> -dir=<redacted> -concurrentWriters=1 -allowOthers=true -chunkSizeLimitMB=1 -cacheCapacityMB=10240 -cacheDir=<redacted>

The bandwidth is not saturated during the write operation (in fact this is why I have concurrentWriters=1 and chunkSizeLimitMB=1 because I thought it might be due to my network not being able to handle the default 32 concurrent writers uploading chunks at the same time, but this did not change anything).

Expected behavior

The FUSE mount should continue to process read requests while the write request is ongoing.

The text was updated successfully, but these errors were encountered:

PeterCxy · 2021-08-16T15:17:31Z

Okay it seems that by using RWMutex in the volume server, one single volume can never be written to and read at the same time (though I am not sure if this is the root cause of my problem, but I do have a small amount of volumes in the collection where this happens). Is this really necessary at all times? Since volumes are normally append-only except during compaction, I believe it is reasonable to allow reads to happen while writes are ongoing (though precautions need to be made to not return data that has not finished writing yet or is only partially written).

PeterCxy · 2021-08-16T15:45:54Z

According to the Go documentation, a blocked Lock call prevents any RLocks from being acquired. It might be the case that multiple goroutines are trying to acquire the write lock to a volume at the same time, this preventing any reads from happening.

PeterCxy · 2021-08-17T05:50:38Z

This only seems to happen when there are a lot of ongoing write and read requests -- at some point in the request queue, all reads begin to get stuck and only writes are processed. Take this debug output for example: (I added log output around the RLock() call in readNeedle in volume_read.go)

I0817 13:46:32 51473 volume_read.go:17] reading needle 1,0ae3122338, acquiring read lock
I0817 13:46:32 51473 volume_read.go:19] reading needle 1,0ae3122338, read lock okay
I0817 13:46:33 51473 volume_read.go:17] reading needle 1,0b2e0c8014, acquiring read lock
I0817 13:46:33 51473 volume_read.go:19] reading needle 1,0b2e0c8014, read lock okay
I0817 13:46:33 51473 volume_read.go:17] reading needle 1,0b2e0c8014, acquiring read lock
I0817 13:46:33 51473 volume_read.go:19] reading needle 1,0b2e0c8014, read lock okay
I0817 13:46:33 51473 volume_read.go:17] reading needle 1,0c346954e1, acquiring read lock
I0817 13:46:33 51473 volume_read.go:19] reading needle 1,0c346954e1, read lock okay
I0817 13:46:33 51473 volume_read.go:17] reading needle 1,0c346954e1, acquiring read lock
I0817 13:46:33 51473 volume_read.go:19] reading needle 1,0c346954e1, read lock okay
I0817 13:46:33 51473 volume_read.go:17] reading needle 1,0d4157506e, acquiring read lock
I0817 13:46:33 51473 volume_read.go:19] reading needle 1,0d4157506e, read lock okay
I0817 13:46:33 51473 volume_read.go:17] reading needle 1,0d4157506e, acquiring read lock
I0817 13:46:33 51473 volume_read.go:19] reading needle 1,0d4157506e, read lock okay
I0817 13:46:33 51473 volume_read.go:17] reading needle 1,0eda71d03e, acquiring read lock
I0817 13:46:33 51473 volume_read.go:19] reading needle 1,0eda71d03e, read lock okay
I0817 13:46:33 51473 volume_read.go:17] reading needle 1,0eda71d03e, acquiring read lock
I0817 13:46:33 51473 volume_read.go:19] reading needle 1,0eda71d03e, read lock okay
I0817 13:46:33 51473 volume_read.go:17] reading needle 1,0ff5a33599, acquiring read lock
I0817 13:46:33 51473 volume_read.go:19] reading needle 1,0ff5a33599, read lock okay
I0817 13:46:33 51473 volume_read.go:17] reading needle 1,0ff5a33599, acquiring read lock
I0817 13:46:33 51473 volume_read.go:19] reading needle 1,0ff5a33599, read lock okay
I0817 13:46:33 51473 volume_read.go:17] reading needle 1,10533a013f, acquiring read lock
I0817 13:46:33 51473 volume_read.go:19] reading needle 1,10533a013f, read lock okay
I0817 13:46:33 51473 volume_read.go:17] reading needle 1,10533a013f, acquiring read lock
I0817 13:46:33 51473 volume_read.go:19] reading needle 1,10533a013f, read lock okay
I0817 13:46:33 51473 volume_read.go:17] reading needle 1,119233bb06, acquiring read lock
I0817 13:46:33 51473 volume_read.go:19] reading needle 1,119233bb06, read lock okay
I0817 13:47:04 51473 volume_read.go:17] reading needle 1,119233bb06, acquiring read lock
I0817 13:47:04 51473 volume_read.go:19] reading needle 1,119233bb06, read lock okay

Notice the irregular time gap between the last two invocations (during this time, only write requests are processed, because if I add the same debug output to the Lock() calls in the write implementation, it keeps going when reads are stuck). Also note that it did not get stuck in the process of acquiring the lock, so it may have nothing to do with the usage of RWMutex after all.

chrislusf · 2021-08-17T06:04:31Z

how busy are the traffic? There is a new volume server option -concurrentDownloadLimitMB=256 in 2.62. Do you think it may be related?

PeterCxy · 2021-08-17T06:07:42Z

@chrislusf I thought of that just now as well, so I tried to remove all checks of in-flight download data in volume_server_handlers.go, but it is still happening.

My current way to reproduce it is to just write one 1GB file, try to cat it while writing another, and observe the cat process getting stuck in the middle of the reading.

chrislusf · 2021-08-17T06:27:31Z

can you reproduce without using mount? Mount has its own caching and write strategies, and skipping it can help to pinpoint the problem.

PeterCxy · 2021-08-17T06:39:10Z

@chrislusf I've not tested without mount, but I believe the problem should be on the volume server implementation, because in the metrics exposed to Prometheus, those stuck intervals are counted towards volume server response times (unless the response times are not collected from the volume servers themselves, then I may be wrong).

PeterCxy · 2021-08-17T07:42:58Z

@chrislusf I have found the cause of the issue -- it's in your fork of fuse. In particular, it is this commit: seaweedfs/fuse@b6b8dbb. Removing this or drastically increasing N temporarily fixed the issue for me.

It distributes works to N queues, disregarding the fact that some work may be finished much slower than others. As a result, a request will inevitably be stuck waiting for a lot of slow requests ahead of it in the queue, while all of the other workers may be idling. I'd suggest either reverting the commit, or implementing some sort of shared work queue or work stealing to avoid such a situation.

I'm going to run some tests locally to see if a shared work queue is a better solution, or maybe this should just be reverted, since the goroutine scheduler itself implements work queuing and stealing.

chrislusf · 2021-08-17T07:51:58Z

Nice! I actually wonder how you found it out so quickly. :)

The original round-robin job queue distribution can cause starvation when a request is distributed to a worker which has a long backlog of slow requests to process. This is especially prevalent when there are a lot of concurrent read and write requests, as mentioned in issue seaweedfs/seaweedfs#2263. This commit uses `reflect.Select()` so that the request is only ever sent to a channel in the ready state. As part of this change, all the channels are now unbuffered instead of buffered, and the main FUSE serving loop will block when no idle worker is available. There is no point building up a job queue when no goroutine can process them -- it will block the filesystem operation anyway, and increase memory usage for no benefit. Just do not accept any new request when it cannot be processed in time. I am not sure if using the dynamic `select` provided by `reflect` is faster than just creating a goroutine for each request, but doing this at least allows us to still limit the maximum number of concurrent requests to a reasonable level. I'd also like to propose a customizable worker pool size `N` (and expose it to the SeaweedFS side), but this is a topic for another day.

PeterCxy changed the title ~~weed mount stalls all read operations while writing~~ weed mount stalls some read operations while writing Aug 17, 2021

PeterCxy mentioned this issue Aug 17, 2021

fs: serve: Distribute requests only to idle workers seaweedfs/fuse#2

Merged

chrislusf closed this as completed in 8461c02 Aug 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`weed mount` stalls some read operations while writing #2263

`weed mount` stalls some read operations while writing #2263

PeterCxy commented Aug 16, 2021 •

edited

Loading

PeterCxy commented Aug 16, 2021 •

edited

Loading

PeterCxy commented Aug 16, 2021

PeterCxy commented Aug 17, 2021 •

edited

Loading

chrislusf commented Aug 17, 2021

PeterCxy commented Aug 17, 2021 •

edited

Loading

chrislusf commented Aug 17, 2021

PeterCxy commented Aug 17, 2021

PeterCxy commented Aug 17, 2021

chrislusf commented Aug 17, 2021

weed mount stalls some read operations while writing #2263

weed mount stalls some read operations while writing #2263

Comments

PeterCxy commented Aug 16, 2021 • edited Loading

PeterCxy commented Aug 16, 2021 • edited Loading

PeterCxy commented Aug 16, 2021

PeterCxy commented Aug 17, 2021 • edited Loading

chrislusf commented Aug 17, 2021

PeterCxy commented Aug 17, 2021 • edited Loading

chrislusf commented Aug 17, 2021

PeterCxy commented Aug 17, 2021

PeterCxy commented Aug 17, 2021

chrislusf commented Aug 17, 2021

`weed mount` stalls some read operations while writing #2263

`weed mount` stalls some read operations while writing #2263

PeterCxy commented Aug 16, 2021 •

edited

Loading

PeterCxy commented Aug 16, 2021 •

edited

Loading

PeterCxy commented Aug 17, 2021 •

edited

Loading

PeterCxy commented Aug 17, 2021 •

edited

Loading