-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
weed mount
stalls some read operations while writing
#2263
Comments
Okay it seems that by using |
According to the Go documentation, a blocked Lock call prevents any RLocks from being acquired. It might be the case that multiple goroutines are trying to acquire the write lock to a volume at the same time, this preventing any reads from happening. |
weed mount
stalls all read operations while writingweed mount
stalls some read operations while writing
This only seems to happen when there are a lot of ongoing write and read requests -- at some point in the request queue, all reads begin to get stuck and only writes are processed. Take this debug output for example: (I added log output around the
Notice the irregular time gap between the last two invocations (during this time, only write requests are processed, because if I add the same debug output to the |
how busy are the traffic? There is a new volume server option |
@chrislusf I thought of that just now as well, so I tried to remove all checks of in-flight download data in My current way to reproduce it is to just write one 1GB file, try to |
can you reproduce without using mount? Mount has its own caching and write strategies, and skipping it can help to pinpoint the problem. |
@chrislusf I've not tested without mount, but I believe the problem should be on the volume server implementation, because in the metrics exposed to Prometheus, those stuck intervals are counted towards volume server response times (unless the response times are not collected from the volume servers themselves, then I may be wrong). |
@chrislusf I have found the cause of the issue -- it's in your fork of It distributes works to I'm going to run some tests locally to see if a shared work queue is a better solution, or maybe this should just be reverted, since the goroutine scheduler itself implements work queuing and stealing. |
Nice! I actually wonder how you found it out so quickly. :) |
The original round-robin job queue distribution can cause starvation when a request is distributed to a worker which has a long backlog of slow requests to process. This is especially prevalent when there are a lot of concurrent read and write requests, as mentioned in issue seaweedfs/seaweedfs#2263. This commit uses `reflect.Select()` so that the request is only ever sent to a channel in the ready state. As part of this change, all the channels are now unbuffered instead of buffered, and the main FUSE serving loop will block when no idle worker is available. There is no point building up a job queue when no goroutine can process them -- it will block the filesystem operation anyway, and increase memory usage for no benefit. Just do not accept any new request when it cannot be processed in time. I am not sure if using the dynamic `select` provided by `reflect` is faster than just creating a goroutine for each request, but doing this at least allows us to still limit the maximum number of concurrent requests to a reasonable level. I'd also like to propose a customizable worker pool size `N` (and expose it to the SeaweedFS side), but this is a topic for another day.
Describe the bug
It seems that on my setup, a
weed mount
process will stall some chunk read requests while writing a large file. After the write operation is finished, the process is unstalled and all read requests are processed. It does not seem like a volume server performance issue, because otherweed mount
processes and the S3 gateway work completely fine while the one doing writes is stuck.To reproduce, try to read a large file while writing another large file to the mountpoint, e.g. using
dd
. If the file being read is not in the local cache, then the reading process may get stuck for a long time.System Setup
SeaweedFS version: 2.62
Mount command:
weed mount -filer=<redacted> -filer.path=<redacted> -collection=<redacted> -dir=<redacted> -concurrentWriters=1 -allowOthers=true -chunkSizeLimitMB=1 -cacheCapacityMB=10240 -cacheDir=<redacted>
The bandwidth is not saturated during the write operation (in fact this is why I have
concurrentWriters=1
andchunkSizeLimitMB=1
because I thought it might be due to my network not being able to handle the default 32 concurrent writers uploading chunks at the same time, but this did not change anything).Expected behavior
The FUSE mount should continue to process read requests while the write request is ongoing.
The text was updated successfully, but these errors were encountered: