[mount/s3/filer] spurious I/O timeout when reading from volume servers #1907

PeterCxy · 2021-03-16T03:45:12Z

Describe the bug

Since some time ago, my weed mount mountpoint stopped working properly and the logs showed a lot of I/O timeouts, such as

read http://172.22.1.5:8080/14,0aa7c3291471 failed, err: fetch http://172.22.1.5:8080/14,0aa7c3291471?readDeleted=true: read tcp 192.168.201.199:37870->172.22.1.5:8080: i/o timeout

This caused the daemon to get stuck retrying and make zero progress. This looked like an issue with the volume servers, however, when I tried to curl the failed URL manually on the exact same machine where the error was reported:

curl http://172.22.1.5:8080/14,0aa7c3291471?readDeleted=true > test.bin

It returned immediately with the expected content of the file dumped into test.bin. But the weed mount daemon still keeps reporting I/O timeout even after my manual curl has clearly succeeded. In case it was just me being lucky, I retried the command several times while the I/O timeout errors were continuing, but all of them failed.

System Setup

If there is anything non-standard about my setup, it's probably the interconnect between the nodes -- I used ZeroTier to form a virtual private network between all the nodes to save me some trouble. But I have ran tests in the network and no other program seem to show any issue with ZeroTier, including curl.

Expected behavior

SeaweedFS should not time out when curl clearly didn't.

Additional context

I suspect that some TCP connection parameter in either Go's HTTP library or SeaweedFS is at play here. I don't expect this bug to be very reproducible, but any insight into the weird behavior of SeaweedFS is appreciated.

Speaking of the timeout, it seems like my Nginx reverse proxy at the master node can also get stuck halfway through when receiving HTML from the volume servers. I think it is the same issue here.

The text was updated successfully, but these errors were encountered:

chrislusf · 2021-03-16T04:00:17Z

Seems related to the memory leak problem?

PeterCxy · 2021-03-16T04:06:31Z

@chrislusf: But curl behaves correctly, so do you mean that it is an issue with the weed mount client-side implementation? I tried to roll back to 2.2x and 2.30-31, but every single version had the same issue I am facing here.

Also, when I tried to check older logs, it seems that these I/O timeouts have been happening before too, both with weed mount and with filer itself. It's just that these errors have not completely stalled the server before.

PeterCxy · 2021-03-16T05:40:41Z

It seems that the issue is unrelated to ZeroTier, because my setup does not work properly even with ZeroTier removed. Unfortunately I am not able to reliably reproduce the issue outside of my production setup.

PeterCxy · 2021-03-16T06:07:39Z

Interestingly, whenever the timeout happens, the volume server spams something like these in the logs

2021/03/16 05:53:13 http: superfluous response.WriteHeader call from github.com/chrislusf/seaweedfs/weed/server.processRangeRequest (common.go:243)

kmlebedev · 2021-03-16T06:10:43Z

Interestingly, whenever the timeout happens, the volume server spams something like these in the logs

2021/03/16 05:53:13 http: superfluous response.WriteHeader call from github.com/chrislusf/seaweedfs/weed/server.processRangeRequest (common.go:243)

This is a known issue and it is clear how to solve it
#1903

PeterCxy · 2021-03-16T06:16:38Z

@kmlebedev But is it the cause of timeouts / read failures? Or is it just a cosmetic error?

kmlebedev · 2021-03-16T06:20:19Z

@kmlebedev But is it the cause of timeouts / read failures? Or is it just a cosmetic error?
This is the problem of inability to change the response code of the http response and is not the cause of your problem.
I would advise you to make tcpdump this error

PeterCxy · 2021-03-16T06:27:51Z

I think the problem might simply be that SeaweedFS is giving up the connection too early. It seems that SeaweedFS starts reporting the timeout error only about 5 - 10 seconds after I initiated the I/O operation, which means that the first timeout happened very early. In addition, this error seems to only happen with large files, which could actually take more than 5 - 10 seconds to download due to network latency and TCP slow start.

PeterCxy · 2021-03-16T06:33:29Z

Yeah I believe that Filer / Mount started timing out just ~9s after the request was initiated, but the full download would take ~16s. But I assumed that Filer / Mount is supposed to stream the file data, i.e. it should not wait until the full request is completed before sending back data to the client?

PeterCxy · 2021-03-16T06:36:24Z

According to tcpdump, the volume server did send the data when Filer or Mount requested the file.

PeterCxy · 2021-03-16T06:42:34Z

@chrislusf I think this could be the issue: https://github.com/chrislusf/seaweedfs/blob/10164d0386460c1c39ed8b5ee5c434704a2b28fd/weed/util/fasthttp_util.go#L17

According to the documentation of fasthttp, this is

Maximum duration for full response reading (including body).

So if the body took longer than this to read, the request would time out (?).

chrislusf · 2021-03-16T06:46:03Z

Ok. We may need to remove the usage of fasthttp package.

PeterCxy · 2021-03-16T06:55:16Z

@chrislusf I believe just increasing the ReadTimeout / WriteTimeout will be enough. fasthttp does not seem to have a way to set a connection timeout without including the body, though.

PeterCxy · 2021-03-16T06:55:45Z

I changed the timeouts locally to time.Minute and it seems at least for now the timeout messages have gone away.

PeterCxy · 2021-03-16T07:01:22Z

@chrislusf According to valyala/fasthttp#299, to set the TCP dial timeout (instead of the timeout of the full request), one needs to provide a custom Dial function to fasthttp.Client.

The ReadTimeout and WriteTimeout should probably be removed because SeaweedFS has no hard limit on the size of files / chunks, so reading a chunk can take arbitrarily long. Or maybe set those to a sane upper bound, such as several minutes.

kmlebedev · 2021-03-16T07:07:33Z

@chrislusf According to valyala/fasthttp#299, to set the TCP dial timeout (instead of the timeout of the full request), one needs to provide a custom Dial function to fasthttp.Client.

The ReadTimeout and WriteTimeout should probably be removed because SeaweedFS has no hard limit on the size of files / chunks, so reading a chunk can take arbitrarily long. Or maybe set those to a sane upper bound, such as several minutes.

We have a limited chunk size
https://github.com/chrislusf/seaweedfs/blob/df72dc206d9064daa79439ef9f3fb83f491eebd8/weed/server/filer_server_handlers_write_autochunk.go#L39

PeterCxy · 2021-03-16T07:10:00Z

@kmlebedev But that could be customized and there doesn't seem to be a sane way to calculate a max timeout from a given chunk size. So probably need to just get rid of the body read timeout or set it to a high value like 5 minutes or 10 minutes.

kmlebedev · 2021-03-16T07:13:14Z

@kmlebedev But that could be customized and there doesn't seem to be a sane way to calculate a max timeout from a given chunk size. So probably need to just get rid of the body read timeout or set it to a high value like 5 minutes or 10 minutes.

This is hardly reasonable, since the chunk size can be set to 1Gb.
Seems worth considering chunk size.

PeterCxy · 2021-03-16T07:16:06Z

Or maybe just assume that the interconnect between SeaweedFS nodes will not be slower than, say, 10Mbps (~1.25 MiB/s), and calculate the timeout based on this. If users still run into timeout issues, a customizable option could be provided to set the timeout even higher.

kmlebedev · 2021-03-16T07:21:43Z

Or maybe just assume that the interconnect between SeaweedFS nodes will not be slower than, say, 10Mbps (~1.25 MiB/s), and calculate the timeout based on this. If users still run into timeout issues, a customizable option could be provided to set the timeout even higher.

On a volume, the bottleneck is the disk, especially the HDD. Accordingly, the read speed may drop to 0.
Here I would like to see the high utilization of the disk and try to go to another volume. But this is different issues

PeterCxy · 2021-03-16T07:27:46Z

I agree that smart load-balancing is nice to have, but for now, to resolve this issue, we either need to get rid of the timeout (which could result in Filer getting stuck forever), or set the timeout to some higher value. Not being able to read anything is a bigger problem than load-balancing.

related to #1907

chrislusf added a commit that referenced this issue Mar 16, 2021

revert fasthttp changes

4b1ed22

related to #1907

PeterCxy closed this as completed Mar 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mount/s3/filer] spurious I/O timeout when reading from volume servers #1907

[mount/s3/filer] spurious I/O timeout when reading from volume servers #1907

PeterCxy commented Mar 16, 2021

chrislusf commented Mar 16, 2021

PeterCxy commented Mar 16, 2021

PeterCxy commented Mar 16, 2021

PeterCxy commented Mar 16, 2021

kmlebedev commented Mar 16, 2021

PeterCxy commented Mar 16, 2021

kmlebedev commented Mar 16, 2021

PeterCxy commented Mar 16, 2021

PeterCxy commented Mar 16, 2021

PeterCxy commented Mar 16, 2021

PeterCxy commented Mar 16, 2021

chrislusf commented Mar 16, 2021

PeterCxy commented Mar 16, 2021

PeterCxy commented Mar 16, 2021 •

edited

Loading

PeterCxy commented Mar 16, 2021 •

edited

Loading

kmlebedev commented Mar 16, 2021

PeterCxy commented Mar 16, 2021

kmlebedev commented Mar 16, 2021

PeterCxy commented Mar 16, 2021

kmlebedev commented Mar 16, 2021

PeterCxy commented Mar 16, 2021

[mount/s3/filer] spurious I/O timeout when reading from volume servers #1907

[mount/s3/filer] spurious I/O timeout when reading from volume servers #1907

Comments

PeterCxy commented Mar 16, 2021

chrislusf commented Mar 16, 2021

PeterCxy commented Mar 16, 2021

PeterCxy commented Mar 16, 2021

PeterCxy commented Mar 16, 2021

kmlebedev commented Mar 16, 2021

PeterCxy commented Mar 16, 2021

kmlebedev commented Mar 16, 2021

PeterCxy commented Mar 16, 2021

PeterCxy commented Mar 16, 2021

PeterCxy commented Mar 16, 2021

PeterCxy commented Mar 16, 2021

chrislusf commented Mar 16, 2021

PeterCxy commented Mar 16, 2021

PeterCxy commented Mar 16, 2021 • edited Loading

PeterCxy commented Mar 16, 2021 • edited Loading

kmlebedev commented Mar 16, 2021

PeterCxy commented Mar 16, 2021

kmlebedev commented Mar 16, 2021

PeterCxy commented Mar 16, 2021

kmlebedev commented Mar 16, 2021

PeterCxy commented Mar 16, 2021

PeterCxy commented Mar 16, 2021 •

edited

Loading

PeterCxy commented Mar 16, 2021 •

edited

Loading