Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mount/s3/filer] spurious I/O timeout when reading from volume servers #1907

Closed
PeterCxy opened this issue Mar 16, 2021 · 21 comments
Closed

Comments

@PeterCxy
Copy link
Contributor

Describe the bug

Since some time ago, my weed mount mountpoint stopped working properly and the logs showed a lot of I/O timeouts, such as

read http://172.22.1.5:8080/14,0aa7c3291471 failed, err: fetch http://172.22.1.5:8080/14,0aa7c3291471?readDeleted=true: read tcp 192.168.201.199:37870->172.22.1.5:8080: i/o timeout

This caused the daemon to get stuck retrying and make zero progress. This looked like an issue with the volume servers, however, when I tried to curl the failed URL manually on the exact same machine where the error was reported:

curl http://172.22.1.5:8080/14,0aa7c3291471?readDeleted=true > test.bin

It returned immediately with the expected content of the file dumped into test.bin. But the weed mount daemon still keeps reporting I/O timeout even after my manual curl has clearly succeeded. In case it was just me being lucky, I retried the command several times while the I/O timeout errors were continuing, but all of them failed.

System Setup

If there is anything non-standard about my setup, it's probably the interconnect between the nodes -- I used ZeroTier to form a virtual private network between all the nodes to save me some trouble. But I have ran tests in the network and no other program seem to show any issue with ZeroTier, including curl.

Expected behavior

SeaweedFS should not time out when curl clearly didn't.

Additional context

I suspect that some TCP connection parameter in either Go's HTTP library or SeaweedFS is at play here. I don't expect this bug to be very reproducible, but any insight into the weird behavior of SeaweedFS is appreciated.

Speaking of the timeout, it seems like my Nginx reverse proxy at the master node can also get stuck halfway through when receiving HTML from the volume servers. I think it is the same issue here.

@chrislusf
Copy link
Collaborator

Seems related to the memory leak problem?

@PeterCxy
Copy link
Contributor Author

@chrislusf: But curl behaves correctly, so do you mean that it is an issue with the weed mount client-side implementation? I tried to roll back to 2.2x and 2.30-31, but every single version had the same issue I am facing here.

Also, when I tried to check older logs, it seems that these I/O timeouts have been happening before too, both with weed mount and with filer itself. It's just that these errors have not completely stalled the server before.

@PeterCxy
Copy link
Contributor Author

It seems that the issue is unrelated to ZeroTier, because my setup does not work properly even with ZeroTier removed. Unfortunately I am not able to reliably reproduce the issue outside of my production setup.

@PeterCxy
Copy link
Contributor Author

Interestingly, whenever the timeout happens, the volume server spams something like these in the logs

2021/03/16 05:53:13 http: superfluous response.WriteHeader call from github.com/chrislusf/seaweedfs/weed/server.processRangeRequest (common.go:243)

@kmlebedev
Copy link
Contributor

Interestingly, whenever the timeout happens, the volume server spams something like these in the logs

2021/03/16 05:53:13 http: superfluous response.WriteHeader call from github.com/chrislusf/seaweedfs/weed/server.processRangeRequest (common.go:243)

This is a known issue and it is clear how to solve it
#1903

@PeterCxy
Copy link
Contributor Author

@kmlebedev But is it the cause of timeouts / read failures? Or is it just a cosmetic error?

@kmlebedev
Copy link
Contributor

@kmlebedev But is it the cause of timeouts / read failures? Or is it just a cosmetic error?
This is the problem of inability to change the response code of the http response and is not the cause of your problem.
I would advise you to make tcpdump this error

@PeterCxy
Copy link
Contributor Author

I think the problem might simply be that SeaweedFS is giving up the connection too early. It seems that SeaweedFS starts reporting the timeout error only about 5 - 10 seconds after I initiated the I/O operation, which means that the first timeout happened very early. In addition, this error seems to only happen with large files, which could actually take more than 5 - 10 seconds to download due to network latency and TCP slow start.

@PeterCxy
Copy link
Contributor Author

Yeah I believe that Filer / Mount started timing out just ~9s after the request was initiated, but the full download would take ~16s. But I assumed that Filer / Mount is supposed to stream the file data, i.e. it should not wait until the full request is completed before sending back data to the client?

@PeterCxy
Copy link
Contributor Author

According to tcpdump, the volume server did send the data when Filer or Mount requested the file.

@PeterCxy
Copy link
Contributor Author

@chrislusf I think this could be the issue: https://github.com/chrislusf/seaweedfs/blob/10164d0386460c1c39ed8b5ee5c434704a2b28fd/weed/util/fasthttp_util.go#L17

According to the documentation of fasthttp, this is

Maximum duration for full response reading (including body).

So if the body took longer than this to read, the request would time out (?).

@chrislusf
Copy link
Collaborator

Ok. We may need to remove the usage of fasthttp package.

@PeterCxy
Copy link
Contributor Author

@chrislusf I believe just increasing the ReadTimeout / WriteTimeout will be enough. fasthttp does not seem to have a way to set a connection timeout without including the body, though.

@PeterCxy
Copy link
Contributor Author

PeterCxy commented Mar 16, 2021

I changed the timeouts locally to time.Minute and it seems at least for now the timeout messages have gone away.

@PeterCxy
Copy link
Contributor Author

PeterCxy commented Mar 16, 2021

@chrislusf According to valyala/fasthttp#299, to set the TCP dial timeout (instead of the timeout of the full request), one needs to provide a custom Dial function to fasthttp.Client.

The ReadTimeout and WriteTimeout should probably be removed because SeaweedFS has no hard limit on the size of files / chunks, so reading a chunk can take arbitrarily long. Or maybe set those to a sane upper bound, such as several minutes.

@kmlebedev
Copy link
Contributor

@chrislusf According to valyala/fasthttp#299, to set the TCP dial timeout (instead of the timeout of the full request), one needs to provide a custom Dial function to fasthttp.Client.

The ReadTimeout and WriteTimeout should probably be removed because SeaweedFS has no hard limit on the size of files / chunks, so reading a chunk can take arbitrarily long. Or maybe set those to a sane upper bound, such as several minutes.

We have a limited chunk size
https://github.com/chrislusf/seaweedfs/blob/df72dc206d9064daa79439ef9f3fb83f491eebd8/weed/server/filer_server_handlers_write_autochunk.go#L39

@PeterCxy
Copy link
Contributor Author

@kmlebedev But that could be customized and there doesn't seem to be a sane way to calculate a max timeout from a given chunk size. So probably need to just get rid of the body read timeout or set it to a high value like 5 minutes or 10 minutes.

@kmlebedev
Copy link
Contributor

@kmlebedev But that could be customized and there doesn't seem to be a sane way to calculate a max timeout from a given chunk size. So probably need to just get rid of the body read timeout or set it to a high value like 5 minutes or 10 minutes.

This is hardly reasonable, since the chunk size can be set to 1Gb.
Seems worth considering chunk size.

@PeterCxy
Copy link
Contributor Author

Or maybe just assume that the interconnect between SeaweedFS nodes will not be slower than, say, 10Mbps (~1.25 MiB/s), and calculate the timeout based on this. If users still run into timeout issues, a customizable option could be provided to set the timeout even higher.

@kmlebedev
Copy link
Contributor

Or maybe just assume that the interconnect between SeaweedFS nodes will not be slower than, say, 10Mbps (~1.25 MiB/s), and calculate the timeout based on this. If users still run into timeout issues, a customizable option could be provided to set the timeout even higher.

On a volume, the bottleneck is the disk, especially the HDD. Accordingly, the read speed may drop to 0.
Here I would like to see the high utilization of the disk and try to go to another volume. But this is different issues

@PeterCxy
Copy link
Contributor Author

I agree that smart load-balancing is nice to have, but for now, to resolve this issue, we either need to get rid of the timeout (which could result in Filer getting stuck forever), or set the timeout to some higher value. Not being able to read anything is a bigger problem than load-balancing.

chrislusf added a commit that referenced this issue Mar 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants