New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ocassional command timeouts on cluster connection. #28
Comments
mstyura
changed the title
Accidental command timeouts on cluster connection.
Ocassional command timeouts on cluster connection.
May 16, 2023
mstyura
added a commit
to mstyura/rustis
that referenced
this issue
May 18, 2023
mcatanzariti
pushed a commit
that referenced
this issue
May 18, 2023
seems like automatically closed, even so not fixed by mr referenced. |
mstyura
added a commit
to mstyura/rustis
that referenced
this issue
May 18, 2023
mcatanzariti
pushed a commit
that referenced
this issue
May 18, 2023
Co-authored-by: Yury Yarashevich <yura.yaroshevich@gmail.com>
mstyura
added a commit
to mstyura/rustis
that referenced
this issue
May 18, 2023
…ds to facilitate logs inspection.
mstyura
added a commit
to mstyura/rustis
that referenced
this issue
May 18, 2023
…ds to facilitate logs inspection.
mcatanzariti
pushed a commit
that referenced
this issue
May 18, 2023
mstyura
added a commit
to mstyura/rustis
that referenced
this issue
May 19, 2023
mstyura
added a commit
to mstyura/rustis
that referenced
this issue
May 19, 2023
…d to different nodes.
mstyura
added a commit
to mstyura/rustis
that referenced
this issue
May 19, 2023
… to different nodes.
mcatanzariti
pushed a commit
that referenced
this issue
May 19, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Problem description
The client application, which uses rustis to connect to a Redis cluster, sometimes doesn't receive a response to a command, or the command execution timeout is triggered.
Steps to reproduce
rustis
to connect to redis cluster;Actual result
Last commands does not receive response or times out
Expected result
All commands run to completion without timeout
More details
The problem feels like some
Futute
is not properly polled, and computation proceeds as long as new commands sent to redis.I've did some debugging and seems like I found out the problem, but can't figure out the easy fix for a problem.
I believe there is a problems with this network loop:
rustis/src/network/network_handler.rs
Lines 136 to 145 in a53ac5d
According to tokio tutorial select drops the futures which are not "selected" (completed). The future returned by
self.msg_receiver.next()
is safe to drop become it does not incur any computations (afaik), while the future returned byself.connection.read()
is not, in case ofClusterConnection
:rustis/src/network/cluster_connection.rs
Lines 426 to 552 in a53ac5d
ClusterConnection::read
it might be interrupted at the middle of execution, somewhere betweenrustis/src/network/cluster_connection.rs
Line 431 in a53ac5d
return
s, and between first await and exits from methods there are multiple suspension points (await
s).So on the next iteration of
NetworkHandler::network_loop
theselect!
will create two new futures forMsgReceiver::next
andClusterConnection::read
, so the new future produced byClusterConnection::read
will first wait for new bytes from socket with redis, while it should "resume" with handling of response from previous future which is cancelled.As a workaround I've tried locally to move
rustis/src/network/cluster_connection.rs
Lines 451 to 459 in a53ac5d
loop
insideClusterConnection::read
and it seems to "make"ClusterConnection::read
behave like "resumable" operation. But I'm not sure the change I've made correct at all, even so it seems to fix the app I've tested with.The open question is what to do next, I see that either:
Connection
seems like kind ofStream
and it should be combined withMsgReceiver
usingselect
to produce combined stream which is later could be handled withwhile
loop withmatch
inside;Connection::read
on all types of connections must be cancel safe, so new calls toread
must "resume" previously cancelled read (the question here is how to make this robust to further changes, i.e. how to prevent cancel-safe future to become unsafe again without notice).The text was updated successfully, but these errors were encountered: