-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client: Handle sending/receiving in separate goroutines #94
Conversation
Changes the TTRPC client logic so that sending and receiving with the server are in completely independent goroutines, with shared state guarded by a mutex. Previously, sending/receiving were tied together by reliance on a coordinator goroutine. This led to issues where if the server was not reading from the connection, the client could get stuck sending a request, causing the client to not read responses from the server. See [1] for more details. The new design sets up separate sending/receiving goroutines. These share state in the form of the set of active calls that have been made to the server. This state is encapsulated in the callMap type and access is guarded by a mutex. The main event loop in `run` previously handled a lot of state management for the client. Now that most state is tracked by the callMap, it mostly exists to notice when the client is closed and take appropriate action to clean up. Also did some minor code cleanup. For instance, the code was previously written to support multiple receiver goroutines, though this was not actually used. I've removed this for now, since the code is simpler this way, and it's easy to add back if we actually need it in the future. [1] containerd#72 Signed-off-by: Kevin Parsons <kevpar@microsoft.com>
Codecov Report
@@ Coverage Diff @@
## main #94 +/- ##
==========================================
+ Coverage 72.88% 73.35% +0.47%
==========================================
Files 11 11
Lines 708 728 +20
==========================================
+ Hits 516 534 +18
+ Misses 155 153 -2
- Partials 37 41 +4
Continue to review full report at Codecov.
|
What did validation for this consist of? Main reason I'm asking is because the sample program in the issue linked takes me to 404 land :/ |
That repo changed to private in the last few months. I've put up a newer version of the test tool here: https://github.com/kevpar/ttrpc-deadlock For validation I've run my test tool (which repros the deadlock) for ~10 minutes and seen no deadlock occur. Other than that the main validation is what we get through CI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I can't speak on removing the multiple receiver goroutines as I have no history here.
// Receives calls from dispatch, adds them to the set of active calls, and sends them | ||
// to the server. | ||
go func() { | ||
var streamID uint32 = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not entirely new, but since you've modified and may have fresh memory - how does that work if streamID is getting bigger than what uint32 can represent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will only be a problem if a request with a previous ID is still pending. Given that a uint32
means we can do 100 requests per second for 500 days before we overflow, I think it's not an issue in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I forgot we only use odd numbers for requests, so it's for 250 days, not 500. Still a long time though. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I think some further refactoring could be done to make the code a bit easier to follow and the number of goroutines still seems higher than necessary with the addition of a shared locking object.
Late last year I created #73 in an attempt to solve #72. This PR had some feedback that I never had a chance to properly address. I've now revisited this area and taken a different approach, which I think is better to put in a fresh PR, so that's what we have here.
Originally I had changed both the client and server code to prevent this deadlock on both ends. However, after thinking on this recently, I think the server code is alright as is. The core issue is that the client would block sending to the server, and the server would block sending back to the client, and thus neither would read from the other and we would deadlock. This is definitely undesirable behavior from the client, but on the server end I think it is reasonable to block reading from a client if it is not draining its end of the connection. So this new approach only changes the client to allow sending and receiving to operate independently.
I intend to close out #73 assuming this new approach is reasonable.
Commit description
Changes the TTRPC client logic so that sending and receiving with the
server are in completely independent goroutines, with shared state
guarded by a mutex. Previously, sending/receiving were tied together by
reliance on a coordinator goroutine. This led to issues where if the
server was not reading from the connection, the client could get stuck
sending a request, causing the client to not read responses from the
server. See [1] for more details.
The new design sets up separate sending/receiving goroutines. These
share state in the form of the set of active calls that have been made
to the server. This state is encapsulated in the callMap type and access
is guarded by a mutex.
The main event loop in
run
previously handled a lot of statemanagement for the client. Now that most state is tracked by the
callMap, it mostly exists to notice when the client is closed and take
appropriate action to clean up.
Also did some minor code cleanup. For instance, the code was previously
written to support multiple receiver goroutines, though this was not
actually used. I've removed this for now, since the code is simpler this
way, and it's easy to add back if we actually need it in the future.
[1] #72
Signed-off-by: Kevin Parsons kevpar@microsoft.com