-
Notifications
You must be signed in to change notification settings - Fork 541
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1114 from haaawk/stream_ids_fix
Stop reusing stream ids of requests that have timed out due to client-side timeout (#1114) * ResponseFuture: do not return the stream ID on client timeout When a timeout occurs, the ResponseFuture associated with the query returns its stream ID to the associated connection's free stream ID pool - so that the stream ID can be immediately reused by another query. However, that it incorrect and dangerous. If query A times out before it receives a response from the cluster, a different query B might be issued on the same connection and stream. If response for query A arrives earlier than the response for query B, the first one might be misinterpreted as the response for query B. This commit changes the logic so that stream IDs are not returned on timeout - now, they are only returned after receiving a response. * Connection: fix tracking of in_flight requests This commit fixes tracking of in_flight requests. Before it, in case of a client-side timeout, the response ID was not returned to the pool, but the in_flight counter was decremented anyway. This counter is used to determine if there is a need to wait for stream IDs to be freed - without this patch, it could happen that the driver throught that it can initiate another request due to in_flight counter being low, but there weren't any free stream IDs to allocate, so an assertion was triggered and the connection was defuncted and opened again. Now, requests timed out on the client side are tracked in the orphaned_request_ids field, and the in_flight counter is decremented only after the response is received. * Connection: notify owning pool about released orphaned streams Before this patch, the following situation could occur: 1. On a single connection, multiple requests are spawned up to the maximum concurrency, 2. We want to issue more requests but we need to wait on a condition variable because requests spawned in 1. took all stream IDs and we need to wait until some of them are freed, 3. All requests from point 1. time out on the client side - we cannot free their stream IDs until the database node responds, 4. Responses for requests issued in point 1. arrive, but the Connection class has no access to the condition variable mentioned in point 2., so no requests from point 2. are admitted, 5. Requests from point 2. waiting on the condition variable time out despite there are stream IDs available. This commit adds an _on_orphaned_stream_released field to the Connection class, and now it notifies the owning pool in case a timed out request receives a late response and a stream ID is freed by calling _on_orphaned_stream_released callback. * HostConnection: implement replacing overloaded connections In a situation of very high overload or poor networking conditions, it might happen that there is a large number of outstanding requests on a single connection. Each request reserves a stream ID which cannot be reused until a response for it arrives, even if the request already timed out on the client side. Because the pool of available stream IDs for a single connection is limited, such situation might cause the set of free stream IDs to shrink to a very small size (including zero), which will drastically reduce the available concurrency on the connection, or even render it unusable for some time. In order to prevent this, the following strategy is adopted: when the number of orphaned stream IDs reaches a certain threshold (e.g. 75% of all available stream IDs), the connection becomes marked as overloaded. Meanwhile, a new connection is opened - when it becomes available, it replaces the old one, and the old connection is moved to "trash" where it waits until all its outstanding requests either respond or time out. This feature is implemented for HostConnection but not for HostConnectionPool, which means that it will only work for clusters which use protocol v3 or newer. This fix is heavily inspired by the fix for JAVA-1519. Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>
- Loading branch information
Showing
6 changed files
with
158 additions
and
30 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.