Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

SqlClient make managed connection more async #36667

Closed
wants to merge 1 commit into from

Conversation

Wraith2
Copy link
Contributor

@Wraith2 Wraith2 commented Apr 6, 2019

Related to https://github.com/dotnet/corefx/issues/25742

SNITcpHandle currently uses synchronous dns queries and socket open calls in the ctor which causes blocking behaviour which can be long running. This PR changes the ctor to initiate an async task to establish the connection and then makes sure that the task is correctly cleaned up and the status resolved when any sync apis request that information. async dns and connect are used where possible. This should allow more connections to be started concurrently by changing wait characteristics to non-blocking.

Notes:

  • Various logic in the library expects sync behaviour and it would be a major undertaking to remove that assumption. I have attempted to leave the resolution of status as late as possible by putting the new ResolveConnectionStatus on the two paths that are always used to check the status but this method has to intersect the sync and async worlds and must be serialized so that only one completion is possible, lock and async aren't compatible so it uses a sync wait on an async task. I don't like this but I don't have a better way
  • I can't test any of the parallel code path. it compiles and the logic is unchanged but it will need validation by someone with access to enterprise style test infrastructure. I find the code on this path slightly baffling.
  • the SNIProxy singleton feels like a really bad idea for async. Again it would be a major rewrite to remove it.

/cc owners @afsanehr, @tarikulsabbir, @Gary-Zh , @David-Engel and interested people @divega @saurabh500

{
cancellationTokenSource.CancelAfter(timeout);
}
connectTask.Wait(cancellationTokenSource.Token);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sync Wait in async method? Can't you just await it?

Also you could pickup one of the Timeout patterns from the TaskTimeoutExtensions in tests

public static async Task TimeoutAfter(this Task task, int millisecondsTimeout)
{
var cts = new CancellationTokenSource();
if (task == await Task.WhenAny(task, Task.Delay(millisecondsTimeout, cts.Token)).ConfigureAwait(false))
{
cts.Cancel();
await task.ConfigureAwait(false);
}
else
{
throw new TimeoutException($"Task timed out after {millisecondsTimeout}ms");
}
}
public static async Task<TResult> TimeoutAfter<TResult>(this Task<TResult> task, int millisecondsTimeout)

/cc @stephentoub

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't test any of the parallel branch and the code in the ParallelAsyncHelper is a bit beyond me. If there's a way to safely await it and preserve behaviour without risk i'm open to suggestions.
Similarly if there's a good way to avoid the async wait in ResolveConnectionStatus it would be good but I don't see one without an async lock.

@saurabh500
Copy link
Contributor

@Wraith2 The change you are making is something SQLClient used to do earlier and had to be changed to sync network calls.
To understand this better, you have to see how the Connection Pool works. The Connection pool starts a thread on which all connection fulfilment requests are queued. The caller i.e. SqlConnection.Open or OpenAsync is notified via semaphores about the completion status of the connection open request.
The connection pool thread, serially fulfils all the connection.open/openasync calls.

With the change you have made a connection.OpenAsync() will cause new threads to be launched for the network operations which will ultimately lead to thread pool exhaustion.

In Asp.Net core apps we saw, with the behavior that you are recommending that, it was hard to open more than 20 connections and with the sync calls we could open more than 10k connections.

Ref PR #26200
Issue https://github.com/dotnet/corefx/issues/25620

@saurabh500
Copy link
Contributor

saurabh500 commented Apr 6, 2019

Since the direction of this approach can lead to scalability issues, I recommend closing this and if sync APIs need to be used for networking, then they should be plumbed all the way down from the connection pool without changing the guarantees of the connection pool behavior.

From what I understand, the connection pool itself has some issues where it returns open connections slowly, and I haven't dived deeper into it. That is an area that should be attacked for connection open, vs the managed SNI layer.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 7, 2019

@Wraith2 The change you are making is something SQLClient used to do earlier and had to be changed to sync network calls.

Same calls, different way of using them. The previous version was using async methods but not awaiting them and was using .Result in the ctor so despite using async methods everything was happening in sync and the caller thread was blocked. So what it did was create a task which probably ran on a threadpool thread and then waited for it.

This PR changes the ctor so the work of connection is immediately farmed out to a detached task and left to run. The ctor caller gets returned to immediately and is not blocked. The connection task is left alone until something requires knowledge of the connection state and at that point it must block to resolve the state, hopefully at this point the connection task has had enough time to do some of the work required to connect and if we're lucky it'll have finished.

Fundamentally if a new connection is requested there is a set of actions that must take place for it to happen, there isn't a way to avoid the sequential nature of the dns lookup and then tcp connect. What this does is make it a little more concurrent friendly so it won't block unless it needs to and more connections can be started.

The only way to avoid the real problem is to change the connection pooling to be async only throughout. That's a big bit of work.

@saurabh500
Copy link
Contributor

This PR changes the ctor so the work of connection is immediately farmed out to a detached task and left to run.

This is the problem. This shouldn't happen. This will need a new thread to complete the connection task, which is what we want to avoid.

@saurabh500
Copy link
Contributor

And also the changes will affect the Sync Open as well.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 7, 2019

This PR changes the ctor so the work of connection is immediately farmed out to a detached task and left to run.

This is the problem. This shouldn't happen. This will need a new thread to complete the connection task, which is what we want to avoid.

It'll need a thread to run the task on. The difference is that the task doing the connection work will not block that thread indefinitely because it isn't being synchronously awaited until state needs to be resolved. In the old model caller thread starts a task on the threadpool and then blocks caller and pool threads waiting for the answer in the ctor. This version starts a task returns caller and defers resolution of the task until later allowing many awaiting tasks to be present without blocking.

And also the changes will affect the Sync Open as well.

Yes. This is because the open action was being performed in the ctor and was always blocking, whether that be the calling thread or a pool thread the .Result call forced state resolution before exit. That has been moved to calling .Status or CheckConnection() which should be later (though unlikely to be ms required for a network dns query) so there's some concurrency now instead of none. As I said you can't be fully concurrent without making the connection pool async all the way from the top.

Essentially I think this is better but can't be perfect.

I'll close as requested but I'd like to continue the discussion to get feedback on the problem to make sure I'm using the right mental model when working with async, hopefully @benaadams or @stephentoub can comment on that.

@Wraith2 Wraith2 closed this Apr 7, 2019
@saurabh500
Copy link
Contributor

The difference is that the task doing the connection work will not block that thread indefinitely because it isn't being synchronously awaited until state needs to be resolved.

The task which is doing is the connection work is not blocked already. SqlConnection.OpenAsync is not blocking. This is because it requests the Connection Pool thread to queue its request. Connection pool thread tries to drain out its queue in serial manner. I.e. if it has 10 requests, then 1 followed by 2 followed by 3 will be fulfilled, and 1 2 3 ... 10 will not lead to 10 parallel TCP connection open.
Also this is intentional and by design, that the connections to the same server be opened in parallel.

This approach takes up more threads for the work it needs to do and reduces the scalability of opening connections.

Even if the connection open was not in the constructor and in a different method, with async connection open on connection pool thread, more connections would have been opened. Right now a single thread from connection pool is used to fulfill the connection open.

What really is the problem that you are facing for which this is the solution is something I am not able to understand. Is the change purely based on reading the code and seeing that there is no async path in managed SNI for opening connections?

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 7, 2019

It was suggested it might be worth taking a look at. I can't see ConnectAsync going async It returns a task but that isn't the same thing, it's hard to follow so I must be missing the Task.Run or await somewhere.

@saurabh500
Copy link
Contributor

The code for async connection open was built on top of the way sync Open was happening. This statement doesn't imply that Async over sync pattern is being used.

When a connection.OpenAsync() is called, then

  1. The connection pool is checked to see if there are any available connections in the pool at https://github.com/dotnet/corefx/blob/master/src/System.Data.SqlClient/src/System/Data/ProviderBase/DbConnectionPool.cs#L1050 Note the allowCreate = false and the pool doesn't create new connections. It waits on any already signaled handles.

  2. If no connection is found at step 1, the connection request is queued in a ConcurrentPool at https://github.com/dotnet/corefx/blob/master/src/System.Data.SqlClient/src/System/Data/ProviderBase/DbConnectionPool.cs#L1066

This queue is dequed by the connection pool thread (which is launched in case there are pending open requests and there is no thread running), and it dequeues the Concurrent Queue at https://github.com/dotnet/corefx/blob/master/src/System.Data.SqlClient/src/System/Data/ProviderBase/DbConnectionPool.cs#L965

Then a connection request is made synchronously on the connection pool thread at https://github.com/dotnet/corefx/blob/master/src/System.Data.SqlClient/src/System/Data/ProviderBase/DbConnectionPool.cs#L991

At this line the allowCreate is set to true. Which means that the request can allow the creation of new connections if the pool doesn't have any connections that can be returned as is. That ultimately calls the SNITcpHandle constructor to establish a TCP connection.

When the connection is created, then the TaskCompletionSource for the SqlConnection.Open result is set at https://github.com/dotnet/corefx/blob/master/src/System.Data.SqlClient/src/System/Data/ProviderBase/DbConnectionPool.cs#L1009 The TaskCompletionSource (tcs) is allocated during SqlConnection.OpenAsync() at https://github.com/dotnet/corefx/blob/master/src/System.Data.SqlClient/src/System/Data/SqlClient/SqlConnection.cs#L1002

This will explain how the SqlConnection.OpenAsync is async. It doesn't block on any real TCP connection Open. The actual TCP connection establishment happens on a ConnectionPool thread, of which only one is launched per connection pool and all the connection requests for opening connections are fulfilled on the connection pool thread.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 8, 2019

Very detailed thank you and i see the confusion now, i hadn't found the producer-consumer thread because i was working up from the tcp handle constructor and not going far enough up into the provider architecture.

So then the only outstanding question i have is whether it is desired for tcp handle ctor to be blocking? As mentioned in the suggestion thread the connection only needs to be finished opening in CheckConnection and my change affects that path which would allow multiple handles to be created more quickly and to block slightly later on when finding one that is open a possible minor speed increase and less blocking on the connection pool thread.

It'd be good to quantify this but i couldn't work out how to setup your docker repro. Would simply benchmarking opening and closing of connections in a loop be sufficient?

@saurabh500
Copy link
Contributor

whether it is desired for tcp handle ctor to be blocking?

Yes it is desired for the TCPHandle Ctor to be blocking.

Lets do a deeper dive into this.

When SqlConnection.OpenAsync was called, the request was sent to the Connection Pool thread, which ends up calling SqlConnectionFactory.CreateConnection
This in turns instantiates SqlInternalConnectionTds at https://github.com/dotnet/corefx/blob/master/src/System.Data.SqlClient/src/System/Data/SqlClient/SqlConnectionFactory.cs#L98

SqlConnection.OpenAsync provides either an open authenticated connection or an exception.

Hence the job of the Ctor of SqlInternalConnectionTds is to then calls OpenLoginEnlist which calls LoginNoFailover
LoginNoFailover does 3 things

  1. Establish a TCP connection, by ultimately using SNITcpHandle ctor,
  2. Send the Pre-login message
  3. Login to Sql server.

The TCP connection of step 1 is created using TdsParserStateObject.CreatePhysicalSNIHandle which calls the overridden TdsParserStateObjectManaged.CreatePhysicalSNIHandle which in turn, instantiates the SNITcpHandle.

As of now, returning from TdsParserStateObjectManaged.CreatePhysicalSNIHandle means that a new TCP connection was created or an error occurred while doing so. There is no CheckConnection in the picture so far. Now the second step has to happen where Pre-Login message to Sql has to be sent.

This is accomplished via TdsParser.SendPreLoginHandshake which calls TdsParserStateObject.WritePacket which then calls TdsParserStateObject.WriteSni which will utimately end up calling SNITcpHandle.Send(packet) which is a sync write to the underlying stream.
This code which sends the pre-login handshake has assumed that the connection is open. In your testing, you might be testing in low latency scenarios where the connections to SQL server are established almost immediately, that by the time the data is sent to SQL server (in this case the prelogin message), the stream is already available. There is no CheckConnection call in the above mentioned scenarios. The CheckConnection call usually happens when there is a connection retry or if there is an connection being returned from the connection pool where we poll the socket to check if it is still alive after a period of inactivity.

So that could be a bug in the approach in this PR.

You could end up adding a check after SNITcpHandle ctor is created, to check if the underlying task has completed. There are 2 caveats.

  1. A task being queued in the constructor will complete on a separate thread than the one that calls the constructor. This means additional resource utilization.
  2. Connection pool, performance wouldn't improve even if the Ctor returns quickly enough, because the connection still needs to send data over the established TCP connection, to authenticate the connection.

Essentially, for connection to open, we need a guaranteed TCP connection here so that the rest of the steps of connection open can be done successfully. Whether the constr returns early or later, it wouldn't matter and connection pool wouldn't be able to continue unless the current connection being processed, is driven to completion.

@saurabh500
Copy link
Contributor

@Wraith2 In case you need more information with respect to this PR, let's continue the conversation here. Else to brainstorm perf of the connection pool we could discuss on #30430

@Wraith2 Wraith2 deleted the sqlfix-asyncopenblocking branch April 11, 2019 09:52
@karelz karelz added this to the 3.0 milestone May 22, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants