-
Notifications
You must be signed in to change notification settings - Fork 312
Description
This was originally posted in dotnet runtime repo but it was actually meant to be posted here.
Description
Repro steps:
With a sql connection configured with MultipleActiveResultSets=True
, execute a simple SELECT
query.
Ex: SELECT * FROM FooTable WHERE UserId='XXX'
The problem is more evident with a higher network latency between the client and server (ex: >=10ms ping) and a bigger payload to return (ex: >=5MB)
Actual results:
In my case, MARS-enabled request took ~2.2-2.3 seconds longer to complete vs the same request with MARS-disabled (~75-100ms).
Note that I believe that this number would increase linearly with ping time between client and server. See analysis below.
Expected results:
I would expect the request latency to be equivalent to the same SELECT
request executed with MultipleActiveResultSets=False
.
Configuration
- Which version of .NET is the code running on? 3.1.19
- What OS version, and what distro if applicable? Linux/Debian
- What is the architecture (x64, x86, ARM, ARM64)? x64
- If relevant, what are the specs of the machine? It doesn't seem to be related to a specific configuration. Tested on Windows and same results.
Regression?
Unknown
Data
Simple SELECT query returning 2000 rows (~4.8MB, ~600 packets of 8KB)
Round-trip time (ping) between client and server: ~11ms
===========================================================
System.Data.SqlClient (MARS enabled)
-----------------------------------------------------------
Execution #1: PacketSize=8192, Elapsed=2857 ms
Execution #2: PacketSize=8192, Elapsed=2368 ms
Execution #3: PacketSize=8192, Elapsed=2404 ms
Execution #4: PacketSize=8192, Elapsed=2356 ms
Execution #5: PacketSize=8192, Elapsed=2337 ms
===========================================================
System.Data.SqlClient (MARS disabled)
-----------------------------------------------------------
Execution #1: PacketSize=8192, Elapsed=260 ms
Execution #2: PacketSize=8192, Elapsed=83 ms
Execution #3: PacketSize=8192, Elapsed=68 ms
Execution #4: PacketSize=8192, Elapsed=73 ms
Execution #5: PacketSize=8192, Elapsed=76 ms
===========================================================
Microsoft.Data.SqlClient (MARS enabled)
-----------------------------------------------------------
Execution #1: PacketSize=8192, Elapsed=2783 ms
Execution #2: PacketSize=8192, Elapsed=2362 ms
Execution #3: PacketSize=8192, Elapsed=2366 ms
Execution #4: PacketSize=8192, Elapsed=2358 ms
Execution #5: PacketSize=8192, Elapsed=2368 ms
===========================================================
Microsoft.Data.SqlClient (MARS disabled)
-----------------------------------------------------------
Execution #1: PacketSize=8192, Elapsed=257 ms
Execution #2: PacketSize=8192, Elapsed=105 ms
Execution #3: PacketSize=8192, Elapsed=101 ms
Execution #4: PacketSize=8192, Elapsed=105 ms
Execution #5: PacketSize=8192, Elapsed=112 ms
===========================================================
Analysis
I sniffed the network and here is the pattern I think is problematic. This image represents a small part where server is transferring data to the client:
Or in other words:
...
SQLServer -> SqlClient: [TDS] Data (8K)
SQLServer -> SqlClient: [TDS] Data (8K)
SQLServer -> SqlClient: [TDS] Data (8K)
SqlClient -> SQLServer: [SMP] Ack (Window size = 3)
~11 ms later...
SQLServer -> SqlClient: [TDS] Data (8K)
SQLServer -> SqlClient: [TDS] Data (8K)
SQLServer -> SqlClient: [TDS] Data (8K)
SqlClient -> SQLServer: [SMP] Ack (Window size = 3)
~11 ms later
SQLServer -> SqlClient: [TDS] Data (8K)
...
From what I understand of the SMP protocol, there is a flow control mechanism in place that work like this:
The server, after sending X amount of packets to the client, will stop sending more until it receives an ACK
message from the client that indicates the new amount of packets that can be sent before stalling again. (I am paraphrasing, but I believe that's the gist of it).
In the implementation of SqlClient, the client only sends the ACK
after receiving the last allowed packet in the window. The server will therefore stall for the equivalent of the network latency between client and server (5.5ms in my case) before sending another packet. And before the packet can reach the client, the same delay applies (another 5.5ms). So a total of 11ms between the reception of the two packets.
We seem to use a window size of 3 in SqlClient. So every 3 packets we receive, we have to wait a full round-trip (11ms in my case) before receiving another packet. When you have 600 packets to receive, it would mean 200 stalls of 11ms or a total of 2.2 seconds that would be added to the full request latency.
Even if we had a 1ms ping between client and server, that would still mean a 200ms delay added that could be avoided (if I am correct with my analysis).
From the code, we don't seem to have the possibility to adjust this algorithm with some configurations.
If I disable MARS, SMP is removed from the protocol stack (reference) which make the problem disappear as we don't have this flow control mechanism enabled.
There are probably good reasons why the MARS algorithm is implemented the way it is, but it seems problematic to stall the transfer while waiting for an ACK
from the client. Am I wrong? Any idea if this could be avoided?