Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

SNIPacket performance improvement #27187

Merged
merged 2 commits into from
Feb 26, 2018
Merged

SNIPacket performance improvement #27187

merged 2 commits into from
Feb 26, 2018

Conversation

geleems
Copy link

@geleems geleems commented Feb 16, 2018

In SNIPacket, memory for _data byte array is allocated and released everytime SqlClient receives packets from server. I made it to reuse existing byte array if new allocation size is the same as existing byte array length. In TdsParserStateObjectManaged, SNIPacket object is newly created everytime it sends packets to server. I made it to reuse existing SNIPacket object if it is already created previously.
This fix improves SqlClient (managedSNI) by around 1000 tps both in sync and async in TechEmpower benchmark testing.(https://github.com/aspnet/benchmarks)

@@ -141,11 +141,11 @@ private SNIPacket GetSMUXEncapsulatedPacket(SNIPacket packet)
{
uint xSequenceNumber = _sequenceNumber;
byte[] headerBytes = null;
GetSMUXHeaderBytes(packet.Length, (byte)SNISMUXFlags.SMUX_DATA, ref headerBytes);
GetSMUXHeaderBytes(packet.DatatLength, (byte)SNISMUXFlags.SMUX_DATA, ref headerBytes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo on Data

private byte[] _data;
private int _length;
private int _capacity;
private byte[] _dataBuffer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep these variable names the same as earlier to keep the extent of changes less?
The name _data was good and its type pointed at the fact that it was a buffer.
Similar comment for _length change to _dataLength

@saurabh500
Copy link
Contributor

Please send two PRs

  1. Focussing purely on perf side of things.
  2. Focussing on cleanup.

The reason for the ask is that the code changes are easier to track when walking the history of the code when the grouping is logical.

}
}
catch (Exception e)
Task writeTask = packet.WriteToStreamAsync(_stream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change really help with performance? How much gain do you see?


return packet;
SNIPacket clonedPacket = new SNIPacket();
clonedPacket._dataBuffer = new byte[_dataBuffer.Length];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a behavior change. The _length was pointing to the length of the available amount of data.
However the change allocates a new byte array with the complete length of the old array. Was this a bug earlier or can this introduce a new bug now? Or does it not matter?

{
SNILoadHandle.SingletonInstance.LastError = new SNIError(SNIProviders.TCP_PROV, 0, SNICommon.ConnTerminatedError, string.Empty);
SNILoadHandle.SingletonInstance.LastError = new SNIError(SNIProviders.TCP_PROV, SNICommon.InternalExceptionError, e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the exception being changed?

@@ -17,8 +17,6 @@ internal class TdsParserStateObjectManaged : TdsParserStateObject
internal SNIPacket _sniAsyncAttnPacket = null; // Packet to use to send Attn
private readonly Dictionary<SNIPacket, SNIPacket> _pendingWritePackets = new Dictionary<SNIPacket, SNIPacket>(); // Stores write packets that have been sent to SNI, but have not yet finished writing (i.e. we are waiting for SNI's callback)

private readonly WritePacketCache _writePacketCache = new WritePacketCache(); // Store write packets that are ready to be re-used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this? From what I understood from our offline discussion is that we need to make sure that the WritePacketCache.Add code path is not being utilized and that causes new write packets to be created everytime.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant in the offline discussion was the is no code path that triggers _writePacketCache.Add(). The only spot that executes the method in TdsParserStateObjectNative is RemovePacketFromPendingList(), and there is no implementation for the method in TdsParserStateObjectManaged

}
}

public bool IsEmpty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsInvalid is a better name that IsEmpty the reason being, a packet with a null buffer is considered Invalid. However a buffer with a non-null buffer but with buffer with no data will be considered empty.

@@ -48,29 +47,26 @@ internal class SNITCPHandle : SNIHandle
/// </summary>
public override void Dispose()
{
lock (this)
if (_sslOverTdsStream != null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the lock removal provide any perf benefits? If yes, can you provide a measurement?

Copy link
Author

@geleems geleems Feb 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there is slight perf improvement without locks (about 100~300 tps).
I ran benchmark test sync / async, with / without locks for 5 times each.
Here is the result:

n sync with lock sync without lock async with lock async without lock
1 12115 tps 12786 tps 12616 tps 12620 tps
2 11979 tps 12454 tps 12318 tps 12591 tps
3 11720 tps 12552 tps 12475 tps 12551 tps
4 12141 tps 12486 tps 12040 tps 12233 tps
5 12317 tps 12344 tps 12617 tps 12643 tps

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the purpose of the lock? I like seeing pointless locks removed, as locks incur both overhead even for single-threaded usage as well as serve as a point of contention and scalability bottleneck. But was it pointless?

@@ -360,6 +354,12 @@ public override uint EnableSsl(uint options)
{
_validateCert = (options & TdsEnums.SNI_SSL_VALIDATE_CERTIFICATE) != 0;

if (_sslOverTdsStream == null || _sslStream == null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you move the ssl initialization to Connect() ? What happens to the ssl needs of the socket coming from ParallelConnect in case of MSF?

@saurabh500
Copy link
Contributor

All the locks have been removed from the send and receive routines. Can you explain why they are not necessary and why they will not cause any problems?

_writeTaskFactory.StartNew(() =>
{
try
Task writeTask = packet.WriteToStreamAsync(_stream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much is this change helping in terms of perf?

{
packet.WriteToStream(_stream);
Exception e = t.Exception;
SNILoadHandle.SingletonInstance.LastError = new SNIError(SNIProviders.TCP_PROV, SNICommon.InternalExceptionError, e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code can be reduced.

SNIAsyncCallback cb = callback ?? _sendCallback;

uint status = TdsEnums.SNI_SUCCESS;

if(t.IsFaulted) 
{
      SNILoadHandle.SingletonInstance.LastError = new SNIError(SNIProviders.TCP_PROV, SNICommon.InternalExceptionError, t.Exception);
      status = TdsEnums.SNI_ERROR;
}

cb(packet, status);

@@ -183,7 +176,8 @@ public SNITCPHandle(string serverName, int port, long timerExpire, object callba
_status = TdsEnums.SNI_SUCCESS;
}

private static Socket Connect(string serverName, int port, TimeSpan timeout)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Extra line

@geleems
Copy link
Author

geleems commented Feb 16, 2018

@saurabh500
There are 2 reasons for removing locks in SNITcpHandle

  1. Each SqlConnection has it own SNIHandle. lock is not needed in SNIHandle unless more than one thread utilize the same SqlConnection object concurrently. That is not the scenario we support. Each thread should have its own SqlConnection.
  2. I could not find semaphore lock in native SNI for the operations that SNITcpHandle provides.

The only thing I am concerning is locking when connection returns back to pool.
We can leave the locks as they were for now for safety reason, and dig into it later to find unexpected scenario.

@geleems geleems changed the title Reuse existing byte array for perf improvement (and minor code cleanup) Reuse existing byte array for perf improvement Feb 16, 2018
@saurabh500
Copy link
Contributor

Thanks for the lock removal explanation @geleems

@geleems geleems changed the title Reuse existing byte array for perf improvement SNIPacket performance improvement Feb 16, 2018
/// </summary>
/// <param name="capacity">Bytes to allocate</param>
public void Allocate(int capacity)
{
_capacity = capacity;
_data = new Byte[capacity];
if (_data == null || _data.Length != capacity)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be < rather than != ? That way you can use the existing array as long as it's at least as large as is needed, rather than exactly what's needed.

@@ -217,9 +205,7 @@ public int TakeData(byte[] buffer, int dataOffset, int size)
/// </summary>
public void Release()
{
_length = 0;
_capacity = 0;
_data = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are packets guaranteed to be released? If yes, you could use ArrayPool rather than directly allocating.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using ArrayPool is great idea, but it cause Manual Test failure in Mars connection test. I will rollback to my original implementation for now.

@@ -249,7 +237,7 @@ public void ReadFromStreamAsync(Stream stream, SNIAsyncCallback callback, bool i
options |= TaskContinuationOptions.LongRunning;
}

stream.ReadAsync(_data, 0, _capacity).ContinueWith(t =>
stream.ReadAsync(_data, 0, _data.Length).ContinueWith(t =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to your change, but looking at this code...

I assume this ReadFromStreamAsync method is on a hot path? As it stands, it's always going to queue a continuation, even if the stream.ReadAsync completes synchronously, which in general is a common thing (I don't know about this specific code and usage patterns), and it's always going to incur a closure, so every time ReadFromStreamAsync is called, it looks like there's at least four allocations of additional overhead happening here just from the way the method is constructed. It'd be great to look at rewriting the async code paths through the library to use async/await, as I expect that'll help significantly with maintainability and also help with perf, but if that's too much work for the short-term, you might look at rewriting just this method to use async/await (it could be an async void method).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @stephentoub
@geleems Lets take this as a fix to a new issue that I have opened here https://github.com/dotnet/corefx/issues/27222
I will be worth considering all I/O calls in SqlClient managed SNI and change them to use async/await instead of creating continuation tasks.

@@ -48,29 +47,26 @@ internal class SNITCPHandle : SNIHandle
/// </summary>
public override void Dispose()
{
lock (this)
if (_sslOverTdsStream != null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the purpose of the lock? I like seeing pointless locks removed, as locks incur both overhead even for single-threaded usage as well as serve as a point of contention and scalability bottleneck. But was it pointless?

else
Task writeTask = null;
writeTask = packet.WriteToStreamAsync(_stream);
writeTask.ConfigureAwait(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nop. ConfigureAwait is a helper that returns a struct that influences how await operates; it doesn't mutate the task on which it's called at all. You can simply delete this line.

That said, related to my previous comment about overheads in the reading method, even more than reads writes often complete synchronously. It'd be nice to look at removing the additional overheads here as well.

@@ -425,25 +423,22 @@ public override void SetBufferSize(int bufferSize)
/// <returns>SNI error code</returns>
public override uint Send(SNIPacket packet)
{
lock (this)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it ok to remove these locks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you already answered this at #27187 (comment). Thanks

@@ -532,42 +526,21 @@ public override void SetAsyncCallbacks(SNIAsyncCallback receiveCallback, SNIAsyn
/// <returns>SNI error code</returns>
public override uint SendAsync(SNIPacket packet, SNIAsyncCallback callback = null)
{
SNIPacket newPacket = packet;

_writeTaskFactory.StartNew(() =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this change the fields

private readonly TaskScheduler _writeScheduler;
private readonly TaskFactory _writeTaskFactory;

are not needed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@geleems
Copy link
Author

geleems commented Feb 22, 2018

  • The change was passed for manual tests in Linux except the ones that failed without this change in master.
  • EFCore SQL Server specific tests were passed with this change.

@@ -19,7 +19,7 @@ internal sealed class TdsParserStateObjectFactory
//private static bool shouldUseLegacyNetorking;
//public static bool UseManagedSNI { get; } = AppContext.TryGetSwitch(UseLegacyNetworkingOnWindows, out shouldUseLegacyNetorking) ? !shouldUseLegacyNetorking : true;

public static bool UseManagedSNI { get; } = false;
public static bool UseManagedSNI { get; } = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you want to check in this change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true was only for testing only. Rolled back to false

@@ -424,6 +424,7 @@
</ItemGroup>
<ItemGroup Condition="'$(IsPartialFacadeAssembly)' != 'true' AND '$(OSGroup)' != 'AnyOS'">
<Reference Include="Microsoft.Win32.Primitives" />
<Reference Include="System.Buffers" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? This probably came in because of ArrayPool usage and I don't see the code using this dependency.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just removed..

@@ -205,8 +208,7 @@ internal override void ClearAllWritePackets()
{
if (_sniPacket != null)
{
_sniPacket.Dispose();
_sniPacket = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this line? The Garbage collector will see a null reference and perform a cleanup. Also I would recommend using the Dispose() call because it seems more natural to Dispose and set to null. (even though Dispose and Release do the same thing)

Copy link
Author

@geleems geleems Feb 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With those lines, _sniPacket object is recreated over and over again, which is pointless. It can reuse existing _sniPacket

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I don't follow. The design looks like _sniPacket is taken from the cache and not allocated over and over.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And _sniPacket object is disposed when TdsParserStateObject is disposed.

_sniAsyncAttnPacket = attnPacket;
SetPacketData(attnPacket, SQL.AttentionHeader, TdsEnums.HEADER_LEN);
return attnPacket;
if (_sniAsyncAttnPacket == null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to remove the local attnPacket ? It is always better to work with a local before putting a reference on a field.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous code creates the same _sniAsyncAttnPacket object over and over again. It should be created once at start (if (_sniAsyncAttnPacket == null)).

Copy link
Author

@geleems geleems Feb 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local variable _attnPacket points the same memory address as _sniAsyncAttnPacket in the line#170 in previous code. No need to create duplicated variable that has the same reference.

@geleems
Copy link
Author

geleems commented Feb 26, 2018

@dotnet-bot test Linux x64 Release Build
please.

@saurabh500 saurabh500 merged commit cb00bee into dotnet:master Feb 26, 2018
saurabh500 pushed a commit to saurabh500/corefx that referenced this pull request Mar 1, 2018
saurabh500 added a commit that referenced this pull request Mar 1, 2018
@karelz karelz added this to the 2.1.0 milestone Mar 10, 2018
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* SNIPacket Perf Improvement

* comment fix


Commit migrated from dotnet/corefx@cb00bee
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants