SqlClient remove primitive boxing in SqlDataReader.GetFieldValue #34999

Wraith2 · 2019-01-31T19:35:51Z

When reading primitive types through a data reader using the GetGuid and GetFieldValue<Guid> methods the guid data was read from the Tds stream into a guid which was then placed in an SqlGuid class which is allocated on the heap. This made reading 16 bytes of data quite expensive and caused a lot of GC noise.

This PR changes guid behaviour by adding guid to the SqlBuffer.Storage union structure allowing the guid to be stored in space already allocated by the SqlBuffer incurring no extra overhead or box. All uses of the StorageType enum have been audited to ensure that Guid and SqlGuid are both handled correctly.

The existing TdsParser code uses a new byte array each time a guid is read from the input stream, I have special cased this behaviour with a stack allocated span in netcore and a rented buffer in other builds.

In SqlDataReader fetching any value through GetFieldValue will assign the value to an object variable causing value types to box. I have used knowledge of two jit behaviours to allow eliminate this boxng behaviour in the case of some common value types.

RyuJIT will identify generic type T as a constant at jit time and eliminate branches which cannot be reached allowing the creation of type specific implementations.
RyuJIT will also recognise the sequence (T)(object)variable where variable is of type T and remove the leading casts, this allows code to be accepted by the c# compiler because of the object cast but still generate efficient and non-boxing code. I have annotated this in the code so anyone who finds it will understand why it is doing strange looking things. sharplab demo

The auto and manual tests pass in debug and release. The methods affected by these changes are covered by the existing DataStreamTest coverage and I didn't see a need to add more, but let me know if you see a gap.

Some performance measurements with my usual SqlBench project show encouraging improvement. The basic premise of the functions benched is to read 1000 guids with a single data reader and repeat that often enough to get enough time spent that the benchmark is representative. GetField and GetValue are both tested:

Before:

Method	Mean	Error	StdDev	Gen 0/1k Op	Gen 1/1k Op	Gen 2/1k Op	Allocated Memory/Op
SyncReadGetField	381.1 ms	0.6447 ms	0.5715 ms	30000.0000	-	-	92.41 MB
SyncReadGetGuid	328.5 ms	1.5653 ms	1.3876 ms	20000.0000	-	-	61.9 MB

After:

Method	Mean	Error	StdDev	Gen 0/1k Op	Gen 1/1k Op	Gen 2/1k Op	Allocated Memory/Op
SyncReadGetField	107.0 ms	2.067 ms	2.123 ms	-	-	-	882.81 KB
SyncReadGetGuid	106.3 ms	1.164 ms	1.031 ms	-	-	-	882.81 KB

I have only benched Guids because that was the target of my original investigation. The code changes include all supported value types so they should show the same behaviour (with appropriately sized memory gains for the box of course, guids is the largest of them).

/cc @afsanehr @saurabh500 @tarikulsabbir @David-Engel and @MarkPflug who brought this to my attention in https://github.com/dotnet/corefx/issues/31595#issuecomment-458789444

add ConstructGuid to TdsParser with netcore/netstd implementations add generic accessors for SqlBuffer storage fields and consume from SqlDataReader

src/System.Data.SqlClient/src/System/Data/SqlClient/SqlDataReader.cs

Wraith2 · 2019-01-31T19:40:32Z

src/System.Data.SqlClient/src/System/Data/SqlClient/SqlBuffer.cs

+        //   the jit will emit all the cast operations as written. this will put the value into a box and then attempt to
+        //   cast it, because it is an object even compatible casts will generate the desired InvalidCastException so users
+        //   cannot widen a short to an int preserving external expectations that requests for any type other than the correct
+        //   one will fail without conversion being considered


This comment explains the new generic methods with code which looks wrong.

MarkPflug · 2019-01-31T20:10:44Z

Another thing I would point out is the oddity of the SqlGuid type itself. I would have expected that it would contain a Guid as a field, but instead it uses a byte array to hold the guid value (so allocation on the heap). I think this should be changed to hold the guid value in value-type fields (like a Guid). I think the reason that it might be implemented that was is because of the odd-ball way that Sql server does guid sorting, it uses a different byte-order than .NET does for comparison. This probably gets even more complicated because Guid itself is endian-ness aware, so there are two levels of byte reshuffling that need to be taken into account, one for endianness at the Guid layer, and one for sql style sorting at the SqlGuid layer. It looks like this change you've made will avoid that SqlGuid allocation. And, I suspect that use of SqlGuid is probably pretty limited, but it is another place that an unneccessary heap allocation could be avoided.

Another potential place for improvement would be to read short length binary values (<= 16 bytes) into the Storage struct as well. Example: rowversion columns are pretty common and are returned in the same way as a binary(8) column, as a byte array. The value is small enough to be stored as a long (or a fixed length buffer) in the Storage struct as well.

Wraith2 · 2019-01-31T20:39:15Z

I don't see a way to put an array into a struct without having it as separate fields (in which case accessor perf would hurt) or using Unsafe which isn't appropriate in this library IMO. I thought rowversion could be nop converted to timestamp which can probably be made cheap if this set of changes passes review.

In general I assume that whoever designed and implemented the Sql* types had good reasons for what they're doing and that I shouldn't assume they're wrong or misguided. I'm happy to change perf and work on internal improvements but changes like reversing field orders is likely to give someone somewhere a really bad day and I've been on the receiving end of that too many times to do so lightly. Also, I just made guid's free and you want me to make them backwards now?! 😀

MarkPflug · 2019-01-31T21:50:49Z

This library already had the unsafe flag set, not sure if it is actually used or not. But yeah, I was thinking a fixed length array, which would require unsafe, but the "unsafety" would all be encapsulated in the Storage struct. My experience dealing with rowversion (which to my understanding timestamp is just a sqlServer-specific alias for) is that it can only be accessed via GetBytes(), GetInt64() throws. To me this implies that it is landing in an array, is that array being reused or allocated? I don't know.

I don't really care about SqlGuid, and I suspect very few people do. I was just commenting that I thought it was odd that it used an array instead of a Guid internally, and speculating why that might have been done.

Those numbers above are pretty impressive!

Wraith2 · 2019-01-31T22:54:33Z

If you can give me an example setup I can profile and investigate for sql server I can look into it and see if there's anything I can do to improve it. I'm not working to any particular plan I'm just spending my spare time improving things that I happen to find.

By Unsafe I means the type Unsafe which is used to do various il level trickery that languages don't allow but the runtime does. It's powerful and dangerous, hence the name. The unsafe keyword use in sql client is all to do with interop and while I could mark the Storage struct as unsafe I'd rather investigate if there's a cleaner safe way to accomplish whatever is needed first.

MarkPflug · 2019-01-31T23:17:54Z

Actually, it seems it should be possible with "safe" code. I just discovered these MemoryMarshal methods recently, I think they use the "Unsafe" class internally to do their work.

using System;
using System.Runtime.InteropServices;

class Program
{
    static void Main()
    {
        // create a span "input" containing 16 bytes of data.
        Span<byte> input = stackalloc byte[16];
        Guid.NewGuid().TryWriteBytes(input);

        Byte16 f;
        // copy the bytes from "input" into "f"
        MemoryMarshal.TryRead(input, out f);

        Span<byte> output = stackalloc byte[16];
        // read the data out of "f" into "output"
        MemoryMarshal.TryWrite(output, ref f);
    }
}

[StructLayout(LayoutKind.Explicit, Size = 16)]
struct Byte16
{
    // totally opaque struct, just a fixed-sized block of memory
}

So, you could put a Byte16 inside the Storage and write the binary results to/from it with the MemoryMarshal.TryRead/TryWrite. It seems that TryRead/TryWrite expect the size of the span to exactly match the size of the struct. Might be necessary to handle specific sizes: 8 bytes and 16 bytes. Those seem likely the most common sizes for short fixed-length binary values. 8 covers RowVersion which is probably the most common, and 16 … because the software I work on uses it extensively 😜.

MarkPflug · 2019-01-31T23:51:12Z

Nevermind. There doesn't appear to be a problem with binary/rowversion. I thought I was seeing new arrays being allocated for the values that were stored behind the _object field, but I think I was misreading the code, or it got optimized away somehow. I profiled reading millions of binary(16) values and rowversions and it didn't seem to be allocating anything significant (according to the VS diagnostic tools). Here's the gist I was using: https://gist.github.com/MarkPflug/b46cdc748bdf71258a4812dd4d809858
It uses the sql localdb, creates a new empty database, fills a table with a million rows, then reads those rows back out in a loop.

I was basing my beliefs on my reading of the code, not actual diagnostics. Lesson learned.

Wraith2 · 2019-01-31T23:53:32Z

MemoryMarshal is netstandard2.1 or core only. This library has to be backwards compatible. The same applies to Unsafe and a lot of other useful new functions. If you look at my changes there are two implementations of ConstructGuid one for core which takes a span and another for all other runtimes which does things the old allocatey way.

I'm not sure the PR discussion is the right place to have a conversation about possible additions, that's more something an issue should be used for. Feel free to open one and tag me.

add datetime decimal and datetimeoffset support

Wraith2 · 2019-02-02T19:44:36Z

Added date and decimal types which completes the valuetypes accepted by GetFieldValue and the async version which calls through it according to the code comments.
I also made the type checks more stringent.

omariom · 2019-02-14T01:35:52Z

src/System.Data.SqlClient/src/System/Data/SqlClient/SqlDataReader.cs

-            if (_typeofINullable.IsAssignableFrom(typeofT))
+            // this block of type specific shortcuts uses RyuJIT jit behaviours to achieve fast implementations of the primative types
+            // RyuJIT will be able to determine at compilation time that the typeof(T)==typeof(<primative>) options are constant
+            // and be able to remove all implementations which cannot be reached. this will eliminate non-specialized code for value types


@Wraith2
Have you checked the assumption?
dataType is known at runtime only, so JIT won't be able to do the optimization.

The typeof(T) == typeof(byte) should eliminate the branches; it will then leave behind dataType == typeof(byte)

I assume that's why the double checks with typeof(T) being first?

It does, i answered in the main thread not this branch but sharplab confirms the elimination because one of the conditions is constantly false.

@Wraith2
Hmm.. yes, you are right.
Wouldn't then be more efficient to compare directly with StorageType enum rather than creating an instance of Type just for comparison? Types are cached but the cash is finite and you could save a call.

It does the storage type checks first and if a direct match is found will return the value as directly as possible, if there's a mismatch then it falls back to the original path of doing the type check. So yes it's better to do what you said and it does.

I believe since it's generic that typeof(T) is low cost because the jit can fold in the type handle directly not requiring a method call. The other part is a readonly static to avoid the lookup cost.

Wraith2 · 2019-02-14T08:57:39Z

Yes, see this sharplab code and note the differences in assembly between the Guid and Int32 versions. The check isn't identical since i can't do explicit layout in sharplab but it proves the point because you can see that int specialization route is present but the short and guid ones aren't.

The result of the typeof(T)=typeof(<primative>) is still constant so even though the instance check is added the combination of the two can never be true for the majority of T's and the ones which can never be true are removed, The one which can be true is left present and the instance check done as expected. The Jit is really very clever 😁

grant-d · 2019-02-19T00:22:45Z

src/System.Data.SqlClient/src/System/Data/SqlClient/SqlDataReader.cs

+                                // If the value was actually null, then we should throw a SqlNullValue instead
+                                throw SQL.SqlNullValue();
+                            }
+                            else


nit: don't need the else

Agreed, it could just fall throw and have equivalent functionality but it's less obvious imo and this is the original layout so I stuck with it. Given that it's dealing with exceptions so perf is not a concern I didn't see a need to remove it, is there formatting guidance that applies to this case?

I think your point about existing code makes sense

grant-d · 2019-02-19T00:33:57Z

types are not primates

Ha. Good one

roji · 2019-02-27T17:46:04Z

Posting this a bit late, but maybe it can still be helpful - see here for how Npgsql constructs its Guids without any allocations. Rather than using a pooled byte array instance as in the current PR a union struct is used instead.

grant-d · 2019-02-27T17:58:16Z

See here for how ...

Heads-up that it may not be a good idea to link to code with different OSS licenses

roji · 2019-02-27T18:02:36Z

@grant-d thanks for the heads up, I really have no idea how that works... Npgsql is under the PostgreSQL license which should be very compatible with MIT, but I'm not expert...

grant-d · 2019-02-27T18:05:00Z

Neither me - so I rather err on the side of extreme caution

Wraith2 · 2019-02-27T18:47:28Z

I'm going to assume it's the explicit struct overlay trick. If so that's used elsewhere in the library and i'm not a fan. I know it works and the layout of Guid is unlikely to ever change but I'd much prefer to rely on public api surface. I haven't degraded perf on netfx and the span path on core gets equivalent perf to the overlay path.

roji · 2019-02-28T09:20:49Z

Fair enough, and in any case the netcoreapp version is using span directly, which is in any case the best.

karelz · 2019-03-04T05:36:34Z

@afsanehr @tarikulsabbir @Gary-Zh @David-Engel this is another "lost PR" - sorry for tagging you late. Can you please check it out and see what are next steps (given it flew under the radar for 1+ month)? Thanks!

src/System.Data.SqlClient/src/System/Data/SqlClient/SqlBuffer.cs

AfsanehR-zz · 2019-03-14T17:39:31Z

@Wraith2 Could you update this pr with the latest from master please?

src/System.Data.SqlClient/src/System/Data/SqlClient/SqlBuffer.cs

Wraith2 · 2019-03-14T19:50:21Z

Addressed feedback and updated to master.

AfsanehR-zz · 2019-03-18T17:18:35Z

src/System.Data.SqlClient/src/System/Data/SqlClient/SqlDataReader.cs

+            {
+                return (T)(object)data.Decimal;
+            }
+            else if (typeof(T) == typeof(DateTimeOffset) && dataType == typeof(DateTimeOffset) && _typeSystem > SqlConnectionString.TypeSystem.SQLServer2005 && !metaData.IsNewKatmaiDateTimeType)


Why are we checking for !metaData.IsNewKatmaiDateTimeType here?

When I traced through how datetime was handled I found that there is special casing going on for old database versions and that it's checked using this flag. Katmai datetime values are really stored as strings according to the code so if you want to get a datetime you have to know whether it's directly stored or has to go through reinterpretation.

In that line particularly I took the logic used in GetSqlValueFromSqlBufferInternal and did the check to see if I can tell if I've really got a datetime I can easily return, if I have then do so, if not follow the old path to ensure compatibility.

Thanks, now what would be the case if Sql server version is bigger than 2005 & datatype is datetimeoffset.
Doesn't the check for metaData.IsNewKatmaiDateTimeType return true, hence the !metaData.IsNewKatmaiDateTimeType would eventually be false?

You're right, I've inverted both parts when I only need to invert the version check. It should (and now does) check that the type system supports date/time/offset and that the column is one of those types. This would have forced dates down the compatibility path which wouldn't have been faster but also wouldn't be slower.

Thanks. I will rerun the CI.

AfsanehR-zz · 2019-03-18T17:29:03Z

@Wraith2 There is one other question I had on the check for IsNewKatmaiDateTimeType. Otherwise, everything else LGTM. Once that's finalized, we should be good with merging. Tests also passed.

Ported changes from [PR 34999](dotnet/corefx#34999) : SqlClient remove primitive boxing in SqlDataReader.GetFieldValue Ported changes from [PR 35023](dotnet/corefx#35023) : Remove stale warning 420 pragmas

SqlClient remove primitive boxing in SqlDataReader.GetFieldValue Commit migrated from dotnet/corefx@5430d51

Wraith2 added 2 commits January 31, 2019 00:07

add Guid type to SqlBuffer

3db2b54

add ConstructGuid to TdsParser with netcore/netstd implementations add generic accessors for SqlBuffer storage fields and consume from SqlDataReader

clarify generic codegen explanation

cf668fa

Wraith2 commented Jan 31, 2019

View reviewed changes

src/System.Data.SqlClient/src/System/Data/SqlClient/SqlDataReader.cs Outdated Show resolved Hide resolved

Wraith2 commented Jan 31, 2019

View reviewed changes

Wraith2 added 2 commits February 2, 2019 01:26

add more complete data type checking

dc6cac5

add datetime decimal and datetimeoffset support

Merge remote-tracking branch 'dotnet/master' into sqlperf-guidread

131cf60

Wraith2 changed the title SqlClient remove most primative boxing in SqlDataReader.Get* SqlClient remove primative boxing in SqlDataReader.GetFieldValue Feb 3, 2019

Wraith2 added 3 commits February 7, 2019 19:04

resolve GetFieldValueFromSqlBufferInternal conflict

6dae0ff

resolve GetFieldValueFromSqlBufferInternal conflict

4cc7175

Merge remote-tracking branch 'dotnet/master' into sqlperf-guidread

39211b2

omariom reviewed Feb 14, 2019

View reviewed changes

grant-d reviewed Feb 19, 2019

View reviewed changes

spelling, types are not primates

d74449f

Wraith2 changed the title ~~SqlClient remove primative boxing in SqlDataReader.GetFieldValue~~ SqlClient remove primitive boxing in SqlDataReader.GetFieldValue Feb 22, 2019

Merge remote-tracking branch 'dotnet/master' into sqlperf-guidread

a2c8b92

roji mentioned this pull request Feb 28, 2019

Remove unsafe code from UuidHandler npgsql/npgsql#2310

Merged

karelz added the area-System.Data.SqlClient label Mar 4, 2019

karelz assigned Wraith2, AfsanehR-zz, David-Engel, Gary-Zh and tarikulsabbir Mar 4, 2019

AfsanehR-zz reviewed Mar 14, 2019

View reviewed changes

src/System.Data.SqlClient/src/System/Data/SqlClient/SqlBuffer.cs Outdated Show resolved Hide resolved

AfsanehR-zz reviewed Mar 14, 2019

View reviewed changes

src/System.Data.SqlClient/src/System/Data/SqlClient/SqlBuffer.cs Outdated Show resolved Hide resolved

AfsanehR-zz added this to the 3.0 milestone Mar 14, 2019

Wraith2 added 2 commits March 14, 2019 19:48

Merge remote-tracking branch 'dotnet/master' into sqlperf-guidread

a08768d

address formatting feedback

de24817

AfsanehR-zz reviewed Mar 18, 2019

View reviewed changes

fix KatmaiNewDateTimeType logic inversion

c9d2331

AfsanehR-zz approved these changes Mar 19, 2019

View reviewed changes

AfsanehR-zz merged commit 5430d51 into dotnet:master Mar 19, 2019

Wraith2 deleted the sqlperf-guidread branch March 19, 2019 19:52

Wraith2 mentioned this pull request Dec 21, 2019

SqlBulkCopy - generics to avoid boxing dotnet/SqlClient#358

Closed

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022

Merge pull request dotnet/corefx#34999 from Wraith2/sqlperf-guidread

fa84ab2

SqlClient remove primitive boxing in SqlDataReader.GetFieldValue Commit migrated from dotnet/corefx@5430d51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SqlClient remove primitive boxing in SqlDataReader.GetFieldValue #34999

SqlClient remove primitive boxing in SqlDataReader.GetFieldValue #34999

Wraith2 commented Jan 31, 2019 •

edited

Loading

Wraith2 Jan 31, 2019

MarkPflug commented Jan 31, 2019

Wraith2 commented Jan 31, 2019

MarkPflug commented Jan 31, 2019

Wraith2 commented Jan 31, 2019

MarkPflug commented Jan 31, 2019

MarkPflug commented Jan 31, 2019

Wraith2 commented Jan 31, 2019

Wraith2 commented Feb 2, 2019 •

edited

Loading

omariom Feb 14, 2019

benaadams Feb 27, 2019

Wraith2 Feb 27, 2019

omariom Mar 4, 2019

Wraith2 Mar 4, 2019

Wraith2 commented Feb 14, 2019

grant-d Feb 19, 2019

Wraith2 Feb 19, 2019

grant-d Feb 19, 2019

grant-d commented Feb 19, 2019

roji commented Feb 27, 2019

grant-d commented Feb 27, 2019

roji commented Feb 27, 2019

grant-d commented Feb 27, 2019

Wraith2 commented Feb 27, 2019

roji commented Feb 28, 2019

karelz commented Mar 4, 2019

AfsanehR-zz commented Mar 14, 2019

Wraith2 commented Mar 14, 2019

AfsanehR-zz Mar 18, 2019

Wraith2 Mar 18, 2019

AfsanehR-zz Mar 18, 2019

Wraith2 Mar 18, 2019

AfsanehR-zz Mar 18, 2019

AfsanehR-zz commented Mar 18, 2019

SqlClient remove primitive boxing in SqlDataReader.GetFieldValue #34999

SqlClient remove primitive boxing in SqlDataReader.GetFieldValue #34999

Conversation

Wraith2 commented Jan 31, 2019 • edited Loading

Choose a reason for hiding this comment

MarkPflug commented Jan 31, 2019

Wraith2 commented Jan 31, 2019

MarkPflug commented Jan 31, 2019

Wraith2 commented Jan 31, 2019

MarkPflug commented Jan 31, 2019

MarkPflug commented Jan 31, 2019

Wraith2 commented Jan 31, 2019

Wraith2 commented Feb 2, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Wraith2 commented Feb 14, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grant-d commented Feb 19, 2019

roji commented Feb 27, 2019

grant-d commented Feb 27, 2019

roji commented Feb 27, 2019

grant-d commented Feb 27, 2019

Wraith2 commented Feb 27, 2019

roji commented Feb 28, 2019

karelz commented Mar 4, 2019

AfsanehR-zz commented Mar 14, 2019

Wraith2 commented Mar 14, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AfsanehR-zz commented Mar 18, 2019

Wraith2 commented Jan 31, 2019 •

edited

Loading

Wraith2 commented Feb 2, 2019 •

edited

Loading