Heavy GC thrashing in TdsParser #1864

masonwheeler · 2022-12-07T03:33:04Z

Describe the bug

While running and processing a large (multi-GB) query, the debugger shows the GC running almost constantly. After reducing allocations as much as possible in my code, I ran it through JetBrains DotMemory, and it said that the process was allocating almost 5 GB of char[] buffers in TdsPaser.TryReadPlpUnicodeChars. The stack trace it gave shows that this is being called as part of reading data from a result set, apparently to read a string buffer and turn it into a string.

Would it be possible to cache and reuse the char buffer rather than (I assume; the relevant code doesn't appear to be in this repo) allocating a new buffer for each TryReadPlpUnicodeChars invocation?

To reproduce

Grab and restore this StackOverflow data dump.
Create a SqlConnection and open a connection to the database.
Run the following:

void Process(SqlConnection conn)
{
	using var cmd = conn.CreateCommand();
	cmd.CommandText = "select * from dbo.Posts";
	using var reader =  cmd.ExecuteReader();
	var buffer = new object[reader.FieldCount];
	while (reader.Read())
	{
		reader.GetValues(buffer);
	}
}

Expected behavior

As we're processing one row at a time, memory usage should remain fairly low.

Observed behavior:

Constant GC pressure. Profiling shows the vast majority of it comes from char buffer allocations as described above.

Further technical details

Microsoft.Data.SqlClient version: nuget 4.1.0
.NET target: .NET 7
SQL Server version: SQL Server 15.0.2095.3
Operating system: Windows 10

The text was updated successfully, but these errors were encountered:

lcheunglci · 2022-12-07T18:44:34Z

Thanks for bringing this to our attention. We'll get back to you soon.

Wraith2 · 2022-12-08T17:27:38Z

I'm not sure a 10Gib compressed dataset counts as a "minimal" reproduction 😁

masonwheeler · 2022-12-08T18:01:17Z

Fair enough. This is the testing dataset I was using when I ran across this problem; I used it here because it clearly shows that this is a big, serious problem when working with large amounts of data. Thing is, that's something that can't easily be demonstrated without actually working with a large amount of data, so it's admittedly a bit in conflict with the notion of a simple, minimal reproduction.

Honestly, any dataset that contains a lot of VARCHARs would show the issue.

Wraith2 · 2022-12-08T18:38:15Z

I'm downloading the sample dataset and try it, it just makes my connection cry a little bit. I'll see if it's an issue I know about and if I can do anything about it at the moment.

The internals of the library are from .net 1.0 era in places and updating them to use more modern memory management patterns can be complicated because of the tendancy to just pass data and buffers back to the user.

masonwheeler · 2022-12-08T22:06:52Z

updating them to use more modern memory management patterns can be complicated because of the tendancy to just pass data and buffers back to the user.

Fair enough, though I don't think that will be a problem in this specific case because we're dealing with a char[] that appears to get used as an intermediary buffer while reading strings.

Wraith2 · 2022-12-08T22:29:26Z

In this case you're right that it does. However the second use of the function that is doing the allocations is inside the SqlDataReader implementation and that one keeps hold of the char buffer so the ability to handle a rental buffer has to be done at the callsite. Fortunately there are only two uses of the function so that's not hard to do and I don't have to bother tracking down the way the array used is passed around inside the reader.

I can remove the intermediate allocations, old:

new:

Over a rough manually timed 20s.

An interesting result of this is that it gets slower. Not a lot slower but over the entire lifetime of that query it increases it from 2:55 to 3:10 on my hardware but I seem to be memory and disk bandwidth limited. I suspect that we're trading time from the extremely optimized GC against time spend in user code managing the array rental.

So overall for memory reasons the change would be a win. For perf a slight regression. Overall since this library is always going to have to co-exist with higher level callers my opinion is that I'd make the change because it'll make us an overall better co-operative part of the larger process.

masonwheeler · 2022-12-08T22:33:37Z

Interesting. Can you re-run the profiling against the code I suggested, which should take somewhere in the neighborhood of 5 minutes to run if your hardware is similar to mine, to see if the same effect occurs on a big run? The whole reason I posted this was because I was seeing constant GC thrashing and I figured this would be a way to improve performance. If that turns out not to be the case then forget about it.

Wraith2 · 2022-12-08T23:19:57Z

I was profiling your example. I just let it run for 20 seconds because there's no point in having more data, I could explain every allocation and the ones we were looking at are the intermediate char[] buffers which you can see move from being gc handled to being kept by the shared arraypool.

The times were for a full run of the entire example you gave, so it looks like I'm running slightly faster hardware but that means lower hardware would see a larger slowdown. Any realistic application would not simply be spinning reading data in this way so as I said I think being a better co-operative library in this case is likely better than the fairly small overall speed change.

It's also worth noting that this will be sped up by my existing PR #1544 so with both it might be a win in all scenarios.

masonwheeler · 2022-12-08T23:57:05Z

The times were for a full run of the entire example you gave

All right. I'm pretty sure the DotMemory screenshot you showed wasn't; it should have showed at least 2 GB of char[] buffers. That's why I thought this wasn't a full run.

Wraith2 · 2022-12-09T00:59:46Z

The dotmemory screens were for 20 seconds samples. They are there to show that the highest allocation can be removed. that will have a good impact on gc frequency. I didn't judge there be be any benefit to going longer, it's clear that you're right and that I can make an improvement to that behaviour.

The times I gave, 3:10 for new version and 2:55 for current version were for running the example query to completion. I had expected the new approach to be faster so I was a little surprised when it wasn't. I can understand why I would be.

I'll open PR for this and see what the MS team think.

masonwheeler · 2022-12-09T02:34:34Z

Ah, I see what you mean now. Thanks.

masonwheeler · 2022-12-09T02:37:19Z

I must say, looking at your PR, I'm a little bit aggravated. I said at the start of this issue that "the relevant code doesn't appear to be in this repo." This is because I'd run a GitHub search for TryReadPlpUnicodeChars on the repo and it returned 0 code results.

But there it is, right there in the file you edit. GitHub, what in the world is wrong with you?!?

Havunen · 2023-08-11T17:49:07Z

We started experiencing this same issue in our application.

The column in the database is type of NVARCHAR(max) and there is a JSON data with approx size of 1,2MB. fetching that row from database 5 times causes 1.6GB of memory to pressure to application code.

Here is screenshot of memory pressure of this library when fetching JSON data from our MS SQL server.

Microsoft.Data.SqlClient v5.1.1

Microsoft.Data.SqlClient v5.2.0-preview3.23201.1

The situation is better in v5.2.0 but in my opinion its still not good enough, is there anything else what can we done here to reduce the memory allocations?

Havunen · 2023-08-11T17:52:52Z

As you can see from the screen shots anything else what the application does is irrelevant from the memory usage perspective due to high number of allocations in System.Buffers.TlsOverPerCoreLockedStacksArrayPool.Rent

> System.Char[]
  Objects : n/a
  Bytes   : 1751536296

>99,9%  Rent • 1,63 GB / 1,63 GB • System.Buffers.TlsOverPerCoreLockedStacksArrayPool<T>.Rent(Int32)
  >99,9%  TryReadPlpUnicodeChars • 1,63 GB / - • Microsoft.Data.SqlClient.TdsParser.TryReadPlpUnicodeChars(Char[], Int32, Int32, TdsParserStateObject, Int32, Boolean, Boolean)
    >99,9%  TryReadSqlStringValue • 1,63 GB / - • Microsoft.Data.SqlClient.TdsParser.TryReadSqlStringValue(SqlBuffer, Byte, Int32, Encoding, Boolean, TdsParserStateObject)
      >99,9%  TryReadSqlValue • 1,63 GB / - • Microsoft.Data.SqlClient.TdsParser.TryReadSqlValue(SqlBuffer, SqlMetaDataPriv, Int32, TdsParserStateObject, SqlCommandColumnEncryptionSetting, String, SqlCommand)
        >99,9%  TryReadColumnInternal • 1,63 GB / - • Microsoft.Data.SqlClient.SqlDataReader.TryReadColumnInternal(Int32, Boolean, Boolean)
          >99,9%  ReadAsyncExecute • 1,63 GB / - • Microsoft.Data.SqlClient.SqlDataReader.ReadAsyncExecute(Task, Object)
            >99,9%  ContinueAsyncCall • 1,63 GB / - • Microsoft.Data.SqlClient.SqlDataReader.ContinueAsyncCall<T>(Task, SqlDataReader+SqlDataReaderBaseAsyncCallContext<T>)
              >99,9%  InnerInvoke • 1,63 GB / - • System.Threading.Tasks.ContinuationResultTaskFromResultTask<TAntecedentResult, TResult>.InnerInvoke()
                >99,9%  <.cctor>b__272_0 • 1,63 GB / - • System.Threading.Tasks.Task+<>c.<.cctor>b__272_0(Object)
                  >99,9%  RunFromThreadPoolDispatchLoop • 1,63 GB / - • System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread, ExecutionContext, ContextCallback, Object)
                    >99,9%  ExecuteWithThreadLocal • 1,63 GB / - • System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task, Thread)
                      >99,9%  ExecuteEntryUnsafe • 1,63 GB / - • System.Threading.Tasks.Task.ExecuteEntryUnsafe(Thread)
                        >99,9%  ExecuteFromThreadPool • 1,63 GB / - • System.Threading.Tasks.Task.ExecuteFromThreadPool(Thread)
                          >99,9%  Dispatch • 1,63 GB / - • System.Threading.ThreadPoolWorkQueue.Dispatch()
                            >99,9%  WorkerThreadStart • 1,63 GB / - • System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
                              >99,9%  StartCallback • 1,63 GB / - • System.Threading.Thread.StartCallback()
                                ► >99,9%  [AllThreadsRoot] • 1,63 GB / - • [AllThreadsRoot]
  ► <0,01%  Grow • 104,4 KB / - • System.Text.ValueStringBuilder.Grow(Int32)

Wraith2 · 2023-08-11T17:55:22Z

Odd. Can you open a new issue rather than replying to this closed one please.

Havunen · 2023-08-11T19:01:22Z

ok, I opened a new issue: #2120 , there is also a link to sample console application where it can be reproduced

JRahnama added the untriaged label Dec 7, 2022

lcheunglci added this to Needs triage in SqlClient Triage Board via automation Dec 7, 2022

Wraith2 mentioned this issue Dec 9, 2022

Add array rental capability in TryReadPlpUnicodeChars #1866

Merged

lcheunglci removed the untriaged label Dec 14, 2022

lcheunglci moved this from Needs triage to Under Investigation in SqlClient Triage Board Dec 14, 2022

JRahnama closed this as completed in #1866 Feb 1, 2023

SqlClient Triage Board automation moved this from Under Investigation to Closed Feb 1, 2023

Havunen mentioned this issue Aug 11, 2023

Heavy GC thrashing in TdsParser #2120

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Heavy GC thrashing in TdsParser #1864

Heavy GC thrashing in TdsParser #1864

masonwheeler commented Dec 7, 2022 •

edited

lcheunglci commented Dec 7, 2022

Wraith2 commented Dec 8, 2022 •

edited

masonwheeler commented Dec 8, 2022

Wraith2 commented Dec 8, 2022

masonwheeler commented Dec 8, 2022

Wraith2 commented Dec 8, 2022

masonwheeler commented Dec 8, 2022

Wraith2 commented Dec 8, 2022

masonwheeler commented Dec 8, 2022

Wraith2 commented Dec 9, 2022

masonwheeler commented Dec 9, 2022

masonwheeler commented Dec 9, 2022 •

edited

Havunen commented Aug 11, 2023

Havunen commented Aug 11, 2023

Wraith2 commented Aug 11, 2023

Havunen commented Aug 11, 2023

Heavy GC thrashing in TdsParser #1864

Heavy GC thrashing in TdsParser #1864

Comments

masonwheeler commented Dec 7, 2022 • edited

Describe the bug

To reproduce

Expected behavior

Observed behavior:

Further technical details

lcheunglci commented Dec 7, 2022

Wraith2 commented Dec 8, 2022 • edited

masonwheeler commented Dec 8, 2022

Wraith2 commented Dec 8, 2022

masonwheeler commented Dec 8, 2022

Wraith2 commented Dec 8, 2022

masonwheeler commented Dec 8, 2022

Wraith2 commented Dec 8, 2022

masonwheeler commented Dec 8, 2022

Wraith2 commented Dec 9, 2022

masonwheeler commented Dec 9, 2022

masonwheeler commented Dec 9, 2022 • edited

Havunen commented Aug 11, 2023

Havunen commented Aug 11, 2023

Wraith2 commented Aug 11, 2023

Havunen commented Aug 11, 2023

masonwheeler commented Dec 7, 2022 •

edited

Wraith2 commented Dec 8, 2022 •

edited

masonwheeler commented Dec 9, 2022 •

edited