Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heavy GC thrashing in TdsParser #1864

Closed
masonwheeler opened this issue Dec 7, 2022 · 16 comments · Fixed by #1866
Closed

Heavy GC thrashing in TdsParser #1864

masonwheeler opened this issue Dec 7, 2022 · 16 comments · Fixed by #1866

Comments

@masonwheeler
Copy link

masonwheeler commented Dec 7, 2022

Describe the bug

While running and processing a large (multi-GB) query, the debugger shows the GC running almost constantly. After reducing allocations as much as possible in my code, I ran it through JetBrains DotMemory, and it said that the process was allocating almost 5 GB of char[] buffers in TdsPaser.TryReadPlpUnicodeChars. The stack trace it gave shows that this is being called as part of reading data from a result set, apparently to read a string buffer and turn it into a string.

Would it be possible to cache and reuse the char buffer rather than (I assume; the relevant code doesn't appear to be in this repo) allocating a new buffer for each TryReadPlpUnicodeChars invocation?

To reproduce

void Process(SqlConnection conn)
{
	using var cmd = conn.CreateCommand();
	cmd.CommandText = "select * from dbo.Posts";
	using var reader =  cmd.ExecuteReader();
	var buffer = new object[reader.FieldCount];
	while (reader.Read())
	{
		reader.GetValues(buffer);
	}
}

Expected behavior

As we're processing one row at a time, memory usage should remain fairly low.

Observed behavior:

Constant GC pressure. Profiling shows the vast majority of it comes from char buffer allocations as described above.

Further technical details

Microsoft.Data.SqlClient version: nuget 4.1.0
.NET target: .NET 7
SQL Server version: SQL Server 15.0.2095.3
Operating system: Windows 10

@lcheunglci lcheunglci added this to Needs triage in SqlClient Triage Board via automation Dec 7, 2022
@lcheunglci
Copy link
Contributor

Thanks for bringing this to our attention. We'll get back to you soon.

@Wraith2
Copy link
Contributor

Wraith2 commented Dec 8, 2022

I'm not sure a 10Gib compressed dataset counts as a "minimal" reproduction 😁

@masonwheeler
Copy link
Author

Fair enough. This is the testing dataset I was using when I ran across this problem; I used it here because it clearly shows that this is a big, serious problem when working with large amounts of data. Thing is, that's something that can't easily be demonstrated without actually working with a large amount of data, so it's admittedly a bit in conflict with the notion of a simple, minimal reproduction.

Honestly, any dataset that contains a lot of VARCHARs would show the issue.

@Wraith2
Copy link
Contributor

Wraith2 commented Dec 8, 2022

I'm downloading the sample dataset and try it, it just makes my connection cry a little bit. I'll see if it's an issue I know about and if I can do anything about it at the moment.

The internals of the library are from .net 1.0 era in places and updating them to use more modern memory management patterns can be complicated because of the tendancy to just pass data and buffers back to the user.

@masonwheeler
Copy link
Author

updating them to use more modern memory management patterns can be complicated because of the tendancy to just pass data and buffers back to the user.

Fair enough, though I don't think that will be a problem in this specific case because we're dealing with a char[] that appears to get used as an intermediary buffer while reading strings.

@Wraith2
Copy link
Contributor

Wraith2 commented Dec 8, 2022

In this case you're right that it does. However the second use of the function that is doing the allocations is inside the SqlDataReader implementation and that one keeps hold of the char buffer so the ability to handle a rental buffer has to be done at the callsite. Fortunately there are only two uses of the function so that's not hard to do and I don't have to bother tracking down the way the array used is passed around inside the reader.

I can remove the intermediate allocations, old:
image

new:
image

Over a rough manually timed 20s.

An interesting result of this is that it gets slower. Not a lot slower but over the entire lifetime of that query it increases it from 2:55 to 3:10 on my hardware but I seem to be memory and disk bandwidth limited. I suspect that we're trading time from the extremely optimized GC against time spend in user code managing the array rental.

So overall for memory reasons the change would be a win. For perf a slight regression. Overall since this library is always going to have to co-exist with higher level callers my opinion is that I'd make the change because it'll make us an overall better co-operative part of the larger process.

@masonwheeler
Copy link
Author

Interesting. Can you re-run the profiling against the code I suggested, which should take somewhere in the neighborhood of 5 minutes to run if your hardware is similar to mine, to see if the same effect occurs on a big run? The whole reason I posted this was because I was seeing constant GC thrashing and I figured this would be a way to improve performance. If that turns out not to be the case then forget about it.

@Wraith2
Copy link
Contributor

Wraith2 commented Dec 8, 2022

I was profiling your example. I just let it run for 20 seconds because there's no point in having more data, I could explain every allocation and the ones we were looking at are the intermediate char[] buffers which you can see move from being gc handled to being kept by the shared arraypool.

The times were for a full run of the entire example you gave, so it looks like I'm running slightly faster hardware but that means lower hardware would see a larger slowdown. Any realistic application would not simply be spinning reading data in this way so as I said I think being a better co-operative library in this case is likely better than the fairly small overall speed change.

It's also worth noting that this will be sped up by my existing PR #1544 so with both it might be a win in all scenarios.

@masonwheeler
Copy link
Author

The times were for a full run of the entire example you gave

All right. I'm pretty sure the DotMemory screenshot you showed wasn't; it should have showed at least 2 GB of char[] buffers. That's why I thought this wasn't a full run.

@Wraith2
Copy link
Contributor

Wraith2 commented Dec 9, 2022

The dotmemory screens were for 20 seconds samples. They are there to show that the highest allocation can be removed. that will have a good impact on gc frequency. I didn't judge there be be any benefit to going longer, it's clear that you're right and that I can make an improvement to that behaviour.

The times I gave, 3:10 for new version and 2:55 for current version were for running the example query to completion. I had expected the new approach to be faster so I was a little surprised when it wasn't. I can understand why I would be.

I'll open PR for this and see what the MS team think.

@masonwheeler
Copy link
Author

Ah, I see what you mean now. Thanks.

@masonwheeler
Copy link
Author

masonwheeler commented Dec 9, 2022

I must say, looking at your PR, I'm a little bit aggravated. I said at the start of this issue that "the relevant code doesn't appear to be in this repo." This is because I'd run a GitHub search for TryReadPlpUnicodeChars on the repo and it returned 0 code results.

But there it is, right there in the file you edit. GitHub, what in the world is wrong with you?!?

@lcheunglci lcheunglci moved this from Needs triage to Under Investigation in SqlClient Triage Board Dec 14, 2022
SqlClient Triage Board automation moved this from Under Investigation to Closed Feb 1, 2023
@Havunen
Copy link

Havunen commented Aug 11, 2023

We started experiencing this same issue in our application.

The column in the database is type of NVARCHAR(max) and there is a JSON data with approx size of 1,2MB. fetching that row from database 5 times causes 1.6GB of memory to pressure to application code.

Here is screenshot of memory pressure of this library when fetching JSON data from our MS SQL server.

Microsoft.Data.SqlClient v5.1.1
image

Microsoft.Data.SqlClient v5.2.0-preview3.23201.1
image

The situation is better in v5.2.0 but in my opinion its still not good enough, is there anything else what can we done here to reduce the memory allocations?

@Havunen
Copy link

Havunen commented Aug 11, 2023

As you can see from the screen shots anything else what the application does is irrelevant from the memory usage perspective due to high number of allocations in System.Buffers.TlsOverPerCoreLockedStacksArrayPool.Rent

> System.Char[]
  Objects : n/a
  Bytes   : 1751536296

>99,9%  Rent • 1,63 GB / 1,63 GB • System.Buffers.TlsOverPerCoreLockedStacksArrayPool<T>.Rent(Int32)
  >99,9%  TryReadPlpUnicodeChars • 1,63 GB / - • Microsoft.Data.SqlClient.TdsParser.TryReadPlpUnicodeChars(Char[], Int32, Int32, TdsParserStateObject, Int32, Boolean, Boolean)
    >99,9%  TryReadSqlStringValue • 1,63 GB / - • Microsoft.Data.SqlClient.TdsParser.TryReadSqlStringValue(SqlBuffer, Byte, Int32, Encoding, Boolean, TdsParserStateObject)
      >99,9%  TryReadSqlValue • 1,63 GB / - • Microsoft.Data.SqlClient.TdsParser.TryReadSqlValue(SqlBuffer, SqlMetaDataPriv, Int32, TdsParserStateObject, SqlCommandColumnEncryptionSetting, String, SqlCommand)
        >99,9%  TryReadColumnInternal • 1,63 GB / - • Microsoft.Data.SqlClient.SqlDataReader.TryReadColumnInternal(Int32, Boolean, Boolean)
          >99,9%  ReadAsyncExecute • 1,63 GB / - • Microsoft.Data.SqlClient.SqlDataReader.ReadAsyncExecute(Task, Object)
            >99,9%  ContinueAsyncCall • 1,63 GB / - • Microsoft.Data.SqlClient.SqlDataReader.ContinueAsyncCall<T>(Task, SqlDataReader+SqlDataReaderBaseAsyncCallContext<T>)
              >99,9%  InnerInvoke • 1,63 GB / - • System.Threading.Tasks.ContinuationResultTaskFromResultTask<TAntecedentResult, TResult>.InnerInvoke()
                >99,9%  <.cctor>b__272_0 • 1,63 GB / - • System.Threading.Tasks.Task+<>c.<.cctor>b__272_0(Object)
                  >99,9%  RunFromThreadPoolDispatchLoop • 1,63 GB / - • System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread, ExecutionContext, ContextCallback, Object)
                    >99,9%  ExecuteWithThreadLocal • 1,63 GB / - • System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task, Thread)
                      >99,9%  ExecuteEntryUnsafe • 1,63 GB / - • System.Threading.Tasks.Task.ExecuteEntryUnsafe(Thread)
                        >99,9%  ExecuteFromThreadPool • 1,63 GB / - • System.Threading.Tasks.Task.ExecuteFromThreadPool(Thread)
                          >99,9%  Dispatch • 1,63 GB / - • System.Threading.ThreadPoolWorkQueue.Dispatch()
                            >99,9%  WorkerThreadStart • 1,63 GB / - • System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
                              >99,9%  StartCallback • 1,63 GB / - • System.Threading.Thread.StartCallback()>99,9%  [AllThreadsRoot]1,63 GB / -[AllThreadsRoot]<0,01%  Grow • 104,4 KB / - • System.Text.ValueStringBuilder.Grow(Int32)

@Wraith2
Copy link
Contributor

Wraith2 commented Aug 11, 2023

Odd. Can you open a new issue rather than replying to this closed one please.

@Havunen
Copy link

Havunen commented Aug 11, 2023

ok, I opened a new issue: #2120 , there is also a link to sample console application where it can be reproduced

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging a pull request may close this issue.

5 participants