Perf followup for Pinned Object Heap #34215

VSadov · 2020-03-27T21:49:26Z

A follow up for items tracked by #13739

Verified, adjusted or otherwise addressed the following items:

NUM_POH_ALIST
Since POH covers whole possible range of object sizes, we need a few more buckets than in SOH or LOH - to cover combined SOH and LOH range.
static_data static_data_table
POH has behaviors similar to LOH here - For example it is not very likely that GC will result in reducing the generation size (since there is no compaction and objects are relatively long lived).
LOH tuning seems to be reasonable enough.
initial allocation sizes (also min segment size)
I did not see obvious reasons for the above to be different from LOH.
It is possible we could be ok with lower initial/minimum segment size, but that seems to be making very little difference, since this is not about committed size. We commit in smaller chunks (2 OS pages) anyways.
size_t gc_heap::get_uoh_seg_size (size_t size)
Same as above.

Also looked in debugger and VTune for any obvious perf issues.
Apart from few minor tweaks, it all looked as expected.

src/coreclr/src/gc/gcpriv.h

Maoni0 · 2020-03-27T23:11:47Z

src/coreclr/src/gc/gcpriv.h

+
+    // skip buckets that cannot possibly fit "size" and return the next one
+    // there is always such bucket since the last one fits everything
+    unsigned int first_suitable_bucket(size_t size)


unsigned int first_suitable_bucket(size_t size) [](start = 4, length = 47)

have you actually observed any perf difference doing this? I doubt it.

On a microbenchmark doing a lot of POH allocations this becomes noticeable. I saw about 10% improvement from these changes.

The hot spots that I saw: taking msl lock, iterating through buckets, splitting a free chunk in two (when chunk is big, make_unused_array may cause cache misses)
The others are hard to avoid or improve though. This one was fairly easy.

can you share your benchmark? what sort of perf testing were done on this in general? are you running the default perf scenarios with the infra? this new scenario should be added to gcperfsim.

In reply to: 399581040 [](ancestors = 399581040)

An example:

private static void AllocateUninitializedPinned() { System.Console.Write("AllocateUninitializedPinned: "); var r = new Random(1234); Stopwatch sw = Stopwatch.StartNew(); for (int i = 0; i < iters; i++) { int size = r.Next(2, 10000); var arr = GC.AllocateUninitializedArray<long>(size, pinned: true); Add(arr); // noop when not retaining, store in a static list when do } sw.Stop(); System.Console.WriteLine(sw.ElapsedMilliseconds); }

I had a variety of these - with not retaining arrays, retaining until after the iteration is over, retaining randomly in an array

like:

private static void Add(long[] arr) { if (r.Next(100) > 50) staticArray[arr.Length & 2047] = arr; }

The not retaining anything scenario was the most sensitive to the cost of the allocation codepath.

I ran the standard infra with these changes, but I mostly see noise. If there are improvements, they are too small in the scenarios measured.

Yes, we should add POH scenarios to infra. At least for regression testing.
We may need to settle on what would be the scenario too look at - how big allocations are. How much is retained, etc.
If we track scenarios for LOH, POH scenarios could be very similar, except for size.

LOH and POH are not necessarily that similar - LOH is compacted by default for hardlimit and POH can't be compacted; LOH has in general much larger object sizes. let's sync up next week on what scenarios we want to test.

Yes. For regression testing we need specific scenarios.

The tweaks I did here were just for something that stood out from looking at the code and VTune.

Maoni0 · 2020-03-27T23:20:26Z

src/coreclr/src/gc/gc.cpp

 {
    return (Interlocked::CompareExchange(&spin_lock->lock, 0, -1) < 0);
 }

+inline
+static void enter_spin_lock (GCSpinLock* spin_lock)


what's the justification of doing this? also I would not do this with this PR.

We take locks very frequently and they are nearly always uncontended. The codepath that handles the contention is very large. It seems wasteful to inline that into callers when the code is not expected to run.

I tried to reduce the cost of taking these locks in most common scenarios. These locks are particularly expensive in POH allocations.

VSadov · 2020-03-31T22:00:09Z

@Maoni0 - I think I have addressed all the PR concerns. Please take a look. Thanks!

Maoni0 · 2020-04-01T00:17:26Z

src/coreclr/src/gc/gcpriv.h

@@ -3982,26 +4012,25 @@ class gc_heap
 #endif //SYNCHRONIZATION_STATS

 #define NUM_LOH_ALIST (7)
-#define BASE_LOH_ALIST (64*1024)
+    // bucket 0 contains sizes less than 64*1024
+#define BASE_LOH_ALIST_BITS (14)


this should be 15

Right!!! Because it is the highest bit in 64*1024 - 1, (zero-based as in BitScanReverse). I'll add a comment as well.

Maoni0 · 2020-04-01T00:17:49Z

the rest LGTM!

ghost · 2020-04-01T01:03:14Z

Hello @VSadov!

Because this pull request has the auto-merge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (`@msftbot`) and give me an instruction to get started! Learn more here.

Maoni0

LGTM!

Dotnet-GitSync-Bot added the area-GC-coreclr label Mar 27, 2020

VSadov requested a review from Maoni0 March 27, 2020 21:54

lpereira reviewed Mar 27, 2020

View reviewed changes

src/coreclr/src/gc/gcpriv.h Outdated Show resolved Hide resolved

lpereira reviewed Mar 27, 2020

View reviewed changes

src/coreclr/src/gc/gcpriv.h Outdated Show resolved Hide resolved

Maoni0 reviewed Mar 27, 2020

View reviewed changes

VSadov force-pushed the pohTuning branch from c149712 to 20ca5c7 Compare March 31, 2020 19:23

Maoni0 reviewed Apr 1, 2020

View reviewed changes

VSadov added the auto-merge label Apr 1, 2020

Maoni0 approved these changes Apr 1, 2020

View reviewed changes

VSadov added 6 commits April 1, 2020 10:16

free list bucket skipping

c6a3d1b

couple more tweaks

1afef87

More size buckets in POH free lists

7024765

static_data_table

6f781e8

PR feedback

2432dcf

off-by-one error in BASE_LOH_ALIST_BITS

8dc3c09

VSadov force-pushed the pohTuning branch from 75c5025 to 8dc3c09 Compare April 1, 2020 17:16

ghost merged commit 45ed8fd into dotnet:master Apr 1, 2020

VSadov mentioned this pull request Apr 7, 2020

[Pinned Object Heap] tuning. #13739

Closed

4 tasks

VSadov mentioned this pull request Jun 22, 2020

[Pinned Object Heap] POH perf tuning #38221

Open

ghost locked as resolved and limited conversation to collaborators Dec 10, 2020

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf followup for Pinned Object Heap #34215

Perf followup for Pinned Object Heap #34215

VSadov commented Mar 27, 2020

Maoni0 Mar 27, 2020

VSadov Mar 27, 2020 •

edited

Loading

Maoni0 Mar 27, 2020 •

edited

Loading

VSadov Mar 27, 2020

VSadov Mar 27, 2020 •

edited

Loading

Maoni0 Mar 28, 2020

VSadov Mar 28, 2020

Maoni0 Mar 27, 2020

VSadov Mar 27, 2020

VSadov Mar 27, 2020 •

edited

Loading

VSadov commented Mar 31, 2020

Maoni0 Apr 1, 2020

VSadov Apr 1, 2020

Maoni0 commented Apr 1, 2020

ghost commented Apr 1, 2020

Maoni0 left a comment

Perf followup for Pinned Object Heap #34215

Perf followup for Pinned Object Heap #34215

Conversation

VSadov commented Mar 27, 2020

Choose a reason for hiding this comment

VSadov Mar 27, 2020 • edited Loading

Choose a reason for hiding this comment

Maoni0 Mar 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VSadov Mar 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VSadov Mar 27, 2020 • edited Loading

Choose a reason for hiding this comment

VSadov commented Mar 31, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Maoni0 commented Apr 1, 2020

ghost commented Apr 1, 2020

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

Maoni0 left a comment

Choose a reason for hiding this comment

VSadov Mar 27, 2020 •

edited

Loading

Maoni0 Mar 27, 2020 •

edited

Loading

VSadov Mar 27, 2020 •

edited

Loading

VSadov Mar 27, 2020 •

edited

Loading

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (`@msftbot`) and give me an instruction to get started! Learn more here.