Skip to content
This repository was archived by the owner on Nov 27, 2018. It is now read-only.

Link Generator perf improvements. #788

Merged
rynowak merged 1 commit intorelease/2.2from
rynowak/linkgenerator-perf
Sep 14, 2018
Merged

Link Generator perf improvements. #788
rynowak merged 1 commit intorelease/2.2from
rynowak/linkgenerator-perf

Conversation

@rynowak
Copy link
Member

@rynowak rynowak commented Sep 7, 2018

Will be sending batches of small-ish improvments to LinkGenerator perf here. I'll be using LinkGenerationGithubBenchmark - this isn't a great benchmark in it's current state but it's useful enough as everything I'm doing so far is low-hanging fruit.

Note that since this benchmark is using TreeRouter as its baseline, anything we do that improves the TemplateBinder will also improve the LinkGenerator as this piece of infrastructure is shared.


ORIGINAL

Method Mean Error StdDev Op/s Scaled ScaledSD Gen 0 Allocated
TreeRouter 2.158 us 0.0136 us 0.0121 us 463,336.5 1.00 0.00 0.0114 1.14 KB
EndpointRouting 4.115 us 0.0383 us 0.0340 us 242,988.8 1.91 0.02 0.0229 2.16 KB

Add caching of TemplateBinder to LinkGenerator

Method Mean Error StdDev Op/s Scaled ScaledSD Gen 0 Allocated
TreeRouter 2.231 us 0.0443 us 0.0885 us 448,318.6 1.00 0.00 0.0114 1.14 KB
EndpointRouting 3.583 us 0.0086 us 0.0062 us 279,108.8 1.61 0.06 0.0191 1.77 KB

This is an important step, because many of the rest of the optimizations we will make will rely on caching more data up front in TemplateBinder. Note that the existing old routing code paths already cache template binder instances.


Add a synthetic baseline

Method Mean Error StdDev Op/s Scaled ScaledSD Gen 0 Allocated
Baseline 374.0 ns 6.109 ns 5.714 ns 2,674,044.4 1.00 0.00 0.0010 112 B
TreeRouter 2,280.8 ns 45.307 ns 50.359 ns 438,445.5 6.10 0.16 0.0114 1168 B
EndpointRouting 3,620.2 ns 30.286 ns 25.290 ns 276,228.7 9.68 0.16 0.0153 1808 B

Adding a baseline based on RVD + string.Format. This helps with context, because this is similar to what someone would do naively if they hand-rolled link generation. As you can see we're 6-10x slower (but with a lot more features). I doubt we'll beat this, but we should try to get within 2-3x.


Move require keys to constructor

Method Mean Error StdDev Median Op/s Scaled ScaledSD Gen 0 Allocated
Baseline 357.7 ns 7.068 ns 12.56 ns 351.6 ns 2,795,693.2 1.00 0.00 0.0010 112 B
TreeRouter 2,230.5 ns 43.497 ns 44.67 ns 2,218.8 ns 448,329.7 6.24 0.24 0.0153 1168 B
EndpointRouting 3,513.3 ns 23.199 ns 21.70 ns 3,511.8 ns 284,635.8 9.83 0.33 0.0191 1720 B

The change here just looks like noise, but since this data is totally static, this is a change I wanted to make anyway.


Inline RouteValueDictionary.Enumerator.MoveNext()

Before

Method Mean Error StdDev Op/s Gen 0 Allocated
AddSingleItem 25.674 ns 0.6754 ns 0.7507 ns 38,949,914.8 0.0014 128 B
AddThreeItems 47.450 ns 0.9756 ns 0.8648 ns 21,075,037.0 0.0014 128 B
ForEachThreeItems_Array 37.850 ns 0.8199 ns 1.1223 ns 26,420,054.0 - 0 B
ForEachThreeItems_Properties 70.011 ns 0.6198 ns 0.5494 ns 14,283,368.7 - 0 B
GetThreeItems_Array 35.798 ns 0.5689 ns 0.5321 ns 27,934,391.6 - 0 B
GetThreeItems_Properties 102.160 ns 1.9926 ns 2.0463 ns 9,788,545.5 - 0 B
SetSingleItem 26.321 ns 0.5732 ns 0.6371 ns 37,992,393.4 0.0014 128 B
SetExistingItem 8.059 ns 0.2318 ns 0.5686 ns 124,089,448.2 - 0 B
SetThreeItems 55.578 ns 1.2162 ns 1.4478 ns 17,992,752.5 0.0013 128 B
TryGetValueThreeItems_Array 42.096 ns 0.6717 ns 0.5955 ns 23,755,317.1 - 0 B
TryGetValueThreeItems_Properties 108.665 ns 1.4041 ns 1.3134 ns 9,202,624.2 - 0 B

After

Method Mean Error StdDev Op/s Gen 0 Allocated
AddSingleItem 25.299 ns 0.5758 ns 0.6161 ns 39,527,730.0 0.0014 128 B
AddThreeItems 46.090 ns 0.1929 ns 0.1611 ns 21,696,905.9 0.0014 128 B
ForEachThreeItems_Array 28.433 ns 0.6276 ns 0.5870 ns 35,170,335.6 - 0 B
ForEachThreeItems_Properties 86.650 ns 1.0807 ns 1.0109 ns 11,540,619.4 - 0 B
GetThreeItems_Array 34.724 ns 0.7203 ns 0.6386 ns 28,798,117.2 - 0 B
GetThreeItems_Properties 105.932 ns 1.6163 ns 1.4328 ns 9,439,976.9 - 0 B
SetSingleItem 25.635 ns 0.4527 ns 0.3780 ns 39,008,981.5 0.0015 128 B
SetExistingItem 7.581 ns 0.0414 ns 0.0387 ns 131,912,083.0 - 0 B
SetThreeItems 53.223 ns 0.3683 ns 0.3075 ns 18,788,914.3 0.0013 128 B
TryGetValueThreeItems_Array 41.432 ns 0.8117 ns 0.6778 ns 24,135,851.9 - 0 B
TryGetValueThreeItems_Properties 112.407 ns 1.2656 ns 1.1838 ns 8,896,226.5 - 0 B

ThrowHelper for RVD keys

Method Mean Error StdDev Op/s Gen 0 Allocated
AddSingleItem 26.249 ns 0.6003 ns 1.0029 ns 38,096,944.7 0.0015 128 B
AddThreeItems 47.206 ns 1.0000 ns 1.3003 ns 21,183,860.8 0.0014 128 B
ForEachThreeItems_Array 27.826 ns 0.3924 ns 0.3671 ns 35,937,015.3 - 0 B
ForEachThreeItems_Properties 83.445 ns 0.5780 ns 0.4179 ns 11,983,932.1 - 0 B
GetThreeItems_Array 37.869 ns 0.1254 ns 0.1111 ns 26,407,170.4 - 0 B
GetThreeItems_Properties 108.817 ns 0.7426 ns 0.5369 ns 9,189,773.1 - 0 B
SetSingleItem 26.168 ns 0.5936 ns 1.1148 ns 38,215,340.8 0.0015 128 B
SetExistingItem 7.518 ns 0.1806 ns 0.1690 ns 133,021,088.3 - 0 B
SetThreeItems 49.023 ns 0.9677 ns 0.9052 ns 20,398,701.3 0.0014 128 B
TryGetValueThreeItems_Array 39.660 ns 0.6776 ns 0.6338 ns 25,214,193.4 - 0 B
TryGetValueThreeItems_Properties 109.138 ns 0.4628 ns 0.3613 ns 9,162,729.9 - 0 B

There's a pretty good improvement getting/setting items directly. A lot of this seems like noise, but it's a good change to make.

@rynowak rynowak requested a review from JamesNK September 7, 2018 22:49
// The uncommon case is that the propertyStorage is in use
if (dictionary._propertyStorage == null && ((uint)_index < (uint)dictionary._count))
{
Current = _dictionary._arrayStorage[_index];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use the local variable? dictionary._arrayStorage[_index];

If there is a reason not to then explain in a comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh spicy, yeah this is a mistake.

var dictionary = _dictionary;
if (dictionary._propertyStorage != null && ((uint)_index < (uint)dictionary._count))
{
var storage = _dictionary._propertyStorage;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local variable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's another one!

return true;
}

_index = _dictionary._count;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more


// If we're converting from properties, it's likely due to an 'add' to make sure we have at least
// the default amount of space.
capacity = Math.Max(DefaultCapacity, Math.Max(storage.Properties.Length, capacity));
Copy link
Member Author

@rynowak rynowak Sep 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this issue while testing the new TryAdd method. We want to always have a minimum of 4. The properties -> array path could cause you to end up with an array of size 1, which would immediately get resized to 2, then next time you add... you get the idea.

@rynowak
Copy link
Member Author

rynowak commented Sep 8, 2018

Adding RVD.TryAdd

Method Mean Error StdDev Op/s Gen 0 Allocated
AddSingleItem 27.157 ns 0.6063 ns 0.7884 ns 36,822,777.6 0.0015 128 B
AddThreeItems 52.742 ns 1.0700 ns 1.0988 ns 18,960,350.6 0.0015 128 B
ConditionalAdd_ContainsKeyAdd 44.228 ns 0.9189 ns 1.2881 ns 22,610,045.1 - 0 B
ConditionalAdd_TryAdd 31.000 ns 0.3510 ns 0.3111 ns 32,257,925.3 - 0 B
ForEachThreeItems_Array 29.312 ns 0.1200 ns 0.0937 ns 34,115,287.1 - 0 B
ForEachThreeItems_Properties 86.397 ns 1.5924 ns 1.4895 ns 11,574,424.1 - 0 B
GetThreeItems_Array 41.358 ns 0.8785 ns 1.2877 ns 24,178,907.6 - 0 B
GetThreeItems_Properties 116.323 ns 2.3833 ns 3.0989 ns 8,596,739.5 - 0 B
SetSingleItem 27.056 ns 0.5854 ns 0.6012 ns 36,960,761.4 0.0014 128 B
SetExistingItem 8.573 ns 0.2228 ns 0.1739 ns 116,643,503.0 - 0 B
SetThreeItems 50.656 ns 1.1139 ns 3.2138 ns 19,740,867.8 0.0014 128 B
TryGetValueThreeItems_Array 43.504 ns 0.9272 ns 1.8302 ns 22,986,549.2 - 0 B
TryGetValueThreeItems_Properties 107.692 ns 4.1687 ns 4.9625 ns 9,285,738.1 - 0 B

Notice that the new TryAdd is significantly faster than the old pattern.

@rynowak
Copy link
Member Author

rynowak commented Sep 8, 2018

Use registers for RVD array accesses

Method Mean Error StdDev Median Op/s Gen 0 Allocated
AddSingleItem 26.665 ns 0.6040 ns 1.1638 ns 26.457 ns 37,501,751.5 0.0014 128 B
AddThreeItems 52.663 ns 1.1245 ns 1.3810 ns 52.048 ns 18,988,676.1 0.0014 128 B
ConditionalAdd_ContainsKeyAdd 40.976 ns 0.8786 ns 1.8145 ns 40.109 ns 24,404,551.5 - 0 B
ConditionalAdd_TryAdd 28.928 ns 0.6439 ns 1.0398 ns 28.967 ns 34,568,660.7 - 0 B
ForEachThreeItems_Array 28.222 ns 0.6237 ns 1.1086 ns 28.173 ns 35,433,723.5 - 0 B
ForEachThreeItems_Properties 86.009 ns 1.7731 ns 2.9133 ns 84.871 ns 11,626,688.6 - 0 B
GetThreeItems_Array 39.507 ns 0.5708 ns 0.5339 ns 39.641 ns 25,311,751.7 - 0 B
GetThreeItems_Properties 110.002 ns 1.2838 ns 1.1380 ns 110.307 ns 9,090,736.0 - 0 B
SetSingleItem 26.825 ns 0.7262 ns 0.9184 ns 26.603 ns 37,278,651.0 0.0014 128 B
SetExistingItem 8.049 ns 0.2207 ns 0.2453 ns 7.948 ns 124,244,539.7 - 0 B
SetThreeItems 50.941 ns 1.1584 ns 1.9982 ns 50.245 ns 19,630,564.1 0.0014 128 B
TryGetValueThreeItems_Array 39.181 ns 0.4336 ns 0.3136 ns 39.250 ns 25,522,890.6 - 0 B
TryGetValueThreeItems_Properties 112.249 ns 2.0146 ns 1.8845 ns 112.823 ns 8,908,765.0 - 0 B

@rynowak
Copy link
Member Author

rynowak commented Sep 8, 2018

Inlining for RVD TryGetValue

Method Mean Error StdDev Op/s Scaled ScaledSD Gen 0 Allocated
Baseline 339.8 ns 2.371 ns 2.218 ns 2,942,933.5 1.00 0.00 0.0010 112 B
TreeRouter 2,125.0 ns 8.415 ns 6.085 ns 470,594.7 6.25 0.04 0.0114 1168 B
EndpointRouting 3,581.5 ns 69.821 ns 77.606 ns 279,211.2 10.54 0.23 0.0153 1720 B

This actually makes our baseline much faster.

_acceptedValues.Add(key, value);
}
#endif

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: too liney

@rynowak
Copy link
Member Author

rynowak commented Sep 12, 2018

Rewrite of TemplateBinder.GetValues

Method Mean Error StdDev Op/s Scaled ScaledSD Gen 0 Allocated
Baseline 531.1 ns 3.050 ns 2.704 ns 1,882,801.6 1.00 0.00 0.0010 112 B
TreeRouter 2,558.8 ns 13.973 ns 12.387 ns 390,806.9 4.82 0.03 0.0114 1136 B
EndpointRouting 3,827.5 ns 17.777 ns 14.845 ns 261,269.6 7.21 0.04 0.0153 1600 B

TreeRouter getting much faster with these changes, as I'm improving the shared stuff.

@rynowak
Copy link
Member Author

rynowak commented Sep 12, 2018

@JamesNK - I plan to sort of keep iterating on this for the next day or so before merging any of this stuff. I'll probably fork the RVD changes into a separate PR since that's pretty isolated.

@rynowak rynowak force-pushed the rynowak/linkgenerator-perf branch from b308edc to c08cee2 Compare September 12, 2018 07:40
metadata: new[] { new RouteValuesAddressMetadata(routeName: null, new RouteValueDictionary(new { controller = "Home", action = "In?dex", })) });
"Home/Index/{id}",
defaults: new { controller = "Home", action = "Index", },
metadata: new[] { new RouteValuesAddressMetadata(routeName: null, new RouteValueDictionary(new { controller = "Home", action = "Index", })) });
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests were invalid as-written. It should not be possible to have a route value appear as both a parameter and a required key.

@rynowak
Copy link
Member Author

rynowak commented Sep 12, 2018

@JamesNK - OK I think I've shaken out most of the obvious stuff here. Could you take another look?

rynowak added a commit to aspnet/HttpAbstractions that referenced this pull request Sep 12, 2018
Porting changes from perf work in
aspnet/Routing#788

Includes porting/adding the RVD benchmarks, as well as a new TryAdd
method.

var canCopyParameterAmbientValues = true;
// Make a new copy of the slots array, we'll use this as 'scratch' space.
var slots = new KeyValuePair<string, object>[_slots.Length];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be stackalloced?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment that it will be used to back a RVD

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't stack this anyway since it contains reference types 😁

// At this point we've captured all of the 'known' route values, but we have't
// handled an extra route values that were provided in 'values'. These all
// need to be included in the accepted values.
var acceptedValues = RouteValueDictionary.FromArray(slots);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I guess it can't be stackalloced since it is used to back the RVD

}
}
[DebuggerDisplay("explicit null")]
private class SentinullValue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lolled IRL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are comments where this is used but it would be good to have a sentence explaining what this is for here

rynowak added a commit to aspnet/HttpAbstractions that referenced this pull request Sep 13, 2018
Porting changes from perf work in
aspnet/Routing#788

Includes porting/adding the RVD benchmarks, as well as a new TryAdd
method.
rynowak added a commit to aspnet/HttpAbstractions that referenced this pull request Sep 13, 2018
Porting changes from perf work in
aspnet/Routing#788

Includes porting/adding the RVD benchmarks, as well as a new TryAdd
method.
@rynowak rynowak force-pushed the rynowak/linkgenerator-perf branch from 0726ccd to 88455cf Compare September 14, 2018 02:02
@rynowak rynowak merged commit 426a48a into release/2.2 Sep 14, 2018
@rynowak rynowak deleted the rynowak/linkgenerator-perf branch September 14, 2018 02:11
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants