New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cached type shallow copy availability #1586
Cached type shallow copy availability #1586
Conversation
f115b09
to
6baf0ce
Compare
static readonly Dictionary<RuntimeTypeHandle, string> typeNameCache = new Dictionary<RuntimeTypeHandle, string>(); | ||
static readonly Dictionary<RuntimeTypeHandle, string> typeKeyStringCache = new Dictionary<RuntimeTypeHandle, string>(); | ||
static readonly Dictionary<RuntimeTypeHandle, byte[]> typeKeyCache = new Dictionary<RuntimeTypeHandle, byte[]>(); | ||
|
||
static TypeUtilities() | ||
{ | ||
shallowCopyableValueTypes[typeof(Decimal).TypeHandle] = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TypeHandle
uses underlying system type in GetHashCode()
method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't understand this comment. Do you mean that using typeof()
instead of typeof().TypeHandle
as a key here is more efficient? I don't understand why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, calling GetHashCode()
on RuntimeTypeHandle
is ~ 10% slower than on the Type
and cost of accesing TypeHandle
property adds additional 20%,
I like it. The one thing is to try back with the dictionary instead of concurrent dictionary: keep all you did (cache primitives dynamically), but use regular dictionary with lock. Alternatively, in addition to what you did with the order, we can have a smarter 2 dictionaries approach: fast without lock and a 2nd one under lock and periodically move all items from the 2nd one to 1st one. |
I've already tried simple dictionary implementations with locks and with |
Really? You tried exactly the same code as in this PR, with concurrent dictionary and with regular dictionary with lock and regular dictionary with lock was 2-3 times slower in single threaded execution ?!? Interesting. OK. |
Yes,regular dictionary with locks was 2 times slower in single threaded execution than concurrent one. Perhaps because of lock on each read, while concurrent one implements almost no-locking read. As for difference between simple dictionary without locks and |
multithreaded without locks is incorrect, so makes no sense, right? |
If dictionary content doesn't change - it can me accesed concurrently without locks. |
OK, this change looks legit to me. You would still want to do a perf. check before merging @sergeybykov, to be sure. |
Also, the price of the |
16 ms for million operations seems like very very low to me. I wonder if optimizing that is even worth our time. |
I like it. Although any reason to have this new one use I would expect to have consistency among them, unless there's a specific reason not to. I do agree that having immutable dictionaries (not really using |
8583f1a
to
45d47c4
Compare
Updated to have consistency among cache collections. |
I'm running perf tests to confirm that there is no perf regression here. Looking at the code I can't imagine there would be. |
Interestingly enough, I see no perf impact on one of the ping tests, but a significant increase in throughput in the other one. PingLoadTest_LocalReentrant: 58,984 vs. 58,416 last night Maybe @gabikliot has an idea how this is possible. I'm dumbfounded so far. |
@sergeybykov I'd suggest to rerun those tests, but if you believe in their reliability then I don't have explanation either. |
@dVakulen I will rerun them. However, the numbers on these tests have been very stable. I looked several days back. That's why I'm puzzled by the sudden jump. It's a good thing, of course. Just uncomfortable that I don't understand how this is possible. |
@sergeybykov To which builds this two results corresponds? |
@sergeybykov, not sure how big the jump is, but I was expecting an increase in throughput. We were locking: even without contention, locking is very costly. Nevertheless, having many cores, I would expect this to be contested normally for reading depending on the scenario. |
NightlyLoadTest: 175,165 vs. 166,106 last night - ~5% gain. @jdom I'm not as much surprised by the gain as by the difference between PingLoadTest_LocalReentrant and PingLoadTest_RandomReentrant_MultiSilos. |
ActivationCollectorStressTest: 205,393 vs. 189,404 = ~8% gain. I think I'll merge it now because there is clearly no perf regression, and will rerun the tests against master. |
@dVakulen Obviously, bug thanks! IIRC this the highest perf gains for a single PR! |
@sergeybykov Glad to hear. |
I think I understand the diff between PingLoadTest_LocalReentrant and The other question is why this improvement was so big. I would speculate that it is not so much due to ConcurrentDic vs Ditionary plus lock, but more due to direct check in the dictionary vs. checking all the if I remember correctly, the PingLoadTest send Great job @dVakulen ! |
@gabikliot Makes sense. |
Basically, the take away I think is that operations on |
Addresses #1305,
Checking for ability of shallow copying is taking considerable amount of CPU:
Benchmark, 3 000 000 repeats:
Benchmark source code:
https://gist.github.com/dVakulen/f34be5ba5024d008d3d2#file-orleansshallowcopyablebenchmark
Expected perfomance improvement: 3x for best case(multithreaded checking of the primitive), 67x for the worst one (checking of the class with
ImmutableAttribute
).Note that for lightweight classes making full copy may actually be faster than checking existence of the
ImmutableAttribute
.