Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix invalid Box<TKey> and number dereference in dict. implementations... #23

Merged
merged 5 commits into from
Jul 8, 2023

Conversation

neon-sunset
Copy link
Contributor

@neon-sunset neon-sunset commented Jul 1, 2023

By the time we reach _keyComparer.Equals(key.Value, entryKey.Value) in TryClaimSlotForPut/Copy where entryKey is ref Box<TKey>, the reference might have been changed, leading to catching null reference exception or observing wrong value. Solve this by passing initially dereferenced entryKeyValue instead, as was probably intended in the first place. Fix this for number primitive implementations too.

Additionally:

  • Speed up tests by using Parallel.For over Parallel.ForEach(enumerable)
  • Slow down tests back by increasing concurrent iterations count for reliability, reduce code duplication
  • Fix CLI builds by removing IsTrimmable, dotnet sdk seems to have gotten unhappy about it when targeting NS2.0 or 2.1
  • Do not cache typeof(TValue).IsValueType because it is a JIT constant
  • Remove TC off from builds, it has never really been an issue with default configuration and JIT can now do either OSR or classic TC when there is no loop, enable DynamicPGO too

Fixes neon-sunset/fast-cache#52

- By the time we reach _comparer(key.Value, entryKey.Value) where entryKey is `ref Box<TKey>`, the reference might have been changed, leading to catching null reference exception or obvserving wrong value. Solve this by de-referencing `key` exactly once within method body. Also solve this for number primitive implementations.
- Speed up tests by using Parallel.For over Parallel.ForEach(enumerable)
- Slow down tests back by increasing concurrent iterations count for reliability, reduce code duplication
- Fix CLI builds by removing IsTrimmable, dotnet sdk seems to have gotten unhappy about it when targeting NS2.0 or 2.1
- Do not cache typeof(TValue).IsValueType because it is a JIT constant
- Remove TC off from builds, it has never really been an issue with default configuration and JIT can now do either OSR or classic TC when there is no loop, enable DynamicPGO too
@neon-sunset
Copy link
Contributor Author

With all previous comments resolved, is there anything else I need to address?

@VSadov
Copy link
Owner

VSadov commented Jul 8, 2023

I am getting somewhat weird results with dynamic PGO enabled.

without PGO I see roughly

======== TryGet NonBlocking object->string 1M Ops/sec:
54.74 109.6 153.1 219.7 274.8 319.4 375.3 426.2 481.0 532.8 584.7 634.4 674.1 720.6 762.9 752.2 761.7 777.2 778.8 791.8 798.6 807.6 813.6 809.4 . . .

with PGO I see

======== TryGet NonBlocking object->string 1M Ops/sec:
38.65 35.29 25.69 22.49 21.86 20.64 20.78 20.05 641.2 715.4 777.6 833.0 898.6 948.4 968.4 999.3 1016 1038 1050 1067 1074 1084 1077 1086 1081

I.E. It looks like we get a huge improvement in the throughput from the PGO, but it takes tens of seconds and millions of hashtable Get operations before the benefits kick in. Before that happens the perf is pretty poor.

I am inclined to keep PGO enabled in the benchmark project, since it results in a better perf and thus should be the desired mode for which we optimize. Besides it is just the default, which is easy to change and is often changed during perf investigations anyways, but I wonder if what I see is a normal behavior.

CC: @EgorBo - is it normal for PGO effects to be so delayed?

@VSadov
Copy link
Owner

VSadov commented Jul 8, 2023

With all previous comments resolved, is there anything else I need to address?

@neon-sunset I do not see anything else that needs changing.
Let me run a few tests/benchmarks, to be sure all is ok, and I will merge if nothing too surprising comes up.

@EgorBo
Copy link

EgorBo commented Jul 8, 2023

CC: @EgorBo - is it normal for PGO effects to be so delayed?

Can you try again with DOTNET_TC_CallCountingDelayMs=0?

@VSadov
Copy link
Owner

VSadov commented Jul 8, 2023

Can you try again with DOTNET_TC_CallCountingDelayMs=0?

does not seem to have any effect

C:\Program Files\Microsoft Visual Studio\2022\Preview>C:\NonBlocking\NonBlocking\BenchmarksCore\bin\Release\net7.0\BenchmarksCore.exe
Timer: High Resolution
======== TryGet NonBlocking object->string 1M Ops/sec:
39.13 31.35 21.25 19.81 19.79 19.37 18.32 18.13 636.9 708.0 766.0 839.0 895.3 ^C
C:\Program Files\Microsoft Visual Studio\2022\Preview>
C:\Program Files\Microsoft Visual Studio\2022\Preview>
C:\Program Files\Microsoft Visual Studio\2022\Preview>
C:\Program Files\Microsoft Visual Studio\2022\Preview>set DOTNET_TC_CallCountingDelayMs=0

C:\Program Files\Microsoft Visual Studio\2022\Preview>C:\NonBlocking\NonBlocking\BenchmarksCore\bin\Release\net7.0\BenchmarksCore.exe
Timer: High Resolution
======== TryGet NonBlocking object->string 1M Ops/sec:
39.80 31.26 22.84 20.78 19.52 18.84 18.08 17.51 656.6 703.4 793.1 ^C
C:\Program Files\Microsoft Visual Studio\2022\Preview>

- Don't cache typeof(T).IsValueType in a duplicate FromObjectValue either
- Simplify throw helper code
@EgorBo
Copy link

EgorBo commented Jul 8, 2023

Can you try again with DOTNET_TC_CallCountingDelayMs=0?

does not seem to have any effect

Ouch, but I see it's net7, right? We've landed many improvements for startup in net8 - can you try with at least net8.0 preview 5 (or later) ?

@VSadov
Copy link
Owner

VSadov commented Jul 8, 2023

can you try with at least net8.0 preview 5 (or later) ?

Right. 8.0 does not have the issue!

And it seems the overall performance is a bit better too.
(these numbers are the throughput for different number of threads and the higher the better)

C:\Program Files\Microsoft Visual Studio\2022\Preview>C:\NonBlocking\NonBlocking\BenchmarksCore\bin\Release\net7.0\BenchmarksCore.exe
Timer: High Resolution
======== TryGet NonBlocking object->string 1M Ops/sec:
40.86 32.82 21.85 20.42 20.21 19.41 19.27 18.99 670.8 738.8 818.4 883.7 920.5 970.4 988.5 1002 1022 1041 1044 1056 1066 ^C
C:\Program Files\Microsoft Visual Studio\2022\Preview>
C:\Program Files\Microsoft Visual Studio\2022\Preview>C:\NonBlocking\NonBlocking\BenchmarksCore\bin\Release\net8.0\BenchmarksCore.exe
Timer: High Resolution
======== TryGet NonBlocking object->string 1M Ops/sec:
59.96 124.0 179.0 222.0 291.9 351.1 411.2 464.6 737.8 815.8 897.6 975.6 1045 1077 1126 1124 1122 1130 1151 1170 1177 1184 ^C

Copy link
Owner

@VSadov VSadov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!!

@VSadov VSadov merged commit 5e8f686 into VSadov:main Jul 8, 2023
@neon-sunset neon-sunset deleted the fix-invalid-deref-and-other branch July 8, 2023 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[NonBlocking] NullReferenceException
3 participants