New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read method without locals #65
Comments
It's weird - the distinction between object reference and managed pointer is of little value. Object reference is a manager pointer to a method table. The math should work, only the verifier could complain. But, .NET itself uses Calculating the array data offset and storing it in a However, we could change
On my current noisy machine where lots of stuff running this gives very significant throughput improvement for current master in Need to check I will send a PR for that. |
Brr, this throughput numbers mean very little with different batching on a noisy machine. Need more precise measurement and some extra work 😄 |
Yes, I know about that, but the spec is pretty explicit about it:
Though the current code is storing an |
You should calculate this using an array instead of a regular object ( |
@ltrzesniewski |
Exactly. Don't we want the offset between the first element and the method table, thus skipping the length slot? |
We can calculate it, but we cannot make it a JIT constant in easy way. So now we have on x64: |
Oh, ok, I see 👍
But that's exactly what |
By using Unsafe and not |
Oh, ok, sorry, I misunderstood what you were saying earlier 👍 |
Also this comment about managed pointers to zero: dotnet/coreclr#20386 So I'm confused. |
The current implementation is optimal for x-plat. For .NET Core it works even with simple |
I suppose the reason for having both But I'm very interested in the answer to your linked question. 🙂 |
I've compared the Read method with
Unsafe
implementation I used to use before. It's faster thanUnsafe
, but I found cases in my code where perf dropped by 9%. Without locals the perf improved as expected by 5% vs original or 14% vs the version with locals.Fir the Disruptor benchmark the performance is identical. My theory is that it's basically the same stuff, but with too many IL locals JIT sometimes gives up to optimize, or something from this genre. This particular benchmark was always very sensitive to locals even in the inlined method it does not directly go through.
The text was updated successfully, but these errors were encountered: