-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Span<T> wrapped in a struct isn't performing as fast as it could be #68797
Comments
Tagging subscribers to this area: @dotnet/area-system-memory Issue DetailsBrief intro:I'm developing a binary serialization library with two major requirements:
For requirement For requirement The problem:I noticed that a regular The method using public MethodHost_Span WriteInt32_Span(ref Span<byte> span, int value)
{
MemoryMarshal.Cast<byte, int>(span)[0] = value;
span = span.Slice(sizeof(int));
// Return of zero-size struct is needed for method chaining.
return default(MethodHost_Span);
} The method using public MethodHost_WrappedSpan WriteInt32_WrappedSpan(ref WrappedSpan wrapper, int value)
{
MemoryMarshal.Cast<byte, int>(wrapper.Span)[0] = value;
wrapper.Span = wrapper.Span.Slice(sizeof(int));
// Return of zero-size struct is needed for method chaining.
return default(MethodHost_WrappedSpan);
}
public ref struct WrappedSpan
{
public Span<byte> Span;
} Actual benchmark method is here. More details:
BenchmarkDotNet results with different data types:
ExpectedI would expect the wrapping struct to have no impact on the generated byte code. In other words, My questions
|
In the assembly you linked there are a lot of In sharplab almost the same code is generated. FWIW 1: Instead of FWIW 2: Instead of having -public MethodHost_Span WriteInt32_Span(ref Span<byte> span, int value);
+public MethodHost_Span WriteInt32_Span(Span<byte> span, int value, out int bytesWritten); So the consumer of that method has to slice the span. |
Hey @gfoidl, I am not sure why
I didn't know about As for your second point, having an Span<byte> w = buffer.Span;
BookDetailsMessage bookDetails = BookDetailsMessage.Build()
.WithIdentifier(ref w)
.WithTitle(ref w, book.Title)
.WithISBN(ref w, book.ISBN)
.WithPageCount(ref w, book.PageCount)
.WithYear(ref w, book.Year); The next method in the chain is still expecting a Finally, I don't think comparing |
Note: comment put inside a details to not distract too much from the main-concern you have. Re: As for your second point
I don't know if this the concrete usage, or just a demo that you showed. But by using a more standard builder-pattern, this can be accomplished quite nicely (and very efficient): using System.Buffers;
using System.Runtime.InteropServices;
using System.Text;
using BookDetailsMessageBuilder bookDetailsMessageBuilder = new(stackalloc byte[256]);
BookDetailsMessage bookDetailsMessage = bookDetailsMessageBuilder
.WithTitle(".NET is amazing Pt. 7")
.WithPageCount(42)
.Build();
public class BookDetailsMessage
{
private readonly byte[] _binaryData;
// Or hower it should be used.
public BookDetailsMessage(byte[] binaryData) => _binaryData = binaryData;
}
public ref struct BookDetailsMessageBuilder
{
private int _curPos;
private Span<byte> _buffer;
private byte[]? _bufferFromPool;
public BookDetailsMessageBuilder()
{
_curPos = 0;
_buffer = _bufferFromPool = ArrayPool<byte>.Shared.Rent(1024);
}
public BookDetailsMessageBuilder(Span<byte> initialBuffer)
{
_curPos = 0;
_buffer = initialBuffer;
_bufferFromPool = null;
}
public void Dispose()
{
if (_bufferFromPool is not null)
{
ArrayPool<byte>.Shared.Return(_bufferFromPool);
_bufferFromPool = null;
}
}
private Span<byte> CurrentBuffer => _buffer.Slice(_curPos);
public BookDetailsMessageBuilder WithTitle(string title)
{
// Validate that into the CurrentBuffer can be written, otherwise grow the buffer.
// E.g. by renting a larger buffer from the ArrayPool, and copying over the
// current contents. Omitted here for brevity.
int written = Encoding.UTF8.GetBytes(title, CurrentBuffer);
_curPos += written;
return this;
}
public BookDetailsMessageBuilder WithPageCount(int pageCount)
{
// Validate that into the CurrentBuffer can be written, otherwise grow the buffer.
// E.g. by renting a larger buffer from the ArrayPool, and copying over the
// current contents. Omitted here for brevity.
MemoryMarshal.Write(CurrentBuffer, ref pageCount);
_curPos += sizeof(int);
return this;
}
public BookDetailsMessage Build()
{
return new BookDetailsMessage(_buffer[.._curPos].ToArray());
}
} For the "grow" you can take inspiration at the code of |
It's worth noting that While I haven't done the investigation to look at why there are differences, there are several known cases where having wrapper structs may be less efficient. This ranges from
Some of this will only be possible to fix if we deviate from the native ABI and native code (C, C++, Rust compilers, etc) would likewise pessimize the result when inlining cannot happen. |
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsBrief intro:I'm developing a binary serialization library with two major requirements:
For requirement For requirement The problem:I noticed that a regular The method using public MethodHost_Span WriteInt32_Span(ref Span<byte> span, int value)
{
MemoryMarshal.Cast<byte, int>(span)[0] = value;
span = span.Slice(sizeof(int));
// Return of zero-size struct is needed for method chaining.
return default(MethodHost_Span);
} The method using public MethodHost_WrappedSpan WriteInt32_WrappedSpan(ref WrappedSpan wrapper, int value)
{
MemoryMarshal.Cast<byte, int>(wrapper.Span)[0] = value;
wrapper.Span = wrapper.Span.Slice(sizeof(int));
// Return of zero-size struct is needed for method chaining.
return default(MethodHost_WrappedSpan);
}
public ref struct WrappedSpan
{
public Span<byte> Span;
} Actual benchmark method is here. More details:
BenchmarkDotNet results with different data types:
ExpectedI would expect the wrapping struct to have no impact on the generated byte code. In other words, My questions
|
As Tanner notes, wrapping structs around types may not give the same performance as passing the underlying type. This is especially true when the data being wrapped is itself a multi-field struct (like Structs wrapping single-field structs are optimized in many but not all cases; see eg dotnet/coreclr#22867. There is a general concept of "recursive struct promotion" where optimizations like struct promotion can be applied to arbitrary structs with arbitrary nesting which we hope to address someday. There's no specific issue open for this and no timeline for when we might get around to working on it. |
With the same modifications as in #32415 (comment) physical promotion gets us:
Double checked a few of the cases and the codegen was equal between the |
Great to see so much progress on this. Thanks @jakobbotsch! Are the changes in #32415 (comment) already merged to main? I'm hoping to get a daily build to give this a try. Even without |
@essoperagma Not yet. I can let you know once I submit the PR to fix the remaining problem here, but it may still be a bit before that happens.
Nothing I'm aware of that would affect this benchmark, but it's likely various other improvements in the JIT affected it. Sadly it's a bit hard to check what exactly is the cause of the improvement. |
Brief intro:
I'm developing a binary serialization library with two major requirements:
For requirement
#1
, I'm heavily using Span. I'm quite happy with the performance and how clean the internal serializer implementation is.For requirement
#2
, I'm trying to prevent users from accidentally passing the wrong Span to a method.As a solution, I'm wrapping Span in my own struct:
WrappedSpan
. All my methods are requesting aWrappedSpan
and they won't accept aSpan
.This helps prevent mistakes via compile-time errors in the case that the user passes a
Span
instead of aWrappedSpan
to any of the serialization methods.The problem:
I noticed that a regular
Span<byte>
is performing 100% better than aSpan
wrapped in an otherwise-empty struct in some scenarios (many, small, chained method calls).The method using
Span<byte>
:The method using
WrappedSpan
:Actual benchmark method is here.
More details:
DOTNET_TieredPGO
doesn't impact the performance.BenchmarkDotNet results with different data types:
Expected
I would expect the wrapping struct to have no impact on the generated byte code. In other words,
WrappedSpan
performs just as fast as a regular Span.My questions
category:cq
theme:structs
skill-level:expert
cost:large
impact:medium
The text was updated successfully, but these errors were encountered: