-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API Proposal: SequenceReader<T>.TryRead overloads to read a specified number of elements #30778
Comments
We should consider adding |
The problem I'm seeing with this approach over my other suggestion here (https://github.com/dotnet/corefx/issues/40885) is the code flow.
Also this does not allow elegant parsing code like in the "Example 2" here: https://github.com/dotnet/corefx/issues/40885 |
The
I don't think we should be turning the reader into a random access reader. The APIs are specifically designed around performance, skipping around to random
Only our positioning APIs throw, and they never should if your code is written correctly. I don't think we'll want to change the design so that some read methods throw and some don't. It also makes writing high performance consuming code more difficult as
While you can't write an inline call with the out, it allows the no-throw design for high performance. The current design is pretty consistent, I don't think we want to mix another usage pattern in. |
I have written high performance parsing code and the bottom line was that I was faster by copying to a byte array and parsing from there than by using the insufficient crippled API of the SequenceReader in its current form.
You're exactly hitting the problem here: "part of it" Now tell me, which code should a consumer of this API write, when he doesn't need to peek for a byte, but for an unsigned 32bit integer? |
No API is going to address every use case. I'm not sure what you're suggesting here.
We have binary reader extensions. We didn't put peek versions of those as there was no request. Please feel free to create a proposal if you think it's useful. You can build the same sort of helpers in the meantime. |
What proposing here is the functionality that any of those helper methods would need to implement. So why should we propose a hundred helper method when a single method would be sufficient? Those 6 extension methods that you are referencing is just a minimal subset of methods available from Also, .NET Core recently added hundreds of various TryParse(..) overloads working on spans (mostly ReadOnlySpan<char). The single method that I'm proposing here allows using all of them. Are you seriously suggesting that I should create proposals for duplicating all those TryParse and binary extension methods?
It's not about a specific single use case. It's about the ability to make this usable in a hundred of cases. That single method that I'm proposing would allow that. |
No. I'm suggesting you create
I don't see how understand how your proposal provides more functionality then this proposal or the "peek" option that doesn't advance the reader. Yes, you can do it inline, but I've already gone into the reasons why we are unlikely to do that. If you have examples to demonstrate where your proposal provides functionality this does not please share them. |
For the proposal example, here is what you currently would have to write: void Parse(ref ReadOnlySequence<byte> buffer)
{
var reader = new SequenceReader<byte>(buffer);
// Read the length prefix, validate we have enough content left
if (!reader.TryReadBigEndian(out int length) || reader.Remaining < length)
{
return;
}
ProcessPayload(reader.Sequence.Slice(reader.Position, length));
} With this proposal it becomes: void Parse(ref ReadOnlySequence<byte> buffer)
{
var reader = new SequenceReader<byte>(buffer);
// Read the length prefix, then read the payload
if (!reader.TryReadBigEndian(out int length) || !reader.TryRead(length, out ReadOnlySequence<byte> payload))
{
return;
}
ProcessPayload(payload);
} It's more awkward for a span in the current code as you'd have to use |
With my proposal (https://github.com/dotnet/corefx/issues/40885) it's a one-liner:
PS: I need UInt32, not Int32 - what would I do currently (except casting)? |
With your proposal what gets returned when length exceeds the bounds? |
Just cast, there is another discussion bout adding the unsigned overloads. There is no cost to casting, which is why we didn't add them originally- just trying to be deliberate in adding API so we didn't end up with a really huge surface area. #30580 |
An IndexOutOfRangeException (or similar). I have written parsing code that is processing large volumes of data and there's just a single try catch at the very top of the call stack. |
There is also some cost to adding the throw in the reader as well. I spent over two months getting the fast path as fast as possible- adding a single line or method call can measure in some of the cases. Perf not withstanding, we didn't go that route with the design here so it would not align with the rest of the API. |
The check for the condition ("buffer long enough") is required in either case. |
When we do throw we use throw "helpers" in these critical paths which put the throw in another method as a throw prevents the method from getting inlined. Just having another method call can also impact the ability of the JIT to inline / optimize in other ways. For the fast paths the overhead is very, very small so seemingly trivial things can have a measurable impact. |
I understand, but I'm not sure if the impact can be really significant considering that even the most elementary members are throwing exceptions - just think of an array's item accessor. |
Sure, bounds checking period has a cost and it is enough to measure in the fast paths here where you're effectively just doing Span operations. There are some incredibly subtle JIT interactions we're considering in the implementation here- matching arguments to sub methods, having a single final return, etc. As I mentioned, we spent months flogging this code, running perf measurements and looking at the output IL and final JIT'ted code. It was quite the slog trying to squeeze every last drop of perf we could out of it (without resorting to unsafe code if possible). |
A few months ago - when I was struggling trying to make some reasonable use of ReadOnlySequence - I've been reading through a lot of discussions here on that subject involving @stephentoub and @davidfowl and a few others from which I gained the impression that - under acknowledgement of the limited accessibility of the ReadOnlySequence API - there should be introduced something allowing some real-world use of this - namely the SequenceReader. I had assumed that the low-level kind of use would be using ReadOnlySequence directly and the purpose of the SequenceReader would be to allow using it in a more convenient way. Or have I misunderstood the intended purpose of the SequenceReader? |
I don’t think it’s fair to say it doesn’t make using the ReadOnlySequence easier. That may not be the case for your specific scenario but it certainly does it make it easier to use for a majority of other scenarios and also makes several nasty edge cases completely disappear. |
It might also be good to have an API like this on ReadOnlySequence directly |
The intent was to make it easier to handle cross segment reading, with a high priority on performance given the desire to use this in performance critical situations (e.g. handling web requests). While we didn't purposefully want to make it hard to use, it was specifically designed to be high performance no matter how you use it. That meant that it had to be somewhat aligned with the reality of the data structure it was working with- which doesn't fall directly in line with models people are used to such as We're trying to be very deliberate with adding/expanding API to try and make sure that we're adding APIs that will stand up to the test of time. These APIs will be around for decades after all. :) |
I think the problem is that you haven't recognized yet, that there's a much wider range of potential for this technique. You have focused on and optimized for just a small range of specific use cases. But there's more - even cases where performance doesn't play a primary role. That's why I think that API proposals should not only be judged by how it would perform in those specific scenarios that you are currently focusing on. Of course, each public method should do its job with the best possible performance. |
I personally don't like the first overload. It's deceptively simple, and the only sensible approach when it comes up to a boundary would be to fail. For example: // Sequence:
// [\x08 1 2 3 4] [5 6 7 8]
bool TryReadMessage(ref ReadOnlySequence<byte> buffer, out string message)
{
var reader = new SequenceReader<byte>(buffer);
if (!reader.TryRead(out byte length) || !reader.TryRead(length, out ReadOnlySpan<byte> data))
{
message = null;
return false;
}
message = Encoding.UTF8.GetString(data);
return true;
} I would expect this method to return false, as the read could not read the specified number of bytes due to the boundary in the sequence. This, in my opinion, means that more code will be written to only use the second ReadOnlySequence overload, ignoring the first overload entirely. |
It wouldn't fail, it would allocate an array. |
That seems a bit odd to me considering most of these APIs seem to be designed to avoid that. |
There's already prior art with |
I see. Thanks for the clarification. |
Approved: https://github.com/dotnet/corefx/issues/40962#issuecomment-530060857 public bool TryRead(int length, out ReadOnlySpan<T> value);
public bool TryRead(long length, out ReadOnlySequence<T> value); |
@FiniteReality - You might be interested in my comment https://github.com/dotnet/corefx/issues/40871#issuecomment-530088885 regarding allocation probability. |
See https://github.com/dotnet/corefx/issues/40871#issuecomment-542928968
|
public partial ref struct SequenceReader<T>
{
bool TryReadExact(int count, out ReadOnlySequence<T> value);
} |
We talked about this API (and some others) more in #30807 (comment). Specifically, we discussed putting the On the topic of the Span overload, I think that keeping the overload is also important for having a congruent API surface. There are existing APIs like Of course, having some type of alternative which doesn't necessarily allocate is useful, especially since the buffer size can be much more predictable. But I think that the matching APIs should be added for this method, and likewise anything new we come up with should be added to |
Moved this to future and marked as up-for-grabs. @davidfowl if this is needed for 6.0.0, please let us know. |
This is fine for .NET 7 |
Hi, I would try and learn this. I have raised draft PR and will change its status when it is ready for review, thanks ! |
PR is ready for review now, please review, thanks. |
The use case is to return a contiguous
ReadOnlySpan<byte>
or aReadOnlySequence<byte>
the current position.One use case for this type of method is length prefixed network protocols e.g.
The text was updated successfully, but these errors were encountered: