Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory and ReadOnlyMemory validation errors not matching #23670

Closed
Drawaes opened this issue Sep 28, 2017 · 52 comments
Closed

Memory and ReadOnlyMemory validation errors not matching #23670

Drawaes opened this issue Sep 28, 2017 · 52 comments
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Memory help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@Drawaes
Copy link
Contributor

Drawaes commented Sep 28, 2017

Updated as per the suggestion of a create method
Updated with design from @stephentoub which allows all cases to be covered
Updated with design change from @benaadams to allow type inference for T
Put in namespace and removed the Create to leave only the Dangerous Create
Added question around moving current Dangerous Create Method

Rationale

A major use case of [ReadOnly]Span/Memory is to replace handing around array buffers and their offsets and count.

One of the major benifits of the design as I see it as it moves bounds checks out to where the buffer is created which is excellent. However when upgrading legacy code there seems to be a blocker in that stream defines the triplet to be

public int SomeMethod(byte[] buffer, int offset, int count);

This is normally then teamed up with checks on

  1. "is buffer null" == null argument exception
  2. "is offset negative or past the end of the buffer?" == argument out of range exception with offset as field referenced
  3. "is count > than buffer.length, or < 0 or count +offset > buffer.length" == argument out of range exception

The issue with the way it currently is, that for anything that takes that triplet you have to manually do validation on the inputs before creating a Memory or risk having exceptions with names that don't match.

This causes double validation to take place, once in the legacy code and once in the Memory creation. As Memory is often used on the "hot paths" of code it is a penalty for using memory.

Proposal

Add "unsafe" methods to create memory and readonly memory that avoid the extra checks. Then the checks can be maintained in the code with the same error messages and the double check penalty doesn't have to be paid.

namespace System.Runtime.InteropServices
{
    public static class Span
    {
        public static Span<T> DangerousCreate(T[] array, int start, int length);
    ...
    }
    // ... same for ReadOnlySpan<T>

    public static class Memory
    {
        public static Memory<T> DangerousCreate(T[] array, int start, int length);
    }

    // ... same for ReadOnlyMemory<T>
}

Usage

byte[] buffer;

var span = Span.DangerousCreate(buffer, offset, count);
// vs
var span = Span<byte>.DangerousCreate(buffer, offset, count);

Outstanding Questions

Should the existing method

[MethodImpl(MethodImplOptions.AggressiveInlining)]
[EditorBrowsable(EditorBrowsableState.Never)]
public static Span<T> DangerousCreate(object obj, ref T objectData, int length)

Be moved to the new types and should the IDE hiding be removed?

References

dotnet/corefx#24295 The Code this came up in (It means an extra validation path without it)

Below no longer matters as the legacy code can maintain it's own named validation.



|Class|ArrayName|Start|Length|
|---|---|---|---|
|FileStream|buffer|offset|count|
|NetworkStream|buffer|offset|count|
|BufferedStream|buffer|offset|count|
|MemoryStream|buffer|offset|count|
|StreamReader|buffer|index|count|
|StreamWriter|buffer|index|count|
|Stream|buffer|offset|count|
|SslStream|buffer|offset|count|
@benaadams
Copy link
Member

Currently its

 ReadOnlyMemory(T[] array, int start, int length)

Change to

 ReadOnlyMemory(T[] buffer, int offset, int count)

Seems good

cc: @ahsonkhan, @KrzysztofCwalina

@ahsonkhan
Copy link
Member

ahsonkhan commented Sep 28, 2017

Would the same apply to Span/ReadOnlySpan?
https://github.com/dotnet/coreclr/blob/master/src/mscorlib/shared/System/Span.cs#L71-L78

This should go through API review to make sure we want this change and apply it in other places too (to remain consistent).

@benaadams
Copy link
Member

Would the same apply to Span/ReadOnlySpan?

/cc @stephentoub

@Drawaes
Copy link
Contributor Author

Drawaes commented Sep 28, 2017

Yeah it looks from a quick tour around that it is the same for the non async methods that take span.

@ahsonkhan If you want I can update the top comment to be in proper API review form?

@ahsonkhan
Copy link
Member

ahsonkhan commented Sep 28, 2017

If you want I can update the top comment to be in proper API review form?

Yes please. Tyvm.
We have other APIs in Span that use start and length too, like Slice:

public Span<T> Slice(int start, int length)

@Drawaes
Copy link
Contributor Author

Drawaes commented Sep 28, 2017

Updated

@KrzysztofCwalina
Copy link
Member

We cannot rename Span.Length to Span.Count. Span is like an array and arrays have Length property.

I also don't want the ctor parameter to be called "count" but the property it initialized being called "Length"

To solve the problem outlined in this issue, could we add a helper method that would create Span out of T[], offset, count while validating the three parameters and throwing the right exception?

@Drawaes
Copy link
Contributor Author

Drawaes commented Sep 28, 2017

That seems like a reasonable idea to me.

@stephentoub
Copy link
Member

Is the suggestion to have two different APIs for constructing a {ReadOnly}Span/Memory<T> from a T[]/int/int triplet, just with different parameter names and thus different exceptions that get thrown?

@benaadams
Copy link
Member

benaadams commented Sep 28, 2017

To solve the problem outlined in this issue, could we add a helper method that would create Span out of T[], offset, count while validating the three parameters and throwing the right exception?

Precedent is the (Value)Tuple Create methods, so you don't always have to specify type

var t = ValueTuple.Create(array);

rather than

var t = new ValueTuple<int[]>(array);

So

public static class Span
{
    public static Span<T> Create<T>(T[] buffer, int offset, int count) 
    {
        // Validate
        return Span<T>.DangerousCreate(buffer, offset, count);
    }
}

For

var span = Span.Create(array, offset, count);

Rather than

var span = new Span<int>(array, offset, count);

Obv Tuple is more annoying as it has many type params rather than just one

@Drawaes
Copy link
Contributor Author

Drawaes commented Sep 28, 2017

Removed my comment as it was out of order due to my phone ;) Added @benaadams / @KrzysztofCwalina 's idea as the main API request above.

@benaadams
Copy link
Member

Would need to be in a new non-generic static class; else you'd have to specify the type anyway and it would loose some advantages; so would be a bit of a weird addition other than the change in names of params

e.g.

Span<int>.Create(array, offset, count);

rather than

Span.Create(array, offset, count);

@Drawaes
Copy link
Contributor Author

Drawaes commented Sep 28, 2017

Would it work to add an unsafe unchecked internal method/constructor in coreclr, then add the same to the typeforwarded types in corefx. Finally then a helper method could be added to CoreFX.

This would only need to be internal as most streams are in corefx very few external custom ones I imagine. That way there are no external API additions or changes?

@benaadams
Copy link
Member

benaadams commented Sep 28, 2017

DangerousCreate is there for Span anyway, though its checking length (but not other params) https://github.com/dotnet/corefx/blob/master/src/System.Memory/src/System/Span.cs#L108-L113

@Drawaes
Copy link
Contributor Author

Drawaes commented Sep 28, 2017

So would just need something like that for memory. Would pay for a double length check but it would be less than a full check on all the params.

@stephentoub
Copy link
Member

stephentoub commented Oct 9, 2017

could we add a helper method that would create Span out of T[], offset, count while validating the three parameters and throwing the right exception?

How about just adding a Dangerous/Unsafe factory for {ReadOnly}Memory<T> that doesn't do any argument validation at all, leaving it up to the call site? Then I can do my own argument validation if needed, and places where I know the arguments are valid I can skip the checks entirely. There are a bunch of places I'm seeing where, as I'm switching over from using array-based to memory-based calls, I'm having to construct {ReadOnly}Memory<T> instances from T[]/offset/count triplets that I know are all valid.

@Drawaes
Copy link
Contributor Author

Drawaes commented Oct 9, 2017

Seems like that is going to be the most flexible option.

@Drawaes
Copy link
Contributor Author

Drawaes commented Oct 9, 2017

@stephentoub do you have a shape in mind for this factory (don't call it factory please :P ) I will happily update the top comment with the idea.

@stephentoub
Copy link
Member

do you have a shape in mind for this factory

public ref struct Span<T>
{
    public static Span<T> DangerousCreate(T[] array, int start, int length);
    ...
}
// ... same for ReadOnlySpan<T>

public struct Memory<T>
{
    public static Memory<T> DangerousCreate(T[] array, int start, int length);
    ...
}
// ... same for ReadOnlyMemory<T>

@Drawaes
Copy link
Contributor Author

Drawaes commented Oct 9, 2017

Updated top issue.

@benaadams
Copy link
Member

benaadams commented Oct 9, 2017

Push it out to a static class and can get type inference and throw in the Create with renamed parameter validation?

public static class Span
{
    public static Span<T> Create(T[] buffer, int offset, int count)
    {
         // validate params
        return DangerousCreate(buffer, offset, count);
    }

    public static Span<T> DangerousCreate(T[] buffer, int offset, int count);
    ...
}
// ... same for ReadOnlySpan<T>

public static class Memory
{
    public static Memory<T> Create(T[] buffer, int offset, int count)
    {
         // validate params
        return DangerousCreate(buffer, offset, count);
    }

    public static Memory<T> DangerousCreate(T[] buffer, int offset, int count);
    ...
}
// ... same for ReadOnlyMemory<T>
byte[] array;

var span = Span.DangerousCreate(array, start, length);
// vs
var span = Span<byte>.DangerousCreate(array, start, length);

// drop in validation Exception matching
var memory = Memory.Create(array, start, length);

Also then the Throws can be in non-generic class

@Drawaes
Copy link
Contributor Author

Drawaes commented Oct 9, 2017

I prefer that, but @stephentoub your the one that gets to be in the API reviews, whats your thinking? (Apart from the fact we could never use the var in corefx ;) )

@stephentoub
Copy link
Member

Seems reasonable. @KrzysztofCwalina, @ahsonkhan, was there strong reasoning for the ctor-based span/memory creation vs factory-based span/memory creation?

@Drawaes
Copy link
Contributor Author

Drawaes commented Oct 10, 2017

@benaadams @stephentoub updated top issue to match Bens design

@KrzysztofCwalina
Copy link
Member

I am not sure it's worth adding two types to System root namespace just so we can have the factory methods.
Also, I don't think we should have Dangerous APIs on such types. We decided to move all dangerous APIs to Marshal-like type in System.Runtime.InteropServices.

@benaadams
Copy link
Member

benaadams commented Oct 10, 2017

Move the entire classes into System.Runtime.InteropServices the argument rename is for interop also :)

namespace System.Runtime.InteropServices
{
    public static class Span
    {
        public static Span<T> Create(T[] buffer, int offset, int count);
        public static Span<T> DangerousCreate(T[] buffer, int offset, int count);

@Drawaes
Copy link
Contributor Author

Drawaes commented Oct 10, 2017

I am easy on location, and on naming, the Type Inference is more "friendly" if you want it moved @KrzysztofCwalina what would you call the class if it went to CompilierServices?

public static class Unsafe
{
    public static Span<T> DangerousCreateSpan<T>(T[] buffer, int offset, int count);
    public static ReadonlySpan<T> DangeriousCreateReadonlySpan(T[] buffer, int offset, int count);
    public static Memory<T> DangerousCreateMemory<T>(T[] buffer, int offset, int count);
    public static ReadOnlyMemory<T> DangerousCreateReadonlyMemory<T>(T[] buffer, int offset, int count);
}

Means no new type, and they all show up in the IDE side by side. You also have made it super clear what the story is here?

@ahsonkhan
Copy link
Member

Move the entire classes into System.Runtime.InteropServices the argument rename is for interop also :)

@Drawaes, can you please update the original post with the namespace specified?

the Type Inference is more "friendly"

Don't we still get the type inference regardless of whether it lives in System or System.Runtime.InteropServices?

Do we need the Create methods? For the original concern about error matching, isn't DangerousCreate sufficient?

@Drawaes
Copy link
Contributor Author

Drawaes commented Oct 28, 2017

Does that look better?

  1. We have the namespace moved to System.Runtime.InteropServices
  2. We get type Inference
  3. Dropped the Create methods, not sure why there were still there as you are correct the DangerousCreate seems to cover all of issues I had.

:shipit:

@ahsonkhan
Copy link
Member

ahsonkhan commented Oct 28, 2017

Do the argument names in DangerousCreate have to be different than what we use in the Span constructor or can we continue to use index/length?

@Drawaes
Copy link
Contributor Author

Drawaes commented Oct 28, 2017

you are correct Sir they should match the Span names, as it doesn't matter what they are called for my purpose now, and .. consistency

I fixed it ( I left the example code as Array, offset to make the point about what they are likely to be ;) )

@ahsonkhan
Copy link
Member

ahsonkhan commented Oct 28, 2017

Sorry for the mistake. I meant start/length.

  • index -> start?
  • buffer -> array?

https://github.com/dotnet/corefx/blob/master/src/System.Memory/ref/System.Memory.cs#L46

public Span(T[] array, int start, int length)

Edit: Also, should we also add an overload that only takes an array (which skips the null and co-variance checks)?

@Drawaes
Copy link
Contributor Author

Drawaes commented Oct 28, 2017

No problem, I was being lazy and didn't check as well 🤣 (fixed)

@benaadams
Copy link
Member

benaadams commented Oct 28, 2017

Question would be, should the existing Span overload

[MethodImpl(MethodImplOptions.AggressiveInlining)]
[EditorBrowsable(EditorBrowsableState.Never)]
public static Span<T> DangerousCreate(object obj, ref T objectData, int length)

Also move to System.Runtime.InteropServices? T is specified by ref T objectData

@Drawaes
Copy link
Contributor Author

Drawaes commented Oct 28, 2017

Seems reasonable, the only query would be around shipping status and back compat.

@benaadams
Copy link
Member

Seems reasonable, the only query would be around shipping status and back compat.

Question for the api reviewers :)

@Drawaes
Copy link
Contributor Author

Drawaes commented Oct 28, 2017

Above our pay grades... :lol:

@terrajobst
Copy link
Member

terrajobst commented Oct 31, 2017

Video

We went back and forth, but it seems we're in agreement that these can be useful. We shouldn't expose static types Span and Memory in InteropServices. Instead, we should add them to the new MemoryMarshal type.

@ahsonkhan ahsonkhan removed their assignment Nov 2, 2017
@Drawaes
Copy link
Contributor Author

Drawaes commented Nov 2, 2017

Started a PR in coreclr dotnet/coreclr#14833

@jkotas
Copy link
Member

jkotas commented Nov 2, 2017

I do not think these APIs are good addition.

  • Memory is "slow" type. We are losing security and gaining nothing by having a constructor that omits bounds checks.
  • I am ok with having the low level constructor for Span, but it should be just DangerousCreate moved from the Span type to MemoryMarshal. I do not think we should have convenience constructors that take arrays to make skipping the bounds check easy.

@Drawaes
Copy link
Contributor Author

Drawaes commented Nov 2, 2017

The reason mainly is

Most of the stream types run the validation checks already. I couldn't remove these checks when updating to take memory from an array (or span for that matter) because it would change the exception message for the field that fails, making it a breaking change.

That means there are plenty of times that instead of the 5 checks (null, index < length & index > 0 using overflow for 1 check, index + count < length, length > 0) you end up with 10 checks everytime you call.

This might seem like a "slow" operation but often these async methods can return sync and with ValueTask there is no allocations. So say SslStream reads 16k frames, and then the client "sips" 2k from it, then the next call for 2k will be a sync very quick return and the 5 extra checks start to show.

@ahsonkhan
Copy link
Member

I do not think these APIs are good addition.

The reason mainly is

Would having these methods as internal only solve both concerns?

@jkotas
Copy link
Member

jkotas commented Nov 2, 2017

the 5 extra checks start to show

Memory(T[], int, int) has these precondinditions:

            if (array == null)
                ThrowHelper.ThrowArgumentNullException(ExceptionArgument.array);
            // This one should be completely optimized out for `Memory<byte>`.
            if (default(T) == null && array.GetType() != typeof(T[]))
                ThrowHelper.ThrowArrayTypeMismatchException();
            if ((uint)start > (uint)array.Length || (uint)length > (uint)(array.Length - start))
                ThrowHelper.ThrowArgumentOutOfRangeException();

So I see 3 extra checks - they will be like 7 well predicted instructions (and JIT may optimize some of it out). They should cost nothing compared to other overhead associated with Memory<T>.

Could you please share an example of realworld code that shows the problem?

@omariom
Copy link
Contributor

omariom commented Nov 3, 2017

@jkotas
Copy link
Member

jkotas commented Nov 3, 2017

I meant to e.g. have some measurement that shows how much faster a real world async Stream Read method will get if it has this special constructor. (Also, this is not what the code will look like because of the actual code is written using non-inlineable ThrowHelper.)

For better or worse, the framework APIs are designed with explicit argument checks. Yes, these argument checks add extra code everywhere. But the execution cost of these extra argument checks is relatively small. We have done experiments in the past with "fast and crash" modes where we removed things like bounds checking everywhere - the performance benefits of such modes were surprisingly small because of these checks tend to execute pretty fast. I am wondering whether we know what we are really getting out of these duplicated APIs.

I guess I should wait for the link to the API discussion to be posted to see what arguments were used to justify these APIs.

@ahsonkhan
Copy link
Member

I guess I should wait for the link to the API discussion to be posted to see what arguments were used to justify these APIs.

https://youtu.be/bHwCPVNQLwo?t=1h16m53s

@omariom
Copy link
Contributor

omariom commented Nov 3, 2017

@jkotas
I meant to illustrate that the comparison, as you said, is a few instructions with a couple of well predicted branches.

    mov esi, [ebp+0xc]
    test edx, edx
    jz L001e
    mov eax, [edx+0x4]
    cmp eax, esi
    jb L0049
    sub eax, esi
    cmp eax, [ebp+0x8]
    jb L0049

@jkotas
Copy link
Member

jkotas commented Nov 3, 2017

I have listened to the discussion.

My take is that if we are worried about redunant checks, we should tech the JIT to eliminate them. The JIT does elimate some of the duplicate checks already, e.g. null checks. For example, the code for this method:

static int FetchElementUsingSpan(int[] x)
{
    if (x == null)
        throw new ArgumentNullException();
    ReadOnlySpan<int> s = new ReadOnlySpan<int>(x);
    return s[0];
}

Is this:

test!My.FetchElementUsingSpan(Int32[]):
00007ff9`ea1108d0 56              push    rsi
00007ff9`ea1108d1 4883ec20        sub     rsp,20h
00007ff9`ea1108d5 4885c9          test    rcx,rcx
00007ff9`ea1108d8 7414            je      test!My.FetchElementUsingSpan(Int32[])+0x1e (00007ff9`ea1108ee)
00007ff9`ea1108da 488d4110        lea     rax,[rcx+10h]
00007ff9`ea1108de 8b4908          mov     ecx,dword ptr [rcx+8]
00007ff9`ea1108e1 83f900          cmp     ecx,0
00007ff9`ea1108e4 762b            jbe     test!My.FetchElementUsingSpan(Int32[])+0x41 (00007ff9`ea110911)
00007ff9`ea1108e6 8b00            mov     eax,dword ptr [rax]
00007ff9`ea1108e8 4883c420        add     rsp,20h
00007ff9`ea1108ec 5e              pop     rsi
00007ff9`ea1108ed c3              ret
...

Notice that there are two null checks: one in my method and other one in the ReadOnlySpan implementation itself. However, the JIT figured out that the second check is redundant and optimized it out. The JIT is not as good in eliminating the redundant checks for bounds checks today. It is something to fix in the JIT and not optimize API design around.

I do not think we should be adding duplicate APIs that differ just in having vs. not having argument validation. Having two ways to do a thing and making people to think about it is negative value for the framework. As always, I can be convinced that this make sense by data.

@Drawaes
Copy link
Contributor Author

Drawaes commented Nov 3, 2017

If the nulls are already removed and there is scope to look at either removal via the JIT then I would be good with removing the API review. It's been through the review process though so others would need to make the final choice.

How about this, I will do some measurements in an end to end scenario, if the measurements are in the margin of error we could remove It? If they aren't then there is something more to discuss.

@jkotas
Copy link
Member

jkotas commented Nov 3, 2017

@Drawaes Sounds good. Thanks!

@karelz
Copy link
Member

karelz commented Nov 21, 2017

FYI: The API review discussion was recorded - see https://youtu.be/bHwCPVNQLwo?t=4584 (23 min duration)

@tarekgh
Copy link
Member

tarekgh commented Jan 10, 2018

I am closing this review per the latest discussion on this issue. here are some comments:

  • We already have Span.DangerousCreate and we have another overlapping issue dotnet/corefx#26139 which proposing moving the Span.DangerousCreate to MemoryMarshal.DangerousCreate
  • @Drawaes can still provide his E2E scenario measurements and we can consider it if it is worth doing something more (e.g. adding MemoryMarshal.DangerousCreate which support creating Memory objects too). Or suggest any change in the dotnet/corefx#26139. @ahsonkhan is already aware of that and he can update dotnet/corefx#26139 as needed. @Drawaes, feel free to post your measurement numbers here or in dotnet/corefx#26139

@tarekgh tarekgh closed this as completed Jan 10, 2018
@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 2.1.0 milestone Jan 31, 2020
@dotnet dotnet locked as resolved and limited conversation to collaborators Dec 20, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Memory help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

No branches or pull requests