Proposal: Fixed-size buffers enhancements #126

Closed
stephentoub opened this Issue Jan 28, 2015 · 21 comments

Projects

None yet
@stephentoub
Member

Background

Interop with code implemented with C and C++ often involves fixed-size buffers. For multiple releases C# has supported the creation of fixed-size arrays on the stack, using the ‘stackalloc’ keyword:

static unsafe void SomeMethod()
{
    int* block = stackalloc int[100];   // array of 100 integers on the stack
    …
}

It’s also supported the creation of fixed-size arrays embedded within structures, using the ‘fixed’ keyword:

internal unsafe struct MyBuffer
{
    public fixed char FixedBuffer[128]; // array of 128 chars in the struct
}

Problem

In both of these cases, however, the creation of the fixed-size array may only happen in an unsafe context, hence the usage of the 'unsafe' keyword on the method and struct, respectively, in each of these examples. Further, in the case of fixed-size buffers in structs, the type can only be of a limited set: bool, byte, short, int, long, char, sbyte, ushort, uint, ulong, float or double.

As C#'s reach is extended and interop becomes even more common-place, it’s desirable to be able to work with fixed-size arrays, in particular embedded in structures, while still getting the benefits of safe code, such as bounds checking.

Solution

If the language supports ref returns (#118), C# can support this without further syntax, with simply relaxing the existing constraints on fixed-size buffers and changing the code generated for these types. For example, consider the following type:

internal struct SYSTEM_POWER_LEVEL
{
    public byte Enable;
    public fixed byte Spare[3]; // inline array, as in C
    public uint BatteryLevel;
    public POWER_ACTION_POLICY PowerPolicy;
    public SYSTEM_POWER_STATE MinSystemState;
}

Exactly how this is implemented is up to the compiler, but logically the support could be thought of as being implemented as an anonymous struct with a field for every element and an indexer for accessing them:

internal struct SYSTEM_POWER_LEVEL
{
    public byte Enable;
    public <>s__ArrayStruct1 Spare;
    public uint BatteryLevel;
    public POWER_ACTION_POLICY PowerPolicy;
    public SYSTEM_POWER_STATE MinSystemState;
}

internal struct <>s__ArrayStruct1
{
    public byte Item0, Item1, Item2;
    public ref byte this[int index]
    {
        get {
            switch(index) {
                case 0: return ref Item0;
                case 1: return ref Item1;
                case 2: return ref Item2;
                default: Environment.FailFast();
            }
        }
    }
}

In the past, developers have resorted to hand-crafting such a solution, implementing helper types like this manually. Instead, the compiler would be capable of implementing all of that boilerplate for the developer in as efficient a manner as possible. (If there were concerns around the efficiency of the new code generation, it could potentially only be used when using a type that was previously unsupported or when using a fixed-size buffer in a safe context. Or alternatively new syntax for expressing this new form could be devised.)

@HaloFour

Seems to me that it would be preferable if fixed just required that the struct include a StructLayoutAttribute with an explicit Size value. That would allow the compiler to generate the buffer helper class to be the correct size and also to calculate out the appropriate offsets when dealing with a specific member of one of the structs.

@stephentoub
Member

@HaloFour, wouldn't you then need to define a new type for every different desired size?

Or maybe I'm just not understanding the proposal... can you provide a code example?

@mikedn
mikedn commented Jan 29, 2015

AFAIR simply setting the size via the StructLayoutAttribute doesn't work in certain interop scenarios. Basically it works only if the type containing the fixed array is blittable. If actual marshaling needs to be performed then nothing gets copied because the marshaler copies only the declared fields, not the entire struct.

@HaloFour

@stephentoub You should be at least specifying StructLayoutAttribute for all of the structs that you intend to use with interop anyway otherwise the layout of the struct in memory is entirely up to the runtime which could do such things as reorder the fields in order to pack them better, etc.

If the size and layout of the struct is known at compile time then I think there should be no reason that C# couldn't support fixed arrays of any struct. All the C# compiler does today is emit a helper struct containing a single member of the primitive type and explicitly sets the size of that helper struct to the known size of the entire fixed buffer. Then when attempting to reference an element from that buffer it calculates out the byte offset from the start position of the buffer.

[StructLayout(LayoutKind.Sequential)]
public unsafe struct Foo {
    public int X;
    public fixed int Y[10];
}

// turns effectively into

[StructLayout(LayoutKind.Sequential)]
public unsafe struct Foo {
    [StructLayout(LayoutKind.Sequential, Size = 40, Pack = 0)] // buffer size is known to be sizeof(int) * 10
    public struct FooFixedBufferHelper {
       public int FixedElementField;
    }

    public int X;
    public Foo.FooFixedBufferHelper Y;
}

// and

Foo f = new Foo();
f.Y[5] = 1234;

// turns into

Foo f = new Foo();
int* p = f.Y;
p[5] = 1234; // IL contains an offset of 20, or sizeof(int) * 5

If we also want to discuss making it easier to write blittable structs I'd be fine with that conversation. I'm just starting with what the C# compiler does today, and I think that with some minor enhancements the C# compiler could handle fixed buffers of arbitrary structs as long as they are annotated so that their size and offsets are predictable. There are other struct tricks that the CLR allows and can be accomplished manually with C# that could be nice like unions also.

@stephentoub
Member

You should be at least specifying StructLayoutAttribute for all of the structs that you intend to use with interop anyway otherwise the layout of the struct in memory is entirely up to the runtime which could do such things as reorder the fields

If you don't specify a StructLayoutAttribute on a struct, the C# compiler will emit one for you to specify Sequential.

just starting with what the C# compiler does today

Thanks; I understand what the compiler does today. This proposal wasn't just about supporting arbitrary structs, but about doing so with safe code.

@HaloFour

Thanks; I understand what the compiler does today. This proposal wasn't just about supporting arbitrary structs, but about doing so with safe code.

Hrm, okay. To me the very concept of trying to directly manipulate memory for interop purposes is inherently "unsafe", even if you manage to avoid the keyword. Something like this feels like it meets the letter of the law while not meeting the spirit. With what are you proposing to "interop" with? Just looking for more background as to the use cases.

@stephentoub
Member

To me the very concept of trying to directly manipulate memory for interop purposes is inherently "unsafe", even if you manage to avoid the keyword

At some level most code one writes in C# is doing interop; the question is how far down you can push and isolate the unsafeness. Yes, with this proposal I can still mess up my DllImport signature and cause problems, but if I get the DllImport correct, then the resulting struct can be used in safe code, with bounds-checked accesses to its contained fixed-size arrays, etc. In contrast, today even if I get the DllImport correct, I'm still required to use unsafe code to access the resulting struct, and I'm still able to cause serious problems when I use the resulting struct because I really am just indexing into whatever memory I want, without bounds checks, etc.

@HaloFour

Well you also have MarshalAs(UnmanagedType.ByvalArray) which gives you that functionality but at the cost of performance since in managed code you have to endure the allocation of the array and, IIRC, the marshaller copies the data before and after the P/Invoke call.

I guess that there probably isn't a cleaner way to implement an inline fixed-length array of arbitrary structs with safe code. Seems kludgy having to emit out all of those synthetic fields in place of each of the elements but the net result is no different than .size. The indexer bit still irks me, maybe because your bounds checking example uses Environment.FailFast. I wonder what the perf penalty is to that over pointer arithmetic. I still need to fully wrap my head around the prereq proposal of ref return.

This is again one of those places where I feel that the language is doing something "un-clean" to work around limitations that could be fixed in the runtime proper.

@xen2
xen2 commented Feb 2, 2015

If we also want to discuss making it easier to write blittable structs I'd be fine with that conversation. I'm just starting with what the C# compiler does today, and I think that with some minor enhancements the C# compiler could handle fixed buffers of arbitrary structs as long as they are annotated so that their size and offsets are predictable. There are other struct tricks that the CLR allows and can be accomplished manually with C# that could be nice like unions also.

I am also quite interested in having fixed buffer for arbitrary (blittable) struct.

When thinking about it, the problem I found with current construct (struct with custom size) was that using IntPtr would result in different StructLayout Size depending on if we are in 32 or 64 bits. Not sure if there is any way to work around that in emitted MSIL (to avoid runtime change).
Of course, not allowing IntPtr in blittable/fixable struct might be good enough for most use cases.

Also, "blittable" could become a generic constraint, so that we could use "T*" in unsafe generic methods.

@gafter gafter added the 1 - Planning label Feb 2, 2015
@MadsTorgersen MadsTorgersen was assigned by gafter Feb 2, 2015
@RobertCvn

Will this work in conjuction with MemoryMapped IO to read arrays of structures directly from disk?

@jakobbotsch

I asked about why 'fixed' did not support arbitrary types on Stackoverflow a while ago: http://stackoverflow.com/questions/18839085/why-can-fixed-size-buffers-only-be-of-primitive-types

I don't see why your solution with the ref returns could not be supported though, albeit it does not look very efficient. To support it efficiently, would it not need a change to the CLI spec (as Hans Passant quotes)?

@lorcanmooney
Contributor

I ported the SHA-3 algorithm to C# recently, and this would have made it a bit easier. The algorithms state is modelled with a 5x5 array of 64-bit integers, so supporting multidimensional arrays is something I'd like to see. On the surface at least, it seems like it would be a fairly easy transformation back to a 1-dimensional offset.

@gafter
Member
gafter commented Oct 17, 2015

@lorcanmooney Your wish is my command. The following now works in C# 6.

        ulong[,] array = new ulong[5, 5];
        array[1, 2] += array[2, 3];
@lorcanmooney
Contributor

Ah, classic @gafter.

@HaloFour

@gafter Given the issue I had assumed that @lorcanmooney meant multidimensional fixed buffers.

@lorcanmooney
Contributor

@HaloFour You are of course correct. My terminology was ... imprecise. On further reflection, maybe I should open a separate feature request for the multidimensional part, since it may not be directly tied to this proposal.

@HaloFour

@lorcanmooney Well, the "enhancements" in the title is also quite vague. 😀

Unless I'm missing something multidimensional fixed buffers seems like it would just be a question of math, emitting the correct multiplier per rank.

@gafter gafter added 0 - Backlog and removed 1 - Planning labels Nov 20, 2015
@MadsTorgersen MadsTorgersen was unassigned by gafter Nov 20, 2015
@asvishnyakov
Contributor

👍 It's maybe helpful not only for interop. When you write algorithms, like SHA in example above, you may want to use fixed-sized arrays.

@andresantacruz

Any progress here? Deciding if I go with C# or C++ and this is a critical issue getting me away from C#.

@MadsTorgersen
Contributor

This feels like something that should be done as a CLR change. The code gen proposed is worrisome in its complexity.

Instead the CLR could allow fixed buffers with bounds checking, and the compiler could then target that.

@xen2
xen2 commented Aug 16, 2016 edited

Would definitely like it better as a runtime feature too.

Should this discussion continue on coreclr then?
Reopen dotnet/coreclr#735 or a new issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment