Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Champion: fixed-sized buffers #1314

Open
1 of 5 tasks
jcouv opened this issue Feb 7, 2018 · 98 comments
Open
1 of 5 tasks

Champion: fixed-sized buffers #1314

jcouv opened this issue Feb 7, 2018 · 98 comments

Comments

@jcouv
Copy link
Member

jcouv commented Feb 7, 2018

Introduce a pattern that would allow types to participate in fixed statements.

LDM history:

Tagging @VSadov @jaredpar

@jcouv jcouv added this to the 8.0 candidate milestone Feb 7, 2018
@glenn-slayden
Copy link

public fixed DXGI_RGB GammaCurve[1025];

// ...
DXGI_RGB my_rgb;

How is this different from...

[StructLayout(LayoutKind.Explicit, Size = 1025)]
public struct DXGI_RGB { }

// ...
DXGI_RGB my_rgb;

@tannergooding
Copy link
Member

The latter is a struct which is 1025 bytes in size.

The former is an inline array of 1025 elements, where each element is a DXGI_RGB struct instance.

@portal-chan
Copy link

Shouldn't this

public fixed DXGI_RGB GammaCurve[1025];

be

public fixed DXGI_RGB[1025] GammaCurve;

The array thingy [] in C# always goes after the type name, not the variable name like in C/C++ so it looks kinda wrong to have it the other way around here. Unless I'm misunderstanding some part of this syntax 🤷‍♀️

@IS4Code
Copy link

IS4Code commented Feb 11, 2018

The array thingy [] in C# always goes after the type name, not the variable name like in C/C++ so it looks kinda wrong to have it the other way around here. Unless I'm misunderstanding some part of this syntax

No, fixed-size buffers use this syntax to distinguish them from standard fields. GammaCurve is a fixed-size buffer field, not a field having an array type. DXGI_RGB[1025] is not a valid type name.

However, since a special type is always generated for such a field, why not introduce a new special type syntax fixed T[size] that refers to this type?

public static fixed int[5] GetValues() => return default(fixed int[5]);

This is effectively producing a by-val array. Then the syntax fixed DXGI_RGB[1025] GammaCurve; would be reasonable and valid.

@tannergooding
Copy link
Member

Yes. Fixed-size buffers are an existing language syntax (which this proposal is extending) and are different from normal arrays (which are heap allocated and tracked by the GC). The syntax difference ensures the two features (arrays and fixed sized buffers) can be differentiated and updated independently without worrying about possible ambiguities.

@tannergooding
Copy link
Member

However, since a special type is always generated for such a field, why not introduce a new special type syntax fixed T[size] that refers to this type?

@IllidanS4, see ref returns and Span<T>. However, that requires a stackalloc to allow use within functions and has various limitations to ensure that you aren't accessing memory from a stack frame that no longer exists.

@IS4Code
Copy link

IS4Code commented Feb 11, 2018

No, there is a difference between my idea and Span<T>. Span<T> is a reference to an arbitrary location in the memory, but fixed T[size] is a by-value array. Moving a value of this type would move the whole array. In essence, fixed T[2] is like (T, T), fixed T[3] is like (T, T, T) and so on.

@tannergooding
Copy link
Member

That would be a completely separate/distinct feature and would need its own proposal.

@airbreather
Copy link

airbreather commented Feb 24, 2018

How insane would it be to "just" allow consumers to use the existing fixed stuff as Span<T>?

Picturing...

struct MyStruct
{
    public fixed ulong Values[4];
}

class Program
{
    MyStruct s = default(MyStruct);
    Span<ulong> values = s.Values;
    Console.WriteLine(values.Length); // 4
}

Seems like "all it would take" (heh) is an attribute that tells the compiler the size of the field, and then the offset to the start of the buffer from the start of the struct could be either derived from metadata if sequential / explicit layout or computed live if auto layout (maybe at JIT time? I don't know how this one would work, sorry)... combine a ref to the the struct instance itself, the offset to start of fixed-size buffer, and the length of that fixed-size buffer (from the attribute), and that's enough to make what I think is a perfectly safe Span<T>.

Is this insane?

@airbreather
Copy link

airbreather commented Feb 24, 2018

Hmm, to answer my own question, without runtime support, that (edit: "that" = just exposing an easy safe way for callers to get Span<T> over the existing fixed-size buffer stuff) doesn't feel like enough to support the "any element type" part of the proposal, since if the buffer contains reference types, the GC would need to see each element of the fixed-size buffer as a separate reference to track. So it seems like there would still be a need to do something like the transformation in the current proposal, at least for reference types.

Right?

@IS4Code
Copy link

IS4Code commented Feb 24, 2018

No, because there is nothing unsafe in the layout of the actual compiler-generated struct. Each element is represented by a separate field with the correct type, and since the whole struct is a field of the containing type, the runtime has perfect knowledge of all the references which may be stored inside.

@airbreather
Copy link

airbreather commented Feb 24, 2018

No, because there is nothing unsafe in the layout of the actual compiler-generated struct. Each element is represented by a separate field with the correct type, and since the whole struct is a field of the containing type, the runtime has perfect knowledge of all the references which may be stored inside.

Sorry, my comment was a bit ambiguous. I was positing a problem with my own Span<T> idea w.r.t. reference types and identifying this as a reason that we would still need something like the current proposal's compiler-generated struct with separate fields.

@johnwason
Copy link

A lot of C++ libraries have support for specifying the size of fixed arrays using template arguments. The fixed arrays are still allocated on the stack, but the size is specified at compile time. For instance, the armadillo library has the mat::fixed<n_rows, n_cols> class. (Reference) Having this capability in C# would be very useful for linear algebra applications.

@4creators
Copy link

A lot of C++ libraries have support for specifying the size of fixed arrays using template arguments.

@johnwason See [Proposal] Const blittable parameter as a generic type parameter - constexpr used for blittable type parametrization

@Thealexbarney
Copy link

I've played around with working around these limitations in order to operate directly on blittable data structures. It becomes more work to maintain once you start nesting structures or arrays of structures, but it's functional.

Simple example:

public class BlittableStructReader
{
    private Memory<byte> _data;

    private ref Layout Data => ref Unsafe.As<byte, Layout>(ref _data.Span[0]);

    public Span<byte> FixedArray1 => _data.Span.Slice(0, 0x100);
    public Span<byte> FixedArray2 => _data.Span.Slice(0x100, 0x100);

    public uint Field1
    {
        get => Data.Field1;
        set => Data.Field1 = value;
    }

    public uint Field2
    {
        get => Data.Field2;
        set => Data.Field2 = value;
    }
    
    [StructLayout(LayoutKind.Explicit, Size = 0x800)]
    private struct Layout
    {
        [FieldOffset(0x200)] public uint Field1;
        [FieldOffset(0x204)] public byte Field2;
    }
}

@MadsTorgersen MadsTorgersen moved this from 8.0 Candidate (not started) to 9.0 Candidate in Language Version Planning Apr 29, 2019
@gafter gafter modified the milestones: 8.0 candidate, 9.0 candidate Apr 29, 2019
@gafter gafter removed this from the 9.0 candidate milestone Jun 5, 2019
@DeafMan1983
Copy link

DeafMan1983 commented Jan 30, 2024

Hello why do you use Span?
I really don't know about Span
But why not with fixed buffer like example:
in C

typedef struct
{
    float X, Y;
} Vec2;
...
typedef struct
{
    Vec2 vert[4];
    float distFromCamera;
    int planeIdInPoly;
} ScreenSpacePoly;

In C#:

public struct Vec2
{
    public float X, Y;
}
...
public strcut ScreenSpacePoly
{
    [FixedBuffer(typeof(Vec2), 4)] <- Error line shows....
    public FixedBufferOfVec2_4<Vec2> vert;
    [CompilerGenerated, UnsafeValueType, StructLayout(LayoutKind.Sequential, Pack = 0)]
        public struct FixedBufferOfVec2_4<T>
        {
            private T value;
            public T this[int i] => i <= 4 ? Unsafe.Add(ref Unsafe.AsRef(in value), i) : throw new IndexOutOfRangeException();
            public Span<T> Span => MemoryMarshal.CreateSpan(ref value, 4);
        }

    public float distFromCamera;
    public int planeIdInPoly;
}

Is it correct or wrongly? But how do I get limited index example: No more than 4?

Thanks!

@zezba9000

This comment has been minimized.

@DeafMan1983
Copy link

@DeafMan1983 You have to use "InlineArray" type in C# 12 or above. MS is afraid to update their poorly outdated IL feature set so they revert to C# lang hacks instead of introducing proper C# feature improvements. The reason behind this is from library loading/linking between libs compiled with different C# versions.

For example if I have a new project which loads older C# lib. Fixed fields build out as a fixed buffer size not handled by the JIT to say its size (which was a mistake). Now if I change how fixed field buffers work the old lib may have code laid out in a way thats not longer compatible. But really the C# lang should have been updated to deal with fixed field primitives differently from fixed field custom types instead of adding in a hacky attribute type. At min this should have been auto generated for you.

When .NET Core was being made they should have just broken old library support from .NETFW libs. This poor choice now means we are left with poorly designed choices from the early 2000's in an array of areas. It would have been so much nicer to have been forced to re-compile stuff with newer compilers for newer runtime models.

What do you mean InlineArray()?

But I find horrible but I don't know how do I pass with fixed struct ptrs[4] or just ... = new struct[4]?

But I want know that.

@sgf
Copy link

sgf commented Feb 1, 2024

The implementation of InlineArray is of little value, and it may even be unnecessary to implement it, since you can use a source code generator to generate a structure replacement yourself.

The source code generator looks like this:

StringBuilder sb=new();
for(int i=0;i<size;i++)
sb.AppendLine($" [FieldOffset(i*sizeOfStruct)] public {BaseStructType} p{i} {Enviroment.NewLine};")

var inlineArrayCode=@"
namespacexxx;

[StructLayout(LayoutKind.Explicit, Pack = 1)]
struct Buffer_{BaseStructType}_{Size}{
{sb.ToString()}
}";


spc.AddSource($"FixedSizeBuffer.g.cs", inlineArrayCode);

@tannergooding
Copy link
Member

The source code generator looks like this

This is not strictly equivalent at the ABI level and may break for interop on some future platforms, where-as InlineArray is specially supported by the runtime to do the correct things.

Additionally, it is missing the general language integration, optimizations, and other features that are present for InlineArray making it an overall worse option

@DeafMan1983
Copy link

@tannergooding you are right. But I wish to have fixed struct in C# ( 13 language version ) but why I need to use InlineArray how do you pass if you use managed struct like System.Numerics.Vector2 as public variable of struct
public fixed Vector2 vert[4]; instead struct. Why does C# make complications? I expect like C =/= C# are close same. Thanks for everything!

@zezba9000
Copy link

zezba9000 commented Feb 1, 2024

@DeafMan1983 You have to do this. Yes it sucks and is verbose but at least it works.

using System;
using System.Numerics;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

[InlineArray(4)]
struct FixedArray
{
	private Vector2 element;
}

[StructLayout(LayoutKind.Sequential)]
struct Buff
{
	public FixedArray a;
}

@tannergooding
Copy link
Member

The reason that simply private fixed Vector2 a[4]; doesn't work is because the pre-existing fixed-size buffers feature has existed for many years (back to C# 1.0, afair) and thus has existing semantics, meaning, typing, etc.

Changing that meaning 20 years into the game would've been breaking, especially for the large amounts of foundational interop code that has existed and been in use up until this point. It's unfortunate, but that's simply how all programming languages work. You cannot simply break back-compat on a whim due to potential of large downstream breaks and other issues.

The new InlineArray feature works, has the necessary language integration, and avoids many of the complications that the older fixed-size buffers feature had. Given that the primary usage of this is in low-level or foundational code, the minor additional verbosity is worth the tradeoff.

The language is then free to get newer features in the future which may build upon InlineArray or use it behind the scenes as an implementation detail. This might be used for params Span<T> or some types of collection expressions, for example.

@sgf
Copy link

sgf commented Feb 1, 2024

The reason that simply private fixed Vector2 a[4]; doesn't work is because the pre-existing fixed-size buffers feature has existed for many years (back to C# 1.0, afair) and thus has existing semantics, meaning, typing, etc.

But they are syntactically compatible.

It can even be said that from a grammatical perspective, the C# designers took this into consideration from the beginning and reserved the possibility of implementation.

@CyrusNajmabadi
Copy link
Member

But they are syntactically compatible.

But not semantically. We don't want literal identical syntactic constructs to have different semantic meaning like this. We considered that as part of the work, and quickly determined there were too many problems going down that route.

@sgf
Copy link

sgf commented Feb 1, 2024

This reminds me of a problem that I have not understood for a long time. Custom unmanaged structs are obviously similar to cl-predefined structures (int16, int32...), but why does the compiler prevent them from being defined as constants?
But strings are reference types and can be defined as constants.
Perhaps this is the difference between custom unmanaged struct and clr-predefined structure.

In other words, the compiler does some special processing for them. But why can’t this special treatment be smoothed over and treated equally? Make custom unmanaged structures first-class citizens.

@sgf
Copy link

sgf commented Feb 1, 2024

But they are syntactically compatible.

But not semantically. We don't want literal identical syntactic constructs to have different semantic meaning like this. We considered that as part of the work, and quickly determined there were too many problems going down that route.

I'm not sure what the semantic difference is.
Because in my opinion,There is no difference in memory between a custom unmanaged structure and a Clr-predefined structure.
Essentially, they are just fixed-size blocks of memory.

@DeafMan1983
Copy link

DeafMan1983 commented Feb 1, 2024

@DeafMan1983 You have to do this. Yes it sucks and is verbose but at least it works.

using System;
using System.Numerics;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

[InlineArray(4)]
struct FixedArray
{
	private Vector2 element;
}

[StructLayout(LayoutKind.Sequential)]
struct Buff
{
	public FixedArray a;
}

How do I pass with Array of Vector2 or Pointer of Vector2?
Example:

    [InlineArray(4)]
    public struct FixedBufferVec2_4
    {
        private Vector2 vert;
    }

    public struct ScreenViewport
    {
        public FixedBufferVec2_4 vert;
    }

    static void Main()
    {
        Vector2[] verts = [
            new Vector2(-40, 40),
            new Vector2(40, 40),
            new Vector2(40, -40),
            new Vector2(-40, -40)
        ];

        ScreenViewport svp = new()
        {
            vert = verts // still error
        };

        fixed (Vector2* vertptrs = verts)
        {
            svp.vert = verts; // still error, too
        } 
    }

How do I know like I communicate with FixedBuffer? You mean I need add Span and ReadOnlySpan?

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Feb 1, 2024

but why does the compiler prevent them from being defined as constants?

Because from the language level, they are not constants. Someone would have to spec out what this means at the lang level in order for the compiler to allow it. That hasn't been done yet. Why? Because no one has done it. :)

But strings are reference types and can be defined as constants.

From the language level it's totally fine for reference types to be constants.

But why can’t this special treatment be smoothed over and treated equally?

Because no one has written a spec for how that would work. We'd need to start with a discussion on the lang design there. Then it would need to be championed. Then a good spec would have to be made. Then it would have to get implemented.

Nothing is stopping this, except the enormous amount of work above.

@sgf
Copy link

sgf commented Feb 1, 2024

but why does the compiler prevent them from being defined as constants?

Because from the language level, they are not constants. Someone would have to spec out what this means at the lang level in order for the compiler to allow it. That hasn't been done yet. Why? Because no one has done it. :)

But strings are reference types and can be defined as constants.

From the language level it's totally fine for reference types to be constants.

But why can’t this special treatment be smoothed over and treated equally?

Because no one has written a spec for how that would work. We'd need to start with a discussion on the lang design there. Then it would need to be championed. Then a good spec would have to be made. Then it would have to get implemented.

Nothing is stopping this, except the enormous amount of work above.

Drawing on the development history of other languages, once a language deviates during its development, it must either turn the tide and correct it as soon as possible, or keep the error and continue to patch it.
Judging from the development of C++, keeping errors + patching is obviously not a good choice.
Judging from the development of python (python2-python3), it may be a good idea to overturn some wrong things in stages and start anew.

Keeping bugs + patching = means counterintuitive.

Although C# is rich in features, it also has many implicit rules. Most of them are issues left over from history.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Feb 1, 2024

@sgf I don't really have any idea what you're asking for :)

We have no intent on breaking the world with our new language versions. At the same time, we are continuing to improve and develop on the features we have (including iterating on ones recently shipped). If there are changes you want to see, please open discussions on them and we can follow the process we have for doing language development and improvements.

@sgf
Copy link

sgf commented Feb 1, 2024

Whether this feature is implemented or not, I no longer care. Thanks to @CyrusNajmabadi for answering all my questions.
I have created an open source library https://github.com/sgf/BitX
This repository contains two features that are not actively implemented in C#, but are indeed very important:

  1. Bit-Field
  2. Custome Struct Fixed-Size-Buffer

Unfortunately, the warehouse has not yet been fully tested and is only a prototype.
I currently don't have enough time to polish it up and write all the tests.
But if anyone likes it, can refer to it.

A key part of the repository: the use of the source code generator was completed under the guidance of @CyrusNajmabadi. Thanks to him for teaching me how to use source code generators.

@tannergooding
Copy link
Member

Because in my opinion,There is no difference in memory between a custom unmanaged structure and a Clr-predefined structure.
Essentially, they are just fixed-size blocks of memory.

To be a little blunt here, there isn't an opinion on this topic and you're fundamentally incorrect.

Every single architecture (such as x86 vs x64 vs Arm32 vs Arm64 vs RISC-V, vs LoongArch vs PowerPC vs ...) and Operating System (such as Windows vs Linux vs MacOS vs Android vs iOS vs ...) has what is known as an ABI or Application Binary Interface that defines the fundamentals of what is considered a "primitive", the semantics of those primitive types, how those types can be composed together to form larger data structures, and the semantics of how that data is laid out in memory, passed into or returned from methods.

The base platform ABI is typically defined by and around C and every other language in the world must interop with this base ABI in some fashion. Higher level languages, including C++, which have additional concepts may then expand that ABI with additional details/semantics on how they behave (this can include exception handling, memory handling, initialization of global or static state, etc).

Some languages largely don't care about interop with the base platform ABI and go and do effectively their own thing. This can be particularly prevalent when working with higher level language constructs. Other languages care deeply about interop and utilize this to allow what is typically fast and near seamless integration between the higher level language and the base ABI.

C# tends closer to the latter camp and is itself designed not only to be familiar to the rest of the C family of languages, but also to allow easy interop with C.


Types like int, long, float, or double are primitives and have a well-defined constant size that cleanly maps into the underlying C ABI primitives defined for most platforms/architectures. However, not everything about them is constant and things like "natural alignment" or "natural packing" can differ from platform to platform.

Once you start getting into user-defined structs, you are composing multiple primitive types together and thus you start having to get into additional considerations around layout that can not be static and thus their size cannot be constant. For example, there are common ABIs where int M() and S M() given struct S { int x; } are semantically different and result in a different set of parameter passing rules. Almost all platforms treat int M() as returning int directly in register, however some platforms treat S M() even for a trivial wrapper type like struct S { int x; } as requiring a hidden "return buffer" and thus the underlying raw calling convention may look like ref S M(out S s).

Likewise, there is a difference between struct S { long x; int y; int z; } and struct S1 { long x; int y; } struct S2 { S1 s; int z; } because the size of S1 each struct must be considered in isolation so that arrays and other layouts are consistent. Thus, for S most 64-bit architectures will treat it as Size=16, Alignment=8, Packing=8. While for S1 you get Size=16, Alignment=8, Packing=8 (because it has to have 4 implicit padding bytes at the end to maintain the natural alignment of long x;) and S2 gets Size=24, Alignment=8, Packing=8 (because int z; has to start after S1 which was 16 bytes big and then the entire struct needs to add 4 more padding bytes at the end to maintain the natural alignment).

There are also cases like struct S { int x; long y; } where this can be Size=12, Alignment=4, Pack=4 on some 32-bit platforms but it can also be Size=16, Alignment=8, Pack=8 on other 32-bit platforms or on 64-bit platforms. There are even architectures where void M(int x); vs void M(uint x); have different ABI semantics because of implicit sign or zero-extension that parameter passing requires.


Effectively, there are a ton of rules that exist here and C# needs to support targeting platforms, like IL, where you might have an abstract virtual machine and where the actual layout is computed later (such as by the JIT, AOT, or even an object linker). Even C/C++ often operates the same way with its compilation process as the source files are typically compiled to intermediate objects and then linker takes in those object files and may do final layout computations and some architecture specific optimizations, trimming, etc.

So, C# can't just go and make fixed Vector2 x[4] work using the existing semantics of fixed-size buffers. It can't just trivially make it start working the same way as InlineArray either due to breaking changes that can incur. The language design team carefully considered what was feasible, what could cover the majority use-case for an already niche feature with a little bit of effort, how it would impact the ecosystem, etc.

The decision ended up with where InlineArray is today, and no not everyone is going to be happy. There will be people who complain and say that the language should be more willing to take breaking changes, people who say the language is going in the wrong direction, people saying every negative comment you could think of.

But ultimately, there is always some subset of users that do this for every feature (and often different users each time). Even some of the most loved and uncontroversial C# features have had people coming in and complaining about them and saying it was done wrong. However, overall the language design team makes good decisions and the language continues to grow in popularity, general user love, and places it can be used. So they are likely most things consistently right ;)

@sgf
Copy link

sgf commented Feb 1, 2024

Because in my opinion,There is no difference in memory between a custom unmanaged structure and a Clr-predefined structure.
Essentially, they are just fixed-size blocks of memory.

To be a little blunt here, there isn't an opinion on this topic and you're fundamentally incorrect.

Every single architecture (such as x86 vs x64 vs Arm32 vs Arm64 vs RISC-V, vs LoongArch vs PowerPC vs ...) and Operating System (such as Windows vs Linux vs MacOS vs Android vs iOS vs ...) has what is known as an ABI or Application Binary Interface that defines the fundamentals of what is considered a "primitive", the semantics of those primitive types, how those types can be composed together to form larger data structures, and the semantics of how that data is laid out in memory, passed into or returned from methods.

The base platform ABI is typically defined by and around C and every other language in the world must interop with this base ABI in some fashion. Higher level languages, including C++, which have additional concepts may then expand that ABI with additional details/semantics on how they behave (this can include exception handling, memory handling, initialization of global or static state, etc).

Some languages largely don't care about interop with the base platform ABI and go and do effectively their own thing. This can be particularly prevalent when working with higher level language constructs. Other languages care deeply about interop and utilize this to allow what is typically fast and near seamless integration between the higher level language and the base ABI.

C# tends closer to the latter camp and is itself designed not only to be familiar to the rest of the C family of languages, but also to allow easy interop with C.

Types like int, long, float, or double are primitives and have a well-defined constant size that cleanly maps into the underlying C ABI primitives defined for most platforms/architectures. However, not everything about them is constant and things like "natural alignment" or "natural packing" can differ from platform to platform.

Once you start getting into user-defined structs, you are composing multiple primitive types together and thus you start having to get into additional considerations around layout that can not be static and thus their size cannot be constant. For example, there are common ABIs where int M() and S M() given struct S { int x; } are semantically different and result in a different set of parameter passing rules. Almost all platforms treat int M() as returning int directly in register, however some platforms treat S M() even for a trivial wrapper type like struct S { int x; } as requiring a hidden "return buffer" and thus the underlying raw calling convention may look like ref S M(out S s).

Likewise, there is a difference between struct S { long x; int y; int z; } and struct S1 { long x; int y; } struct S2 { S1 s; int z; } because the size of S1 each struct must be considered in isolation so that arrays and other layouts are consistent. Thus, for S most 64-bit architectures will treat it as Size=16, Alignment=8, Packing=8. While for S1 you get Size=16, Alignment=8, Packing=8 (because it has to have 4 implicit padding bytes at the end to maintain the natural alignment of long x;) and S2 gets Size=24, Alignment=8, Packing=8 (because int z; has to start after S1 which was 16 bytes big and then the entire struct needs to add 4 more padding bytes at the end to maintain the natural alignment).

There are also cases like struct S { int x; long y; } where this can be Size=12, Alignment=4, Pack=4 on some 32-bit platforms but it can also be Size=16, Alignment=8, Pack=8 on other 32-bit platforms or on 64-bit platforms. There are even architectures where void M(int x); vs void M(uint x); have different ABI semantics because of implicit sign or zero-extension that parameter passing requires.

Effectively, there are a ton of rules that exist here and C# needs to support targeting platforms, like IL, where you might have an abstract virtual machine and where the actual layout is computed later (such as by the JIT, AOT, or even an object linker). Even C/C++ often operates the same way with its compilation process as the source files are typically compiled to intermediate objects and then linker takes in those object files and may do final layout computations and some architecture specific optimizations, trimming, etc.

So, C# can't just go and make fixed Vector2 x[4] work using the existing semantics of fixed-size buffers. It can't just trivially make it start working the same way as InlineArray either due to breaking changes that can incur. The language design team carefully considered what was feasible, what could cover the majority use-case for an already niche feature with a little bit of effort, how it would impact the ecosystem, etc.

The decision ended up with where InlineArray is today, and no not everyone is going to be happy. There will be people who complain and say that the language should be more willing to take breaking changes, people who say the language is going in the wrong direction, people saying every negative comment you could think of.

But ultimately, there is always some subset of users that do this for every feature (and often different users each time). Even some of the most loved and uncontroversial C# features have had people coming in and complaining about them and saying it was done wrong. However, overall the language design team makes good decisions and the language continues to grow in popularity, general user love, and places it can be used. So they are likely most things consistently right ;)

I think languages should define their own standards system independently of the platform.
When interacting with the platform, make appropriate conversions (if necessary).
In this way, the structure of the language system itself is constant. When we discuss int, it must be int32 in little endian order, not others.

When I send network data using little-endian data on MAC-OS, I don't need to convert it to big-endian because my sender (C# MAC-OS App) and receiver (x86WinServer) are both little-endian.
Even though my client is running on MAC-OS, the IP address and port here need to be endian converted only when setting the IP address and port before sending (assuming I call the MAC-OS-API directly).

Of course as you said, the ABI needs to be taken into account when appropriate. This should be done at the OS-API interaction layer for interaction and encapsulation. But the interaction and encapsulation here should not affect the structure definition inside the language (of course, if the performance problem is caused by io-intensive operations, it needs to be dealt with specifically, such as directly defining ABI-compatible structures inside the language and operating with them. , which should also leave customization capabilities for such scenarios).
If the library writer needs to interact with the API, then they should decide how to be OS-ABI compliant, which is probably the purpose of something like PInvoke. This should result in a set of specifications and platform-specific processing tools.

Knowledge points related to assembly such as registers. I'm really not know much about this. As far as I know, on x86, the compiler can decide how to passing parameters Because I know that Delphi and VC use registers for passing parameters slightly differently (it seems that Delphi uses registers, while VC uses push and pop), so I can conclude that if it is not necessary to expose interaction points (for example, on the windowsx86 platform In the case of exported functions), the compiler and language internals can maintain their own architecture.

Of course, I'm not sure if this involves the underlying complexity of the compiler compiling IL to asm.I don't know enough about the underlying content, thanks for the guidance.

@tannergooding
Copy link
Member

I think languages should define their own standards system independently of the platform.

Like I mentioned, some platforms do this. Even C#/.NET do this for some concepts that have no common mapping in C, like shared generics. However, this is typically limited to only concepts where that's required and I'll go into that a bit further down.

When interacting with the platform, make appropriate conversions (if necessary).

I think you grossly underestimate the overall impact of deviating from the underlying platform. Any call into native must make a transition and fixup the handling. This can and typically does happen for almost any underlying operation including things like:

  • Memory Management
  • File IO
  • Networking
  • Device Interaction
  • Threading
  • etc

Thus, a typical runtime is typically making many of these transitions per second and the overhead can rapidly add up and lead to pessimizations of your code in terms of overall usability and perf. Additionally, you are more likely to run into fundamental incompatibilities with the underlying architecture and your code actually becomes less portable by trying to make it more stringent.

When I send network data using little-endian data on MAC-OS, I don't need to convert it to big-endian because my sender (C# MAC-OS App) and receiver (x86WinServer) are both little-endian.

You're making an assumption about the sender/receiver here and in practice cannot do that. There are typically dozens of machines involved with any network request and that need to inspect packet headers to ensure the data is passed from point A to point B correctly and efficiently.

We define a fixed standard because that ensures all machines interacting with this data can understand it correctly. We define it in a way that allows efficient interaction of the data with more primitive hardware. We define it in a way accounting for the overhead of network transfers and understanding that the minimal changes to fixup byte ordering are "free" on most modern CPUs and can be done as part of the load/store operation, so the endianness conversion is typically a non-issue.

Of course as you said, the ABI needs to be taken into account when appropriate. This should be done at the OS-API interaction layer for interaction and encapsulation

As indicated above, the cost of this is non-trivial and incredibly frequent. Even for cases where you simply need to change the registers around, you're incurring a cost per argument/return value in order to shuffle everything around given typically limited register space. When you start talking about composed data structures, you're now going to incur a cost per-field in many cases and thus you're going to be allocating, copying, and freeing much more memory in order to execute what would otherwise be these simple ABI calls.

Unlike something such as endianness fixup, the cost of shuffling around registers and touching memory has a real cost and that cost rapidly adds up. It can also confuse the CPU, which often has support for more typical coding patterns, and lead to less efficient execution of your code (energy and perf-wise).

As far as I know, on x86, the compiler can decide how to passing parameters

Not really. There are some cases where a non-public leaf method (that is a method which calls nothing else) with few callers may get specialized. However, this also breaks debugging, stack traces, and other considerations so it's not typically done. The perf benefits of it are also incredibly minute.

Some platforms, particularly legacy C code on 32-bit x86, do have multiple calling conventions available (cdecl vs stdcall vs thiscall vs fastcall vs varargs vs ...). However, this is mostly due to unimportant historical reasons and isn't something regularly encountered more broadly. Mixing these calling conventions is typically avoided where possible due to the costs mentioned above. -- Windows OS APIs on 32-bit x86 for example almost exclusively use stdcall and if you don't follow stdcall when invoking the function, the function won't be able to understand the data you pass in.


Now, it would be possible to define, for the purposes of constants over user-defined types, a fixed layout for the IL definitions of such data. But that must then be paired with a special runtime API that the compiler can defer to so that data can be recognized and fixed up on any platforms where the layout differs.

But, that is very different from treating something like sizeof(T) as a constant, which is notably largely unnecessary given that the JIT/AOT will already convert it to a constant instead. C#->IL is really only the first step of the compilation process and many of the interesting optimizations are really happening as part of the IL->asm phase done by the JIT/AOT/etc. This is very similar, at least in concept, to C->obj (compiler) and the obj->asm (linker).

@DeafMan1983
Copy link

If I understand correct then ABI works and understands functions like 00000011 like Binary Hexadecimal.
Example: Assembly ( asm ) like nasm or fasm etc
Boot loader is similar to start and load image like 8-bits .

Of course fixed types like float,double, bye and more ( only numeric types ) are easy format. Like @tannergooding said.

Example: in C style:
#define unsigned long HWND

For C# style: You need to create structure with implicit operator.

Imagine about fixed HWND

typedef {
    HWND wins[4];
    int n_wins;
} AllWindows;

In C#

public unsafe struct AllWindows
{
    public fixed HWND wins[4];
    public int n_wins;
}

If C# Analyser doesn't like it than we use InlineArray

[InlineArray(4)]
public struct FixedBufferWins
{
    private HWND hwnd;
    // ... ToSpan etc
}

public unsafe struct AllWindows
{
    public FixedBufferWins wins;
    public int n_wins;
}

Thanks for explanation of longer texts.

@zezba9000
Copy link

zezba9000 commented Feb 3, 2024

The one thing that makes zero sense to me is why this approach wasn't automated in a way that makes sense to 99% of people.
You could have added at a C# level support for fixed array classes and structs that just auto-generated this special FixedBuffer struct with the attribute for us (as so many other C# features do [closures, async etc]).

Older compilers would fail to compile new code as has always been the case and no IL changes needed ect. Same approach just with syntax sugar (every wants btw) making it WAY more practical and less confusing syntax. It also encourages its use. I knew this was going to happen. The new syntax expects you to understand underlining runtime details which is horrible. I like understanding how things work too but man I only have so much time to learn every little cork in something because of legacy IL flaws. I do encourage you to consider adding syntax sugar to your solution here.

Also I think you're (ie C# devs) assume a chicken and egg problem here. You think people don't use it because it doesn't exist (so how could they) and now it exists in a way thats not useful for anything but the rarest of things because its out of phase with code when in reality this type of optimization is desired in many cases not just interop. There have been many cases where just having a fixed array of classes on the stack or heap would be useful (forget interop). The approach now only continues to encourage allocations in situations otherwise not needed because the effort needed is annoying and clutters things up.

Anyway everyone knows who I am. I've voiced this for years. Voicing it again because I'm clearly not the only one here that thinks this. Stuff like this in C# is what separates it from other managed langs giving it a balance of performance with practicality / productivity. In this case its kinda failed IMO.

@HaloFour
Copy link
Contributor

HaloFour commented Feb 3, 2024

@zezba9000

The one thing that makes zero sense to me is why this approach wasn't automated in a way that makes sense to 99% of people.

Because there would be too many places where the fact that they aren't arrays would cause problems:

https://github.com/dotnet/csharplang/blob/main/meetings/2023/LDM-2023-04-10.md#fixed-size-buffers
https://github.com/dotnet/csharplang/blob/main/meetings/2023/LDM-2023-05-01.md#fixed-size-buffers

That doesn't preclude that it may be considered again, but until those issues can be smoothed out, they're not going to introduce a syntax that only creates more confusion and friction.

@tannergooding
Copy link
Member

The topics of why it was done this way was also covered in depth on the various issues, discussion threads, and specifically in depth with you on the .NET Evolution discord.

There are future directions the language desires to go and this feature is the effectively raw backing runtime feature that makes them possible. Doing "more" around the feature, rather than the minimal support up front (much as is done for other core runtime features like say LayoutKind.Explicit, or P/Invoke, or ...) would have potentially delayed the feature for another release or more. It would have potentially burned syntax that was desired for such future features due to it becoming a breaking change (much as has already been covered as being breaking for fixed-size buffers, etc).

@CyrusNajmabadi
Copy link
Member

The one thing that makes zero sense to me is why this approach wasn't automated in a way that makes sense to 99% of people.

ALl of this was documented. And you have been involved in discussions going over the aspects here. Furthermore, we've made it clear we're interested in continuing investments in this area, but that we needed the core support there to make many important scenarios possible, and to lay the foundation of that future work. You are seemingly acting as if the intent on our part was to only ship in the current form and never do anything else here. Hopefully stating this here, as well as the other forum locations will make this clear :)

@zezba9000
Copy link

zezba9000 commented Feb 3, 2024

Ok guys, I know I'm annoying. But you need to realize you're literally in these talks 24/7. This isn't my day job as I've said before. Lots of people are confused in different aspects here for an array of reason. I'm just more vocal about this. Most devs I know just move onto a different lang for reasons like this but I see C# as still the most useful lang today in my life. So sorry if I cry about it.

And please understand most of what is translated to me over the years that comes in every so often in little bits on this topic is something along the lines of (solution found no reason to pursue further or its not a feature worth looking into to much more because no one cares, its a niche etc). Thats changed since years ago and I understand I don't know all the little runtime details making this annoying for you guys to tackle at all. However I'll read arguments for why something can't be done and it reads like its impossible to improve it or it just sounds fundumentally wrong to me because I'm missing yet another complex legacy IL or runtime flaw or complexity I just haven't thought about. If I'm wrong here I'm sorry for what comes of as pestering but again I can't read peoples minds and have to make some level of deduction when I read new information the way things are said. I'm sure I've earned myself a dark star of disapproval here but I've dug a hole far enough a couple more scoops isn't going to matter.

With the links @HaloFour shared what does "if a public fixed size buffer shrinks" mean? How would they shrink? Or does this just mean GC collected?

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Feb 3, 2024

Most devs I know just move onto a different lang for reasons like this

Honestly, please consider that your personal experiences may not at all reflect anything about the greater programming community. We do this sort of analysis, and we're ok with the plan here and how it is being executed. I get that you don't like it and you want different things, but continuing to harp on it, and continuing to ignore the very real and relevant reasons given for why things were done this way is not productive.

but I see C# as still the most useful lang today in my life. So sorry if I cry about it.

Instead of crying about it, please just constructively contribute to the future design discussions that overlap these areas.

but again I can't read peoples minds

No one is asking for that. But it's not constructive to continually act as if there was no thought or information provided as to why things were done a particular way, after numerous conversations on the topic. We're continually open and transparent about the decision making that went into this. Including for the short and long terms. You may not like that reasoning, but rejecting it out of hand does not serve any sort of constructive purpose.

what does "if a public fixed size buffer shrinks" mean?

The code is updated to have a smaller fixed size buffer. This clearly would have significant downstream impacts on consumers (who have hardcoded knowledge about that length.

@zezba9000
Copy link

your personal experiences may not at all reflect anything about the greater programming community.

100% however most people I know and have worked with don't interface in the "community" withholding their feelings. To the point of it being detrimental to a product like bug reporting something they need fixed. I tend to bug report issues on products I use.

continuing to ignore the very real and relevant reasons given for why things were done this way is not productive.

Understood. I really do apologies for this. I really should hold back more than I do sometimes.

continually act as if there was no thought or information provided

Some info I've gotten in the past was not consistent and initially this feature came off as a hard no back in the day. Maybe that stuck with me wrong IDK.

Anyway thanks for your work. You guys really are a good team for all my distasteful criticism I give.

@sgf

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests