Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide alignment guarantees for embedded binary data #61259

Open
lorcanmooney opened this issue May 11, 2022 · 13 comments
Open

Provide alignment guarantees for embedded binary data #61259

lorcanmooney opened this issue May 11, 2022 · 13 comments

Comments

@lorcanmooney
Copy link
Contributor

dotnet/runtime#60948 requires that embedded data should be aligned to match the pimitive data type, but even in the absence of that feature, it would be good to provide some sort of alignment guarantees for raw byte data. This would allow us to overlay a ReadOnlySpan of custom structs on the embedded data.

I imagine 8-byte alignment would be the minimum, but I can imagine anything up to 64-bytes being useful.

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged Issues and PRs which have not yet been triaged by a lead label May 11, 2022
@Sergio0694
Copy link
Contributor

This would be very useful for blitting GUIDs over static spans, which is eg. done extensively in TerraFX.Interop.Windows (cc. @tannergooding). This also came up in microsoft/CsWinRT#1058 (comment).

@jcouv
Copy link
Member

jcouv commented Jun 15, 2022

We have a PR that's updating the compiler to make use of new CreateSpan(FieldHandle) API, which will allow optimizations on (ReadOnlySpan<TYPE>)new TYPE[] { ...constants... } to kick in for other types than byte.
As part of that PR, we're doing two things:
1. verifying the alignment of the blobs emitted (currently we always align to 8-bytes, but this could become more granular in the future, ie. compacting more while respecting the alignment required by a given type)
2. emitting .pack flags that capture the alignment requirement for each blob (this is used by IL rewriters)

I believe those two address your needs described above. Let me know if that's not the case.

@jcouv jcouv added this to the C# 11.0 milestone Jun 15, 2022
@jcouv jcouv removed the untriaged Issues and PRs which have not yet been triaged by a lead label Jun 15, 2022
@jcouv
Copy link
Member

jcouv commented Jun 15, 2022

Retracted my comment above. Upon re-reading, I think you're asking about alignment guarantees outside of this scenario.
Could you provide a specific example to make sure we're on the same page?

@Sergio0694
Copy link
Contributor

So, not sure about OP, but for my use case scenario, as well as TerraFX, consider this:

public static ref readonly Guid CLSID_D3D12Debug
{
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    get
    {
        ReadOnlySpan<byte> data = new byte[] {
            0xEB, 0x2A, 0x35, 0xF2,
            0x84, 0xDD,
            0xFE, 0x49,
            0xB9,
            0x7B,
            0xA9,
            0xDC,
            0xFD,
            0xCC,
            0x1B,
            0x4F
        };

        return ref Unsafe.As<byte, Guid>(ref MemoryMarshal.GetReference(data));
    }
}

This is used extensively in TerraFX (like, there's thousands of these), and it's a neat trick that can completely get rid of static constructors in cases like this, which greatly improves startup performance, and also reduces binary size. Now, this is technically undefined behavior given that a GUID should be aligned to 8 bytes boundaries. This can be worked around by generating the blittable sequence as a span of two long-s, as that'd then get the correct alignment for the RVA field (this idea was from @AaronRobinsonMSFT). This is kinda clunky thought, as you'd have to manually compute the right constants by combining groups of bytes into long-s, plus it wouldn't really be flexible (nor allow greater alignment if needed, say if someone wanted to build a complex lookup table to then load with SIMD instructions, where you might want alignment to 64, 128 or 256 bytes boundaries, or whatever).

As far as I understand, right now .NET 7/C# 11 will only provide alignment guarantees for the primitive used in the span, which means it's restricted to the supported primitive types, and it has the issue mentioned above in case you want a sequence of just bytes, but with an alignment greater than 1. It would be nice to be able to somehow just indicate the byte alignment for the generated .text field for a blitted span 🙂

cc. @tannergooding

@tannergooding
Copy link
Member

An example would be, for example, declaring the below, but providing some way for it to have 16-byte alignment

static ReadOnlySpan<uint> Data => new uint[] { 0, 1, 2, 3 };

This would allow it to be used with Vector128<uint> and aligned loads. There are other cases like Int128, UInt128, and other user-defined structs where one might want a higher than default guarantee on alignment.

I don't think this is strictly crucial for .NET 7, particularly given it would be hard to describe such arbitrary alignment support in all the relevant places. But it may be a good future consideration.

@tannergooding
Copy link
Member

Now, this is technically undefined behavior given that a GUID should be aligned to 8 bytes boundaries

Just noting @Sergio0694, Guid on Windows only needs/expects 4-byte alignment since it is a struct Guid { uint _x; ushort _y; ushort _z; fixed byte _w[8]; } effectively.

@Sergio0694
Copy link
Contributor

Sergio0694 commented Jun 15, 2022

Ah, well then we only need to use a ReadOnlySpan<int> and reinterpret that, though still clunky ahah 😄
It would really be nice to just be able to indicate arbitrary alignment in some way.

Completely spitballing here, but I can imagine just using an attribute, though there's the issue that C# currently doesn't support attributes on locals (even though AFAIK they're allowed in IL as per ECMA spec, right?). That is, something like this:

[Align(16)]
static ReadOnlySpan<uint> Data => new uint[] { 0, 1, 2, 3 };

// And for locals
public static ref readonly Guid CLSID_D3D12Debug
{
    get
    {
        [Align(4)]
        ReadOnlySpan<byte> data = new byte[] {
            0xEB, 0x2A, 0x35, 0xF2,
            0x84, 0xDD,
            0xFE, 0x49,
            0xB9,
            0x7B,
            0xA9,
            0xDC,
            0xFD,
            0xCC,
            0x1B,
            0x4F
        };

        return ref Unsafe.As<byte, Guid>(ref MemoryMarshal.GetReference(data));
    }
}

Just saying, it'd be neat if something like this was looked at in the future 😄

This would be kinda similar to how you can __declspec(align(N)) in C++ etc.

@jkotas
Copy link
Member

jkotas commented Jun 15, 2022

Ah, well then we only need to use a ReadOnlySpan and reinterpret that, though still clunky ahah

That still does not fully address the problem since the Guids created this way are non-portable. They won't work on big-endian machines.

dotnet/designs#46 proposal tried to fully address the problem. If we want to do something here, it should be something along these lines.

@Sergio0694
Copy link
Contributor

Sergio0694 commented Jun 15, 2022

"That still does not fully address the problem since the Guids created this way are non-portable. They won't work on big-endian machines."

Oh, yeah no, of course. To clarify, that specific comment was just meant to apply in the context of TerraFX.Interop.Windows, where big-endian is not an issue anyway as it's not one of the supported platforms. That would not be a general solution, absolutely 👍

@tannergooding
Copy link
Member

tannergooding commented Jun 15, 2022

On Windows the alignment is also not an issue, so I probably wouldn't try to drive a fix from that perspective. I can do whatever changes are necessary there, but the real fix would ideally be proper user-defined constants. Give to Int128, Vector2/3/4, Guid and other types what Decimal has in C# and DateTime has in VB ;)

@lorcanmooney
Copy link
Contributor Author

@jcouv My initial motivation was to be able to cast a ReadOnlySpan<Byte> to somthing like ReadOnlySpan<Language>, where I would manually define the type along these lines:

[StructLayout(LayoutKind.Explicit)]
struct Language
{
    [FieldOffset(0)] public Int64LE Value;
    [FieldOffset(8)] public Int16LE MacrolanguageIndex;
    [FieldOffset(10)] public Int16LE SuppressScriptIndex;
}

(The *LE types are just wrappers with implicit conversions to handle byte-order differences.)

Since then, my feelings on this have changed somewhat. I found maintaining the packing/layout to be tedious and error-prone, while interleaving all these fields didn't always match the access patterns I needed.

A secondary thought was that being able to specify alignment might help keep important data within cache lines, but this isn't something I've found the need to measure yet, so I'm probably the wrong person to comment.

@MichalPetryka
Copy link

Would guaranteeing a fixed alignment of for example 16 bytes (or maybe 32) for all types hurt much on binary sizes? This would mean SIMD loads from RVA statics would always be aligned (well at least until we get AVX512 or SVE support) and I assume it'd be enough for most usecases.

@jaredpar jaredpar modified the milestones: C# 11.0, 17.4 Jun 24, 2022
@jaredpar jaredpar modified the milestones: 17.4, C# 12.0 Aug 16, 2022
@jaredpar jaredpar modified the milestones: C# 12.0, Backlog Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants