Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Expose System.Runtime.Intrinsics.X86.Aes256 and Aes512 #86952

Open
e4m2 opened this issue May 31, 2023 · 7 comments
Open

[API Proposal]: Expose System.Runtime.Intrinsics.X86.Aes256 and Aes512 #86952

e4m2 opened this issue May 31, 2023 · 7 comments
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.Intrinsics

Comments

@e4m2
Copy link

e4m2 commented May 31, 2023

Background and motivation

On some newer x86 CPUs VAES provides wider variants of encoding/decoding included in the older AES instruction set.

The 256-bit VEX-encoded variant (effectively operating on 2 AES blocks in parallel using a single instruction) has a separate CPUID flag and is not dependent on AVX512 support. Additionally, if AVX512F is supported, a 512-bit EVEX-encoded variant is available. As expected, EVEX-encoded 128 and 256-bit variants are available if AVX512VL is supported.

API Proposal

namespace System.Runtime.Intrinsics.X86;

[Intrinsic]
[CLSCompliant(false)]
public abstract class Aes256 : Aes
{
    internal Aes256() { }

    // This would depend on the VAES CPUID bit for VEX encoding and VAES + AVX512VL for EVEX
    public static new bool IsSupported { get => IsSupported; }

    [Intrinsic]
    public new abstract class X64 : Aes.X64
    {
        internal X64() { }

        public static new bool IsSupported { get => IsSupported; }
    }

    /// <summary>
    /// __m256i _mm256_aesdec_epi128(__m256i a, __m256i RoundKey)
    ///   VAESDEC ymm1, ymm2, ymm3/m256
    /// </summary>
    public static Vector256<byte> Decrypt(Vector256<byte> value, Vector256<byte> roundKey);

    /// <summary>
    /// __m256i _mm256_aesdeclast_epi128(__m256i a, __m256i RoundKey)
    ///   VAESDECLAST ymm1, ymm2, ymm3/m256
    /// </summary>
    public static Vector256<byte> DecryptLast(Vector256<byte> value, Vector256<byte> roundKey);

    /// <summary>
    /// __m256i _mm256_aesenc_epi128(__m256i a, __m256i RoundKey)
    ///   VAESENC ymm1, ymm2, ymm3/m256
    /// </summary>
    public static Vector256<byte> Encrypt(Vector256<byte> value, Vector256<byte> roundKey);
    
    /// <summary>
    /// __m256i _mm256_aesenclast_epi128(__m256i a, __m256i RoundKey)
    ///   VAESENCLAST ymm1, ymm2, ymm3/m256
    /// </summary>
    public static Vector256<byte> EncryptLast(Vector256<byte> value, Vector256<byte> roundKey);
}

[Intrinsic]
[CLSCompliant(false)]
public abstract class Aes512 : Aes
{
    internal Aes512() { }

    // This would depend on the VAES + AVX512F CPUID bits
    public static new bool IsSupported { get => IsSupported; }

    [Intrinsic]
    public new abstract class X64 : Aes.X64
    {
        internal X64() { }

        public static new bool IsSupported { get => IsSupported; }
    }
    
    /// <summary>
    /// __m512i _mm512_aesdec_epi128(__m512i a, __m512i RoundKey)
    ///   VAESDEC zmm1, zmm2, zmm3/m512
    /// </summary>
    public static Vector512<byte> Decrypt(Vector512<byte> value, Vector512<byte> roundKey);
    
    /// <summary>
    /// __m512i _mm512_aesdeclast_epi128(__m512i a, __m512i RoundKey)
    ///   VAESDECLAST zmm1, zmm2, zmm3/m512
    /// </summary>
    public static Vector512<byte> DecryptLast(Vector512<byte> value, Vector512<byte> roundKey);

    /// <summary>
    /// __m512i _mm512_aesenc_epi128(__m512i a, __m512i RoundKey)
    ///   VAESENC zmm1, zmm2, zmm3/m512
    /// </summary>
    public static Vector512<byte> Encrypt(Vector512<byte> value, Vector512<byte> roundKey);

    /// <summary>
    /// __m512i _mm512_aesenclast_epi128(__m512i a, __m512i RoundKey)
    ///   VAESENCLAST zmm1, zmm2, zmm3/m512
    /// </summary>
    public static Vector512<byte> EncryptLast(Vector512<byte> value, Vector512<byte> roundKey);
}

Note VAES doesn't include round key assist or inverse mix columns instructions.

API Usage

Same as AES intrinsics, except using wider vector types.

Alternative Designs

No response

Risks

No response

References

https://en.wikichip.org/wiki/x86/vaes
https://en.wikipedia.org/wiki/AVX-512#VAES
https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#othertechs=VAES

@e4m2 e4m2 added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label May 31, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label May 31, 2023
@ghost
Copy link

ghost commented May 31, 2023

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

On some newer x86 CPUs VAES provides wider variants of encoding/decoding included in the older AES instruction set.

The 256-bit VEX-encoded variant (effectively operating on 2 AES blocks in parallel using a single instruction) has a separate CPUID flag and is not dependent on AVX512 support. Additionally, if AVX512F is supported, a 512-bit EVEX-encoded variant is available. As expected, EVEX-encoded 128 and 256-bit variants are available if AVX512VL is supported.

API Proposal

namespace System.Runtime.Intrinsics.X86;

public abstract class Vaes : Aes
{
    public static new bool IsSupported { get; }

    public new abstract class X64 : Aes.X64
    {
        public static new bool IsSupported { get; }
    }

    public static Vector256<byte> Decrypt(Vector256<byte> value, Vector256<byte> roundKey);
    public static Vector256<byte> DecryptLast(Vector256<byte> value, Vector256<byte> roundKey);

    public static Vector256<byte> Encrypt(Vector256<byte> value, Vector256<byte> roundKey);
    public static Vector256<byte> EncryptLast(Vector256<byte> value, Vector256<byte> roundKey);
}

public static abstract class Avx512Vaes : Avx512F
{
    public static new bool IsSupported { get; }

    public new abstract class X64 : Avx512F.X64
    {
        public static new bool IsSupported { get; }
    }

    public new abstract class VL : Avx512F.VL
    {
        public static new bool IsSupported { get; }

        public static Vector128<byte> Decrypt(Vector128<byte> value, Vector128<byte> roundKey);
        public static Vector128<byte> DecryptLast(Vector128<byte> value, Vector128<byte> roundKey);

        public static Vector256<byte> Encrypt(Vector256<byte> value, Vector256<byte> roundKey);
        public static Vector256<byte> EncryptLast(Vector256<byte> value, Vector256<byte> roundKey);
    }

    public static Vector512<byte> Decrypt(Vector512<byte> value, Vector512<byte> roundKey);
    public static Vector512<byte> DecryptLast(Vector512<byte> value, Vector512<byte> roundKey);

    public static Vector512<byte> Encrypt(Vector512<byte> value, Vector512<byte> roundKey);
    public static Vector512<byte> EncryptLast(Vector512<byte> value, Vector512<byte> roundKey);
}

Note VAES doesn't include round key assist or inverse mix columns instructions.

API Usage

Same as AES intrinsics, except using wider vector types.

References

https://en.wikichip.org/wiki/x86/vaes
https://en.wikipedia.org/wiki/AVX-512#VAES
https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#othertechs=VAES

Alternative Designs

No response

Risks

No response

Author: e4m2
Assignees: -
Labels:

api-suggestion, area-System.Runtime.Intrinsics, untriaged

Milestone: -

@MichalPetryka
Copy link
Contributor

    public new abstract class VL : Avx512F.VL
    {
        public static new bool IsSupported { get; }

        public static Vector128<byte> Decrypt(Vector128<byte> value, Vector128<byte> roundKey);
        public static Vector128<byte> DecryptLast(Vector128<byte> value, Vector128<byte> roundKey);

        public static Vector256<byte> Encrypt(Vector256<byte> value, Vector256<byte> roundKey);
        public static Vector256<byte> EncryptLast(Vector256<byte> value, Vector256<byte> roundKey);
    }

What's the benefit of exposing the EVEX variants separately?

@colejohnson66
Copy link

Technically, VAES and AVX512-F only indicate 512-bit operation; AVX512-VL is required to use 128-bit and 256-bit vectors, hence the dedicated subclass. If you're asking why they exist when the VEX forms exist, it's probably just to allow the user to choose which prefix to use, or for consistency.

@tannergooding
Copy link
Member

If you're asking why they exist when the VEX forms exist, it's probably just to allow the user to choose which prefix to use, or for consistency.

Users don't get to pick the prefix, the JIT picks based on the most optimal form. For V512, it's required to use EVEX. For V128/V256 it will pick VEX if only the lower 16 SIMD registers are used. If LSRA must allocate an extended SIMD register (one of the upper 16) or decides that it can take advantage of another EVEX only feature such as embedded broadcast or embedded masking, then it may use EVEX instead (assuming the hardware is capable of course).

We intentionally do not duplicate APIs needlessly, and so we shouldn't need them under Avx512Vaes.VL


Given that, given the future for Avx10, and given what we had previously opted for with VPCLMULQDQ (#95772), we should likely name these Aes256 and Aes512, respectively.

However, depending on how we decide to do Avx10, it may be "better" to have these in nested V256/V512 classes under Aes and Pclmulqdq instead.

@tannergooding
Copy link
Member

@e4m2, could you update to follow the same general pattern as Pclmulqdq for now and then I can get this reviewed after or as part of the Avx10 work, at which point we'll know the desired pattern?

@e4m2 e4m2 changed the title [API Proposal]: Expose System.Runtime.Intrinsics.X86.Vaes and Avx512Vaes [API Proposal]: Expose System.Runtime.Intrinsics.X86.Aes256 and Aes512 Feb 8, 2024
@e4m2
Copy link
Author

e4m2 commented Feb 8, 2024

Thanks for the input. Updated!

@tannergooding tannergooding added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation untriaged New issue has not been triaged by the area owner labels Feb 8, 2024
@terrajobst
Copy link
Member

terrajobst commented Feb 29, 2024

Video

  • Looks good as proposed, we opted to change the surface area slightly to match the approach approved for Avx10 where we have a nested V256 and V512 class
namespace System.Runtime.Intrinsics.X86;

public abstract class Aes
{
    public abstract class V256
    {
        public static new bool IsSupported { get; }

        public static Vector256<byte> Decrypt(Vector256<byte> value, Vector256<byte> roundKey);
        public static Vector256<byte> DecryptLast(Vector256<byte> value, Vector256<byte> roundKey);
        public static Vector256<byte> Encrypt(Vector256<byte> value, Vector256<byte> roundKey);   
        public static Vector256<byte> EncryptLast(Vector256<byte> value, Vector256<byte> roundKey);
    }

    public abstract class V512
    {
        public static Vector512<byte> Decrypt(Vector512<byte> value, Vector512<byte> roundKey);   
        public static Vector512<byte> DecryptLast(Vector512<byte> value, Vector512<byte> roundKey);
        public static Vector512<byte> Encrypt(Vector512<byte> value, Vector512<byte> roundKey);
        public static Vector512<byte> EncryptLast(Vector512<byte> value, Vector512<byte> roundKey);
    }
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.Intrinsics
Projects
None yet
Development

No branches or pull requests

5 participants