Discussion for new ISA extensions #132

LekKit · 2024-03-25T10:52:12Z

This issue is a place for discussions about newer unprivileged ISA extensions

Each extension should be evaluated for following qualities:

Mandatory for a known usecase
Benefit for guest (Better performance or more features)
Benefit for advanced user (Better debugging capabilities inside/outside the guest, testing guest compliance)
Ease of implementation
Emulation performance
JITability (If new extension is too complicated for JIT, yet guest will use it instead of usual JITed ALU ops, this might cause a performance hit)
Conformance tests available (Implementing extensions without standardized testing suite is very error-prone)
Compiler support

Extensible list of ratified ISA extensions beyond rv64imafdc

Bitmanip family

Zba - Bitmanip address generation

Mandatory for a known usecase
Benefit for guest
Benefit for user (Testing guest compliance)
Ease of implementation: Moderate
Emulation performance
JITability (May fallback to ALU lowering without significant perf hit)
Tests available (https://github.com/LekKit/riscv-tests)
Compiler support

Zbb - Bitmanip basic bit-manipulation

Mandatory for a known usecase
Benefit for guest
Benefit for user (Testing guest compliance)
Ease of implementation: Moderate
Emulation performance
JITability (May fallback to ALU lowering without significant perf hit)
Tests available (https://github.com/LekKit/riscv-tests)
Compiler support

Zbc - Bitmanip carry-less multiplication

Mandatory for a known usecase
Benefit for guest: Unknown
Benefit for user (Testing guest compliance)
Ease of implementation
Emulation performance
JITability: Unknown
Tests available (https://github.com/LekKit/riscv-tests)
Compiler support

Zbs - Bitmanip single-bit instructions

Mandatory for a known usecase
Benefit for guest
Benefit for user (Testing guest compliance)
Ease of implementation: Moderate
Emulation performance
JITability (May fallback to ALU lowering without significant perf hit)
Tests available (https://github.com/LekKit/riscv-tests)
Compiler support

Floating-point family

Q - 128-bit IEEE754 floating point

Zfh - 16-bit IEEE754 floating point, Zfhmin - bfloat16

Mandatory for a known usecase
Benefit for guest
Benefit for user (Testing guest compliance)
Ease of implementation: Significant complexity
Emulation performance
JITability
Tests available (https://github.com/LekKit/riscv-tests)
Compiler support

Zfa - Additional floating-point instructions

Vector family

V - Vector Operations

Mandatory for a known usecase
Benefit for guest
Benefit for user (Testing guest compliance)
Ease of implementation: Extremely complex
Emulation performance: Unknown, might be fast enough with big vector sizes
JITability: Likely none
Tests available
Compiler support

K - Vector Cryptography

Mandatory for a known usecase
Benefit for guest: Unknown (But things like OpenSSL are likely to use it if they aren't already)
Benefit for user (Testing guest compliance)
Ease of implementation: Very complex
Emulation performance: Unknown, might be fast enough with big vector sizes
JITability: Likely none
Tests available
Compiler support

Memory/atomics related extensions

Zawrs - Wait on memory reservation (Almost like a hint)

Mandatory for a known usecase
Benefit for guest (Linux spinlock optimization merged into 6.9)
Benefit for user (Testing guest compliance)
Ease of implementation: Relatively easy, might actually implement as a pause on some hosts
Emulation performance: Unlikely to have significant impact
JITability: Not needed
Tests available
Compiler support (Usable in assembly)

Zacas - Compare and Swap

Mandatory for a known usecase
Benefit for guest: Unknown
Benefit for user (Testing guest compliance)
Ease of implementation: Easy
Emulation performance: Unlikely to have significant impact
JITability: Probably not needed
Tests available
Compiler support

Zicbom, Zicboz - Cache management (flush, invalidate, prefetch, zero cache block)

Mandatory for a known usecase
Benefit for guest
Benefit for user (Testing guest compliance)
Ease of implementation: Relatively easy
Emulation performance: Unlikely to have significant impact
JITability: Probably not needed
Tests available (https://github.com/LekKit/riscv-tests)
Compiler support

Hints

Zihintpause - Pause hint

Extra

Zicond - Integer Conditional operations

Zcb - Code size reduction extension

LekKit · 2024-03-25T13:03:53Z

Implemented Zihintpause in 133e45f.

LekKit · 2024-03-25T18:44:31Z

Implemented Zba (interpreter only for now) in 2a57cff

LekKit · 2024-03-25T19:42:22Z

Implemented Zbs (interpreter only for now) in dcb7021

LekKit · 2024-03-25T23:03:18Z

Implemented Zbb (interpreter only for now) in a6d4593

X547 · 2024-03-26T09:41:33Z

Why implementing vector extension is considered "extremely complex"?

LekKit · 2024-03-26T09:46:22Z

I am not even sure I understand it entirely after reading the spec repeatedly a few times. And it also seems to duplicate every piece of usual scalar instructions, but vectorized?
Like some FPU instructions are already complex to emulate and now we need to copy-paste them and make a scalar loop.

Secondly the hardest part is the JIT for me. But maybe I'll invent some better ways, or maybe interpreter will be fast enough so guests with interpreted vectors won't be slower than ones using JITed scalar loop.

Be aware that "extremely complex" != "I won't implement". It means it likely will take a lot of time and that it might be imperfect in regard to perf/quality for even longer.

LekKit · 2024-03-26T09:48:19Z

Anyhow Bitmanip, Zicond, Zcb seem like very good candidates for something that is much easier to implement both in interpreter & JIT, and they are already supported in GCC very well.

And Vector is something like "we'll get there eventually" target rn.

LekKit · 2024-03-26T09:56:47Z

If you want to work on it then no problem. I am simply focusing on other things.

It would help a lot if there existed some test suite for V instructions similar to how riscv-tests work.

LekKit · 2024-03-26T14:08:55Z

Implemented Zbc (interpreter only, probably no JIT planned) in 08094c5

Now the entire Bitmanip family is supported in the interpreter

LekKit · 2024-03-27T11:17:40Z

Implemented Zcb with partial JIT support in 1f41839

TODO: Test this properly

LekKit · 2024-03-27T11:24:47Z

Implemented Zicond (interpreter only for now) in fc406a9

TODO: Test this properly

LekKit · 2024-03-27T18:07:39Z

Overview on possible Zawrs implementation:

It is highly similar to x86 monitor/mwait instructions, however those are usually only usable in ring 0.
Some AMD chips (Starting from Bulldozer?) have monitorx/mwaitx that are supposed to be accessible from userspace.
Quoting LLVM commit: The presence of the MONITORX and MWAITX instructions is indicated by CPUID 8000_0001, ECX, bit 29. I am able to use those instructions on a Zen 1 machine.
Current consumer Intel chips don't have this (I receive SIGILL on i5 6100U). I don't see any potential replacement except umwait which is only awailable somewhere from 12th gen CPUs.

It seems ARM64 has WFE which is very similar to WRS.NTO on RISC-V which Zawrs provides. I don't know if it's usable in userland. There is a problem that it only works for tiny exclusive reservation sequence.

All things considered a better way to implement it would be to improve dirty memory tracking together with LR/SC handling. Or maybe not implement it at all if the implementation won't be efficient.

LekKit · 2024-03-28T17:11:14Z

There are scalar crypto extensions that are extending atop Bitmanip. It might make sense to implement them; altho my initial evaluation of JITability is fairly low unless we just start inlining generic ALU lowering everywhere.

LekKit · 2024-03-29T08:40:59Z

Implemented Zkr (entropy source CSR).

LekKit · 2024-03-30T21:14:01Z

Overview on how new extensions could be JITed: godbolt link

TLDR:

Zba fairly well compiles to lea rd, [rs2 + rs1 * 2] variations on x86 (2 insns are needed for uw variants), compiles 1:1 on arm64
Zbb andn/orn/xnor have no replacement on x86, but 1:1 replacement on arm64
Zbb clz/ctz are workable but have some nuances
Zbb popcount has no replacement on generic x86, and the arm64 variant is weird (need to throw in vector regalloc)
Zbb max/min/maxu/minu have no replacement and compile to conditional moves (Some codegen could be shared with Zicond)
Zbb sext.b, zext.h, sext.h compile fairly well. We already JIT zext.h r0, r1 -> andi r0, r1, 0xFFFF in IR, since IR imm is i32, and a special case peephole optimization could be added too
Zbb bit rotations should compile fairly well (No rol on arm64, but we always can use rbit or neg etc)
Zbs bext has no replacement but easily lowers into srli r0, r1, r2; andi, r0, r0, 1, same could be done with bexti
Other Zbs instructions have 1:1 replacement on x86 (bts/btr/btc), but only imm variants are 1:1 replaceable on arm64
Zicond is basically cmov
Zbc is not JITable at all and probably there is no point

All of those instructions will have a generic IR lowering for less advanced backends, then x86_64 & arm64 backends will incorporate 1:1 variants to actually speed up code which uses those RISC-V extensions.

TODO: Consider scalar crypto instructions

LekKit · 2024-04-08T14:08:01Z

Optimized orc.b instruction implementation (used in interpreter) in f760ee2. This could be also inlined in JIT.
This instruction is heavily used to accelerate string operations, so having a fast implementation for it is important.
This patch already improves Zbb-optimized Dhrystone score, even tho it's interpreter only yet.

LekKit · 2024-04-24T08:52:20Z

Probably the best possible orc.b implementation for x86_64: 6a37001
A similar implementation is probably possible on ARM64 using vceqq_u8 instrinsic

UPD: ARM64 neon implementation 3563cbf

static inline uint64_t bit_orc_b(uint64_t val)
{
    uint8x8_t in = vreinterpret_u8_u64(vcreate_u64(val));
    uint8x8_t orc = vtst_u8(in, in);
    return vget_lane_u64(vreinterpret_u64_u8(orc), 0);
}

bit_orc_b_neon:
    fmov    d0, x0
    cmtst   v0.8b, v0.8b, v0.8b
    fmov    x0, d0
    ret

LekKit added enhancement New feature or request documentation Improvements or additions to documentation discussion Debate for improvement labels Mar 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion for new ISA extensions #132

Discussion for new ISA extensions #132

LekKit commented Mar 25, 2024 •

edited

Loading

LekKit commented Mar 25, 2024

LekKit commented Mar 25, 2024

LekKit commented Mar 25, 2024

LekKit commented Mar 25, 2024

X547 commented Mar 26, 2024

LekKit commented Mar 26, 2024

LekKit commented Mar 26, 2024

LekKit commented Mar 26, 2024

LekKit commented Mar 26, 2024

LekKit commented Mar 27, 2024 •

edited

Loading

LekKit commented Mar 27, 2024 •

edited

Loading

LekKit commented Mar 27, 2024

LekKit commented Mar 28, 2024

LekKit commented Mar 29, 2024

LekKit commented Mar 30, 2024 •

edited

Loading

LekKit commented Apr 8, 2024

LekKit commented Apr 24, 2024 •

edited

Loading

Discussion for new ISA extensions #132

Discussion for new ISA extensions #132

Comments

LekKit commented Mar 25, 2024 • edited Loading

This issue is a place for discussions about newer unprivileged ISA extensions

Extensible list of ratified ISA extensions beyond rv64imafdc

Bitmanip family

Zba - Bitmanip address generation

Zbb - Bitmanip basic bit-manipulation

Zbc - Bitmanip carry-less multiplication

Zbs - Bitmanip single-bit instructions

Floating-point family

Q - 128-bit IEEE754 floating point

Zfh - 16-bit IEEE754 floating point, Zfhmin - bfloat16

Zfa - Additional floating-point instructions

Vector family

V - Vector Operations

K - Vector Cryptography

Memory/atomics related extensions

Zawrs - Wait on memory reservation (Almost like a hint)

Zacas - Compare and Swap

Zicbom, Zicboz - Cache management (flush, invalidate, prefetch, zero cache block)

Hints

Zihintpause - Pause hint

Extra

Zicond - Integer Conditional operations

Zcb - Code size reduction extension

LekKit commented Mar 25, 2024

LekKit commented Mar 25, 2024

LekKit commented Mar 25, 2024

LekKit commented Mar 25, 2024

X547 commented Mar 26, 2024

LekKit commented Mar 26, 2024

LekKit commented Mar 26, 2024

LekKit commented Mar 26, 2024

LekKit commented Mar 26, 2024

LekKit commented Mar 27, 2024 • edited Loading

LekKit commented Mar 27, 2024 • edited Loading

LekKit commented Mar 27, 2024

LekKit commented Mar 28, 2024

LekKit commented Mar 29, 2024

LekKit commented Mar 30, 2024 • edited Loading

LekKit commented Apr 8, 2024

LekKit commented Apr 24, 2024 • edited Loading

LekKit commented Mar 25, 2024 •

edited

Loading

LekKit commented Mar 27, 2024 •

edited

Loading

LekKit commented Mar 27, 2024 •

edited

Loading

LekKit commented Mar 30, 2024 •

edited

Loading

LekKit commented Apr 24, 2024 •

edited

Loading