feature request - support for Zc* extensions #633

biosbob · 2023-06-21T02:47:00Z

the proposed Zc* extensions described here have recently been ratified....

the Zca extensions is of particular interest in "small cores" with limit memory resources....

just a placeholder for what appears to be a non-trivial improvement....

The text was updated successfully, but these errors were encountered:

stnolting · 2023-06-23T18:14:44Z

The standard C extension is already implemented (-> CPU_EXTENSION_RISCV_C generic). In this particular case C = Zcf, but all the compressed floating-point operations are mapped to normal integer load/store because Zfinx is used instead of F.

I had a closer look at the Zc* ISA extension. It think it is quite promising. However, in terms of the NEORV32 I am not fully convinced if it would be a good idea to implemten all of them.

Zca - this is what we have if the FPU is disabled (Zfinx disabled)
Zcf - this is what we have if the FPU is enabled
Zcd makes no sense as the FPU is single-precision only
Zcmp (list-based push/pop similar to ARM) is quite interesting, but would require a lot of additional hardware. Furthermore, precise exception trapping is complex here as there are several memory loads/stores invoked by a single instruction.
Zcmt (table-based jumps) might be nice thing to have. But this would have a high latency - so the only gain would be further code size reduction (not performance).
Zcb - I really like this extension because it adds 16-bit variants for common operations (like multiplication). This would be quite easy to implement I think. So, yeah, maybe this sub-extension might be integrated in the future. 😉

This is just my opinion. Any thoughts?

biosbob · 2023-06-23T18:29:32Z

Zcb looks promising, in that it is relatively easy to implement.... with disciplined declarations of integer types in EM (uint8, int16, uint32, etc) this would mesh quite well.... future CPU implementations that use (say) an internal 8- or 16-bit ALU would also benefit; reducing ALU width obviously saves gates....

as for Zcmp, the EM runtime would generally place some common push/pop code fragments (used by LLVM) into the boot ROM.... even with the smallest boot ROM (say, 2K), there is plenty of "free and fast" instruction memory....

i'm not entirely sure what motivates Zcmt.... but honestly, reducing code size without a performance gain doesn't seem worth it anyway....

bottom line -- Zcb would be first, when we get around to it....

stnolting · 2023-06-30T05:35:28Z

bottom line -- Zcb would be first, when we get around to it....

I agree! But before we start implementing that we should wait for GCC support. Unfortunately, upcoming GCC 13(.1) does not include Zcb (https://gcc.gnu.org/gcc-13/changes.html).

biosbob · 2023-06-30T15:51:29Z

i'm finding that LLVM is much more current with risc-v extensions.... looks like they have lots of Zb* support -- as requested in #640

since EM supports both compilers, comparative benchmarks are trivial....

kimstik · 2024-01-23T16:06:40Z

As per benchmark Zcmp saves up to 35% (~6.5% average) of footprint.
GCC looks set to adopt it too.

stnolting · 2024-01-23T22:00:23Z

Interesting results! Thanks for sharing!

35% would be quite amazing, but I'm not sure what the "cost" of that might be (additional hardware resources, impact on critical path, etc.). Zcmp adds push and pop operations that would require to modify the CPU's pipeline (as there are several memory accesses triggered by a single instruction).

But the NEORV32's execution stage is a multi-cycle architecture... so maybe the additional hardware overhead would be quite small... I think I'll need to have a closer look at this again.

kimstik · 2024-01-26T16:08:56Z

Moreover, Zcb has become mandatory for RVA2023 profile.

stnolting · 2024-01-26T21:01:25Z

Oh, I did not expect that. However, RVA is the application-class profile (MMU, 64-bit, ...), which is out of scope of this project right now 🙈

I had another look at the Zcb specs. Basically, it just adds 11 new compressed instructions. Adding the memory operations should be quite easy and I think the performance benefit might be noticeable. Adding the remaining instructions (bit-manip, multiplication, inversion) is a little bit more complex but still doable.

Anybody volunteering to do a PR? 😅

kimstik · 2024-04-30T16:14:13Z

btw, I tested LLVM 18 with/without Zcmp:

stnolting · 2024-05-04T10:40:28Z

The Zcmp's push/pop instruction are quite powerful as they can "compress" up to 13 load/stores and an addition into a single 16-bit instruction! They might even increase performance a little bit as there will be less traffic on the CPU's instruction fetch interface.

However, the big problem with these two instructions is that they do not de-compress into a 32-bit counterpart. Instead, they decompress into several and different instructions which would require a lot of hardware overhead. So I think that the "costs" clearly exceed the benefits here.

What do you think? 🤔

kimstik · 2024-05-07T10:45:39Z

The main advantages from my biased point of view are reducing the load on the instruction fetch channel, less cache pollution, and it should have a positive impact on interrupt handler latency. However, the most valuable aspect is the reduction of byte-code size to at least the level of Cortex-M0.

I think technical difficulties are unavoidable, and it's hard to objectively evaluate their value until they come into play :).
If the overhead is truly enormous, the number of configurations where Zcmp would be useful will shrink to a minimum. But I want to hope that the overhead won't be so huge and won't ruin the whole idea.

stnolting · 2024-05-13T20:08:03Z

The main advantages from my biased point of view are reducing the load on the instruction fetch channel

That's true! In its best case, this instruction saves up to 27 further 16-bit words from being fetched.

less cache pollution

Also true. However, embedded single-core systems might not need any kind of caches if you use fast on-chip memory.

and it should have a positive impact on interrupt handler latency

I'm not sure about this. Execution time would be identical. However, due to cache pollution / bus congestion there might be a relevant speedup.

However, the most valuable aspect is the reduction of byte-code size to at least the level of Cortex-M0.

Maybe, but technically such a complex instruction isn't "RISC" anymore, right? 😅

If the overhead is truly enormous, the number of configurations where Zcmp would be useful will shrink to a minimum. But I want to hope that the overhead won't be so huge and won't ruin the whole idea.

I think there are several benchmark examples provided by the people who invented these extended compressed instructions. The benefit (entirely looking at code size and performance) is quite impressive!

stnolting added enhancement New feature or request HW hardware-related labels Jun 23, 2023

stnolting mentioned this issue Jun 30, 2023

feature request - Zb* sub-extensions #640

Closed

stnolting added help wanted Extra attention is needed good first issue Good for newcomers labels Jan 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature request - support for Zc* extensions #633

feature request - support for Zc* extensions #633

biosbob commented Jun 21, 2023

stnolting commented Jun 23, 2023

biosbob commented Jun 23, 2023

stnolting commented Jun 30, 2023

biosbob commented Jun 30, 2023

kimstik commented Jan 23, 2024 •

edited

stnolting commented Jan 23, 2024

kimstik commented Jan 26, 2024

stnolting commented Jan 26, 2024

kimstik commented Apr 30, 2024

stnolting commented May 4, 2024

kimstik commented May 7, 2024

stnolting commented May 13, 2024

feature request - support for Zc* extensions #633

feature request - support for Zc* extensions #633

Comments

biosbob commented Jun 21, 2023

stnolting commented Jun 23, 2023

biosbob commented Jun 23, 2023

stnolting commented Jun 30, 2023

biosbob commented Jun 30, 2023

kimstik commented Jan 23, 2024 • edited

stnolting commented Jan 23, 2024

kimstik commented Jan 26, 2024

stnolting commented Jan 26, 2024

kimstik commented Apr 30, 2024

stnolting commented May 4, 2024

kimstik commented May 7, 2024

stnolting commented May 13, 2024

kimstik commented Jan 23, 2024 •

edited