Support multicore configurations #85

mithro · 2019-09-15T00:51:37Z

It would be good to support VexRISCV in multicore configurations.

With the low resource usage of VexRISCV, supporting 2 or 4 core complexes on cheap boards would be very possible. We can then use it at litex-hub/linux-on-litex-vexriscv#47

As VexRISCV now being used to run Linux, SMP support would hopefully improve performance.

@SpinalHDL / @Dolu1990 - What would be needed to make this happen? I assume a bunch of stuff around atomics and cache coherence?

Dolu1990 · 2019-09-15T09:06:02Z

As far i know there would be the change required :

As you said, a coherent data cache. Possibly adding write back behaviour in addition of the actual write through to avoid producing to much write transactions.
No atomic change required, basicaly, if the data cache is coherent, the atomics which are done in the data cache itself become coherent as well
It would also require a coherent memory interconnect, maybe a coherent L2 depending the design choosed

Basicaly, there is a list of things to do on VexRiscv to improve various things:

Increasing the data width of I$ D$ memory busses, to allow faster miss refill, actualy that's only 32 bits, this would boost each core, as currently it seem the main bottleneck is the I$ miss penality. This kind of things do not show when we run benchmarks as dhrystone and coremark, as they work on very little code base. But running linux is a I$ hell ^^
Data cache coherency for SMP
Having xtval xtvec xepc xscratch in a block ram instead of raw register, this would save area and avoid having to reduce features set in machine mode (done to save area)
FPU

Currently i'm slamed by third party obligations, but in march i should have much more free time too move things forward.

WillGreen · 2019-09-20T07:26:33Z

Would you mind elaborating on the use-case for SMP @mithro?

My gut feeling is that SMP support would be impressive, but not that useful. There are more straightforward ways to boost performance that benefit a wide range of uses, not just SMP-aware operating systems. For example, increasing the width of cache busses, as @Dolu1990 has already mentioned.

I see CPUs on FPGAs as providing control and performing high-complexity operations, such as division and square root. If you’re doing something parallel and performance-critical, you can do it in a co-processor or separate logic block, without complicating the core VexRiscv design. If there is going to be a substantial addition of functionality, then I think an FPU is more desirable than additional integer cores.

Another approach is to place multiple VexRiscv cores on one FPGA and link them with a bus. Such a design wouldn’t “just work” in Linux, but would allow custom designs to perform more CPU operations in parallel if required.

A big part of what makes VexRiscv unique is its clean, elegant design. I’d hate to lose that.

Dolu1990 · 2019-11-07T12:42:15Z

I'm currently crushed by none SpinalHDL/VexRiscv obligations :(
So my hope is that in march i will get free of most of them, and move forward to make things progress.

@WillGreen @mithro Maybe SMP would be possible in some kind of FPGA friendly / clean way :

Keeping the memory bus untouched from the SMP stuff
Having one coherent directory which would manage and track cache line states (invalid, shared, unique)
To write, a CPU should have the line in a unique state
To read, a CPU should have the line in a shared or unique state)

To negotiate line state, there would be 2 channels between each CPU and the directory :

Channel 1

Directory -> CPU to ask invalidate a line (allowing to make a line unique for another CPU) or to make it shared (which would imply the CPU isn't allow to keep the line dirty)
CPU -> Directory to notify when one of the above transaction in comleted

Channel 2

CPU -> Directory, to ask a line as unique or shared
Directory -> CPU to notify when one of the above transaction is done

Channel 1 should have priority over channel 2
the memory channel share no logic with channel 1 and 2, as it isn't connected by any way to the directory, so nothing specific there.

Then all data transaction would still be done by the same bus as actualy. This would allow to bring the system on all sort of memory system without specific requirements.
It would also avoid duplicating data path, as i would like to avoid tilelink like things.

That's just my current unrefined idea of how SMP could be done, but i'm not expert in that field XD

Dolu1990 · 2019-11-07T17:02:46Z

Another solution would be more AXI ACE like with 3 channel :

Channel 1 (write) :

cpu -> interconnect : write cmd + data
interconnect -> cpu : ack

Channel 2 (read / reserve) :

cpu -> interconnect : ask to read and/or make unique/shared a given address
interconnect -> CPU : readed data
cpu -> interconnect : ack

Channel 3 (probe) :

interconnect -> cpu : address invalidate/shared
cpu -> interconnect : ack

which would result into 3 steam with address, 2 stream with data.
The CPU can write the data back to memory when it got a probe hit.

Dolu1990 · 2019-11-09T15:01:22Z

A variation of the above proposal could be that a CPU cache could react to a probe request by the following ways :

cache miss => channel 3 rsp ack
cache hit => write back to memory on channel 1 + channel 3 rsp ack
cache hit (and this one is special) => Channel 1 probe response with the related data., no channel 3 rsp ack. This would allow at a cheap cost to move data between caches while avoiding adding a new datapath.

Dolu1990 · 2020-02-14T23:26:53Z

There is a draft of a coherent interface spec
https://github.com/SpinalHDL/VexRiscv/blob/dev/doc/smp/smp.md

Dolu1990 · 2020-04-10T13:08:35Z

Quite some progress here.

dev branch :
https://github.com/SpinalHDL/VexRiscv/tree/smp

Basicaly, the aim is to implement write-through invalidate coherency protocole for the CPU L1 d$.
There is a few reasons to not adopte (yet) a write-back based coherency protocol for the L1 :

Write back penalities / workaround cost
Write allocate penalities / workaround cost
Write allocate line virtualy reduce the cache size / number of ways (ex single way d$ doing memory copy can endup into sever cash trashing if source and destination are aligned)
Interconnect complexity, not that it is particulaly heavy for a writeback proposal, but still, quite some complexity added.
Latency added on the interconnect

Currently, all the CPU side stuff is implemented
In a single core config, with random invalidations comming from the testbench, it can boot linux.

LR/SC are now using the memory bus "exclusive" feature, something similar to the AXI4 one, while AMO are emulated in the CPU hardware using those same LR/SC memory bus request.

Synthesis look good, as FMax is only 5% lower (without any time used to improve that), and the LUT occupancy is 2 % higher.

The only requirements for the interconnect are to implement eclusive accesses and to propagate write request as invalidation request to the other CPU.

Dolu1990 · 2020-05-02T14:19:08Z

Done :)

https://github.com/enjoy-digital/litex_vexriscv_smp
https://github.com/SpinalHDL/VexRiscv/blob/smp/src/main/scala/vexriscv/demo/smp/VexRiscvSmpCluster.scala#L21

But i have to document things

Dolu1990 mentioned this issue Sep 15, 2019

Support multicore CPU configuration litex-hub/linux-on-litex-vexriscv#47

Closed

mithro mentioned this issue Jan 9, 2020

Import the Taiga high performance RISC-V core enjoy-digital/litex#177

Closed

Dolu1990 pinned this issue Apr 10, 2020

Dolu1990 unpinned this issue May 2, 2020

Dolu1990 closed this as completed May 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multicore configurations #85

Support multicore configurations #85

mithro commented Sep 15, 2019

Dolu1990 commented Sep 15, 2019

WillGreen commented Sep 20, 2019

Dolu1990 commented Nov 7, 2019 •

edited

Dolu1990 commented Nov 7, 2019 •

edited

Dolu1990 commented Nov 9, 2019

Dolu1990 commented Feb 14, 2020 •

edited

Dolu1990 commented Apr 10, 2020 •

edited

Dolu1990 commented May 2, 2020

Support multicore configurations #85

Support multicore configurations #85

Comments

mithro commented Sep 15, 2019

Dolu1990 commented Sep 15, 2019

WillGreen commented Sep 20, 2019

Dolu1990 commented Nov 7, 2019 • edited

Dolu1990 commented Nov 7, 2019 • edited

Dolu1990 commented Nov 9, 2019

Dolu1990 commented Feb 14, 2020 • edited

Dolu1990 commented Apr 10, 2020 • edited

Dolu1990 commented May 2, 2020

Dolu1990 commented Nov 7, 2019 •

edited

Dolu1990 commented Nov 7, 2019 •

edited

Dolu1990 commented Feb 14, 2020 •

edited

Dolu1990 commented Apr 10, 2020 •

edited