RV32IMC with IBusCachedPlugin #93

MarekPikula · 2019-10-16T13:24:27Z

Hi, first of all congratulations for your impressive work. I'm currently evaluating different RISC-V cores and VexRiscv is my favourite so far.

I have a problem when trying to configure Briey to IMC ISA. I've tried Murax before with IBusSimplePlugin(compressedGen = true) and it worked just fine, but it doesn't seem to work with IBusCachedPlugin. When I set compressedGen = true in Briey I have the following error:

[Runtime] SpinalHDL v1.3.6    git head : 9bf01e7f360e003fac1dd5ca8b8f4bffec0e52b8
[Runtime] JVM max memory : 2444.5MiB
[Runtime] Current date : 2019.10.16 15:19:49
[Progress] at 0.000 : Elaborate components
PcManagerSimplePlugin is now useless

**********************************************************************************************
[Warning] Elaboration failed (0 error).
          Spinal will restart with scala trace to help you to find the problem.
**********************************************************************************************

[Progress] at 1.023 : Elaborate components
PcManagerSimplePlugin is now useless
Exception in thread "main" java.lang.Exception: Missing inserts : INSTRUCTION_ANTICIPATED
	at vexriscv.Pipeline$class.build(Pipeline.scala:95)
	at vexriscv.VexRiscv.build(VexRiscv.scala:86)
	at vexriscv.Pipeline$$anonfun$1.apply$mcV$sp(Pipeline.scala:161)
	at vexriscv.Pipeline$$anonfun$1.apply(Pipeline.scala:161)
	at vexriscv.Pipeline$$anonfun$1.apply(Pipeline.scala:161)
	at spinal.core.ClockDomain.apply(ClockDomain.scala:306)
	at spinal.core.Component$$anonfun$prePop$1.apply(Component.scala:124)
	at spinal.core.Component$$anonfun$prePop$1.apply(Component.scala:123)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at spinal.core.Component.prePop(Component.scala:123)
	at spinal.core.Component.delayedInit(Component.scala:138)
	at vexriscv.VexRiscv.<init>(VexRiscv.scala:86)
	at vexriscv.demo.Briey$$anon$3$$anon$4.<init>(Briey.scala:400)
	at vexriscv.demo.Briey$$anon$3.delayedEndpoint$vexriscv$demo$Briey$$anon$3$1(Briey.scala:395)
	at vexriscv.demo.Briey$$anon$3$delayedInit$body.apply(Briey.scala:346)
	at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
	at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
	at spinal.core.ClockingArea.delayedInit(Area.scala:84)
	at vexriscv.demo.Briey$$anon$3.<init>(Briey.scala:346)
	at vexriscv.demo.Briey.delayedEndpoint$vexriscv$demo$Briey$1(Briey.scala:346)
	at vexriscv.demo.Briey$delayedInit$body.apply(Briey.scala:270)
	at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
	at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
	at spinal.core.Component.delayedInit(Component.scala:131)
	at vexriscv.demo.Briey.<init>(Briey.scala:270)
	at vexriscv.demo.Briey$$anonfun$main$1.apply(Briey.scala:497)
	at vexriscv.demo.Briey$$anonfun$main$1.apply(Briey.scala:496)
	at spinal.core.internals.PhaseCreateComponent.impl(Phase.scala:1920)
	at spinal.core.internals.PhaseContext.doPhase(Phase.scala:195)
	at spinal.core.internals.SpinalVerilogBoot$$anonfun$singleShot$10.apply(Phase.scala:2156)
	at spinal.core.internals.SpinalVerilogBoot$$anonfun$singleShot$10.apply(Phase.scala:2154)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at spinal.core.internals.SpinalVerilogBoot$.singleShot(Phase.scala:2154)
	at spinal.core.internals.SpinalVerilogBoot$.apply(Phase.scala:2090)
	at spinal.core.Spinal$.apply(Spinal.scala:311)
	at spinal.core.SpinalConfig.generateVerilog(Spinal.scala:142)
	at vexriscv.demo.Briey$.main(Briey.scala:496)
	at vexriscv.demo.Briey.main(Briey.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.intellij.rt.execution.application.AppMainV2.main(AppMainV2.java:131)

I'm using current master with IntelliJ IDE.

The text was updated successfully, but these errors were encountered:

Dolu1990 · 2019-10-16T15:37:27Z

Thanks :)

It isn't realy a bug, but a set of incompatible feature.
Basicaly, with the Briey default configuration, the cache use two cycle (https://github.com/SpinalHDL/VexRiscv/blob/master/src/main/scala/vexriscv/demo/Briey.scala#L69)
which give the instruction in the decode stage, but there is also the INSTRUCTION_ANTICIPATED which provide the future value of the decode instruction, which allow the register file to use a syncronus ram read using as address the INSTRUCTION_ANTICIPATED to produce the RS1 RS2 in the decode stage.

The issue with the RVC in that case, is that the RVC decompression is only done in the decode stage, which do not allow to produce the INSTRUCTION_ANTICIPATED value used by the reg file.

There is multiple ways to workaround that.
If you are using a FPGA with distributed ram capability, i would just pass the RegFilePlugin from SYNC to ASYNC.
Else you can ask the IBusCachePlugin twoCycleRam and twoCycleCache to false.
Else you can set the https://github.com/SpinalHDL/VexRiscv/blob/master/src/main/scala/vexriscv/plugin/IBusCachedPlugin.scala#L36 to true, in the CPU config, which would add an additional stage in the fetch pipeline which will allow INSTRUCTION_ANTICIPATED to be generated.

MarekPikula · 2019-10-17T10:20:56Z

After disabling twoCycleRam and twoCycleCache it works like a charm 👍

I'd propose to add information about it in readme and possibly add cached IMC to standard configurations and size/performance breakdown, since IMC is quite popular configuration.

Besides what makes IMC less performant than IM? I'm running CoreMark and it's about 5% slower. It's not a huge difference, but I'm curious, especially that for Syntacore scr1 it's the other way around (IM is about 2% less performant than IMC). These are not huge differences, but still it's interesting to know what is the cause of this. VexRiscv is my first SpinalHDL project I'm looking at, so I don't have enough competence to look around the code and figure it out myself.

Dolu1990 · 2019-10-17T11:22:29Z

A performance hit due to RVC which should impact most implementations (and at least VexRiscv) is the fact that branch/jump on 32 bits unaligned instruction will result in the fetcher having to fetch two words. before being able to deliver that 32 bits instruction.

You can imagine other cases where it require to read two different lines of the cache, or event two diferrent MMU TLB.

I don't realy know about the Syntacore scr1 architecture.
RVC performance improvement for coremark could come from op-fusion ? But else i don't realy see any reason. Excepted maybe less i$ trashing, but normaly coremark should fully fit into 4 KB i$ d$. But scr1 is cacheless right ?

MarekPikula · 2019-10-17T12:01:20Z

To be honest I didn't go too deep into scr1 architecture, but from what I can see there is no caching.

FYI I'm using scr1 under PULPissimo with their logarithmic interconnect for memory, which is very fast (thus caching is not that required). I'll check if there is any performance difference for Ibex, which I was also evaluating.

MarekPikula · 2019-10-17T12:11:59Z

Same story for Ibex. IM is about 1,5% less performant than IMC. If I'll find some time maybe I'll try to fit cacheless VexRiscv in PULPissimo and do some benchmarks to see what is the difference.

Dolu1990 · 2019-10-17T12:16:31Z

logarithmic interconnect for memory

What is that :D ?

IM is about 1,5% less performant than IMC

That's weird, i'm curious to understand why XD
I mean technicaly speaking, RVC is a subset of RV. So maybe that's because of the compiler creativity ?

MarekPikula · 2019-10-17T12:26:20Z

You can read more about log interconnect in Near-Threshold RISC-V Core With DSP Extensions
for Scalable IoT Endpoint Devices article (here is a copy I've found if you don't have access to IEEE).

Compiler creativity has nothing to do in this particular case, because on all platforms I'm testing, I'm executing the same exact code compiled using the same exact compiler with the same exact flags, so the only difference is the core and SoC platform.

tomverbeure · 2019-10-17T12:33:30Z

I noticed a similar case on the picorv32: when RVC is enabled, dhrystone performance is slightly lower when running identical IM32 binary.

MarekPikula · 2019-10-17T13:35:06Z

For me it's exactly the same core (so full IMC), but running either IMC or IM code, so the only variable compiled code.

MarekPikula · 2019-10-17T14:10:06Z

@Dolu1990 for your reference here is log interconnect code: pulp-platform/L2_tcdm_hybrid_interco.

MarekPikula · 2019-10-18T09:12:26Z

And more about log interconnect: https://iis-people.ee.ethz.ch/~arahimi/papers/DATE11.pdf

According to SpinalHDL/VexRiscv#93 compressed mode doesn't work with a sync register file, and with twoCycleRam/twoCycleCache. Fix compressed instructions by moving to an async register file. Signed-off-by: Sean Cross <sean@xobs.io>

ztachip mentioned this issue Apr 30, 2023

No Video Output ztachip/ztachip#6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RV32IMC with IBusCachedPlugin #93

RV32IMC with IBusCachedPlugin #93

MarekPikula commented Oct 16, 2019

Dolu1990 commented Oct 16, 2019

MarekPikula commented Oct 17, 2019

Dolu1990 commented Oct 17, 2019

MarekPikula commented Oct 17, 2019

MarekPikula commented Oct 17, 2019

Dolu1990 commented Oct 17, 2019

MarekPikula commented Oct 17, 2019

tomverbeure commented Oct 17, 2019

MarekPikula commented Oct 17, 2019

MarekPikula commented Oct 17, 2019

MarekPikula commented Oct 18, 2019

RV32IMC with IBusCachedPlugin #93

RV32IMC with IBusCachedPlugin #93

Comments

MarekPikula commented Oct 16, 2019

Dolu1990 commented Oct 16, 2019

MarekPikula commented Oct 17, 2019

Dolu1990 commented Oct 17, 2019

MarekPikula commented Oct 17, 2019

MarekPikula commented Oct 17, 2019

Dolu1990 commented Oct 17, 2019

MarekPikula commented Oct 17, 2019

tomverbeure commented Oct 17, 2019

MarekPikula commented Oct 17, 2019

MarekPikula commented Oct 17, 2019

MarekPikula commented Oct 18, 2019