Rocket cosim framework #3271

midnighter95 · 2023-02-24T10:28:31Z

This PR provides a Rocket-Spike cosim simulation framework. A standalone rocket core is extracted with its memory port exposed to the top-level and is compiled to c++ model by Verilator . The framework drives rocket model to run testcase while spike runs the same test at the same time. During the test, the rocket result will be compared to the spike for each instruction.

Framework Outline

control the overall test flow.
stimulate Spike with testcase, record the result.
stimulate rocket model.
deal with all the memory access from the core (acts as the memory of rocket core)
peek rocket result and compare it with Spike.

Features

standalone Rocket core with decouple bus interfaces expose to users.
use spike as reference model
capture architecture events, for each cycles, comparing event with architecture events generated from spike
fast binary load and start-up

Uarch Coverpoint

Scalar register file write
Float register file write
CSR write

Completed tests:

Test Flow

nix develop -c make riscvtests

jerryz123

Have you seen ucb-bar/chipyard#1323?

The model for cosim should be

Support (almost) arbitrary configuration of DUT
Match DUT arch spec with spike configuration
For every instruction committed by DUT, compare against single-stepped spike

The only challenge with rocket is the OOO write-back, but that can be solved easily by adding a giant ROB model to the core (used only for cosim with traces, not for implementation).

sequencer

Basically this framework was the initial version from my vector project. really appreciate @midnighter95 for trying to migrate it to RC.

For RC, we need a more carefully designed and general test framework.

You should use VPI or DPI to make Chisel be able to hook data, like the https://github.com/sequencer/vector/blob/master/tests/elaborate/src/GrandCentral.scala interface with DPI.
Tests should fully compatible with the native RocketChip Configs. Spike should read from the RocketCore parameter to config itself.
For the difference between model&impl, there should be a hook to inject to model, since there exist some undefined behavior, which is allowed but behavior differs between model and RTL(this is a little tricky, you can do it in the future PR)
I personally don't like the Monitor implementation in the RC, I always believe all verification codes should be injected remotely, like systemverilog bind + Cross Module Reference, or Aspect from Chisel. I strongly suggest switch to DPI directly and use the DPI + GrandCenteral magic.

Anyway, thanks for this work, I understand this PR is hard to implement, but this should be the key feature for the future RocketCore CI(far more faster than the in-tree version).

sequencer · 2023-02-24T19:21:48Z

flake.nix

+          libargs glog fmt zlib
+          gnused coreutils gnugrep which
+          parallel protobuf antlr4 numactl


Only need to add necessary dependencies.

src/main/scala/tile/Interrupts.scala

src/main/scala/tile/RocketTile.scala

sequencer · 2023-02-24T19:26:59Z

tests/cosim/elaborate/src/DUT.scala

+    val masterNode = TLManagerNode(Seq(TLSlavePortParameters.v1(
+      Seq(TLSlaveParameters.v1(
+        address = List(AddressSet(0x0, 0xffffffffL)),
+        regionType = RegionType.UNCACHED,
+        executable = true,
+        supportsGet = TransferSizes(1, 64),
+        supportsAcquireT = TransferSizes(1, 64),
+        supportsAcquireB = TransferSizes(1, 64),
+        supportsPutPartial = TransferSizes(1, 64),
+        supportsPutFull = TransferSizes(1, 64),
+        supportsLogical = TransferSizes(1, 64),
+        supportsArithmetic = TransferSizes(1, 64),
+        fifoId = Some(0))),
+      beatBytes = 8,
+      endSinkId = 4,
+      minLatency = 1
+    )))
+    masterNode :=* rocketTile.masterNode
+    val memory = InModuleBody {
+      masterNode.makeIOs()
+    }
+
+    val intNode = IntSourceNode(IntSourcePortSimple())
+    rocketTile.intInwardNode :=* intNode
+    val intIn = InModuleBody {
+      intNode.makeIOs()
+    }
+
+    val haltNode = IntSinkNode(IntSinkPortSimple())
+    haltNode :=* rocketTile.haltNode
+    val haltOut = InModuleBody {
+      haltNode.makeIOs()
+    }
+
+    val ceaseNode = IntSinkNode(IntSinkPortSimple())
+    ceaseNode :=* rocketTile.ceaseNode
+    val ceaseOut = InModuleBody {
+      ceaseNode.makeIOs()
+    }
+
+    val wfiNode = IntSinkNode(IntSinkPortSimple())
+    wfiNode :=* rocketTile.wfiNode
+    val wfiOut = InModuleBody {
+      wfiNode.makeIOs()
+    }
+    val resetVectorNode = BundleBridgeSource(() => UInt(32.W))
+    rocketTile.resetVectorNode := resetVectorNode
+    val resetVector = InModuleBody {
+      resetVectorNode.makeIO()
+    }
+    val hartidNode = BundleBridgeSource(() => UInt(4.W))
+    rocketTile.hartIdNode := hartidNode
+    InModuleBody {
+      hartidNode.bundle := 0.U
+    }
+    val nmiNode = BundleBridgeSource(Some(() => new NMI(32)))
+    rocketTile.nmiNode := nmiNode
+    val nmi = InModuleBody {
+      nmiNode.makeIO()
+    }


These logic is already implemented in the InstantiatesTiles, please use it.

sequencer · 2023-02-24T19:27:17Z

tests/cosim/elaborate/src/DUT.scala

+  chisel3.experimental.DataMirror.fullModulePorts(
+    // instantiate the LazyModule
+    Module(ldut.module)
+  ).filterNot(_._2.isInstanceOf[Aggregate]).foreach { case (name, ele) =>
+    if (!(name == "clock" || name == "reset")) {
+      chisel3.experimental.DataMirror.directionOf(ele) match {
+        case ActualDirection.Output =>
+          val io = IO(Output(chiselTypeOf(ele))).suggestName(name)
+          println(s"output $name")
+          io := ele
+        case ActualDirection.Input =>
+          val io = IO(Input(chiselTypeOf(ele))).suggestName(name)
+          println(s"input $name")
+          ele := io
+      }
+    }
+  }


IIRC this hack should be fixed.

sequencer · 2023-02-24T19:28:53Z

tests/cosim/elaborate/src/Main.scala

+private[freechips] final class RocketChiselStage extends ChiselStage {
+
+  override val targets = Seq(
+    Dependency[chisel3.stage.phases.Checks],
+    Dependency[chisel3.stage.phases.Elaborate],
+    Dependency[freechips.rocketchip.stage.phases.GenerateROMs],
+    Dependency[chisel3.stage.phases.AddImplicitOutputFile],
+    Dependency[chisel3.stage.phases.AddImplicitOutputAnnotationFile],
+    Dependency[chisel3.stage.phases.MaybeAspectPhase],
+    Dependency[chisel3.stage.phases.Emitter],
+    Dependency[chisel3.stage.phases.Convert]
+  )
+
+}
+
+class RocketChipStage extends Stage with PreservesAll[Phase] {
+
+  override val shell = new Shell("rocket-chip") with RocketChipCli with ChiselCli with FirrtlCli
+  val targets: Seq[PhaseDependency] = Seq(
+    Dependency[freechips.rocketchip.stage.phases.Checks],
+    Dependency[freechips.rocketchip.stage.phases.TransformAnnotations],
+    Dependency[freechips.rocketchip.stage.phases.PreElaboration],
+    Dependency[RocketChiselStage],
+    Dependency[freechips.rocketchip.stage.phases.GenerateFirrtlAnnos],
+    Dependency[freechips.rocketchip.stage.phases.AddDefaultTests],
+    Dependency[freechips.rocketchip.stage.phases.GenerateTestSuiteMakefrags],
+    Dependency[freechips.rocketchip.stage.phases.GenerateArtefacts]
+  )
+
+  private val pm = new PhaseManager(targets)
+
+  override def run(annotations: AnnotationSeq): AnnotationSeq = pm.transform(annotations)
+
+}
+
+object Generator extends StageMain(new RocketChipStage)


Don't use RC Stage here.
I draft a minimized Main object here, I will find some time to finish it.

sequencer · 2023-02-24T19:29:24Z

tests/cosim/elaborate/src/Main.scala

+class cosimConfig extends Config((site, here, up) => {
+  case MonitorsEnabled => false
+  case freechips.rocketchip.tile.XLen => 64
+  case freechips.rocketchip.tile.XLen => 64
+  case freechips.rocketchip.tile.MaxHartIdBits => 4
+  case freechips.rocketchip.tile.MaxHartIdBits => 4
+  case freechips.rocketchip.rocket.PgLevels => if (site(freechips.rocketchip.tile.XLen) == 64) 3 else 2
+  case freechips.rocketchip.rocket.PgLevels => if (site(freechips.rocketchip.tile.XLen) == 64) 3 else 2
+  case RocketTileParamsKey => RocketTileParams(
+    core = RocketCoreParams(mulDiv = Some(MulDivParams(
+      mulUnroll = 8,
+      mulEarlyOut = true,
+      divEarlyOut = true))),
+    dcache = Some(DCacheParams(
+      rowBits = site(SystemBusKey).beatBits,
+      nMSHRs = 0,
+      blockBytes = site(CacheBlockBytes))),
+    icache = Some(ICacheParams(
+      rowBits = site(SystemBusKey).beatBits,
+      blockBytes = site(CacheBlockBytes))))
+  case SystemBusKey => SystemBusParams(
+    beatBytes = site(freechips.rocketchip.tile.XLen) / 8,
+    blockBytes = site(CacheBlockBytes))
+  case DebugModuleKey => None
+})


You should be able to use RC default configs.

sequencer · 2023-02-24T19:30:45Z

tests/cosim/emulator/src/tl_interface.h

+// the following macro helps us to access tl interface by dynamic index
+#define TL_INTERFACE(type, name)                      \
+  [[nodiscard]] inline type &get_tl_##name(VV &top) { \
+    return top.memory_0_##name;                       \


I don't this using . from verilator is a good idea. it is hardcoded. Please switch to VPI or DPI instead.

sequencer · 2023-02-24T19:47:58Z

The only challenge with rocket is the OOO write-back, but that can be solved easily by adding a giant ROB model to the core (used only for cosim with traces, not for implementation).

I personally prefer to maintain it inside the emulator. what do you think?

sequencer · 2023-02-24T19:50:33Z

Support (almost) arbitrary configuration of DUT

I think we only need to support single core. Since multi-core cosim will be bitten by consistency.
And I don't think we should test the topology of SoC, which is another domain of verification.

jerryz123 · 2023-02-24T20:08:49Z

You should use VPI or DPI to make Chisel be able to hook data, like the https://github.com/sequencer/vector/blob/master/tests/elaborate/src/GrandCentral.scala interface with DPI.

I think a verilog blackbox with DPI is sufficient for this.

Tests should fully compatible with the native RocketChip Configs. Spike should read from the RocketCore parameter to config itself.

Agreed.

For the difference between model&impl, there should be a hook to inject to model, since there exist some undefined behavior, which is allowed but behavior differs between model and RTL(this is a little tricky, you can do it in the future PR)

Yes... I just add the overrides in a ad-hoc manner in https://github.com/ucb-bar/chipyard/blob/main/generators/chipyard/src/main/resources/csrc/cospike.cc. API for this would be better, but I agree that is harder to come up with.

I personally don't like the Monitor implementation in the RC, I always believe all verification codes should be injected remotely, like systemverilog bind + Cross Module Reference, or Aspect from Chisel. I strongly suggest switch to DPI directly and use the DPI + GrandCenteral magic.

I think injecting DPI within the module hierarchy makes some flows more difficult. For example, cosim with DUT on FPGA (or firesim). I prefer to keep all DPI outside the DUT.

I personally prefer to maintain it inside the emulator. what do you think?

I guess you mean outside the DUT. Then you have to define a new interface for carrying the OOO WB information. I prefer to keep interfaces from core to outside implementation-agnostic, so that interface would need to be carefully designed as well.
Adding a ROB model seems to be much easier ... I am doing this in another project.

Personally, I also want a ROB model inside Rocket so ucb-bar/chipyard#1323 can work with Rocket.

I think we only need to support single core. Since multi-core cosim will be bitten by consistency.
And I don't think we should test the topology of SoC, which is another domain of verification.

I agree. I think some uncore things ought-to-be captured in cosim ... for example... bootrom contents, dts contents, memory map.

sequencer · 2023-02-24T20:25:39Z

I think injecting DPI within the module hierarchy makes some flows more difficult. For example, cosim with DUT on FPGA (or firesim). I prefer to keep all DPI outside the DUT.

Yes, those DPI should only exist in the Testbench and inspect RTL with XMR. Design codes should never have any DPI.

I guess you mean outside the DUT. Then you have to define a new interface for carrying the OOO WB information.

Not really if we have XMR and can peek any signal we like.

I think some uncore things ought-to-be captured in cosim ... for example... bootrom contents, dts contents, memory map.

Yes, for verification on those components, how about use SpikeTile to test them? i believe it can speed up a lot.

jerryz123 · 2023-02-24T21:11:21Z

Yes, those DPI should only exist in the Testbench and inspect RTL with XMR. Design codes should never have any DPI.

Not really if we have XMR and can peek any signal we like.

I'm not sure restricting DPI to harness and using XMR to bring signals out solves every use case. For example, with DUT on FPGA, you need to bring the trace signals out of the core to the top level, and then interface it with the host somehow.
Having a uniform traced-instruction port also enables some other interesting use-cases, beyond cosim. You can do analysis of the trace, or even reconstruct architectural state at an arbitrary point in the trace.

I agree for many other use cases, XMR is preferable to punching everything out. In this case, specifically, extending the existing TracedInstruction infrastructure is more useful than XMR.

Yes, for verification on those components, how about use SpikeTile to test them? i believe it can speed up a lot.

What I mean is that the spike cosim should have the same memory map/bootrom as the DUT. Spike natively generates its own bootrom/dts that will differ from the target.

I want regressions of Linux Boot in RTL sim in Chipyard with SpikeTile. Right now, giant memsets and memcpys in boot really slow the system down, since the L2/buses are simulated on every L1 cache miss.

sequencer · 2023-02-25T06:05:23Z

I'm not sure restricting DPI to harness and using XMR to bring signals out solves every use case. For example, with DUT on FPGA, you need to bring the trace signals out of the core to the top level, and then interface it with the host somehow.

Yep, agree the case on FPGA is quite useful, I think in that case, we should migrate the current custom trace function to riscv-trace-spec, so that we can use the software stack.

What I mean is that the spike cosim should have the same memory map/bootrom as the DUT. Spike natively generates its own bootrom/dts that will differ from the target.

Yeah, I think that is a bad design, I personally prefer using processor_t, and disable I$ and D$.

I want regressions of Linux Boot in RTL sim in Chipyard with SpikeTile. Right now, giant memsets and memcpys in boot really slow the system down, since the L2/buses are simulated on every L1 cache miss.

Maybe a custom linking script while maintaining a memory in emulator may help.

jerryz123 · 2023-02-25T07:35:39Z

Yep, agree the case on FPGA is quite useful, I think in that case, we should migrate the current custom trace function to riscv-trace-spec, so that we can use the software stack.

It should support both. A full trace for cases where spec-compliance is not necessary, and a spec-compliant trace. In fact, this is already the case. TraceCoreInterface should provide the spec-compliant trace, while TracedInstruction is instruction-level trace.

Yeah, I think that is a bad design, I personally prefer using processor_t, and disable I$ and D$.

Why do you think it is bad design? Cosim should test I$ and D$ as well... especially since D$ microarchitecture is quite complicated, and implements things like atomics, lr/sc, etc.

That approach is simple, and works ok. It requires no special harness, and does not rely on any details of the DUT microarchitecture. Thus it is easy to maintain, and microarchitecture-agnostic.

sequencer · 2023-02-25T11:53:32Z

Why do you think it is bad design?
I mean the spike cache behavior model is quite strange, we should disable that.

To make it more clear: I think we should only use process_t to construct a spike core and use SoC RTL directly.

jerryz123 · 2023-02-25T20:09:16Z

Why do you think it is bad design?

I mean the spike cache behavior model is quite strange, we should disable that.

To make it more clear: I think we should only use process_t to construct a spike core and use SoC RTL directly.

What is the advantage of this approach over having spike model the entire system.

sequencer · 2023-02-26T04:13:13Z

What is the advantage of this approach over having spike model the entire system.

Basically I think the main purpose to introduce SpikeTile is testing the SoC, especially cache system and periphery.

In this case, we just put text segment to C++, without creating IF transaction to RTL. This can avoid the I$ issue you described. And the SpikeTile will be eventually used as the Load/Store generator for generating transaction, and L1 Cache emulator for Cache system testing.

jerryz123 · 2023-02-26T04:15:29Z

I think two separate concerns are getting mixed together. Integrating SpikeTile can be left to another discussion. I'm more concerned at the moment with the design of spike-cosim.

I believe spike-cosim should be performed by having spike model the entire SoC, as is done in Chipyard's implementation. What are the disadvantages you see with that approach?

sequencer · 2023-02-26T04:23:43Z

I believe spike-cosim should be performed by having spike model the entire SoC

This is basically what I'm concerning, I think the cosim framework in this PR should only for RocketCore, Eventually we should split Rocket Core out from RC, while still be able to test it standalone.
SoC level cosim will force Rocket Core continue depending on the RocketChip. My approach is emulating the necessary off-core components in the C++ to remove the dependency to RocketChip.

jerryz123 · 2023-02-26T04:39:23Z

Ok, to summarize, in this proposed cosim design:

REF uses a c++ uncore and the processor_t as the core
DUT uses the same c++ uncore as REF, and the RocketCore RTL as the core

I feel like this approach introduces a lot of unnecessary complexity, you have to write a interface between spike's bus architecture model, and the RocketCore's tilelink interface, which implies writing a TileLink coherence manager model in C++. On the other hand, using RC+Rocket as the DUT enables a trivial (~200 LoC) implementation of cosim.

Standalone cosim of Rocket should be done in the split-out rocket repo, if it is truly expected to be independent of rocket-chip.

sequencer · 2023-02-26T08:54:13Z

I feel like this approach introduces a lot of unnecessary complexity, you have to write a interface between spike's bus architecture model, and the RocketCore's tilelink interface, which implies writing a TileLink coherence manager model in C++. On the other hand, using RC+Rocket as the DUT enables a trivial (~200 LoC) implementation of cosim.

Yes, I agree that's pain. However it's necessary for splitting RocketCore out. The test framework will only be used for the core test. We will continue keep the current test framework.

This test framework will only work for the rocket core, rather than the entire SoC, and will be eventually split out with the rocket core.

jerryz123 · 2023-02-27T01:33:27Z

Ok. In that case I would like to mark this PR DO_NOT_MERGE.

I do not believe we should merge things that will be split out later.

sequencer · 2023-02-27T01:40:41Z

Ok. In that case I would like to mark this PR DO_NOT_MERGE. I do not believe we should merge things that will be split out later.

So how do you think we can start to split? creating another repo or branch?

jerryz123 · 2023-02-27T01:44:37Z

I think you can create another repo, then just push the latest dev branch into it. Then someone can work on modifying the version of rocket in that repo to be standalone.

We should not lose commit history.

sequencer · 2023-02-27T01:47:11Z

I think you can create another repo, then just push the latest dev branch into it.

Sure! chipsalliance/rocket has been created. @midnighter95 please submit PR there.

midnighter95 added 4 commits February 24, 2023 18:25

add draft

0f2de5b

delete ResourceBinding in tile

8670dc0

update elaborate src

ce190c9

update myelaborate src

dc2a4d6

jerryz123 self-requested a review February 24, 2023 18:55

jerryz123 requested changes Feb 24, 2023

View reviewed changes

sequencer requested changes Feb 24, 2023

View reviewed changes

restore rocket tile rtl

6dc7f5f

midnighter95 force-pushed the cosim_dev branch from b9ebec7 to 6dc7f5f Compare February 25, 2023 03:00

sequencer closed this Feb 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rocket cosim framework #3271

Rocket cosim framework #3271

midnighter95 commented Feb 24, 2023

jerryz123 left a comment

sequencer left a comment

sequencer Feb 24, 2023

sequencer Feb 24, 2023

sequencer Feb 24, 2023

sequencer Feb 24, 2023

sequencer Feb 24, 2023

sequencer Feb 24, 2023

sequencer commented Feb 24, 2023

sequencer commented Feb 24, 2023

jerryz123 commented Feb 24, 2023

sequencer commented Feb 24, 2023

jerryz123 commented Feb 24, 2023

sequencer commented Feb 25, 2023

jerryz123 commented Feb 25, 2023

sequencer commented Feb 25, 2023 •

edited

Loading

jerryz123 commented Feb 25, 2023

sequencer commented Feb 26, 2023

jerryz123 commented Feb 26, 2023

sequencer commented Feb 26, 2023 •

edited

Loading

jerryz123 commented Feb 26, 2023 •

edited

Loading

sequencer commented Feb 26, 2023

jerryz123 commented Feb 27, 2023

sequencer commented Feb 27, 2023

jerryz123 commented Feb 27, 2023

sequencer commented Feb 27, 2023 •

edited

Loading

Rocket cosim framework #3271

Rocket cosim framework #3271

Conversation

midnighter95 commented Feb 24, 2023

Framework Outline

Features

Uarch Coverpoint

Completed tests:

Test Flow

jerryz123 left a comment

Choose a reason for hiding this comment

sequencer left a comment

Choose a reason for hiding this comment

sequencer Feb 24, 2023

Choose a reason for hiding this comment

sequencer Feb 24, 2023

Choose a reason for hiding this comment

sequencer Feb 24, 2023

Choose a reason for hiding this comment

sequencer Feb 24, 2023

Choose a reason for hiding this comment

sequencer Feb 24, 2023

Choose a reason for hiding this comment

sequencer Feb 24, 2023

Choose a reason for hiding this comment

sequencer commented Feb 24, 2023

sequencer commented Feb 24, 2023

jerryz123 commented Feb 24, 2023

sequencer commented Feb 24, 2023

jerryz123 commented Feb 24, 2023

sequencer commented Feb 25, 2023

jerryz123 commented Feb 25, 2023

sequencer commented Feb 25, 2023 • edited Loading

jerryz123 commented Feb 25, 2023

sequencer commented Feb 26, 2023

jerryz123 commented Feb 26, 2023

sequencer commented Feb 26, 2023 • edited Loading

jerryz123 commented Feb 26, 2023 • edited Loading

sequencer commented Feb 26, 2023

jerryz123 commented Feb 27, 2023

sequencer commented Feb 27, 2023

jerryz123 commented Feb 27, 2023

sequencer commented Feb 27, 2023 • edited Loading

sequencer commented Feb 25, 2023 •

edited

Loading

sequencer commented Feb 26, 2023 •

edited

Loading

jerryz123 commented Feb 26, 2023 •

edited

Loading

sequencer commented Feb 27, 2023 •

edited

Loading