Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rocket cosim framework #3271

Closed
wants to merge 5 commits into from
Closed

Conversation

midnighter95
Copy link
Contributor

This PR provides a Rocket-Spike cosim simulation framework. A standalone rocket core is extracted with its memory port exposed to the top-level and is compiled to c++ model by Verilator . The framework drives rocket model to run testcase while spike runs the same test at the same time. During the test, the rocket result will be compared to the spike for each instruction.

Framework Outline

  • control the overall test flow.
  • stimulate Spike with testcase, record the result.
  • stimulate rocket model.
  • deal with all the memory access from the core (acts as the memory of rocket core)
  • peek rocket result and compare it with Spike.

Features

  • standalone Rocket core with decouple bus interfaces expose to users.
  • use spike as reference model
  • capture architecture events, for each cycles, comparing event with architecture events generated from spike
  • fast binary load and start-up

Uarch Coverpoint

  • Scalar register file write
  • Float register file write
  • CSR write

Completed tests:

  • rv64mi-p except csr and breakpoint test
  • rv64mzicbo
  • rv64ssvnapot
  • rv64si
  • rv64ua
  • rv64uc
  • rv64ud
  • rv64uf
  • rv64ui
  • rv64um
  • rv64uzfh
  • rv64um

Test Flow

nix develop -c make riscvtests

@jerryz123 jerryz123 self-requested a review February 24, 2023 18:55
Copy link
Contributor

@jerryz123 jerryz123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you seen ucb-bar/chipyard#1323?

The model for cosim should be

  • Support (almost) arbitrary configuration of DUT
  • Match DUT arch spec with spike configuration
  • For every instruction committed by DUT, compare against single-stepped spike

The only challenge with rocket is the OOO write-back, but that can be solved easily by adding a giant ROB model to the core (used only for cosim with traces, not for implementation).

Copy link
Member

@sequencer sequencer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically this framework was the initial version from my vector project. really appreciate @midnighter95 for trying to migrate it to RC.

For RC, we need a more carefully designed and general test framework.

  1. You should use VPI or DPI to make Chisel be able to hook data, like the https://github.com/sequencer/vector/blob/master/tests/elaborate/src/GrandCentral.scala interface with DPI.
  2. Tests should fully compatible with the native RocketChip Configs. Spike should read from the RocketCore parameter to config itself.
  3. For the difference between model&impl, there should be a hook to inject to model, since there exist some undefined behavior, which is allowed but behavior differs between model and RTL(this is a little tricky, you can do it in the future PR)
  4. I personally don't like the Monitor implementation in the RC, I always believe all verification codes should be injected remotely, like systemverilog bind + Cross Module Reference, or Aspect from Chisel. I strongly suggest switch to DPI directly and use the DPI + GrandCenteral magic.

Anyway, thanks for this work, I understand this PR is hard to implement, but this should be the key feature for the future RocketCore CI(far more faster than the in-tree version).

Comment on lines +26 to +28
libargs glog fmt zlib
gnused coreutils gnugrep which
parallel protobuf antlr4 numactl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only need to add necessary dependencies.

src/main/scala/tile/Interrupts.scala Outdated Show resolved Hide resolved
src/main/scala/tile/Interrupts.scala Outdated Show resolved Hide resolved
src/main/scala/tile/Interrupts.scala Outdated Show resolved Hide resolved
src/main/scala/tile/RocketTile.scala Outdated Show resolved Hide resolved
Comment on lines +19 to +78
val masterNode = TLManagerNode(Seq(TLSlavePortParameters.v1(
Seq(TLSlaveParameters.v1(
address = List(AddressSet(0x0, 0xffffffffL)),
regionType = RegionType.UNCACHED,
executable = true,
supportsGet = TransferSizes(1, 64),
supportsAcquireT = TransferSizes(1, 64),
supportsAcquireB = TransferSizes(1, 64),
supportsPutPartial = TransferSizes(1, 64),
supportsPutFull = TransferSizes(1, 64),
supportsLogical = TransferSizes(1, 64),
supportsArithmetic = TransferSizes(1, 64),
fifoId = Some(0))),
beatBytes = 8,
endSinkId = 4,
minLatency = 1
)))
masterNode :=* rocketTile.masterNode
val memory = InModuleBody {
masterNode.makeIOs()
}

val intNode = IntSourceNode(IntSourcePortSimple())
rocketTile.intInwardNode :=* intNode
val intIn = InModuleBody {
intNode.makeIOs()
}

val haltNode = IntSinkNode(IntSinkPortSimple())
haltNode :=* rocketTile.haltNode
val haltOut = InModuleBody {
haltNode.makeIOs()
}

val ceaseNode = IntSinkNode(IntSinkPortSimple())
ceaseNode :=* rocketTile.ceaseNode
val ceaseOut = InModuleBody {
ceaseNode.makeIOs()
}

val wfiNode = IntSinkNode(IntSinkPortSimple())
wfiNode :=* rocketTile.wfiNode
val wfiOut = InModuleBody {
wfiNode.makeIOs()
}
val resetVectorNode = BundleBridgeSource(() => UInt(32.W))
rocketTile.resetVectorNode := resetVectorNode
val resetVector = InModuleBody {
resetVectorNode.makeIO()
}
val hartidNode = BundleBridgeSource(() => UInt(4.W))
rocketTile.hartIdNode := hartidNode
InModuleBody {
hartidNode.bundle := 0.U
}
val nmiNode = BundleBridgeSource(Some(() => new NMI(32)))
rocketTile.nmiNode := nmiNode
val nmi = InModuleBody {
nmiNode.makeIO()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These logic is already implemented in the InstantiatesTiles, please use it.

Comment on lines +80 to +96
chisel3.experimental.DataMirror.fullModulePorts(
// instantiate the LazyModule
Module(ldut.module)
).filterNot(_._2.isInstanceOf[Aggregate]).foreach { case (name, ele) =>
if (!(name == "clock" || name == "reset")) {
chisel3.experimental.DataMirror.directionOf(ele) match {
case ActualDirection.Output =>
val io = IO(Output(chiselTypeOf(ele))).suggestName(name)
println(s"output $name")
io := ele
case ActualDirection.Input =>
val io = IO(Input(chiselTypeOf(ele))).suggestName(name)
println(s"input $name")
ele := io
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC this hack should be fixed.

Comment on lines +29 to +64
private[freechips] final class RocketChiselStage extends ChiselStage {

override val targets = Seq(
Dependency[chisel3.stage.phases.Checks],
Dependency[chisel3.stage.phases.Elaborate],
Dependency[freechips.rocketchip.stage.phases.GenerateROMs],
Dependency[chisel3.stage.phases.AddImplicitOutputFile],
Dependency[chisel3.stage.phases.AddImplicitOutputAnnotationFile],
Dependency[chisel3.stage.phases.MaybeAspectPhase],
Dependency[chisel3.stage.phases.Emitter],
Dependency[chisel3.stage.phases.Convert]
)

}

class RocketChipStage extends Stage with PreservesAll[Phase] {

override val shell = new Shell("rocket-chip") with RocketChipCli with ChiselCli with FirrtlCli
val targets: Seq[PhaseDependency] = Seq(
Dependency[freechips.rocketchip.stage.phases.Checks],
Dependency[freechips.rocketchip.stage.phases.TransformAnnotations],
Dependency[freechips.rocketchip.stage.phases.PreElaboration],
Dependency[RocketChiselStage],
Dependency[freechips.rocketchip.stage.phases.GenerateFirrtlAnnos],
Dependency[freechips.rocketchip.stage.phases.AddDefaultTests],
Dependency[freechips.rocketchip.stage.phases.GenerateTestSuiteMakefrags],
Dependency[freechips.rocketchip.stage.phases.GenerateArtefacts]
)

private val pm = new PhaseManager(targets)

override def run(annotations: AnnotationSeq): AnnotationSeq = pm.transform(annotations)

}

object Generator extends StageMain(new RocketChipStage)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use RC Stage here.
I draft a minimized Main object here, I will find some time to finish it.

Comment on lines +69 to +93
class cosimConfig extends Config((site, here, up) => {
case MonitorsEnabled => false
case freechips.rocketchip.tile.XLen => 64
case freechips.rocketchip.tile.XLen => 64
case freechips.rocketchip.tile.MaxHartIdBits => 4
case freechips.rocketchip.tile.MaxHartIdBits => 4
case freechips.rocketchip.rocket.PgLevels => if (site(freechips.rocketchip.tile.XLen) == 64) 3 else 2
case freechips.rocketchip.rocket.PgLevels => if (site(freechips.rocketchip.tile.XLen) == 64) 3 else 2
case RocketTileParamsKey => RocketTileParams(
core = RocketCoreParams(mulDiv = Some(MulDivParams(
mulUnroll = 8,
mulEarlyOut = true,
divEarlyOut = true))),
dcache = Some(DCacheParams(
rowBits = site(SystemBusKey).beatBits,
nMSHRs = 0,
blockBytes = site(CacheBlockBytes))),
icache = Some(ICacheParams(
rowBits = site(SystemBusKey).beatBits,
blockBytes = site(CacheBlockBytes))))
case SystemBusKey => SystemBusParams(
beatBytes = site(freechips.rocketchip.tile.XLen) / 8,
blockBytes = site(CacheBlockBytes))
case DebugModuleKey => None
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to use RC default configs.

// the following macro helps us to access tl interface by dynamic index
#define TL_INTERFACE(type, name) \
[[nodiscard]] inline type &get_tl_##name(VV &top) { \
return top.memory_0_##name; \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't this using . from verilator is a good idea. it is hardcoded. Please switch to VPI or DPI instead.

@sequencer
Copy link
Member

The only challenge with rocket is the OOO write-back, but that can be solved easily by adding a giant ROB model to the core (used only for cosim with traces, not for implementation).

I personally prefer to maintain it inside the emulator. what do you think?

@sequencer
Copy link
Member

Support (almost) arbitrary configuration of DUT

I think we only need to support single core. Since multi-core cosim will be bitten by consistency.
And I don't think we should test the topology of SoC, which is another domain of verification.

@jerryz123
Copy link
Contributor

I think a verilog blackbox with DPI is sufficient for this.

  • Tests should fully compatible with the native RocketChip Configs. Spike should read from the RocketCore parameter to config itself.

Agreed.

  • For the difference between model&impl, there should be a hook to inject to model, since there exist some undefined behavior, which is allowed but behavior differs between model and RTL(this is a little tricky, you can do it in the future PR)

Yes... I just add the overrides in a ad-hoc manner in https://github.com/ucb-bar/chipyard/blob/main/generators/chipyard/src/main/resources/csrc/cospike.cc. API for this would be better, but I agree that is harder to come up with.

  • I personally don't like the Monitor implementation in the RC, I always believe all verification codes should be injected remotely, like systemverilog bind + Cross Module Reference, or Aspect from Chisel. I strongly suggest switch to DPI directly and use the DPI + GrandCenteral magic.

I think injecting DPI within the module hierarchy makes some flows more difficult. For example, cosim with DUT on FPGA (or firesim). I prefer to keep all DPI outside the DUT.

I personally prefer to maintain it inside the emulator. what do you think?

I guess you mean outside the DUT. Then you have to define a new interface for carrying the OOO WB information. I prefer to keep interfaces from core to outside implementation-agnostic, so that interface would need to be carefully designed as well.
Adding a ROB model seems to be much easier ... I am doing this in another project.

Personally, I also want a ROB model inside Rocket so ucb-bar/chipyard#1323 can work with Rocket.

I think we only need to support single core. Since multi-core cosim will be bitten by consistency.
And I don't think we should test the topology of SoC, which is another domain of verification.

I agree. I think some uncore things ought-to-be captured in cosim ... for example... bootrom contents, dts contents, memory map.

@sequencer
Copy link
Member

I think injecting DPI within the module hierarchy makes some flows more difficult. For example, cosim with DUT on FPGA (or firesim). I prefer to keep all DPI outside the DUT.

Yes, those DPI should only exist in the Testbench and inspect RTL with XMR. Design codes should never have any DPI.

I guess you mean outside the DUT. Then you have to define a new interface for carrying the OOO WB information.

Not really if we have XMR and can peek any signal we like.

I think some uncore things ought-to-be captured in cosim ... for example... bootrom contents, dts contents, memory map.

Yes, for verification on those components, how about use SpikeTile to test them? i believe it can speed up a lot.

@jerryz123
Copy link
Contributor

Yes, those DPI should only exist in the Testbench and inspect RTL with XMR. Design codes should never have any DPI.

Not really if we have XMR and can peek any signal we like.

I'm not sure restricting DPI to harness and using XMR to bring signals out solves every use case. For example, with DUT on FPGA, you need to bring the trace signals out of the core to the top level, and then interface it with the host somehow.
Having a uniform traced-instruction port also enables some other interesting use-cases, beyond cosim. You can do analysis of the trace, or even reconstruct architectural state at an arbitrary point in the trace.

I agree for many other use cases, XMR is preferable to punching everything out. In this case, specifically, extending the existing TracedInstruction infrastructure is more useful than XMR.

Yes, for verification on those components, how about use SpikeTile to test them? i believe it can speed up a lot.

What I mean is that the spike cosim should have the same memory map/bootrom as the DUT. Spike natively generates its own bootrom/dts that will differ from the target.

I want regressions of Linux Boot in RTL sim in Chipyard with SpikeTile. Right now, giant memsets and memcpys in boot really slow the system down, since the L2/buses are simulated on every L1 cache miss.

@sequencer
Copy link
Member

I'm not sure restricting DPI to harness and using XMR to bring signals out solves every use case. For example, with DUT on FPGA, you need to bring the trace signals out of the core to the top level, and then interface it with the host somehow.

Yep, agree the case on FPGA is quite useful, I think in that case, we should migrate the current custom trace function to riscv-trace-spec, so that we can use the software stack.

What I mean is that the spike cosim should have the same memory map/bootrom as the DUT. Spike natively generates its own bootrom/dts that will differ from the target.

Yeah, I think that is a bad design, I personally prefer using processor_t, and disable I$ and D$.

I want regressions of Linux Boot in RTL sim in Chipyard with SpikeTile. Right now, giant memsets and memcpys in boot really slow the system down, since the L2/buses are simulated on every L1 cache miss.

Maybe a custom linking script while maintaining a memory in emulator may help.

@jerryz123
Copy link
Contributor

Yep, agree the case on FPGA is quite useful, I think in that case, we should migrate the current custom trace function to riscv-trace-spec, so that we can use the software stack.

It should support both. A full trace for cases where spec-compliance is not necessary, and a spec-compliant trace. In fact, this is already the case. TraceCoreInterface should provide the spec-compliant trace, while TracedInstruction is instruction-level trace.

Yeah, I think that is a bad design, I personally prefer using processor_t, and disable I$ and D$.

Why do you think it is bad design? Cosim should test I$ and D$ as well... especially since D$ microarchitecture is quite complicated, and implements things like atomics, lr/sc, etc.

That approach is simple, and works ok. It requires no special harness, and does not rely on any details of the DUT microarchitecture. Thus it is easy to maintain, and microarchitecture-agnostic.

@sequencer
Copy link
Member

sequencer commented Feb 25, 2023

Why do you think it is bad design?
I mean the spike cache behavior model is quite strange, we should disable that.

To make it more clear: I think we should only use process_t to construct a spike core and use SoC RTL directly.

@jerryz123
Copy link
Contributor

Why do you think it is bad design?

I mean the spike cache behavior model is quite strange, we should disable that.

To make it more clear: I think we should only use process_t to construct a spike core and use SoC RTL directly.

What is the advantage of this approach over having spike model the entire system.

@sequencer
Copy link
Member

What is the advantage of this approach over having spike model the entire system.

Basically I think the main purpose to introduce SpikeTile is testing the SoC, especially cache system and periphery.

In this case, we just put text segment to C++, without creating IF transaction to RTL. This can avoid the I$ issue you described. And the SpikeTile will be eventually used as the Load/Store generator for generating transaction, and L1 Cache emulator for Cache system testing.

@jerryz123
Copy link
Contributor

I think two separate concerns are getting mixed together. Integrating SpikeTile can be left to another discussion. I'm more concerned at the moment with the design of spike-cosim.

I believe spike-cosim should be performed by having spike model the entire SoC, as is done in Chipyard's implementation. What are the disadvantages you see with that approach?

@sequencer
Copy link
Member

sequencer commented Feb 26, 2023

I believe spike-cosim should be performed by having spike model the entire SoC

This is basically what I'm concerning, I think the cosim framework in this PR should only for RocketCore, Eventually we should split Rocket Core out from RC, while still be able to test it standalone.
SoC level cosim will force Rocket Core continue depending on the RocketChip. My approach is emulating the necessary off-core components in the C++ to remove the dependency to RocketChip.

@jerryz123
Copy link
Contributor

jerryz123 commented Feb 26, 2023

Ok, to summarize, in this proposed cosim design:

  • REF uses a c++ uncore and the processor_t as the core
  • DUT uses the same c++ uncore as REF, and the RocketCore RTL as the core

I feel like this approach introduces a lot of unnecessary complexity, you have to write a interface between spike's bus architecture model, and the RocketCore's tilelink interface, which implies writing a TileLink coherence manager model in C++. On the other hand, using RC+Rocket as the DUT enables a trivial (~200 LoC) implementation of cosim.

Standalone cosim of Rocket should be done in the split-out rocket repo, if it is truly expected to be independent of rocket-chip.

@sequencer
Copy link
Member

I feel like this approach introduces a lot of unnecessary complexity, you have to write a interface between spike's bus architecture model, and the RocketCore's tilelink interface, which implies writing a TileLink coherence manager model in C++. On the other hand, using RC+Rocket as the DUT enables a trivial (~200 LoC) implementation of cosim.

Yes, I agree that's pain. However it's necessary for splitting RocketCore out. The test framework will only be used for the core test. We will continue keep the current test framework.

This test framework will only work for the rocket core, rather than the entire SoC, and will be eventually split out with the rocket core.

@jerryz123
Copy link
Contributor

Ok. In that case I would like to mark this PR DO_NOT_MERGE.

I do not believe we should merge things that will be split out later.

@sequencer
Copy link
Member

Ok. In that case I would like to mark this PR DO_NOT_MERGE. I do not believe we should merge things that will be split out later.

So how do you think we can start to split? creating another repo or branch?

@jerryz123
Copy link
Contributor

I think you can create another repo, then just push the latest dev branch into it. Then someone can work on modifying the version of rocket in that repo to be standalone.

We should not lose commit history.

@sequencer
Copy link
Member

sequencer commented Feb 27, 2023

I think you can create another repo, then just push the latest dev branch into it.

Sure! chipsalliance/rocket has been created. @midnighter95 please submit PR there.

@sequencer sequencer closed this Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants