-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rocket cosim framework #3271
Rocket cosim framework #3271
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you seen ucb-bar/chipyard#1323?
The model for cosim should be
- Support (almost) arbitrary configuration of DUT
- Match DUT arch spec with spike configuration
- For every instruction committed by DUT, compare against single-stepped spike
The only challenge with rocket is the OOO write-back, but that can be solved easily by adding a giant ROB model to the core (used only for cosim with traces, not for implementation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically this framework was the initial version from my vector project. really appreciate @midnighter95 for trying to migrate it to RC.
For RC, we need a more carefully designed and general test framework.
- You should use VPI or DPI to make Chisel be able to hook data, like the https://github.com/sequencer/vector/blob/master/tests/elaborate/src/GrandCentral.scala interface with DPI.
- Tests should fully compatible with the native RocketChip Configs. Spike should read from the RocketCore parameter to config itself.
- For the difference between model&impl, there should be a hook to inject to model, since there exist some undefined behavior, which is allowed but behavior differs between model and RTL(this is a little tricky, you can do it in the future PR)
- I personally don't like the
Monitor
implementation in the RC, I always believe all verification codes should be injected remotely, like systemverilogbind
+ Cross Module Reference, or Aspect from Chisel. I strongly suggest switch to DPI directly and use the DPI + GrandCenteral magic.
Anyway, thanks for this work, I understand this PR is hard to implement, but this should be the key feature for the future RocketCore CI(far more faster than the in-tree version).
libargs glog fmt zlib | ||
gnused coreutils gnugrep which | ||
parallel protobuf antlr4 numactl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only need to add necessary dependencies.
val masterNode = TLManagerNode(Seq(TLSlavePortParameters.v1( | ||
Seq(TLSlaveParameters.v1( | ||
address = List(AddressSet(0x0, 0xffffffffL)), | ||
regionType = RegionType.UNCACHED, | ||
executable = true, | ||
supportsGet = TransferSizes(1, 64), | ||
supportsAcquireT = TransferSizes(1, 64), | ||
supportsAcquireB = TransferSizes(1, 64), | ||
supportsPutPartial = TransferSizes(1, 64), | ||
supportsPutFull = TransferSizes(1, 64), | ||
supportsLogical = TransferSizes(1, 64), | ||
supportsArithmetic = TransferSizes(1, 64), | ||
fifoId = Some(0))), | ||
beatBytes = 8, | ||
endSinkId = 4, | ||
minLatency = 1 | ||
))) | ||
masterNode :=* rocketTile.masterNode | ||
val memory = InModuleBody { | ||
masterNode.makeIOs() | ||
} | ||
|
||
val intNode = IntSourceNode(IntSourcePortSimple()) | ||
rocketTile.intInwardNode :=* intNode | ||
val intIn = InModuleBody { | ||
intNode.makeIOs() | ||
} | ||
|
||
val haltNode = IntSinkNode(IntSinkPortSimple()) | ||
haltNode :=* rocketTile.haltNode | ||
val haltOut = InModuleBody { | ||
haltNode.makeIOs() | ||
} | ||
|
||
val ceaseNode = IntSinkNode(IntSinkPortSimple()) | ||
ceaseNode :=* rocketTile.ceaseNode | ||
val ceaseOut = InModuleBody { | ||
ceaseNode.makeIOs() | ||
} | ||
|
||
val wfiNode = IntSinkNode(IntSinkPortSimple()) | ||
wfiNode :=* rocketTile.wfiNode | ||
val wfiOut = InModuleBody { | ||
wfiNode.makeIOs() | ||
} | ||
val resetVectorNode = BundleBridgeSource(() => UInt(32.W)) | ||
rocketTile.resetVectorNode := resetVectorNode | ||
val resetVector = InModuleBody { | ||
resetVectorNode.makeIO() | ||
} | ||
val hartidNode = BundleBridgeSource(() => UInt(4.W)) | ||
rocketTile.hartIdNode := hartidNode | ||
InModuleBody { | ||
hartidNode.bundle := 0.U | ||
} | ||
val nmiNode = BundleBridgeSource(Some(() => new NMI(32))) | ||
rocketTile.nmiNode := nmiNode | ||
val nmi = InModuleBody { | ||
nmiNode.makeIO() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These logic is already implemented in the InstantiatesTiles
, please use it.
chisel3.experimental.DataMirror.fullModulePorts( | ||
// instantiate the LazyModule | ||
Module(ldut.module) | ||
).filterNot(_._2.isInstanceOf[Aggregate]).foreach { case (name, ele) => | ||
if (!(name == "clock" || name == "reset")) { | ||
chisel3.experimental.DataMirror.directionOf(ele) match { | ||
case ActualDirection.Output => | ||
val io = IO(Output(chiselTypeOf(ele))).suggestName(name) | ||
println(s"output $name") | ||
io := ele | ||
case ActualDirection.Input => | ||
val io = IO(Input(chiselTypeOf(ele))).suggestName(name) | ||
println(s"input $name") | ||
ele := io | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC this hack should be fixed.
private[freechips] final class RocketChiselStage extends ChiselStage { | ||
|
||
override val targets = Seq( | ||
Dependency[chisel3.stage.phases.Checks], | ||
Dependency[chisel3.stage.phases.Elaborate], | ||
Dependency[freechips.rocketchip.stage.phases.GenerateROMs], | ||
Dependency[chisel3.stage.phases.AddImplicitOutputFile], | ||
Dependency[chisel3.stage.phases.AddImplicitOutputAnnotationFile], | ||
Dependency[chisel3.stage.phases.MaybeAspectPhase], | ||
Dependency[chisel3.stage.phases.Emitter], | ||
Dependency[chisel3.stage.phases.Convert] | ||
) | ||
|
||
} | ||
|
||
class RocketChipStage extends Stage with PreservesAll[Phase] { | ||
|
||
override val shell = new Shell("rocket-chip") with RocketChipCli with ChiselCli with FirrtlCli | ||
val targets: Seq[PhaseDependency] = Seq( | ||
Dependency[freechips.rocketchip.stage.phases.Checks], | ||
Dependency[freechips.rocketchip.stage.phases.TransformAnnotations], | ||
Dependency[freechips.rocketchip.stage.phases.PreElaboration], | ||
Dependency[RocketChiselStage], | ||
Dependency[freechips.rocketchip.stage.phases.GenerateFirrtlAnnos], | ||
Dependency[freechips.rocketchip.stage.phases.AddDefaultTests], | ||
Dependency[freechips.rocketchip.stage.phases.GenerateTestSuiteMakefrags], | ||
Dependency[freechips.rocketchip.stage.phases.GenerateArtefacts] | ||
) | ||
|
||
private val pm = new PhaseManager(targets) | ||
|
||
override def run(annotations: AnnotationSeq): AnnotationSeq = pm.transform(annotations) | ||
|
||
} | ||
|
||
object Generator extends StageMain(new RocketChipStage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use RC Stage here.
I draft a minimized Main
object here, I will find some time to finish it.
class cosimConfig extends Config((site, here, up) => { | ||
case MonitorsEnabled => false | ||
case freechips.rocketchip.tile.XLen => 64 | ||
case freechips.rocketchip.tile.XLen => 64 | ||
case freechips.rocketchip.tile.MaxHartIdBits => 4 | ||
case freechips.rocketchip.tile.MaxHartIdBits => 4 | ||
case freechips.rocketchip.rocket.PgLevels => if (site(freechips.rocketchip.tile.XLen) == 64) 3 else 2 | ||
case freechips.rocketchip.rocket.PgLevels => if (site(freechips.rocketchip.tile.XLen) == 64) 3 else 2 | ||
case RocketTileParamsKey => RocketTileParams( | ||
core = RocketCoreParams(mulDiv = Some(MulDivParams( | ||
mulUnroll = 8, | ||
mulEarlyOut = true, | ||
divEarlyOut = true))), | ||
dcache = Some(DCacheParams( | ||
rowBits = site(SystemBusKey).beatBits, | ||
nMSHRs = 0, | ||
blockBytes = site(CacheBlockBytes))), | ||
icache = Some(ICacheParams( | ||
rowBits = site(SystemBusKey).beatBits, | ||
blockBytes = site(CacheBlockBytes)))) | ||
case SystemBusKey => SystemBusParams( | ||
beatBytes = site(freechips.rocketchip.tile.XLen) / 8, | ||
blockBytes = site(CacheBlockBytes)) | ||
case DebugModuleKey => None | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to use RC default configs.
// the following macro helps us to access tl interface by dynamic index | ||
#define TL_INTERFACE(type, name) \ | ||
[[nodiscard]] inline type &get_tl_##name(VV &top) { \ | ||
return top.memory_0_##name; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't this using .
from verilator is a good idea. it is hardcoded. Please switch to VPI or DPI instead.
I personally prefer to maintain it inside the emulator. what do you think? |
I think we only need to support single core. Since multi-core cosim will be bitten by consistency. |
I think a verilog blackbox with DPI is sufficient for this.
Agreed.
Yes... I just add the overrides in a ad-hoc manner in https://github.com/ucb-bar/chipyard/blob/main/generators/chipyard/src/main/resources/csrc/cospike.cc. API for this would be better, but I agree that is harder to come up with.
I think injecting DPI within the module hierarchy makes some flows more difficult. For example, cosim with DUT on FPGA (or firesim). I prefer to keep all DPI outside the DUT.
I guess you mean outside the DUT. Then you have to define a new interface for carrying the OOO WB information. I prefer to keep interfaces from core to outside implementation-agnostic, so that interface would need to be carefully designed as well. Personally, I also want a ROB model inside Rocket so ucb-bar/chipyard#1323 can work with Rocket.
I agree. I think some uncore things ought-to-be captured in cosim ... for example... bootrom contents, dts contents, memory map. |
Yes, those DPI should only exist in the Testbench and inspect RTL with XMR. Design codes should never have any DPI.
Not really if we have XMR and can peek any signal we like.
Yes, for verification on those components, how about use |
I'm not sure restricting DPI to harness and using XMR to bring signals out solves every use case. For example, with DUT on FPGA, you need to bring the trace signals out of the core to the top level, and then interface it with the host somehow. I agree for many other use cases, XMR is preferable to punching everything out. In this case, specifically, extending the existing TracedInstruction infrastructure is more useful than XMR.
What I mean is that the spike cosim should have the same memory map/bootrom as the DUT. Spike natively generates its own bootrom/dts that will differ from the target. I want regressions of Linux Boot in RTL sim in Chipyard with SpikeTile. Right now, giant memsets and memcpys in boot really slow the system down, since the L2/buses are simulated on every L1 cache miss. |
b9ebec7
to
6dc7f5f
Compare
Yep, agree the case on FPGA is quite useful, I think in that case, we should migrate the current custom trace function to riscv-trace-spec, so that we can use the software stack.
Yeah, I think that is a bad design, I personally prefer using
Maybe a custom linking script while maintaining a memory in emulator may help. |
It should support both. A full trace for cases where spec-compliance is not necessary, and a spec-compliant trace. In fact, this is already the case. TraceCoreInterface should provide the spec-compliant trace, while TracedInstruction is instruction-level trace.
Why do you think it is bad design? Cosim should test I$ and D$ as well... especially since D$ microarchitecture is quite complicated, and implements things like atomics, lr/sc, etc. That approach is simple, and works ok. It requires no special harness, and does not rely on any details of the DUT microarchitecture. Thus it is easy to maintain, and microarchitecture-agnostic. |
To make it more clear: I think we should only use |
What is the advantage of this approach over having spike model the entire system. |
Basically I think the main purpose to introduce In this case, we just put |
I think two separate concerns are getting mixed together. Integrating SpikeTile can be left to another discussion. I'm more concerned at the moment with the design of spike-cosim. I believe spike-cosim should be performed by having spike model the entire SoC, as is done in Chipyard's implementation. What are the disadvantages you see with that approach? |
This is basically what I'm concerning, I think the cosim framework in this PR should only for |
Ok, to summarize, in this proposed cosim design:
I feel like this approach introduces a lot of unnecessary complexity, you have to write a interface between spike's bus architecture model, and the RocketCore's tilelink interface, which implies writing a TileLink coherence manager model in C++. On the other hand, using RC+Rocket as the DUT enables a trivial (~200 LoC) implementation of cosim. Standalone cosim of Rocket should be done in the split-out rocket repo, if it is truly expected to be independent of rocket-chip. |
Yes, I agree that's pain. However it's necessary for splitting This test framework will only work for the rocket core, rather than the entire SoC, and will be eventually split out with the rocket core. |
Ok. In that case I would like to mark this PR DO_NOT_MERGE. I do not believe we should merge things that will be split out later. |
So how do you think we can start to split? creating another repo or branch? |
I think you can create another repo, then just push the latest dev branch into it. Then someone can work on modifying the version of rocket in that repo to be standalone. We should not lose commit history. |
Sure! chipsalliance/rocket has been created. @midnighter95 please submit PR there. |
This PR provides a Rocket-Spike cosim simulation framework. A standalone rocket core is extracted with its memory port exposed to the top-level and is compiled to c++ model by Verilator . The framework drives rocket model to run testcase while spike runs the same test at the same time. During the test, the rocket result will be compared to the spike for each instruction.
Framework Outline
Features
Uarch Coverpoint
Completed tests:
Test Flow
nix develop -c make riscvtests