#### Writing software to elaborate hardware (SpinalHDL)





#### Background / whoami

- Dolu1990 on github
- Active on open/free project
  - SpinalHDL (2015) / VexRiscv (2017) / NaxRiscv(2021) / VexiiRiscv (2023)
- Software / Hardware background
  - Industrial system / Electronic degree

#### Let's use a software language

```
object Main extends App{
   println("Hello world")
}
```

#### Let's use an hardware description library (HDL)



#### Let's use an hardware description library (HDL)

```
counter
increment
                  reset
import spinal.core.
object Main extends App{
  SpinalVerilog(new Timer)
class Timer extends Component {
  val increment = in(Bool())
  val counter
                = Reg(UInt(8 bits)) init(0)
  val full
                = out(counter === 255)
  when(increment){
    counter := counter + 1
```

#### Let's use an hardware description library (HDL)

```
counter
increment
                                                     module Timer (
                                                       input wire increment,
                                                       output wire full,
                                                       input wire clk,
                                                       input
                                                             wire reset
                     reset
import spinal.core.
                                                                 [7:0]
                                                                         counter;
                                                       reg
object Main extends App{
                                                       assign full = (counter == 8'hff);
  SpinalVerilog(new Timer)
                                                       always @(posedge clk or posedge reset) begin
                                                         if(reset) begin
                                                          counter <= 8'h00;
                                                        end else begin
class Timer extends Component {
                                                          if (increment) begin
  val increment = in(Bool())
                                                            counter <= (counter + 8'h01);</pre>
                                                          end
  val counter
                   = Req(UInt(8 bits)) init(0)
                                                        end
                   = out(counter === 255)
  val full
                                                       end
  when(increment){
     counter := counter + 1
                                                     endmodule
```

#### Data structure / parameters

```
case class Pixel(width: Int) extends Bundle{
  val r, g, b = UInt(width bits)
}

case class Stream[T <: Data](dataType: HardType[T]) extends Bundle {
  val valid = Bool()
  val ready = Bool()
  val data = dataType()
}

val bus = Stream(Pixel(8))
bus.valid := False
bus.data.r := 0x11
bus.data.g := 0x22
bus.data.b := 0x33</pre>
```

# Integrated linting

- Latches
- Combinatorial loops
- Unspecified clock crossing
- Undriven signals
- Width mismatch
- •

#### Integrated linting

- Latches
- Combinatorial loops
- Unspecified clock crossing
- Undriven signals
- Width mismatch

•

```
c ← b ← a ←

val a,b,c = Bool()

c := b; b := a; a := c
```

#### **COMBINATORIAL LOOP:**

(toplevel/a : Bool)
(toplevel/b : Bool)
(toplevel/c : Bool)
(toplevel/a : Bool)

#### Integrated linting

- Latches
- Combinatorial loops
- Unspecified clock crossing
- Undriven signals
- Width mismatch

```
class Toplevel(cdA : ClockDomain, cdB : ClockDomain) extends Component {
  val regA = cdA(Reg(Bool()))
  val regB = cdB(Reg(Bool()))
  regB := regA
}
```

```
c ← b ← a ←

val a,b,c = Bool()
c := b; b := a; a := c

COMBINATORIAL LOOP:
(toplevel/a: Bool)
(toplevel/b: Bool)
(toplevel/c: Bool)
(toplevel/a: Bool)
```

```
regB := regA
}

CLOCK CROSSING VIOLATION :
- Source : (toplevel/regA : Bool) spinal.tester.code.Toplevel....(PresentationDsl.scala:1494)
- Source clock : (cdA_clk : Bool)
- Destination : (toplevel/regB : Bool) spinal.tester.code.Toplevel....(PresentationDsl.scala:1495)
- Destination clock : (cdB_clk : Bool)
```

#### Using software to elaborate your hardware

```
if(featureEnabled) {
  Reg(UInt(8 bits))
}
```

- Control flow: if / for
- Data collections: dynamic array / hash map / hash set
- Lambda function: reduce / fold / map / filter / ...
- OOP: class / software interface
- See https://spinalhdl.github.io/NaxRiscv-Rtd/main/NaxRiscv/abstraction/index.html

#### Using software to elaborate your hardware

```
for(i <- 0 to 2) {
  Reg(UInt(8 bits))
}</pre>
```

- Control flow : if / for
- Data collections: dynamic array / hash map / hash set
- Lambda function: reduce / fold / map / filter / ...
- OOP : class / software interface
- See https://spinalhdl.github.io/NaxRiscv-Rtd/main/NaxRiscv/abstraction/index.html

#### Using software to elaborate your hardware

- Control flow : if / for
- Data collections: dynamic array / hash map / hash set
- Lambda function: reduce / fold / map / filter / ...
- OOP: class / software interface





```
val pip = new Pipeline()
```



```
val pip = new Pipeline()

val PC = NamedType(UInt(32 bits))
pip(PC, 0) := 0x42
```



```
val pip = new Pipeline()

val PC = NamedType(UInt(32 bits))
pip(PC, 0) := 0x42

val CALC = NamedType(UInt(32 bits))
pip(CALC, 2) := pip(PC, 2) + 0x11
```



```
val pip = new Pipeline()

val PC = NamedType(UInt(32 bits))
pip(PC, 0) := 0x42

val CALC = NamedType(UInt(32 bits))
pip(CALC, 2) := pip(PC, 2) + 0x11

val x = pip(PC, 3) + 1
val y = pip(CALC, 4) + 0x22
```



```
val pip = new Pipeline()

val PC = NamedType(UInt(32 bits))
pip(PC, 0) := 0x42

val CALC = NamedType(UInt(32 bits))
pip(CALC, 2) := pip(PC, 2) + 0x11

val x = pip(PC, 3) + 1
val y = pip(CALC, 4) + 0x22

pip.build()
```



```
val pip = new Pipeline()

val PC = NamedType(UInt(32 bits))
pip(PC, 0) := 0x42

val CALC = NamedType(UInt(32 bits))
pip(CALC, 2) := pip(PC, 2) + 0x11

val x = pip(PC, 3) + 1
val y = pip(CALC, 4) + 0x22

pip.build()
```

```
[31:0]
wire
                  PC 0;
          [31:0]
                  PC 3;
wire
          [31:0]
wire
                  x;
          [31:0]
                  CALC 2;
wire
wire
          [31:0]
                  PC 2;
wire
                  CALC_4;
          [31:0]
wire
          [31:0]
                  у;
          [31:0]
                  PC 1;
wire
          [31:0]
                  PC 0 reqNext;
reg
                  PC 1 reqNext;
          [31:0]
reg
                  PC 2 reqNext;
reg
          [31:0]
wire
          [31:0]
                  CALC 3;
          [31:0]
                  CALC 2 regNext;
reg
                  CALC 3 regNext;
          [31:0]
req
assign PC 0 = 32'h00000042;
assign x = (PC 3 + 32'h00000001);
assign CALC 2 = (PC 2 + 32'h00000011);
assign y = (CALC 4 + 32'h00000022);
assign PC_1 = PC_0_regNext;
assign PC 2 = PC 1_regNext;
assign PC 3 = PC 2 regNext;
assign CALC 3 = CALC 2 reqNext;
assign CALC 4 = CALC_3_regNext;
always @(posedge clk) begin
 PC 0 reqNext <= PC 0;
 PC 1 reqNext <= PC 1;</pre>
 PC 2 reqNext <= PC 2;
 CALC 2 regNext <= CALC 2;
 CALC 3 regNext <= CALC 3;
end
```

```
class Pipeline{
 //Define the pipeline data model
 val specs = LinkedHashMap[NamedType[Data], LinkedHashMap[Int, Data]]()
 //Define how we can access the pipeline
 def apply[T <: Data] (what: NamedType[T], stageId: Int) = {</pre>
    val spec = specs.getOrElseUpdate(what.asInstanceOf[NamedType[Data]], new LinkedHashMap[Int, Data])
    spec.getOrElseUpdate(stageId, what().setName(what.getName + " " + stageId)).asInstanceOf[T]
 //Translate specs into hardware
 def build(): Unit = {
    for ((what, nodes) <- specs) {</pre>
      for (i <- nodes.keys.min until nodes.keys.max) {</pre>
        apply(what, i + 1) := RegNext(apply(what, i))
```



```
class Pipeline{
 //Define the pipeline data model
 val specs = LinkedHashMap[NamedType[Data], LinkedHashMap[Int, Data]]()
 //Define how we can access the pipeline
 def apply[T <: Data] (what: NamedType[T], stageId: Int) = {</pre>
    val spec = specs.getOrElseUpdate(what.asInstanceOf[NamedType[Data]], new LinkedHashMap[Int, Data])
    spec.getOrElseUpdate(stageId, what().setName(what.getName + " " + stageId)).asInstanceOf[T]
  //Translate specs into hardware
 def build(): Unit = {
    for ((what, nodes) <- specs) {</pre>
      for (i <- nodes.keys.min until nodes.keys.max) {</pre>
        apply(what, i + 1) := RegNext(apply(what, i))
```



```
class Pipeline{
 //Define the pipeline data model
 val specs = LinkedHashMap[NamedType[Data], LinkedHashMap[Int, Data]]()
 //Define how we can access the pipeline
 def apply[T <: Data] (what: NamedType[T], stageId: Int) = {</pre>
    val spec = specs.getOrElseUpdate(what.asInstanceOf[NamedType[Data]], new LinkedHashMap[Int, Data])
    spec.getOrElseUpdate(stageId, what().setName(what.getName + " " + stageId)).asInstanceOf[T]
  //Translate specs into hardware
 def build(): Unit = {
    for ((what, nodes) <- specs) {</pre>
      for (i <- nodes.keys.min until nodes.keys.max) {</pre>
        apply(what, i + 1) := RegNext(apply(what, i))
```

```
HashMap
                     hardware
specs
HashMap
  PC
 CALC
            HashMap
                      hardware
                       CALC 2
```

```
class Pipeline{
 //Define the pipeline data model
 val specs = LinkedHashMap[NamedType[Data], LinkedHashMap[Int, Data]]()
 //Define how we can access the pipeline
 def apply[T <: Data] (what: NamedType[T], stageId: Int) = {</pre>
    val spec = specs.getOrElseUpdate(what.asInstanceOf[NamedType[Data]], new LinkedHashMap[Int, Data])
    spec.getOrElseUpdate(stageId, what().setName(what.getName + " " + stageId)).asInstanceOf[T]
  //Translate specs into hardware
 def build(): Unit = {
    for ((what, nodes) <- specs) {</pre>
      for (i <- nodes.keys.min until nodes.keys.max) {</pre>
        apply(what, i + 1) := RegNext(apply(what, i))
```

```
val pip = new Pipeline()

val PC = NamedType(UInt(32 bits))
pip(PC, 0) := 0x42

val CALC = NamedType(UInt(32 bits))
pip(CALC, 2) := pip(PC, 2) + 0x11

val x = pip(PC, 3) + 1
val y = pip(CALC, 4) + 0x22

pip.build()
```

```
[31:0]
wire
                  PC 0;
          [31:0]
                  PC 3;
wire
wire
          [31:0]
                   x;
          [31:0]
                  CALC 2;
wire
wire
          [31:0]
                  PC 2;
wire
          [31:0]
                   CALC 4;
          [31:0]
wire
                   у;
          [31:0]
                   PC 1;
wire
reg
          [31:0]
                  PC 0 reqNext;
          [31:0]
                  PC 1 reqNext;
reg
                  PC 2 reqNext;
reg
          [31:0]
wire
          [31:0]
                  CALC 3;
                  CALC 2 regNext;
          [31:0]
reg
                  CALC 3 regNext;
          [31:0]
req
assign PC 0 = 32'h00000042;
assign x = (PC 3 + 32'h00000001);
assign CALC 2 = (PC 2 + 32'h00000011);
assign y = (CALC 4 + 32'h00000022);
assign PC_1 = PC_0_regNext;
assign PC 2 = PC 1_regNext;
assign PC 3 = PC 2 regNext;
assign CALC 3 = CALC 2 reqNext;
assign CALC 4 = CALC 3 reqNext;
always @(posedge clk) begin
 PC 0 reqNext <= PC 0;
 PC 1 reqNext <= PC 1;</pre>
 PC 2 reqNext <= PC 2;
  CALC 2 reqNext <= CALC 2;
 CALC 3 regNext <= CALC 3;
end
```

#### VexiiRiscv

- RISC-V softcore
- Very large design space
- Can run linux (RV32 / RV64 IMACSU)
- Multiple issue
- Early + late ALU
- 5.24 coremark/Mhz 2.50 dhystone/Mhz
- https://github.com/SpinalHDL/VexiiRiscv





pcPort.valid := alu.TAKEN

pcPort.pc := ..



pcPort.valid := alu.TAKEN

pcPort.pc := ..



## VexiiRiscv: Decoding / Scheduling



#### VexiiRiscv: Specification structure



```
case class OpSpec(
  encoding : String,
  mayFlush : Boolean,
  sideEffect: Boolean,
)
```

#### VexiiRiscv: Define instructions spec



```
case class OpSpec(
  encoding : String,
  mayFlush : Boolean,
  sideEffect: Boolean,
)

val ADD = OpSpec("000----0110011", false, false)
val CALL = OpSpec("----1101111", true, false)
val SW = OpSpec("100----1100011", false, true)
```

#### VexiiRiscv: Collect the specifications



```
case class OpSpec(
  encoding : String,
  mayFlush : Boolean,
  sideEffect: Boolean,
)

val ADD = OpSpec("000----0110011", false, false)
val CALL = OpSpec("----1101111", true, false)
val SW = OpSpec("100----1100011", false, true)
val opsSpec = List(ADD, CALL, SW)
```

#### VexiiRiscv: Decode

```
encoding : String,
                                 mayFlush : Boolean,
                                 sideEffect: Boolean,
F0
                               val ADD
                                        = OpSpec("000----0110011", false, false)
F1
        ...
                               val CALL = OpSpec("-----1101111", true, false)
                               val SW
                                        = OpSpec("100----1100011", false, true)
F2
        ...
                               val opsSpec = List(ADD, CALL, SW)
D0
                               val instruction = in Bits(32 bits)
     STORE
E0
              → bus.request
                               val withSideEffects = Symplify(
                                 input = instruction,
E1
      CALL
              → flush
                                 trueTerms = opsSpec.filter(_.sideEffect).map(_.encoding),
                                 falseTerms = opsSpec.filter(!_.sideEffect).map(_.encoding)
E2
                    opsSpec
                   Symplify
      instruction
                                withSideEffects
```

case class OpSpec(

#### No more toplevel / design space

```
class VexiiRiscv(...) extends Component{
  val database = ...
  val host
val plugins = ArrayBuffer[Hostable]()
plugins += new fetch.FetchPipelinePlugin()
plugins += new fetch.PcPlugin(resetVector)
plugins += new fetch.FetchL1Plugin(...)
plugins += new prediction.BtbPlugin(...)
plugins += new prediction.GSharePlugin (...)
plugins += new prediction.HistoryPlugin(...)
plugins += new execute.SimdAddPlugin(...)
val cpu = VexiiRiscv(plugins)
```



#### Custom instruction

```
object SimdAddPlugin {
  val ADD4 = IntRegFile.TypeR(M"0000000-----0000----0001011")
class SimdAddPlugin(val layer: LaneLayer) extends ExecutionUnitElementSimple(layer) {
  val logic = during setup new Logic {
    awaitBuild()
    val wb = newWriteback(ifp, 0)
    val add4 = add(SimdAddPlugin.ADD4).spec
    add4.addRsSpec(RS1, executeAt = 0)
    add4.addRsSpec(RS2, executeAt = 0)
    uopRetainer.release()
    val process = new el.Execute(id = 0) {
                                                                           F0
      val rs1 = el(IntRegFile, RS1).asUInt
      val rs2 = el(IntRegFile, RS2).asUInt
                                                                           F1
      val rd = UInt(32 bits)
      rd(7 \text{ downto } 0) := rs1(7 \text{ downto } 0) + rs2(7 \text{ downto } 0)
      rd(16 \text{ downto } 8) := rs1(16 \text{ downto } 8) + rs2(16 \text{ downto } 8)
                                                                           F2
                                                                                                       Decoder
      rd(23 \text{ downto } 16) := rs1(23 \text{ downto } 16) + rs2(23 \text{ downto } 16)
      rd(31 \text{ downto } 24) := rs1(31 \text{ downto } 24) + rs2(31 \text{ downto } 24)
                                                                           D0
                                                                                                      Dispatche
      wb.valid := SEL
      wb.payload := rd.asBits
                                                                                    ADD4
                                                                                                      Writebac
                                                                           E0
                                                                           E1
                                                                           E2
```

#### How to lower barriers to hardware design?

- VHDL / [System]Verilog alternatives
  - SpinalHDL, Chisel, Migen, Amaranth, ...
- Hardware design can leverage software engineering
  - API / Tooling / abstraction
  - New pool of people profiles
- Closed industry tools limits the scale of free/open-source hardware
  - Alternatives (Verilator, GHDL, IVerilog, openroad, ..)

# Question?