#### Open CPU / SoC design, all the way up to Debian



#### TODO

- Why nax
- Perf?
- Tricky thing about CPU perf / optimisazion
- Pipelining api example ?

### Background / whoami

- Dolu1990 on github
- Active on open/free project
  - SpinalHDL / VexRiscv / NaxRiscv
- Software / Hardware background
  - Without computer science degree
  - With Industrial system / Electronic degree

#### Introduction

- This talk will be based on NaxRiscv
  - Opensource https://github.com/SpinalHDL/NaxRiscv
    - RISC-V softcore
    - out of order, multi-issue, multi core, memory coherent
- What the talk will do
  - Share experience
  - Give tips / keywords

# Digital Hardware description



- "There is no alternative" (VHDL / Verilog)
  - Scala based : SpinalHDL, Chisel
  - Python based : Migen, Amaranth

- ...



# It can be pretty / explicit



```
class Timer extends Component {
  val increment = in Bool()
  val counter = Reg(UInt(8 bits)) init(0)
  val full = counter === 255

  when(increment) {
    counter := counter + 1
  }
}
```

```
if(featureEnabled) {
  Reg(UInt(8 bits))
}
```

- Control flow: if / for
- Data structures: dynamic array / hash map / hash set
- Lambda function: reduce / fold / map / filter / ...
- OOP: class / software interface
- See https://spinalhdl.github.io/NaxRiscv-Rtd/main/NaxRiscv/abstraction/index.html

```
for(i <- 0 to 2) {
   Reg(UInt(8 bits))
}</pre>
```

- Control flow : if / for
- Data structures: dynamic array / hash map / hash set
- Lambda function: reduce / fold / map / filter / ...
- OOP: class / software interface
- See https://spinalhdl.github.io/NaxRiscv-Rtd/main/NaxRiscv/abstraction/index.html

```
val flushes = ArrayBuffer[Flush]()
flushes += new Flush()
flushes += new Flush()
if (featureEnabled) {
  flushes += new Flush()
}
```

| Index | Value    |
|-------|----------|
| 0     | hardware |
| 1     | hardware |
| 2     | hardware |

- Data structures: dynamic array / hash map / hash set
- Lambda function: reduce / fold / map / filter / ...
- OOP: class / software interface
- See https://spinalhdl.github.io/NaxRiscv-Rtd/main/NaxRiscv/abstraction/index.html

```
val PC = Hardtype(UInt(32 bits))
val instruction = Hardtype(UInt(32 bits))
val fetch, decode, execute = HashMap[Hardtype, Hardware]()
fetch(PC) = U(42)
val target = execute(PC) + U(666)
autoPipeline(fetch, decode, execute)
                                         fetch
                                                 decode
                                                          execute
                                                                 666
```

- Data structures: dynamic array / hash map / hash set
- Lambda function: reduce / fold / map / filter / ...
- OOP: class / software interface
- See https://spinalhdl.github.io/NaxRiscv-Rtd/main/NaxRiscv/abstraction/index.html

### ISA (instruction set architecture)



- A few open / free ISA (OpenRISC, RISC-V, ...)
- RISC-V
  - https://riscv.org/technical/specifications/
  - Secretly being embedded in ASICs
  - Mostly bloat free, from a FPGA perspective)
  - GCC / LLVM / Qemu
  - Linux / Debian port

- ...

# CPU design (NaxRiscv)

#### 3 EU configuration PC, Fetch BTB **GShare** Fetch Prediction Fetch, Align, Decode RAS / Target Prediction Allocate, Rename Dependency Shared Issue Queue dynamic wake -> Read RF Read RF Read RF Div CSR Env Int Shift J/B Int Shift J/B AGU Mul Div CSR Mul J/B LSU Mul Div CSR Reschedule trap -> ROB-Commit

https://github.com/SpinalHDL/NaxRiscv

# "Branch prediction"



https://spinalhdl.github.io/NaxRiscv-Rtd/main/NaxRiscv/misc/index.html#security

### Secret leak (Konata trace)















# Multi core requirements (RISC-V linux)

- Hardware cache coherency
- Inter-CPU software interrupts



# Software cache coherency

# CPU0 write RAM[0]



### Hardware cache coherency

with memory block copy and permissions



### Hardware cache coherency

And its side effects

# Cache coherency

- Open specification : Tilelink
- Multiple open source implementations, ex :
  - https://github.com/SpinalHDL/SpinalHDL/tree/dev/lib/src/main/scala/spinal/lib/bus/tilelink
- Coherent L2 design
  - https://spinalhdl.github.io/SpinalDoc-RTD/master/SpinalHDL/Libraries/Bus/tilelink/tilelink\_fabric.html
  - Exposes a lot of performance tricks
    - 1 pending transaction per 64 bytes block
    - 64 bytes per burst max
    - Keep bursts aligned
    - Alignment Aliasing





- LiteX
  - Open-source SoC framework (in Python)
  - Integrate many peripherals
    - DDRx controller
    - SDCARD
    - Ethernet, USB host
    - SATA, PCIe, ...



# Memory / Peripherals / Deployment

- python3 -m litex\_boards.targets.digilent\_nexys\_video --cpu-type=naxriscv --bus-standard axi-lite --with-video-framebuffer --with-coherent-dma --with-sdcard --with-ethernet --xlen=64 --scala-args='rvc=true,rvf=true,rvd=true,alu-count=2,decode-count=2' --with-jtag-tap --sys-clk-freq 100000000 --cpu-count 2 --build -load
- + stuff for USB / JTAG



### USB host support

- USB host support =>
  - Keyboard / mouse / audio / flash drive / sdcard / uart / ethernet / bluetooth
- Open specification :
  - Open Host Controller Interface Specification for USB (OHCI)
  - USB 12 Mb/s, up to 15 ports, 2 FPGA GPIO per port
- Open implementation :
  - https://spinalhdl.github.io/SpinalDoc-RTD/master/SpinalHDL/Libraries/Com/usb\_ohci.html
  - https://github.com/SpinalHDL/SpinalHDL/blob/dev/lib/src/main/scala/spinal/lib/com/usb/ohci/UsbOhci.scala
  - https://github.com/litex-hub/pythondata-misc-usb\_ohci/tree/master/pythondata\_misc\_usb\_ohci/verilog

# Synthesis / Place / Route

- Some open source tools for
  - Xilinx 7-Series
  - Lattice ice40 / ecp5
  - ...
- Synthesis: Yosys
- Place and route : Nextpnr
- •

# Running Linux



- Requirements on RISC-V
  - Hardware cache coherency (between CPUs + DMA)
  - Machine/Supervisor/User modes (riscv-privileged spec)
  - SBI (Supervisor Binary Interface, kinda like BIOS)
  - DTB (Device Tree Binary)



# SBI (Supervisor Binary Interface)

- . . . . . . . . . . OpenSBI
- https://github.com/riscv-non-isa/riscv-sbi-doc
- https://github.com/riscv-software-src/opensbi



Supervisor

### Running Debian

- Requirements on RISC-V
  - RV64IMAFDC
  - Linux kernel
  - Some storage (ex: SDCARD)
- Doc: https://github.com/SpinalHDL/NaxSoftware/tree/main/debian\_litex
- Video: https://x.com/dolu1990/status/1712848731349889108

```
0[*
                                0.6%] Tasks: 28, 18 thr, 61 kthr; 1 running
                               13.8%] Load average: 3.48 1.37 0.50
Mem[|||##*@@@@@@
                          46.2M/472M] Uptime: 00:01:43
Swp [
                            0K/4.00G]
[Main] [I/0]
  PID USER
                 PRI
                          VIRT
                                 RES
                                              CPU%∀MEM%
                                                          TIME+
                                                                  Command
                                2776
                                                         0:01.42 htop
  512 root
                                      2220 R 11.6 0.6
  405 root
                                      5568 S
                                               0.6
                                                         0:00.89 /usr/sbin/cup
                                     7524 S
                                               0.0 2.2 0:28.04 /sbin/init
   1 root
                       0 16768 10876
```



# Verification / Debugging

- May create rituals
- May exhaust your sanity
- May never end
- May crush your soul

### Debugging – example (a real one)

- Dual core Debian freeze (RCU error) after ~1h (360'000'000'000 cycles)
  - Impossible to reproduced in simulation (~40 days of runtime @ 100 Khz)
- Hope you get clues
  - Core was still alive on JTAG => OpenOCD
  - PC was in linux "\_raw\_spin\_lock"
  - Forcing it to unlock => Everything back on track
- What it was:
  - CPU D\$ doing bad memory coherency
  - (releasing data permissions once again after being probed out)

#### Simulation

- A few open source simulation tools :
  - GHDL, IVerilog, Verilator, ...
- Based on Verilator (~150-200 Khz on a 4 years old CPU)
- Lock-step checks against Spike / RVLS
- Konata traces





#### 0xAA55

- I hope you had a good time
- If you are looking for open-source project funding :
  - NLnet Foundation (they helped me a lot)

# Speculative execution

- Execute as much as you can in advance
- Bad speculation → no side effect on the ISA states
  - x1, x2, ..., x31
  - CSR
  - Global memory (RAM)
- But what's about all the others hardware states
  - Branch predictors (BTB, RAS, GShare)
  - I\$ D\$ tags (is address X in the cache or not)
  - Those are easy to probe

### Hardware cache coherency

without memory block ownership



### Hardware cache coherency

with memory block ownership

- Caches have to acquire memory block permissions
  - read [write] permissions
  - 64 bytes per block (typically), aligned
- Cache line states
  - Valid / Invalid
  - Clean / Dirty
  - Unique / Shared



# Boot sequence

- FPGA bitstream
- Litex bios
- OpenSBI
- [Uboot]
- Linux
- Buildroot / Debian / ...

```
val a, b, c = out UInt(8 bits)
                                                output
                                                           [7:0]
                                                                     a,
val array = ArrayBuffer[UInt]()
                                                output
                                                           [7:0]
                                                                     b,
array += a // By reference !
                                                output
                                                           [7:0]
array += b
array += c
                                                assign a = 8'h0;
                                                assign b = 8'h0;
for(element <- array){</pre>
                                                assign c = 8'h0;
  element := 0
```

- Data structures : dynamic array / hash map / hash set
- Lambda function: reduce / fold / map / filter / ...
- OOP: class / software interface
- See https://spinalhdl.github.io/NaxRiscv-Rtd/main/NaxRiscv/abstraction/index.html