



# Chisel — an Agile Hardware Description Language

An Introduction to Chisel

Shixin CHEN November 6, 2021

Dept. ESE, Nanjing University
Dept. CSE, Chinese University of Hong Kong

#### Outline

Preliminary

Instances of Chisel

Chisel for Super Resolution on FPGA

Q & A

1

# **Preliminary**

# **Integrated Circuit Design Flow**



Figure 1: IC Design Flow



(a) Chisel in Reality



**(b)** Chisel in IC Design

Figure 2: Chisel

### **Drawbacks of Verilog**

#### **Verilog**

The language has been dominant in IC design for more than 30 years.

- ► A classical but old-fashioned language
  - Incompatible with contemporary software development styles
- Unfriendly to design iterations
  - Low-level abstractions
  - Time-consuming development cycles

# Hardware Description Languages (HDL)<sup>1</sup>

- ► Register Transfer Language (RTL)
  - Verilog, VHDL
- ► Meta HDL and Transpilers

```
C++ SystemC, VisualHDL
```

**Python** PyRTL, Pyrope

**Java** jhdl

**Scala** Chisel, SpinalHDL

- ► High-Level Synthesis (HLS)
  - HLS
  - Legup

https://github.com/drom/awesome-hdl

## Why is Scala? & Why is Chisel?

#### Chisel

Constructing Hardware In Scala Embedded Language

- ▶ **Scala**, an excellent host language used as Domain Specific Language (DSL)
  - Extendibility
  - Simplicity
- ▶ **Chisel**, embedded in Scala, facilitating digital design
  - Higher-level abstractions
  - Time-saving and efficiency

#### How can Chise help?

#### **Motivation**

Through circuit generators, developers can leverage the hard work of design experts and raise the level of design abstraction to meet the demand of evolution in IC design.

- ► Using modern programming styles to productively put IC elements, like Mux, Counter, and RAM together
  - Parameterized types
  - Object-oriented programming
  - Functional programming

# **Chisel Types Tree**

Providing a library of class and objects to representing hardware

• Types: UInt, SInt, Vec, Bundle, Clock

• Hardware: Reg, Wire, IO, Mux

• Structure: Module, Blackbox



# Instances of Chisel

#### **Parameterized Adder in Chisel:**

```
class ParamAddern(n: Int) extends Module {
     val io = IO (new Bundle{
       val a = Input(UInt(n.W))
       val b = Input(UInt(n.W))
       val out = Output(UInt((n+1).W))
     })
     io.out:=io.a+io.b
    //instantiate
   val add8 = Module(new ParamAdder(8))
   val add16 = Module(new ParamAdder(16))
11
```

```
module Adder(
      input
                  clock.
2
      input
                  reset.
      input [8:0] io in a,
      input [8:0] io_in_b,
     output [9:0] io out
7
    assign io out=io in a+io in b;
    endmodule
    //instantiate
    Adder add(
                (svs clk).
        .clock
12
                 (svs rst).
        .reset
13
        .io_in_a (a
14
        .io in b (b
15
        .io out
                 (out
16
17
   );
```

#### **Combinations in Chisel**

```
class PORT B extends Bundle{
    //Bundle Defines the combinations of

→ Modules

     var bo=Input(UInt(2.W))
      var b1=Input(UInt(2.W))
      var b3=Output(UInt(2.W))
    class PORT C extends Bundle{
      var co=Input(UInt(2.W))
      var c1=Output(UInt(2.W))
10
11
12
    class PORT_A extends Bundle {
13
      //Exchange the Input and Output
14
      val interface b = Flipped(new PORT_B)
15
      val interface_c = Flipped(new PORT_C)
16
      val ao=Input(UInt(4.W))
17
      val a1=Output(UInt(4.W))
18
19
```

Here are the *combinations* of *Modules*. With Chisel, implementation is efficient and time-saving.



Figure 3: Modules Combinations

#### **Combinations in Chisel**

```
class Module_A extends Module{
20
      val io=IO(new (PORT A))
21
      ...//Operations
23
    ...//Module B, Module C
24
    class PortReuse extends Module {
25
      val io=IO(new Bundle{
26
        val in = Input(UInt(4.W))
27
        val out=Output(UInt(4.W))
28
29
      val ma=Module(new Module_A()).io
30
      val mb=Module(new Module B()).io
31
      val mc=Module(new Module_C()).io
32
33
      ma.interface_b<>mb//Match the Input and the Corresponding Output
34
      ma.interface c<>mc
35
      ma.ao :=io.in
36
      io.out:=ma.a1
37
38
```

# Object-oriented programming in Chisel

```
class Payload extends Bundle {
        val data = UInt (16.W)
        val flag = Bool ()
3
4
    class Port[T <: Data ](private val dt: T) extends Bundle {</pre>
        val address = UInt (8.W)
        val data = dt. cloneType
8
    class NocRouter2[T <: Data ](dt: T, n: Int) extends Module {</pre>
        val io =IO(new Bundle {
10
        val inPort = Input(Vec(n, dt))
11
        val outPort = Output(Vec(n, dt))
12
   })
13
    // Route the payload according to the address
   // ...
15
    val router = Module(new NocRouter2 (new Port(new Payload), 2))}
```

#### **Functional Abstraction in Chisel**

```
val (cnt.cnt valid)=Counter(io.input valid.4)
     //Counter is a module provided by Chisel
     //We can define our own Module generators as well
     class Counter_pulse extends Module{
       val io=IO(new Bundle{
         val valid=Input(Bool())
         val goal num=Input(UInt(8.W))
         val pulse=Input(Bool())
         val cnt=Output(UInt(8.W))
10
         val out valid=Output(Bool())
11
12
       ...//Counter operations
13
       object Counter_pulse{
14
15
       def apply(valid:Bool,goal_num:UInt,pulse:Bool):(UInt,Bool)={
16
          val inst=Module(new Counter pulse())
17
           inst.io.valid:=valid
           inst.io.goal num:=goal num
18
           inst.io.pulse:=pulse
19
           (inst.io.cnt.inst.io.out valid)
20
21
22
23
     val (cnt2,cnt2_valid)=Counter_pulse(io.input_valid,6.U,cnt_valid)
```

**Chisel for Super Resolution on FPGA** 

# **Conventional Convolution vs. Winograd Algorithm**

#### **Conventional Convolution Algorithm**

The algorithm, consuming 9 DSPs in each convolutional computation, is demanding for DSPs on resource-limited FPGA.



Figure 4: Classical Convolution based Matrix-Multiplying

# Conventional Convolution vs. Winograd Algorithm

#### **Winograd Algorithm**

The algorithm, consuming only 4 DSPs in each computation on FPGA, works as a replacement of convolution operator.

(3)

$$S = A^{T} \left[ \left( G g G^{T} \right) \odot \left( B^{T} d B \right) \right] A \tag{1}$$

$$g = \begin{bmatrix} wt_{00} & wt_{01} & wt_{02} \\ wt_{10} & wt_{11} & wt_{12} \\ wt_{20} & wt_{21} & wt_{22} \end{bmatrix}$$
 (2)

$$d = \begin{bmatrix} x_{00} & x_{01} & x_{02} & x_{03} \\ x_{10} & x_{11} & x_{12} & x_{13} \\ x_{20} & x_{21} & x_{22} & x_{23} \\ x_{30} & x_{31} & x_{32} & x_{33} \end{bmatrix}$$

$$B^{T} = \begin{bmatrix} 1 & 0 & -1 & 0 \\ 0 & 1 & 1 & 0 \\ 0 & -1 & 1 & 0 \\ 0 & 1 & 0 & -1 \end{bmatrix} \tag{2}$$

$$G = \begin{bmatrix} 1 & 0 & 0 \\ 0.5 & 0.5 & 0.5 \\ 0.5 & -0.5 & 0.5 \\ 0 & 0 & 1 \end{bmatrix}$$
 (5)

$$A^{T} = \begin{bmatrix} 1 & 1 & 1 & 0 \\ 0 & 1 & -1 & -1 \end{bmatrix}$$
 (6)

# **Winograd Algorithm Flow**

$$S = A^{T} \left[ \left( GgG^{T} \right) \odot \left( B^{T}dB \right) \right] A \tag{7}$$



**Figure 5:** Data Flow of the Winograd



Figure 6: pipline of the Winograd

#### **Comparison: Length of Code**



(a) Chisel Code (218 LOC)



(b) Verilog Code (3239 LOC)

Figure 7: Chisel-generated Verilog

# **Comparison: Computing Resources**

#### **Compromise**

DSPs is the bottleneck of computing resources while FFs and LUTs are sufficient in most FPGA platforms.

It is a rewarding strategy to use Winograd Algorithm on FPGA.

| Case | Conventional Convolution | Winograd Algorithm |
|------|--------------------------|--------------------|
| DSPs | 9                        | 4                  |
| LUTs | 334                      | 650                |
| FFs  | 1110                     | 1500               |

**Table 1:** Resources of Conventional Conv. Wino. Algo.

## Chisel: Trade-off between Verilog and HLS

It is unavoidable for Chisel to consume more LUTs and FFs in the synthesis of Module than Verilog, because higher abstraction will take extra resources.

- ► More agile than Verilog
- ► More controllable than Chisel

| Case | Verilog | Chisel |
|------|---------|--------|
| DSPs | 9       | 9      |
| LUTs | 180     | 229    |
| FFs  | 1156    | 1523   |

**Table 2:** Resources of Conventional Convolution Implementation

#### Future Plan on Chisel-SR

- ▶ Utilize Chisel to enclose Verilog-based modules as hardware generators.
- ► Formulate the design space of hardware generators, which accept provided parameters to produce RTL flexibly.

# Q & A