# Learn RISC-V CPU Implementation and BSV

(BSV: a High-Level Hardware Design Language)

Rishiyur S. Nikhil

L4: BSV: Combinational Circuits



#### Reminders

Please git clone or git pull: https://github.com/rsnikhil/Learn\_Bluespec\_and\_RISCV\_Design

```
./Book_BLang_RISCV.pdf
 Slides/
      Slides_01_Intro.pdf
     Slides_02_ISA.pdf
 Doc/Installing_bsc_Verilator_etc.{adoc,html}
 Exercises/
     Ex_03_B_Top_and_DUT/
     Ex_03_A_Hello_World/
 Code/
      src_Common/
      src_Drum/
      src Fife/
      src_Top/
      . . .
```

To compile and run the code for exercises, Drum and Fife, please make sure you have installed:

- bsc compiler (see https://github.com/B-Lang-org/bsc)
- Verilator compiler (see https://www.verilator.org/)

14: BSV: Combinational Circuits

# Two CPU implementations (microarchitectures): Drum and Fife



We start learning BSV by coding the fn\_XXX functions.

These are used in both Drum and Fife, and are all combinational circuits.

We start with fn Decode.

#### Inputs to fn\_Decode

The inputs to the Decode stage (see diagram on previous slide) are:

• (From IMem ("instruction-memory")): A 32-bit piece of data—a RISC-V instruction—that has become available by reading it from memory at the PC address. 1

© R.S.Nikhil Learn CPU design & BSV L4: BSV: Combinational Circuits

 $<sup>^{1}</sup>$ When implementing the so-called "C" RISC-V ISA extension ("compressed instructions"), instructions can also be 16-bits, but we ignore that for now.

#### Inputs to fn\_Decode

The inputs to the Decode stage (see diagram on previous slide) are:

- (From IMem ("instruction-memory")): A 32-bit piece of data—a RISC-V instruction—that has become available by reading it from memory at the PC address.<sup>1</sup>
- (Direct from Fetch stage): any additional information for this instruction that did not need to go to memory and back.

© R.S.Nikhil Learn CPU design & BSV L4: BSV: Combinational Circuits

 $<sup>^{1}</sup>$ When implementing the so-called "C" RISC-V ISA extension ("compressed instructions"), instructions can also be 16-bits, but we ignore that for now.

#### Inputs to fn\_Decode

The inputs to the Decode stage (see diagram on previous slide) are:

- (From IMem ("instruction-memory")): A 32-bit piece of data—a RISC-V instruction—that has become available by reading it from memory at the PC address.<sup>1</sup>
- (Direct from Fetch stage): any additional information for this instruction that did not need to go to memory and back.

We will use a BSV "struct" type (to be described soon) whenever we carry multiple pieces of data together.

Example: a memory request will carry a request-code (such as READ) and an address together.

© R.S.Nikhil Learn CPU design & BSV L4: BSV: Combinational Circuits

¹When implementing the so-called "C" RISC-V ISA extension ("compressed instructions"), instructions can also be 16-bits, but we ignore that for now.

The outputs from the Decode stage, as shown in the diagram are:

• Was the Fetch itself successful, or did it encounter a memory error; if so, what kind of memory error?

The outputs from the Decode stage, as shown in the diagram are:

- Was the Fetch itself successful, or did it encounter a memory error; if so, what kind of memory error?
- Is it a legal 32-bit instruction?

The outputs from the Decode stage, as shown in the diagram are:

- Was the Fetch itself successful, or did it encounter a memory error; if so, what kind of memory error?
- Is it a legal 32-bit instruction?
- If legal, what is its broad classification: Control (Branch or Jump)? Integer Arithmetic or Logic? Memory Access? This will help in choosing the next stage to which we must dispatch to execute the instruction.

14: BSV: Combinational Circuits

The outputs from the Decode stage, as shown in the diagram are:

- Was the Fetch itself successful, or did it encounter a memory error; if so, what kind of memory error?
- Is it a legal 32-bit instruction?
- If legal, what is its broad classification: Control (Branch or Jump)? Integer Arithmetic or Logic? Memory Access? This will help in choosing the next stage to which we must dispatch to execute the instruction.
- Does it have zero, one or two input registers ("rs1" and "rs2")? If so, which ones? This will help the next stage in reading registers.

The outputs from the Decode stage, as shown in the diagram are:

- Was the Fetch itself successful, or did it encounter a memory error; if so, what kind of memory error?
- Is it a legal 32-bit instruction?
- If legal, what is its broad classification: Control (Branch or Jump)? Integer Arithmetic or Logic? Memory Access? This will help in choosing the next stage to which we must dispatch to execute the instruction.
- Does it have zero, one or two input registers ("rs1" and "rs2")? If so, which ones? This will help the next stage in reading registers.
- Does it have zero or one output registers ("rd")? If so, which one? This will help the final Register Write stage in writing back a value to a register.

The outputs from the Decode stage, as shown in the diagram are:

- Was the Fetch itself successful, or did it encounter a memory error; if so, what kind of memory error?
- Is it a legal 32-bit instruction?
- If legal, what is its broad classification: Control (Branch or Jump)? Integer Arithmetic or Logic? Memory Access? This will help in choosing the next stage to which we must dispatch to execute the instruction.
- Does it have zero, one or two input registers ("rs1" and "rs2")? If so, which ones? This will help the next stage in reading registers.
- Does it have zero or one output registers ("rd")? If so, which one? This will help the final Register Write stage in writing back a value to a register.

To compute these values, we will need to extract "slices" of the 32-bit instruction (opcode, funct3, rs1, rs2, rd, ...) and compare them with binary constants.

# **BSV**: Integer literals (constants)

Integer literals use the same notation as in Verilog and SystemVerilog:

```
3'b010  // Binary literal, 3 bits wide
7'b_110_0011  // Binary literal, 7 bits wide
5'h3  // Hex literal, 5 bits wide
32'h3  // Hex literal, 5 bits wide
32'h_efff_0f17  // Hex literal, 32 bits wide (an AUIPC instruction)
'h23  // Hex literal, context determines width
```

# **BSV**: Integer literals (constants)

Integer literals use the same notation as in Verilog and SystemVerilog:

```
3'b010  // Binary literal, 3 bits wide
7'b_110_0011  // Binary literal, 7 bits wide
5'h3  // Hex literal, 5 bits wide
32'h3  // Hex literal, 5 bits wide
32'h_efff_0f17  // Hex literal, 32 bits wide (an AUIPC instruction)
'h23  // Hex literal, context determines width
```

When the size is omitted, bsc will infer the required size from the context, and extend it if necessary (zero-extend if the context requires a Bit#(n), sign-extend if Int#(n)).

#### **BSV**: Identifiers and comments

**Identifiers:** any sequence of alphabets, digits, and "\_" (underscore) characters, beginning with an alphabet (same as in most programming languages):

The upper/lower case of the first letter (always an alphabet) is important:

 $\bullet$  Uppercase first letter: constants (value constants, type constants).

#### Examples:

- Value constants: True, False, MEM\_RSP\_OK, ...
- Type constants: Bit, Int, Tuple2, Vector, ...
- Lowercase first letter: variables (value variables, type variables).
   Examples: x, y, tmp, pc, rg\_pc, has\_rs1, ...

14: BSV: Combinational Circuits

#### **BSV**: Identifiers and comments

**Identifiers:** any sequence of alphabets, digits, and "\_" (underscore) characters, beginning with an alphabet (same as in most programming languages):

The upper/lower case of the first letter (always an alphabet) is important:

 $\bullet$  Uppercase first letter: constants (value constants, type constants).

#### Examples:

- Value constants: True, False, MEM\_RSP\_OK, ...
- Type constants: Bit, Int, Tuple2, Vector, ...
- Lowercase first letter: variables (value variables, type variables).
   Examples: x, y, tmp, pc, rg\_pc, has\_rs1, ...

#### **Comments:** same as in Verilog/SystemVerilog/C/C++:

- "//" introduces a comment until end-of-line
- "/\*" and "\*/" bracket an unlimited amount of comment text (can span multiple lines)

# **BSV**: Introduction to Types



- Programs (and hardware modules) compute with Values.
- We group values into sets, which we call Types.
- Types themselves have a "type" (Kind):
  - those representing actual values (Value Kind)
  - those that describe some "size" feature of a type (Numeric Kind, shown in red)

Note: the numeric type "3" (shown in red) is distinct from the numeric value "3" (shown in black). There is never any ambiguity because they occur in distinct contexts: type expressions vs. value expressions.

# **BSV**: Introduction to Types

**BSV** has very strong *type-checking*: every operator, function and method declaration in **BSV** specifies the types of its arguments and results, and these are checked strictly by *bsc*.

# **BSV**: Introduction to Types

**BSV** has very strong *type-checking*: every operator, function and method declaration in **BSV** specifies the types of its arguments and results, and these are checked strictly by *bsc*.

Every expression, statement, rule, module, ... in **BSV** is described by a *type expression* (or just "type" for short). Types can nested to arbitrary depth:

```
type ::= type-constructor #( type, ..., type )
```

A type-constructor always begins with an upper-case letter (is a type constant).

For each *type-constructor*, each *type* argument (parameter) is fixed to be either of value kind or numeric kind. For example,

- In Bit #(n), n is always of numeric kind.
- In Vector #(n,t), n is always of numeric kind, t is always of value kind.
- In Tuple3 #(t1,t2,t3), all three parameters are always of value kind.

# **BSV**: Bit-vectors and declaring identifiers

- The basic type in any hardware design language is the bit-vector (a vector of *n* bits) to be treated as a single entity. Bit-vectors are carried on wires (*n*-bit vectors on *n* wires), stored in registers, memories and other state elements.
- The type of a bit-vector of n bits in **BSV** is written: Bit#(n).
- We can declare identifiers with a type just like in Verilog, SystemVerilog and C, with an initialization:

```
Bit #(32) pc_val = ?;
Bit #(32) pc_val = 32'h_8000_0000;
Bit #(32) pc_val = 'h_1000;
```

Line 1: we let bsc pick an initial value (usually picks 'h\_AAAA...\_AAAA to stand out during debugging).

Line 2: the initial value is specified as an exactly 32-bit value, which matches the declared type of the identifier.

Line 3: the constant does not specify a width; *bsc* will infer that it should be 32 bits, and will zero-extend accordingly. Note: *bsc* will not truncate a too-large constant; it will give an error message instead.



#### Exercise break

Please see directory: and its README.

Exercises/Ex\_04\_A\_Bit\_Vectors/

# **BSV**: Extracting smaller bit-vectors ("slicing"), or individual bits, from a bit-vector

```
Bit #(12) page_offset = pc_val [11:0];
Bit #(1) pc_lsb = pc_val [0];
Bit #(1) pc_msb = pc_val [31];
```

bsc checks that the bit-widths match exactly and reports an error otherwise. (there is no silent bit-extending or truncating).



#### Exercise break

Please see directory: and its README

Exercises/Ex\_04\_B\_Bit\_Vectors\_Slicing/

# BSV: Operators on bit-vectors

Left- and right-arguments must have same Bit#(n) type.

Comparison ops: result type is Bool

Arithmetic ops: result type is same as argument types

```
x = a + b - c * d; // add, subtract, multiply
```

Bitwise logic ops: result type is same as argument types:

Left- and Right-Shifts:

```
x = (a << 3) & (b >> 14);
```



# Explicit extension and truncation

```
y = zeroExtend (x);
y = signExtend (x);
y = extend (x);
x = truncate (y);
```

- x and y must both be Bit#(..) or both be Int#(..)
- Bit-width of y must be  $\geq$  bit-width of x
- extend will zero-extend for Bit#(..) and sign-extend for Int#(..)

# **BSV**: the Bool type

Bool: the type of a boolean values, written True and False.

#### Operators

&& (boolean/logical AND)

| | (boolean/logical OR)

! (boolean/logical NOT)

Bool, Bit#(1) and Int#(1) are distinct types, and cannot be mixed!

CAUTION:

The boolean/logical operators &&, || and ! operate on Bool types and are distinct from the bit-wise logic operators mentioned earlier (such as &), which operate on Bit#(n) types.

Bitwise comparison operators, such as  $(a \le b)$  take Bit#(n) arguments and produce Bool results.

# **BSV**: Integer types

```
Bit #(n) // bit-vectors, bounded to n bits
Int #(n) // signed integers, bounded to n bits
UInt #(n) // unsigned integers, bounded to n bits
Integer // Mathematical integers (unbounded, no bit-width limit)
```

- We rarely use UInt#(n) because they are almost the same as Bit#(n).
- Integer is used for values that are only meaningful at compile time and never represented in hardware (such as the size of a vector of interfaces or modules).

#### **BSV**: User-defined functions

Syntax of function declarations is conventional (similar to Verilog, SystemVerilog, C):

```
function Action print_BV_BV_Bool (String op, Bit #(4) a, Bit #(4) b, Bool result);
   $display (" %s: %04b %04b => %d or ", op, a, b, result, fshow (result));
endfunction
```

Syntax of function application is conventional (similar to Verilog, SystemVerilog, C):

```
...
print_BV_BV_Bool ("==", a, b, a == b);
...
```

In this example, the result type is Action. This is used for functions that are pure side-effects: they perform some action and don't return any value.



#### Exercise break

Please see directory: and its README.

 ${\tt Exercises/Ex\_04\_C\_Bit\_Vectors\_Operations/}$ 

#### **BSV**: User-defined functions have zero incremental hardware cost

In software, functions have some "function-calling overhead" because they perform some actions dynamically (allocate/deallocate stack frame, save/restore registers, move values to and from argument and result registers, ...).

In BSV functions are inlined wherever they are used, so there is no incremental hardware cost.

Takeaway: use functions liberally, to improve clarity, readability, reusability.

20 / 37

© R.S.Nikhil Learn CPU design & BSV L4: BSV: Combinational Circuits

# Example: recognizing a legal BRANCH insruction: code

| 31 27        | 26 <b>2</b> 5 | 24  | 20 | 19  | 15 | 14   | 12  | 11            | 7      | 6    | 0   |                    |
|--------------|---------------|-----|----|-----|----|------|-----|---------------|--------|------|-----|--------------------|
| imm[12 10:5  | 5]            | rs2 |    | rs1 |    | func | ct3 | imm[4         | :1 11] | opco | ode | B-type             |
|              |               | -   |    |     |    |      |     |               |        |      |     |                    |
| [12 10:5]    | ]             | rs2 |    | rs1 |    | 000  | )   | $_{ m imm}[4$ | :1 11] | 1100 | 011 | $_{ m BEQ}$        |
| imm[12 10:5] | [             | rs2 |    | rs1 |    | 00   | 1   | $_{ m imm}[4$ | :1 11] | 1100 | 011 | $_{ m BNE}$        |
| imm[12 10:5] |               | rs2 |    | rs1 |    | 100  | )   | $_{ m imm}[4$ | :1 11] | 1100 | 011 | $_{ m BLT}$        |
| imm[12 10:5] | 1             | rs2 |    | rs1 |    | 10   | 1   | $_{ m imm}[4$ | :1 11] | 1100 | 011 | $_{\mathrm{BGE}}$  |
| imm[12 10:5] |               | rs2 |    | rs1 |    | 110  | )   | $_{ m imm}[4$ | :1 11] | 1100 | 011 | BLTU               |
| imm[12 10:5] |               | rs2 |    | rs1 |    | 11   | 1   | $_{ m imm}[4$ | :1 11] | 1100 | 011 | $_{\mathrm{BGEU}}$ |

# Example: recognizing a legal BRANCH insruction: schematic



Note: the schematic is at "RTL" level; it does not go down to the level of AND-OR-NOT gates, just to bit-vector operators which will be implemented in terms of such gates by a synthesis tool.

22 / 37

© R.S.Nikhil Learn CPU design & BSV L4: BSV: Combinational Circuits



#### Exercise break

Please see directory: and its README.

Exercises/Ex\_04\_D\_is\_legal\_XXX/

# Combinational circuits; pure *vs.* side-effecting functions; Action and ActionValue types

The function is\_legal\_BRANCH() is an example of a *combinational circuit*: an acyclic interconnection of primitive gates (such as AND, OR, NOT).

(More generally: interconnects of RTL-level binary operators on bit-vectors, since they are themselves combinational circuits).

- Combinational circuits do not have any side-effects—they do not modify any state elements. They are also said to be pure functions.
- We idealize combinational circuits as being "instantaneous" (zero time). In practice, because of physics, there will be a *propagation delay* for a change in an input signal to effect a change in the output, but as long as this is less than the clock period, we can regard it as instantaneous.
- Pure functions can be replicated or shared (un-replicated) without changing the functional meaning of the circut (replication/un-replication may have a non-functional implication: silicon size, combinational delay, power consumption, ...).
- In **BSV** circuits that may have a side-effect (may update a state element) always have type Action or ActionValue#(t). Conversely, if a circuit's type does not involve Action or ActionValue, it is guaranteed to be pure.

## StmtFSM: a useful facility for testbenches

Many simple testbenches just involve performing a sequence of actions (providing stimulus/input to the DUT (Design Under Test). This is conveniently expressed using the following **BSV** idiom:

We will discuss StmtFSM in more detail later (when we talk about the Drum CPU). For now, just use the above idiom as-is.





#### Exercise break

Please see directory: and its README.

 ${\tt Exercises/Ex\_04\_E\_FSM\_Testbench/}$ 

# User-defined types: enum types



In the "execute" stage, we have several alternative paths:

Direct (for SYSTEM instructions)

27 / 37

- Control
- Integer arithmetic and logic
- Memory

The Decode stage computes a code that indicates which path should be taken.

The code is defined as an enum type:

```
src_Common/Inter_Stage.bsv: line 39 ... ____
typedef enum {OPCLASS_SYSTEM.
                                // EBREAK, ECALL, CSRRxx
             OPCLASS_CONTROL.
                                // BRANCH, JAL, JALR
             OPCLASS_INT.
             OPCLASS_MEM,
                                // LOAD, STORE, AMO
             OPCLASS FENCE}
                                // FENCE
OpClass
deriving (Bits, Eq, FShow);
```

# User-defined types: enum types

- These are symbolic (more human-readable) constants for the alternatives
- Because of "deriving (Bits)" bsc will represent them in 3 bits (3'b\_000 ... 3'b\_100)<sup>2</sup>
- However, OpClass is a new type, distinct from Bit#(3)
- You can use "pack (OPCLASS\_MEM)" if you really want its Bit#(3) representation (3'b\_011)
- Because of "deriving (Eq)" you can directly compare two OpClass values for equality ("==") and inequality ("!=")
- Because of "deriving (FShow)" you use "fshow()" on an OpClass value in \$display() statements to print the symbolic name (otherwise, it will print the Bit#(3) representation)

# If-then-else (hardware multiplexers)

```
function OpClass instr_opclass (Bit #(32) instr);
    OpClass result;
    if (is_legal_BRANCH (instr) || is_legal_JAL (instr) || is_legal_JALR (instr))
        result = OPCLASS_CONTROL;
    else
        result = OPCLASS_INT;
    return result;
endfunction
```



#### Alternative notations for if-then-else

#### Conditional expressions:

```
function OpClass instr_opclass (Bit #(32) instr);
  return ((is_legal_BRANCH (instr) || is_legal_JAL (instr) || is_legal_JALR (instr))
      ? OPCLASS_CONTROL
      : OPCLASS_INT);
endfunction
```



See also "case-endcase" expressions in the book and BSV Reference Guide.

# Nested conditionals ⇒ cascaded multiplexers

```
function Bool instr_opclass (Bit #(32) instr);
  OpClass result;
  if (is legal BRANCH (instr)
       || is legal JAL (instr)
       || is legal JALR (instr))
      result = OPCLASS_CONTROL;
  else if (is legal OP (instr)
            || is_legal_OP_IMM (instr)
            || is legal LUI (instr)
            || is legal AUIPC (instr))
      result = OPCLASS INT:
  else if (is legal LOAD (instr)
            || is legal STORE (instr))
      result = OPCLASS_MEM:
  else if (is legal ECALL (instr)
            || is legal EBREAK (instr)
            || is legal MRET (instr)
            | | is legal CSRRxx (instr))
      result = OPCLASS SYSTEM:
   return result:
endfunction
```



# Parallel muxes (AND-OR muxes, balanced muxes)

Cascaded multiplexers form an "unbalanced tree". We can balance the tree for a multiplexer with shorter combinational paths.

**Note:** this relies on the conditions being mutually exclusive and complete (exactly one of them is true):





#### Exercise break

Please see directory: and its README.

Exercises/Ex\_04\_F\_Enums\_Muxes/

# Sharing code for RV32I and RV64I using type synonyms and macros

```
// type synonym: new name for numeric type 32
typedef 32 XLEN;

Bit #(XLEN) pc_val;
Bit #(XLEN) rs1_val;
Bit #(XLEN) rs2_val
Bit #(XLEN) rd_val;
```

Edit  $32 \rightarrow 64$  for RV64

The following can automate the typedef of XLEN during compilation:

```
in src_Common/Arch.bsv

'ifdef RV32

typedef 32 XLEN;

'elsif RV64

typedef 64 XLEN;

'endif

Integer xlen = valueOf (XLEN);
```

# Conditional compilation with values instead of 'ifdef

```
For SLLI, SRLI and SRAI instructions, the "shift amount" (shamt):
```

- $\bullet$  is 5 bits (instr[24:20]) in RV32I, and instr[25] must be 0
- is 6 bits (instr[25:20]) in RV64I, and instr[25] can be 0 or 1

If instr[25] is 1, it is illegal in RV32I. We can use xlen to test this in the decode function.

```
in src Common/Instr Bits bsv
function Bool is_legal_OP_IMM (Bit #(32) instr);
   let funct3 = instr_funct3 (instr);
   let funct7 = instr funct7 (instr):
   Bool is_legal_SLLI = (((xlen == 32) && (funct7 == 7'b000_0000))
                         | | ((xlen == 64) & (funct7 [6:1] == 6'b0)));
   Bool is legal SRxI = (( (xlen == 32) && ((funct7 == 7'b010 0000)
                                              || (funct7 == 7'b000_0000)))
                         | | ((xlen == 64) && ((funct7 [6:1] == 6'b01 0000)) |
                                              || (funct7 [6:1] == 6'b00_0000)));
   return ((instr_opcode (instr) == opcode_OP_IMM)
           && ((funct3 == funct3_SLLI)
              ? is_legal_SLLI
               : ((funct3 == funct3_SRxI)
                 ? is_legal_SRxI
                  : True))):
endfunction
```

#### Conditional compilation with values instead of 'ifdef: zero cost

Conditional compilation with values instead of 'ifdef is preferable for readability as well as avoiding well known problems with macros ((scoping, inadvertant variable capture, inadvertant surprises due to associativity of infix operators, and so on)).

But is there a hardware cost (multiplexer for conditional)?

No, because an expression like "xlen==32" can, and is, statically evaluated to True or False by *bsc*, and the whole conditional is reduced to just the relevant arm (the conditional disappears).

# End

