# Lab 1: Hardware Design and Verification CPE 333 - S23

In this lab, you will verify and debug designs for several hardware components. The primary objective of this lab is to introduce you to hardware verification methodologies using SystemVerilog. You will learn how to verify and design the following hardware components: An eight-bit multiplier, a synchronous FIFO, a content addressable memory (CAM), and a cacheline adapter.

#### What is Verification

When designing a digital circuit in a hardware description language (HDL), we are attempting to describe a hardware component whose behavior will comply with a high level description of an intended behavior (i.e. a specification). Hardware verification is a process which attempts to ensure that a design's behavior matches a specified behavior.

VERIFICATION IS HARD. Digital hardware verification is a hard <sup>1</sup> problem. For example, consider the collection of Boolean functions  $B_n = \{f | f : \{0,1\}^n \to \{0,1\}\}$ . These are the functions with n binary inputs and a binary output. How would you go about writing a program which takes as input an element of  $B_n$  (i.e. the specification of the function), and a SystemVerilog description of a digital circuit (i.e. the design), and then determine whether or not the design matches the specification? Can you come up with something significantly better than iterating through all  $2^n$  possible function inputs and ensuring that the output of the design matches the output of the specification? <sup>2</sup>

VERIFICATION IS NECESSARY. We all have experienced buggy software, which ultimately required significant effort for debugging. This is also true for hardware development. However, hardware verification can be even more challenging and expensive.

There are numerous reasons, including the following 3:

- Fabrication cost are much higher for hardware than for software
- Hardware bug fixes after delivery to customers are challenging/impossible.
- Quality expectations are usually higher for hardware than for software. (e.g. human safety is a risk if the hardware does not work properly).

<sup>1</sup> coNP-Complete

<sup>2</sup> If you can, please give CPE333 a shout out as you claim your \$1M prize

<sup>3</sup> Thomas Kropf, editor. Formal Hardware Verification - Methods and Systems in Comparison, volume 1287 of Lecture Notes in Computer Science, 1997. Springer. ISBN 3-540-63475-4. DOI: 10.1007/3-540-63475-4. URL https://doi.org/10.1007/3-540-63475-4

• Time to market severely affects potential revenue.

VERIFICATION IS NOT VALIDATION. Validation is a similar but different process. While verification is a process to ensure that a design matches **its** specification, validation is a process that ensures that a design matches **a** specification.

Consider the case where a truck is designed to meet a specification of being able to haul 20 tons of material. The truck designers at ACME Truck Co. must *verify* that their trucks can haul 20 tons. Likewise, ACE Hauling Co. requires a truck which can haul 25 tons. Thus the engineers at ACE Hauling Co. must *validate* that the ACME Truck Co.'s truck can haul 25 tons.

How to DO VERIFICATION. There are three main tasks to verification <sup>4</sup>:

- Stimulate a design by providing sequences of stimuli.
- Check that the design outputs results in accordance with the specification.
- Measure how much of a design's execution state space <sup>5</sup>

In this lab, you will complete these tasks using dynamic simulation <sup>6</sup>. You will use specifications to generate (sometimes random) sequences of input stimuli, create checkers which confirm that the output of the design under test (DUT) conforms to the specification, and record DUT accuracy and coverage (scoreboard).

A SIMPLE VERIFICATION EXAMPLE. To demonstrate dynamic simulation, we will use the simple combinational circuit shown in Listing 1. We will verify that the module *comfunction* actually implements the truth-table shown in the comments of Listing 1. A truth-table is an example of a specification which describes the intended behvario of the circuit <sup>7</sup>.

- <sup>4</sup> Erik Seligman, Tom Schubert, and M V Achutha Kiran Kumar. Formal Verification: An Essential Toolkit for Modern VLSI Design. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2015. ISBN 9780128008157
- <sup>5</sup> The full space of all RTL state and input values.
- <sup>6</sup> In dynamic simulation, the design is simulated using cycle or gate level simulators, stimuli consist of sequences of input signals to the device under test, and outputs are verified against the specification using assertions. This is in contrast to formal verification techniques which use mathematical representations of the design, along with assumptions about possible inputs and states, to constrain the test space. By partitioning the execution state space into reachable space and unreachable space, formal verification techniques often drastically reduce the size of the space that needs to be tested and then use automated techniques to prove properties about the circuit.
- <sup>7</sup> In this case, the specification is a formal specification, written in a formal language. Often an initial specification will not be formalized so nicely.

```
input logic a_i,
input logic b_i,
input logic c_i,
output logic x_o
);

assign x_o = a_i ^ b_i ^ (a_i & c_i);
endmodule : combfuction
```

Listing 1: Simple Combinational Circuit

GENERATING THE STIMULUS. Combinational circuits have no initial or intermediate state, the size of the input and the output are fixed, and the runtime is constant. To verify the design, we can simply run through all possible inputs <sup>8</sup> and verify that the DUT generates the proper outputs, shown in Listing 2.

```
initial begin
for (int i = 0; i < 4'b1000; ++i) begin
{a_i, b_i, c_i} = i[2:0];
#1;
end
end</pre>
```

Listing 2: Generating the Stimulus

Modeling the Correct Behavior. In addition to generating all possible inputs, we also must create a model for the circuits specified behavior. Listing 3 shows an example model. For a larger circuit with more inputs, we could also implement the model by loading a truth table into a memory indexed by the inputs. <sup>9</sup>

```
function logic spec_output(logic a, logic b, logic c);

case ({a, b, c})

3'b000: return 0;

3'b011: return 0;

3'b010: return 1;

3'b100: return 0;

3'b100: return 1;

3'b100: return 1;

3'b101: return 0;

3'b101: return 0;

3'b111: return 1;

default: $error("Invalid input to spec_output function");

endcase
endfunction
```

Listing 3: Modeling the Correct Behavior

CHECKING THE OUTPUTS. Finally, we can use a for loop which generates the input stimuli to check that the output of the DUT matches the output of the model (Listing 4).

```
initial begin
for (int i = 0; i <= 4'b1000; ++i) begin</pre>
```

<sup>&</sup>lt;sup>8</sup> in time exponential to the number of inputs

<sup>&</sup>lt;sup>9</sup> Note that on line 11, we have a default case since logic encodes four-states. Thus if the input to the function is mistakenly *x* or *z* we can display an error showing our test bench is at fault, rather than the DUT.

Listing 4: Checking the Outputs

PUTTING THIS ALL TOGETHER, we can write our testbench to verify *combfunction* (Listing 5).

```
function logic spec_output(logic a, logic b, logic c);
         case ({a, b, c})
             3'b000: return 0;
             3'b001: return 0;
             3'b011: return 1;
            3'b010: return 1;
            3'b110: return 0;
             3'b100: return 1;
             3'b101: return 0;
9
             3'b111: return 1;
10
             default: $error("Invalid input to spec_output function");
11
         endcase
12
    endfunction
13
14
    module combfunction_tb;
15
         timeunit 1ns;
16
         timeprecision 1ns;
17
18
         logic a_i, b_i, c_i, x_o;
19
         combfunction dut(.*);
21
22
         initial begin
23
             reset = '1;
             // Generate sequence of inputs
25
             for (int i = 0; i <= 4'b1000; ++i) begin</pre>
26
                 // Set input values to the dut, and let combinational logic
27
        settle
                 \{a_{-}i, b_{-}i, c_{-}i\} = i[2:0];
                 #1;
29
                 reset = '0;
                 // Check dut output vs specification output
31
                 output\_equiv: assert (x_o == spec\_output(a_i, b_i, c_i))
                                else $error("With {a, b, c}=%b, dut outputs: %b
       while spec outputs: %b",
                                             {a_i, b_i, c_i}, x_o, spec_output(
34
        a_i, b_i, c_i));
             end
35
             $finish;
36
37
    endmodule : combfunction_tb
```

Listing 5: Testbench for combfunction

VERIFYING A SEQUENTIAL CIRCUIT. When verifying a small circuit we can exhaust all possible inputs simply by iterating through each possible input combination. Consider a sequential circuit that takes an arbitrarily large input serially. Clearly, verifying the circuit by simply monitoring the input and outputs is insufficient, since the circuit can potentially process infinitely many different input "strings". Listing 6 shows an example serial circuit.

```
module div5 (
        input logic clk,
        input logic rst,
        input logic serial_in,
        input logic run,
        output logic decision
    ):
    logic [2:0] state;
    localparam logic [2:0] initial_state = '1;
10
    always_ff @(posedge clk) begin
12
       if (rst) begin
            state <= initial_state;</pre>
14
        end
15
        else if (run) begin
16
          case (state)
17
                initial_state,
18
                3'b000: state <= serial_in ? 3'b001 : 3'b000;
19
                3'b001: state <= serial_in ? 3'b011 : 3'b010;
                3'b010: state <= serial_in ? 3'b000 : 3'b100;
21
                3'b011: state <= serial_in ? 3'b010 : 3'b001;
22
                3'b100: state <= serial_in ? 3'b100 : 3'b011;
23
            endcase
24
        end
        else begin
26
            state <= initial_state;</pre>
27
28
    assign decision = state == 3'b000;
31
32
    endmodule : div5
```

Listing 6: A Sequential circuit with binary string input

Listing 6 is a SystemVerilog representation of a sequential circuit <sup>10</sup> which "decides" if the input string is divisible by five. On completion of the input processing, the output *decision* should go high. Similarly, if the input string is not divisible by five, then on finishing the processing of the input, the output *decision* should go low.

Since there is no limit on how long input strings can be, if we test the functionality by looking only at inputs and outputs of the design module, then we can only give guarantees qualified by a certain input size (e.g. "all inputs of less than 16-bits produced the correct outputs"). Luckily, we can verify whether this design is functionally correct without qualifications. Rather than specifying that the design

<sup>&</sup>lt;sup>10</sup> specifically a Deterministic Finite Automaton (DFA)

produces a certain output signal based on the sequence of input signals, we instead specify that the design implements a specific DFA which we can prove decides the DIV5 problem (language). Therefore we only need to verify that the design implements the DFA.

The DFA that we implement has six states, five of which are labeled o through 4, which represent the value of the in-process inptu string mod 5. The sixth state is the initial state, labeled s. The next state transition function,  $\delta$ , which takes the current state i and the input bit b as follows:  $\delta(i,b) = (2i+b) mod 5$  if  $i \in \{0,1,2,3,4\}$  else b

An input string is divisible by five iff the DFA moves to state o upon processing the last (lsb) bit in the string. We consider the DFA to consume its input string from left-to-right (i.e. msb first). <sup>11</sup>

Therefore to verify the design, we must move the design into every possible state it can enter, and then ensure the transitions from these states are correct. <sup>12</sup>

GENERAL TESTBENCH COMPONENTS. In the prior examples, the verification steps of input stimulus generation, driving the DUT and mode, and comparing the results were implemented "in-place" using basic modules. As we scale the complexity of the designs, you may use a universal verification model (UVM), where we seperate functionality of the verification process into multiple indepdent parts:

- A *Sequencer* generates input stimuli, independent of the bus or interface used by the DUT.
- A *Driver* generates the bus or interface control input stimuli and transfers the sequencer's data to the DUT.
- A Monitor acts as a complement of the driver. The monitor observes and collects the inptus and outputs of the DUT to identify when a transaction is complete and ready to be evaluated by the scoreboard.
- A *Scoreboard* consumes the output of the monitor and evaluates whether the DUT produced the appropriate value. The scoreboard may also measure testing coverage.

In this lab, you will see designs which you will be able to fully cover (similar to *combfunction* above) and design whose execution state space is too large to fully cover.

<sup>11</sup> Given a binary w string, after every symbol is read, the DFA state is k = w mod 5, where w is interpreted as a binary number

Proof: Let w be an arbitrary binary string. Assume for every string x, s.t.  $1 \le |x| \le |w|$ , the DFA state is  $k = x \mod 5$ .

- Suppose w = 0. Then the DFA state o = w mod 5.
- Suppose w = 1. Then the DFA state  $1 = w \mod 5$ .
- Suppose w = {x,0}, for some binary string x, s.t. |x||w|. By the inductive assumption, the DFA is in state k = x mod 5. After processing w = {x,0} from left-to-right, the DFA's next state is:

$$\delta(k,0) = (2k+0) \mod 5$$

$$= (2(x \mod 5)) \mod 5$$

$$= 2x \mod 5$$

$$= w \mod 5$$
(1)

• Suppose  $w = \{x, 1\}$ , for some binary string x, s.t. |x||w|. By the inductive assumption, the DFA is in state  $k = x \mod 5$ . After processing  $w = \{x, 1\}$  from left-to-right, the DFA's next state is:

$$\delta(k,1) = (2k+1) \mod 5$$
=  $(2(x \mod 5 + 1)) \mod 5$ 
=  $(2x+1) \mod 5$ 
=  $w \mod 5$ 
(2)

Therefore the DFA state (k) is  $w \mod 5$  for any non-empty binary string w.

<sup>12</sup> These combinations of internal design state and input signals are called "coverage points" (or "coverages" of "covers').

### Verifying a Multiplier

In this part of the lab, you will verify a multiplier (i.e. an unsigned add-shift multiplier).

Specification. The multiplier has the following port listing:

- clk\_i is the clock which drives the sequential logic in the multiplier
- reset\_n\_i is an active low, synchronous reset signal. If this signal
  is asserted at @(posedge clk\_i), the multiplier should halt any
  ongoing multiplication and reset its state to allow for the start of a
  new multiplication.
- multiplicand\_i and multiplier\_i are the input operands for the
  multiplication. When a multiplication begins, these signals are
  registered in the multiplier and thus are not required to be continuously asserted throughout the multiplication.
- start\_i begins a new multiplication if it is asserted at @(posedge clk\_i) and the multiplier is in a 'ready' state. If the multiplier is not in a 'ready' state, assertion of this signal has no effect.
- ready\_o asserts that the multiplier is in a 'ready' state and can begin a new multiplication. product\_o contains the 2 \* width\_p bit output of the multiplication when the multiplier is in a 'done' state.
- done\_o is asserted when the multiplier is in a 'done' state. This
  occurs when multiplication is complete, meaning (product\_o
  contains the product of the registered input operands OR a synchronous reset has occurred), AND a new multiplication has not
  been started.

Figure 1 shows the timing diagram for this protocol. The number of cycles that the multiplier takes to complete the multiplication is not specified.



Figure 1: Multiplier Timing Diagram

COVERAGES. Your testbench must cover at least the following:

- From a 'ready' state (see types.sv), assert start\_i with every possible combination of multiplicand and multiplier, and without any resets until the multiplier enters a 'done' state (resets while the device is in a 'done' state are acceptable).
- For each 'run' state s, assert the start\_i signal while the multiplier is in state s.
- For each 'run' state s, assert the active-low reset\_n\_i signal while the multiplier is in state s.

ERROR REPORTING. Your testbench must detect the following errors (defined in types.sv):

- Upon entering the 'DONE' state, if the output signal product\_o holds an incorrect product, report a BAD\_PRODUCT error.
- If the ready\_o signal is not asserted after a reset, report a NOT\_READY error.
- If the ready\_o signal is not asserted upon completion of a multiplication, report a NOT\_READY error.
- To report an error, pass the appropriate error type to report\_error task defined in testbench.sv. An example is given below.

```
assert (/* your assertion here */)
else begin
serror ("%0d: %0t: BAD_PRODUCT error detected", '__LINE__, $time);
report_error (BAD_PRODUCT);
end
```

Listing 7: example for reporting error

CLOCKING BLOCKS. In SystemVerilog, clocking blocks are an abstraction used to capture precise timing information and allow the verification engineer to write verification code at the 'cycle' level. The clocking blocks allow you to specify input and output skews, but in this lab you only use them to specify clocks. When using a default clocking construct, signals should be assigned using non-blocking assignments. Further, you can insert a delay of N cycles using the syntax (N). To delay until some condition holds, use the 'if and only if' keyword: @(<clk> iff <condition>);.

CLOCKING BLOCKS. In order to facilitate checking of your testbench (i.e. autograding), your testbench should set signal values only at time o (the beginning of an initial block) or using the tb\_clk clock as described in the Clocking Blocks section above. Additionally, at time o, your testbench must assert the reset\_n\_i signal.

SAMPLING SIGNALS. All time delaying constructs should be associated with this default clock. That is, they should either be of the form ##(n), which waits for n cycles with respect to the clocking block, or @(tb\_clk [iff predicate>]) which delays for a single cycle, or delays until 'predicate' is evaluated true with samples taken with respect to the clocking block.

For example, the following are appropriate procedural blocks for your testbench:

```
initial reset_n = 0;  // initialize reset signal
initial begin

##(5);  // Ensure DUT is reset

reset_n <= 1;

multiplicand_i <= 16;

multiplier_i <= 32;  // NBA: signals still have their initial values

@(tb_clk);  // Wait for clock signal (could use '##(1)')

// Now, when the values get assigned

end

always @() begin

sdisplay("SystemVerilog Functions cannot block");
end</pre>
```

Listing 8: Example of appropriate procedural blocks for your testbench

Listing 9: Example of inappropriate procedural blocks

#### Verifying a FIFO

In this part of the lab, you will write a testbench to verify a synchronous FIFO with a single enqueuer and a single dequeuer. A FIFO is called 'synchronous' when the enqueue clock and the dequeue clock are the same. <sup>13</sup> The FIFO is described in fifo.sv. In this exercise, you will design a test bench to verify this design.

<sup>13</sup> If the clocks are distinct, then it is an asynchrnous FIFO, and much more complicated

Specification The FIFO implements a valid-ready enqueue protocol, and a valid-yumi dequeue protocol, and has the following port listing:

- clk\_i is the clock which drives the sequential logic in the fifo.
- reset\_n\_i is an active low, synchronous reset signal. If this signal is asserted at @(posedge clk\_i), the FIFO sets itself to 'empty'.
- The valid-ready protocol is:
  - data\_i contains the enqueued data word.
  - valid\_i is asserted by the enqueuer to enqueue data\_i into the FIFO.
  - ready\_o asserts that the FIFO is not full and has capacity to enqueue a word. The behavior when item valid\_i is asserted while the FIFO is full is undefined and should be avoided.
- The valid-yumi protocol is:
  - valid\_o asserts that the FIFO is not empty and that the value on data\_o is the oldest word stored in the FIFO.
  - yumi\_i is asserted by the dequeuer to signal to the FIFO that the word in data\_o must be removed from the FIFO.

Figure 2 shows a timing diagram for the fifo.



Figure 2: FIFO Timing Diagram

COVERAGES. Your testbench must cover at least the following for the FIFO with capacity cap\_p:

- You must enqueue words while the FIFO has size in [0, cap\_p-1].
- You must dequeue words while the FIFO has size in [1, cap\_p].
- You must simultaneously enqueue and dequeue while the FIFO has size in [1, cap\_p-1].

ERROR REPORTING. Your testbench must detect the following errors (defined in types.sv):

- Asserting reset\_n\_i at @(tb\_clk) should result in ready\_o being high at @(posedge clk\_i). If it is not, report the appropriate error.
- When asserting yumi\_i at @(tb\_clk) when data is ready, the value on data\_o is the CORRECT value. If not, report the appropriate error. Recall that asserting yumi\_i when the FIFO is empty results in undefined behavior, so avoid doing this.

To report an error, pass the appropriate error type to report\_error task defined in testbench.sv. An example is given below.

```
assert (/* your assertion here */)
else begin
serror ("%0d: %0t: %s error detected", '__LINE__, $time, err.name);
report_error (err);
end
```

Listing 10: Example error reporting

### Verifying a CAM

In this part of the lab, you will write a testbench to verify a content addressable memory (CAM). A CAM is similar to an 'associative array' abstract data type in software, with the distinction that a CAM is of fixed size. A CAM, then, is a collection of key-value pairs, and supports read and write operations. When reading a CAM, a key is provided, and the CAM responds with the appropriate value, or a signal indicating that there is no value associated with the key in the CAM. On a write, both a key and a value are provided, and these get stored into the CAM.

Since a CAM has a fixed number of entries (eight - for this lab), some type of 'replacement policy' must be used when writing a new key to a full CAM. The replacement policy used by the CAM in this lab is the 'least recently used' (LRU) policy, which evicts (removes) the entry whose key was least recently used by a read or write. On writes, a CAM takes different actions depending on whether the key is already present in the CAM, and whether the CAM is full.

- If the key is present in the CAM, the value associated with the key is updated.
- If the key is not present and the CAM is not full, then a new entry
  is allocated and both the key and value are stored into this new
  entry.
- If the key is not present and the CAM is full, then an entry is evicted, meaning the new key and value are written in the location of the previous entry.

In all write cases, metadata associated with the replacement policy is updated.

Specification. The CAM has the following port listing:

- **clk\_i** is the clock which drives the sequential logic in the CAM.
- reset\_n\_i is an active low, synchronous reset signal. If this signal is asserted at @(posedge clk\_i), the CAM resets itself to 'empty'.
- rw\_n\_i decides whether the operation is a read (if set to 1'b1) or a write (if set to 1'b0). This value has no effect on the CAM unless valid\_i is asserted.
- valid\_i is asserted when a read or write operation is performed.
- **key\_i** is the key input used by both read and write operations.
- val\_i is the value input used by write operations.
- val\_o is the output value on reads.

 valid\_o is asserted by the CAM on reads to assert that the value in val\_o is correct (that is, the CAM found a value associated with key\_i).

Write and read operations are serviced at the rising edge of clk\_i. That is, the CAM updates its internal state (both key-value pairs as well as LRU metadata) sequentially. Additionally, the CAM guarantees that val\_o and valid\_o show the correct value on a read at the rising edge of clk\_i.

COVERAGES. Your testbench must cover at least the following:

- The CAM must evict a key-value pair from each of its eight indices.
- The CAM must record a 'read-hit from each of its eight indices.
- You must perform writes of different values to the same key on consecutive clock cycles.
- You must perform a write then a read to the same key on consecutive clock cycles.

Error Reporting Your testbench must detect the following errors:

Assert a read error when the value read from the CAM is incorrect.

To report an error, pass the appropriate error type to itf.tb\_report\_dut\_error task defined in cam\_itf.sv. An example is given below:

```
1 @(clk);
2 assert (itf.val_o == val) else begin
3     itf.tb_report_dut_error(READ_ERROR);
4     $error("%0t TB: Read %0d, expected %0d", $time, itf.val_o, val);
5 end
```

Listing 11: Example error reporting

## References

Thomas Kropf, editor. Formal Hardware Verification - Methods and Systems in Comparison, volume 1287 of Lecture Notes in Computer Science, 1997. Springer. ISBN 3-540-63475-4. DOI: 10.1007/3-540-63475-4. URL https://doi.org/10.1007/3-540-63475-4.

Erik Seligman, Tom Schubert, and M V Achutha Kiran Kumar. *Formal Verification: An Essential Toolkit for Modern VLSI Design.* Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2015. ISBN 9780128008157.