$j.schmaltz@tue.nl\\ www.win.tue.nl/\sim jschmalt/$ 

This third assignment is about verifying a simple micro-processor using Cadence IFV and System Verilog Assertions. You need to hand in your solution (a .tar.gz archive containing your design, the necessary ifv files, and a short plain text file describing what you did) before Monday October 24th 2016 noon.

The assignment follows two steps:

- 1. Design and verify an ALU
- 2. Extend the ALU to a small execution unit

The assignment will take you through these different steps. Note that this is the third assignment, so it has much less details than the previous ones. You are now the verification expert. It is up to you to fill most of the gaps!

# 1 Designing a simple ALU

### 1.1 Ripple Carry Adder

The main component of our ALU will be a simple Ripple Carry Adder. The main blocks used to build this adder are half-adder and full-adder modules. The half-adder is shown in Figure 1 and defined by the following equations:

$$HA_s(a, b) = a \oplus b$$
  
 $HA_c(a, b) = a \wedge b$ 

where  $\oplus$  denote the exclusive or between Booleans.



Figure 1: Circuit for an half-adder.

The full-adder is shown in Figure 2 and defined by the following equations:

$$FA_s(a,b) \equiv HA_s(c,HA_s(a,b)) = c \oplus HA_s(a,b) = c \oplus a \oplus b$$

$$FA_c(a,b) \equiv HA_c(c,(HA_s(a,b)) \vee HA_c(a,b)) = c \wedge HA_s(a,b) \vee a \wedge b$$

$$= c \wedge (a \oplus b) \vee a \wedge b$$



Figure 2: Circuit for a full-adder.

**Exercise 1.** Write a description of the half-adder and of the full-adder in Verilog. Start with a module for the half-adder and then instantiate two copies in the full adder.

Addition between two bit vectors of n bits is computed by cascading several full adders. Figure 3 shows an adder for 4-bit unsigned numbers. Because the carry is passed to the next computation, such a circuit is called a *ripple carry adder* or *carry chain adder*.



Figure 3: Circuit for a 4-bit ripple carry adder.

**Exercise 2.** Write the description of an 8-bit ripple carry adder in Verilog. Let call this module rca8. Its inputs are two 8-bit numbers and an input carry. Its outputs are a carry bit and an 8-bit result.

You now have the description of the basic circuit of the ALU.



Figure 4: 4-bit Arithmetic Logic Unit

### 1.2 Arithmetic Logic Unit

An Arithmetic Logic Unit (ALU) is built from an adder and logic gates (AND, OR, etc.) to compute the result of logic and arithmetic functions. Figure 4 shows an example of a 4-bit ALU. The control bits of the ALU are  $ALU.op = kijc_{in}$ . This 4-bit vector encodes the operations that the ALU can perform. These operations are summarised in Table 1.

| ALU.pp       | operation          |
|--------------|--------------------|
| 1 0 0 0      | a                  |
| $1\ 0\ 0\ 1$ | a+1                |
| $1\ 0\ 1\ 0$ | a-1                |
| $1\ 1\ 0\ 0$ | a + b              |
| $1\ 1\ 0\ 1$ | a + b + 1          |
| 1110         | $a + \overline{b}$ |
| 1111         | a - b              |
| $0\ 0\ 0\ 0$ | $a \wedge b$       |
| $0\ 0\ 1\ 0$ | $a\oplus b$        |
| $0\ 1\ 0\ 0$ | $a \lor b$         |

Table 1: Encoding of ALU operations.

**Exercise 3.** Using your adder of the previous exercise, write the description of an 8-bit extension of the 4-bit ALU shown in Figure 4. The ALU should compute the following flags: C (carry), Z (zero), V (2's complement overflow), and N (negative). The main idea is to replace the cascade of full-adders in Figure 4 with your ripple carry adder. The equations to compute the different flags are as follows:

•  $C \equiv s_8$  (that is, the carry out of the ripple carry adder)

- $Z \equiv (s == 0)$
- $V \equiv a_7 \cdot b_7 \cdot \overline{s_7} + \overline{a_7} \cdot \overline{b_7} \cdot s_7$
- $N \equiv s_7$

Note that you might need to describe some extra modules, e.g., a 2 input multiplexer mux2.

# 2 Proving correctness of the ALU

**Exercise 4.** Using Cadence IFV an SVA, specify the correctness of the ALU and prove that your design is correct.

# 3 Extension to a simple execution unit

You have all the basic blocks to build a complete processor! What you're missing is an execution unit that will fetch, decode, and execute instructions. You also need to add general purpose registers, instruction register, program counter, status register, and some memory.

Before going further, Figure 5 shows a first extension of an ALU. This is a block diagram of the ALU together with:

- SREG: Status Register. This is a 4-bit register storing the value of the four flags of the ALU.
- IR: Instruction Register. This is a 16-bit register storing the current instruction. It has a control bit (IRie). This bit is such that IR is written only when this bit is high.
- GPR: General Purpose Register. This is a block containing the general purpose registers of the processor. We will start with a GPR module with only 4 registers.
- EX8: Execution Unit. This is the state-machine controlling the execution of the instructions.

The basic idea is that the execution unit reads the registers, the instruction, and the flag. Depending on those values, the unit will orchestrate the different bits so that the ALU outputs the correct result. Before going into more details, we first need to determine which instructions we will have and how they will be encoded.

#### 3.1 Instruction Set Architecture

We will encode instructions over 16 bits. We fill start with two formats. Instruction working on two registers and instruction working on one register and an 8-bit value. The format of a 2 register instruction is:

The first 8 bits will encode the instruction. The remaining 8 bits will encode the two registers used by the instruction, namely,  $R_d$  and  $R_r$ . The format of a 1 register instruction is:

op op op op kkkk dddd kkkk



Figure 5: First extension of the ALU.

The first 4 bits will encode the instruction. Bits dddd encode the register used by the instruction. The remaining bits encode the 8-bit value. This way,  $R_d$  is always encoded by the same bits in all instructions.

The Instruction Set Architecture consists in specifying all the operations of a processor. As an example, we will consider the AVR8 ISA. The link below will point you to the document. You should read only the relevant parts of the document. We will only look at some of the instructions to get some inspiration.

http://www.atmel.com/images/atmel-0856-avr-instruction-set-manual.pdf

The AVR8 is also an 8-bit architecture. We will use the opcode of some of the AVR instructions as a specification for our processor. This means that the AVR ISA will specify for each opcode what are the expected outputs.

**Note.** The main deviation between the AVR and our simple processor will be the number of cycles per instructions. Our processor will not execute as fast as the AVR. We also only have 4 flags. The AVR has 8. The flags we have are C,Z,V, and N.

Already with our simply ALU, we can implement quite a number of instructions. We will start by only considering the following instructions:

#### ADC: Add with carry

See page 16 out of 160 of the AVR ISA. This instruction executes an addition with the carry flag. It updates all four flags. We do not have a program counter yet. But if we would have one, it would be incremented by 1.

#### ADD: Add without carry

See page 17 out of 160. This instruction simply adds two registers.

#### **AND: Logical AND**

See page 19 out of 160 of the AVR ISA. This instruction computes the logical conjunction between two registers. Note that it clears the 2's complement overflow flag and leave the carry bit unchanged.

#### ANDI: Logical AND with immediate

See page 20 out of 160 of the AVR ISA. This instruction computes the logical conjunction between a register and an 8-bit value. Here also, the 2's complement flag is cleared.

#### LDI: Load Immediate

See page 94 out of 160 of the AVR ISA. This instruction loads an 8-bit value into a register. Note that it does not modify the flags.

#### 3.2 Adding decode and execute stages

Figure 5 adds enough modules to be able to decode and execute the instruction stored in the IR register. It still needs memory modules to also implement instruction fetch. To keep things simple, we will consider the instruction an input.

On the course webpage, you will find a template implementation of Figure 5 in a Verilog module named core8. Note that this file does not contain the alu8 module. You should use your own ALU. The remaining modules of core8 are fully implemented, except for the execution unit (ex8). The specification of each module is as follows:

- IR: Inputs are the top level input instr of the core8 module and an input enable signal (IRie). Its output is the 16-bit value of the instruction. This module is an instantiation of module spr16.
- GPR: Signal REGS\$ra specifies the register (from 0 to 3) driving REGS\$abus. Same thing for REGS\$rb and REGS\$bbus. When REGS\$aoe is high (resp. REGS\$boe), the output of REGS\$abus (resp. REGS\$bbus) is not driven (high impedance, value 'Z'). Finally, the register pointed by REGS\$ra is updated with the value on cbus only when REGS\$sel is high. GPR is an instantiation of module gpr8.
- SREG is a 4-bit register storing the value of the flags. It is a register equipped with a clear vector to allow clearing any flag. SREG is an instantiation of module sreg4.
- ALU8: the 8-bit ALU you designed and proved correct.
- EX8: the execution unit. Currently, the unit goes through 2 states:
  - fetch (coded as 2'b00)
  - decode & execute (coded as 2'b01)

And that's it! All control signals are assigned a dummy value.

You should perform the two exercises below in lock-step, that is, add one instruction, prove it correct, then add another instruction, prove it correct, etc.

Exercise 5. Complete the design of module ex8 to decode and execute the 5 instructions mentioned in the previous section. In the template, search for FIXME.

Exercise 6. Prove that your execution units sets all the control signals correctly. Note that you should prove module ex8 correct, not its instantiation in module core8.