#### **Computer-Aided VLSI System Design**

# **Homework 2: Simple MIPS CPU**

Graduate Institute of Electronics Engineering, National Taiwan University



### Goal

- In this homework, you will learn
  - How to write testbench
  - How to design FSM
  - How to use IP
  - Generate patterns for testing



#### Introduction



Central Processing Unit (CPU) is one of the most important core in a computer system. In this homework, you are asked to design a simple CPU which consists of a program counter, an ALU, and a register file. The instruction set of the simple CPU is similar to MIPS ISA (Instruction Set Architecture).

#### Instruction set



## **Block Diagram**





# Input/Output



| Signal Name    | I/O | Width | Simple Description                                                              |  |
|----------------|-----|-------|---------------------------------------------------------------------------------|--|
| i_clk          | I   | 1     | Clock signal in the system.                                                     |  |
| i_rst_n        | I   | 1     | Active low asynchronous reset.                                                  |  |
| o_i_addr       | 0   | 32    | Address from program counter (PC)                                               |  |
| i_i_inst       | I   | 32    | Instruction from instruction memory                                             |  |
| o_d_wen        | 0   | 1     | Write enable of data memory Set low for reading mode, and high for writing mode |  |
| o_d_addr       | 0   | 32    | Address for data memory                                                         |  |
| o_d_wdata      | 0   | 32    | Data input to data memory                                                       |  |
| i_d_rdata      | I   | 32    | Data output from data memory                                                    |  |
| o_status       | 0   | 2     | Status of the core after processing each instruction                            |  |
| o_status_valid | 0   | 1     | Set high if ready to output status                                              |  |

# **Specification (1)**



- All outputs should be synchronized at clock rising edge.
- You should set all your outputs and register file to be zero when i\_rst\_n is low. Active low asynchronous reset is used.
- Instruction memory and data memory are provided. All values in memory are reset to be zero.
- You should create 32 registers (each register is 32-bit) in register file.
- After outputting o\_i\_addr to instruction memory, the core can receive the corresponding i\_i\_inst at the next rising edge of the clock.

# **Specification (2)**



- To load data from the data memory, set o\_d\_wen to 0 and o\_d\_addr to relative address value. i\_d\_rdata can be received at the next rising edge of the clock.
- To save data to the data memory, set o\_d\_wen to 1, o\_d\_addr to relative address value, and o\_d\_wdata to the written data. At the next rising of the clock, the data is written to memory.
- Your o\_status\_valid should be set to high for only one cycle for every o\_status.
- The testbench will get your output at negative clock edge to check the o\_status if your o\_status\_valid is **high**.

# Specification (3)



- When you set o\_status\_valid to high and o\_status to 3, stop processing. The testbed will check the data in data memory with golden data.
- If overflow happens, stop processing and raise o\_status\_valid to high and set o\_status to 2. The testbed will check the data in data memory with golden data.
- Less than 1024 instructions are provided for each pattern.
- The whole processing time can't exceed 120000 cycles.

## **Program Counter**



 Program counter is used to control the address of instruction memory.

\$pc = \$pc + 4 for every instruction (except for beq, bne)



| Instruction<br>Memory |                 |  |  |  |  |
|-----------------------|-----------------|--|--|--|--|
| Addr.                 | Instruction     |  |  |  |  |
| 0                     | addi \$1 \$0 20 |  |  |  |  |
| 4                     | addi \$2 \$0 12 |  |  |  |  |
| •                     | •••             |  |  |  |  |

## Instruction mapping



#### R-type

|    | [31:26] | [25:21] | [20:16] | [15:11] | [10:0]   |
|----|---------|---------|---------|---------|----------|
|    | opcode  | \$s2    | \$s3    | \$s1    | Not used |
| 31 |         |         |         |         | C        |

#### I-type

|    | [31:26] | [25:21] | [20:16] | [15:0] |
|----|---------|---------|---------|--------|
|    | opcode  | \$s2    | \$s1    | im     |
| 31 | _       |         |         | 0      |

#### EOF

|          | [31:26] | [25:0]   |
|----------|---------|----------|
|          | opcode  | Not used |
| 31<br>31 |         | C        |

#### Instruction



| Operation            | Assemble | Opcode | Туре | Meaning                                                        | Note                              |
|----------------------|----------|--------|------|----------------------------------------------------------------|-----------------------------------|
| Add                  | add      | 6'd1   | R    | \$s1 = \$s2 + \$s3                                             | Signed operation                  |
| Subtract             | sub      | 6'd2   | R    | \$s1 = \$s2 - \$s3                                             | Signed operation                  |
| Add unsigned         | addu     | 6'd3   | R    | \$s1 = \$s2 + \$s3                                             | Unsigned operation                |
| Subtract<br>unsigned | subu     | 6'd4   | R    | \$s1 = \$s2 - \$s3                                             | Unsigned operation                |
| Add immediate        | addi     | 6'd5   | - 1  | \$s1 = \$s2 + im                                               | Signed operation                  |
| Load word            | lw       | 6'd6   | I    | \$s1 = Mem[\$s2 + im]                                          | Unsigned operation                |
| Store word           | SW       | 6'd7   | Ī    | Mem[\$s2 + im] = \$s1                                          | Unsigned operation                |
| AND                  | and      | 6'd8   | R    | \$s1 = \$s2 & \$s3                                             | Bit-wise                          |
| OR                   | or       | 6'd9   | R    | \$s1 = \$s2   \$s3                                             | Bit-wise                          |
| NOR                  | nor      | 6'd10  | R    | \$s1 = ~(\$s2   \$s3)                                          | Bit-wise                          |
| Branch on equal      | beq      | 6'd11  | - 1  | if(\$s1==\$s2), \$pc = \$pc + 4 + im;<br>else, \$pc = \$pc + 4 | PC-relative Unsigned operation    |
| Branch on not equal  | bne      | 6'd12  | I    | if(\$s1!=\$s2), \$pc = \$pc + 4 + im;<br>else, \$pc = \$pc + 4 | PC-relative<br>Unsigned operation |
| Set on less than     | slt      | 6'd13  | R    | if(\$s2<\$s3), \$s1 = 1;<br>else, \$s1 = 0                     | Signed operation                  |
| End of File          | eof      | 6'd14  | EOF  | Stop processing                                                | Last instruction in the pattern   |

Note: Use two's complement arithmetic for signed operations.

## **Memory IP**



- Instruction memory
  - Size: 1024 imes 32 bit
  - i\_add[11:2] for address mapping in instruction memory
- Data memory
  - Size: 64  $\times$  32 bit
  - i\_add[7:2] for address mapping in data memory

```
module data mem (
   input
                         i clk,
                                    // 1-bit
                         i rst n,
   input
                                    // 1-bit
                         i wen,
   input
                                    // 1-bit
   input [ 31 : 0 ]
                         i_addr,
                                    // 32-bit
                        i_wdata,
   input [ 31 : 0 ]
                                    // 32-bit
   output [ 31 : 0 ]
                                    // 32-bit
                         o rdata
```

### **Status**

4 types of o\_status

| o_status[1:0] | Definition     |  |
|---------------|----------------|--|
| 2'd0          | R_TYPE_SUCCESS |  |
| 2'd1          | I_TYPE_SUCCESS |  |
| 2'd2          | MIPS_OVERFLOW  |  |
| 2'd3          | MIPS_END       |  |

### **Overflow**



- Overflow may happen
  - Situation1: Overflow happens at arithmetic instructions (add, sub, addu, subu, addi)
  - Situation2: The address of data/instruction memory is out of the memory size (Do not consider the case if instruction address is beyond eof, but the address mapping is still in the size of instruction memory)
- Once an overflow happens, the testbed stops and checks the data memory



Status Check





Status Check





Read instruction from instruction memory





Load data from data memory





Save data to data memory



#### core.v



```
//Don't modify interface
module core #(
    parameter ADDR_W = 32,
    parameter INST W = 32,
    parameter DATA_W = 32
   input
                          i_clk,
                          i rst n,
   input
   output [ ADDR_W-1 : 0 ] o i addr,
   input [ INST W-1:0] i i inst,
   output
                          o d wen,
   output [ ADDR W-1 : 0 ] o d addr,
   output [ DATA_W-1 : 0 ] o_d_wdata,
   input [ DATA_W-1 : 0 ] i_d_rdata,
   output [ 1:0] o status,
   output
                          o_status_valid
```

#### rtl.f



#### Filelist

```
Simulation: HW2 simple mips CPU
  define files
../00_TESTBED/define.v
  testbench
../00_TESTBED/testbed.v
../00_TESTBED/inst_mem.vp
../00_TESTBED/data_mem.vp
  design files
./core.v
```

### **Command**



01\_run

ncverilog -f rtl.f +define+p0 +access+rw

99\_clean\_up

rm -rf INCA\_libs/ ncverilog.\* novas\*

## define.v



```
// opcode definition
     `define OP ADD 1
     `define OP_SUB 2
     `define OP ADDI 3
     `define OP LW
     `define OP SW
     `define OP AND
                     6
     `define OP OR
     `define OP_NOR
                     8
     `define OP_BEQ 9
     `define OP BNE 10
11
     `define OP SLT 11
12
     `define OP_EOF 12
13
14
     // MIPS status definition
15
     `define R_TYPE_SUCCESS 0
16
     `define I_TYPE_SUCCESS 1
17
18
     `define MIPS_OVERFLOW 2
     `define MIPS_END 3
19
20
```

## testbed\_temp.v

- Things to add in your testbench
  - Clock
  - Reset
  - Waveform file
  - Function test

**–** ...



```
core u_core (
    .i clk(),
    .i_rst_n(),
    .o_i_addr(),
    .i_i_inst(),
    .o_d_wen(),
    .o_d_addr(),
    .o_d_wdata(),
    .i_d_rdata(),
    .o status(),
    .o status valid()
);
inst_mem u_inst_mem (
    .i_clk(),
    .i_rst_n(),
    .i_addr(),
    .o inst()
);
data mem u data mem (
    .i_clk(),
    .i_rst_n(),
    .i_wen(),
    .i_addr(),
    .i wdata(),
    .o rdata()
```

#### **Protected Files**



- The following files are protected
  - inst\_mem.vp
  - data\_mem.vp

```
module inst mem (
    input
                      i clk,
    input
                      i rst n,
    input [ 31 : 0 ] i addr,
    output [ 31 : 0 ] o inst
 protected
Ndi5kSQH5DT^<D9i:i7T7ceFn3@o:C2]Ke:L;dfq^QGQOG?3K:ogIe8]1ge<gcg3
1CH3E]ekmLN<RVkKa1o39E7E21a; hJRSFMUb2pAgL?TeZdH>]^RK;KWYU@>G2G6
H[IMYG;D<[Z>[;0]] NbPoEAQM<_ZfDbp1HN@HmqSOQ<5[53C:9UD4^:Y44]9a^e
PDH[cdHb;HPi\R4k7mAlPdY8ZpI=4?nNZgQ2I>QUg[agM4j@cTl]hnMoC<i1F9DR
[kf;]ULlecpF`H;9L2DeZa>@LdfLgfB8l4bWgT:_P3?ENhifQW@_Ne;gMZE9@f0A
OERY:F4d68KqAIn]N1dj4LN7 8:Uigk?9UJ9JYQM4l=Lq\TEXDQO1>Zo^SJq=Cge
?kp68am:9p81Q1[<jSXm?;GhoPHHYKp\Q][2epXn_18k8LA5g=N7=D?=VOX<Ham8
[A:Qc;RlpO38>d9 Qk9cfk?:5hXP>LT3n=DP08A ]WPa6nA3cYZjGl32qB9]I4kp
>=:4m9P`dCB8@?ip`@VR7AahIggjNR:M1: \KXElBFOm<Bb@ZS[^W7EheJ18mX8;
?7F`Pg\CCA8igfFUoWY@k>Yq=U3 4>E50 nJ\`aUGcfWD 89dab]cUQfF<?2P?OG
qWglWC[\iqnjC<OipHHnb<T4Sg<:UORVSVocI g?<a@o <PQ493cZIE;7^Sp1AQ
G<cl7[]R\>VT]]LA\7?Uk=]\bG19MT9N;K<Y92[iKOged92EIkQZliW>qlG]QI?5
ST06RFN<KJl@VM1EWKSmB1B5U:BaX`E7of7mqOJBgO`9k$
 endprotected
endmodule
```

#### **PATTERN**



Files in PATTERN are for your references

#### inst\_assemble.dat

| R-type | \$52        | <b>\$</b> s3 | \$s1 |
|--------|-------------|--------------|------|
| I-type | \$52        | <b>\$</b> s1 | im   |
|        |             |              |      |
| and    | \$1         | \$3          | \$1  |
| lw     | \$3         | \$1          | 8    |
| bne    | \$2         | \$0          | 8    |
| add    | \$7         | \$1          | \$4  |
| slt    | \$6         | \$5          | \$4  |
| slt    | \$4         | \$1          | \$1  |
| lw     | \$1         | \$3          | 12   |
| lw     | \$7         | \$7          | 4    |
| bne    | \$6         | \$7          | 8    |
| lw     | \$6         | \$5          | 8    |
| lw     | <b>\$</b> 5 | \$2          | 8    |

## **Error Messages**



Wrong status

```
Pattern: ../00_TESTBED/PATTERN/p0/inst.dat
MIPS Status Error! Status[ 2]: Golden = 01, Yours = 11
```

Wrong data

```
Pattern: ../00_TESTBED/PATTERN/p0/inst.dat

Error! Data[ 0]: Golden = 00000002, Yours = 00000000
Total error: 1
```

# **Grading Policy**



TA will run your code with the following command

#### ncverilog -f rtl.f +define+p0 +access+rw

- Pass the patterns to get full score
  - Provided pattern: 80%
    - 40% for each test (data from data memory: 20%, status check: 20%)
  - Hidden pattern: 20% (20 patterns in total)
    - 1% for each test (data & status both correct)
- Delay submission
  - In one day: (original score)\*0.6
  - In two days: (original score)\*0.3
  - - More than two days: 0 point for this homework
- Lose 3 points for any wrong naming rule or format for submission

## **Submission**

- Create a folder named studentID\_hw2, and put all below files into the folder
  - rtl.f (your file list)
  - core.v
  - all other design files in your file list (optional)
- Compress the folder studentID\_hw2 in a tar file named studentID\_hw2\_vk.tar (k is the number of version, k =1,2,...)

### Hint

- Design your FSM with following states
  - 1. Idle
  - 2. Instruction Fetching
  - 3. Instruction decoding
  - 4. ALU computing/Load data
  - 5. Data write-back
  - 6. Next PC generation
  - 7. Process end