### **Computer-Aided VLSI System Design**

## **Homework 2: Simple MIPS CPU**

Graduate Institute of Electronics Engineering, National Taiwan University



#### Goal

- In this homework, you will learn
  - How to write testbench
  - How to design FSM
  - How to use IP
  - Generate patterns for testing

#### Introduction



Central Processing Unit (CPU) is the important core in the computer system. In this homework, you are asked to design a simple MIPS CPU, which contains the basic module of program counter, ALU and register files. The instruction set of the simple CPU is similar to MIPS structure.

#### Instruction set



## **Block Diagram**





# Input/Output



| Signal<br>Name     | I/O | Width | Simple Description                                                             |  |
|--------------------|-----|-------|--------------------------------------------------------------------------------|--|
| i_clk              | - 1 | 1     | Clock signal in the system.                                                    |  |
| i_rst_n            | I   | 1     | Active low asynchronous reset.                                                 |  |
| o_i_addr           | 0   | 32    | Address from program counter (PC)                                              |  |
| i_i_inst           | I   | 32    | Instruction from instruction memory                                            |  |
| o_d_we             | 0   | 1     | Write enable of data memory Set low for reading mode, and high for writin mode |  |
| o_d_addr           | 0   | 32    | Address for data memory                                                        |  |
| o_d_wdata          | 0   | 32    | Unsigned data input to data memory                                             |  |
| i_d_rdata          | I   | 32    | Unsigned data output from data memory                                          |  |
| o_status           | 0   | 2     | Status of core processing to each instruction                                  |  |
| o_status_v<br>alid | 0   | 1     | Set high if ready to output status                                             |  |

# **Specification (1)**



- All outputs should be synchronized at clock rising edge.
- You should set all your outputs and register file to be zero when i\_rst\_n is low. Active low asynchronous reset is used.
- Instruction memory and data memory are provided. All values in memory are reset to be zero.
- You should create 32 unsigned 32-bit registers in register file.
- After outputting o\_i\_addr to instruction memory, the core can receive the corresponding i\_i\_inst at the next rising edge of the clock.

# **Specification (2)**



- To load data from the data memory, set o\_d\_we to 0 and o\_d\_addr to relative address value. i\_d\_rdata can be received at the next rising edge of the clock.
- To save data to the data memory, set o\_d\_we to 1, o\_d\_addr to relative address value, and o\_d\_wdata to the written data.
- Your o\_status\_valid should be turned to high for only one cycle for every o\_status.
- The testbench will get your output at negative clock edge to check the o\_status if your o\_status\_valid is high.

# Specification (3)



- When you set o\_status\_valid to high and o\_status to 3, stop processing. The testbench will check your data memory value with golden data.
- If overflow happened, stop processing and raise o\_status\_valid to high and set o\_status to 2. The testbench will check your data memory value with golden data.
- Less than 1024 instructions are provided for each pattern..
- The whole processing time can't exceed 120000 cycles.

## **Program Counter**



 Program counter is used to control the address of instruction memory.



| Instruction<br>Memory |                 |  |  |  |  |  |
|-----------------------|-----------------|--|--|--|--|--|
| Addr.                 | Instruction     |  |  |  |  |  |
| 0                     | addi \$1 \$0 20 |  |  |  |  |  |
| 4                     | addi \$2 \$0 12 |  |  |  |  |  |
| •                     | •               |  |  |  |  |  |

## **Instruction mapping**



#### R-type

| [31:26] | [25:21]      | [20:16] | [15:11] | [10:0]   |
|---------|--------------|---------|---------|----------|
| opcode  | \$s <b>2</b> | \$s3    | \$s1    | Not used |

31

#### I-type

|    | [31:26] | [25:21] | [20:16] | [15:0] |
|----|---------|---------|---------|--------|
|    | opcode  | \$s2    | \$s1    | im     |
| 31 | _       |         |         | Ó      |

#### EOF

|    | [31:26] | [25:0]   |
|----|---------|----------|
|    | opcode  | Not used |
| 31 | •       | 0        |

### Instruction



| Operation           | Assemble | Opcode | Туре | Meaning                                                    | Note                              |
|---------------------|----------|--------|------|------------------------------------------------------------|-----------------------------------|
| Add                 | add      | 6'd0   | R    | \$s1 = \$s2 + \$s3                                         | Signed Operation                  |
| Subtract            | sub      | 6'd1   | R    | \$s1 = \$s2 - \$s3                                         | Signed Operation                  |
| Add unsigned        | addu     | 6'd2   | R    | \$s1 = \$s2 + \$s3                                         | Unsigned Operation                |
| Subtract unsigned   | subu     | 6'd3   | R    | \$s1 = \$s2 - \$s3                                         | Unsigned Operation                |
| Add immediate       | addi     | 6'd4   | 1    | \$s1 = \$s2 + im                                           | Signed Operation                  |
| Load word           | lw       | 6'd5   | Ī    | s1 = Mem[s2 + im]                                          | Signed Operation                  |
| Store word          | SW       | 6'd6   | Ī    | Mem[\$s2 + im] = \$s1                                      | Signed Operation                  |
| AND                 | and      | 6'd7   | R    | \$s1 = \$s2 & \$s3                                         | Bit-wise                          |
| OR                  | or       | 6'd8   | R    | \$s1 = \$s2   \$s3                                         | Bit-wise                          |
| XOR                 | xor      | 6'd9   | R    | \$s1 = \$s2 ^ \$s3                                         | Bit-wise                          |
| Branch on equal     | beq      | 6'd10  | I    | if(\$s1==\$s2), \$pc = \$pc + im;<br>else, \$pc = \$pc + 4 | PC-relative<br>Unsigned Operation |
| Branch on not equal | bne      | 6'd11  | I    | if(\$s1!=\$s2), \$pc = \$pc + im;<br>else, \$pc = \$pc + 4 | PC-relative<br>Unsigned Operation |

## Instruction (cont'd)



| Operation                           | Assemble | Opcode | Туре | Meaning                                                    | Note                              |
|-------------------------------------|----------|--------|------|------------------------------------------------------------|-----------------------------------|
| Set on less than                    | slt      | 6'd12  | R    | if(\$s2<\$s3), \$s1 = 1;<br>else, \$s1 = 0                 | Signed Operation                  |
| Shift left logical                  | sll      | 6'd13  | R    | \$s1 = \$s2 << \$s3                                        | Unsigned Operation                |
| Shift right logical                 | srl      | 6'd14  | R    | \$s1 = \$s2 >> \$s3                                        | Unsigned Operation                |
| Branch to less than                 | blt      | 6'd15  | I    | if(\$s1<\$s2), \$pc = \$pc + im;<br>else, \$pc = \$pc + 4  | PC-relative<br>Signed Operation   |
| Branch to greater or equal          | bge      | 6'd16  | I    | if(\$s1>=\$s2), \$pc = \$pc + im;<br>else, \$pc = \$pc + 4 | PC-relative<br>Signed Operation   |
| Branch to less than unsigned        | bltu     | 6'd17  | I    | if(\$s1<\$s2), \$pc = \$pc + im;<br>else, \$pc = \$pc + 4  | PC-relative<br>Unsigned Operation |
| Branch to greater or equal unsigned | bgeu     | 6'd18  | I    | if(\$s1>=\$s2), \$pc = \$pc + im;<br>else, \$pc = \$pc + 4 | PC-relative<br>Unsigned Operation |
| End of File                         | eof      | 6'd19  | EOF  | Stop processing                                            | Last instruction in the pattern   |

Note: The notation of **im** in I-type instruction is **2's complement.** 

Note: Signed operations indicates that the data in register file are expressed in 2's complement.

## **Memory IP**



- Instruction memory
  - Size: 1024 × 32 bit
  - i\_addr[11:2] for address mapping in instruction memory
- Data memory
  - Size:  $64 \times 32$  bit
  - i addr[7:2] for address mapping in data memory

### **Status**

4 statuses of o\_status

| o_status[1:0] | Definition     |  |
|---------------|----------------|--|
| 2'd0          | R_TYPE_SUCCESS |  |
| 2'd1          | I_TYPE_SUCCESS |  |
| 2'd2          | MIPS_OVERFLOW  |  |
| 2'd3          | MIPS_END       |  |

### **Overflow**



- Overflow may be happened.
  - Situation1: Overflow happened at arithmetic instructions (add, sub, addu, subu, addi)
  - Situation2: If output address are mapped to unknown address in data/instruction memory. (Do not consider the case if instruction address is beyond eof, but the address mapping is in the size of instruction memory)



Status Check





Status Check



Status Check





Read instruction from instruction memory





Load data from data memory





Save data to data memory



#### core.v



```
odule core #(
  parameter ADDR_W = 32,
  parameter INST_W = 32,
  parameter DATA_W = 32
  input
                         i_clk,
  input
                          i_rst_n,
  output [ ADDR W-1 : 0 ] o i addr,
  input [ INST W-1 : 0 ] i i inst,
  output
                         o d we,
  output [ ADDR_W-1 : 0 ] o_d_addr,
  output [ DATA_W-1 : 0 ] o_d_wdata,
  input [ DATA_W-1 : 0 ] i_d_rdata,
  output [ 1:0] o_status,
                         o status valid
  output
```

#### rtl.f



#### Filelist

### **Command**



01\_run

ncverilog -f rtl.f +define+p0 +access+r

99\_clean\_up

rm -rf INCA\_libs/ ncverilog.\* novas\*

#### define.v





## testbed\_temp.v

- Things to add in your testbench
  - Clock
  - Reset
  - Waveform file
  - Function test

**—** ...

```
core u core (
    .i_clk(),
    .i_rst_n(),
    .o i addr(),
    .i_i_inst(),
    .o d we(),
    .o d addr(),
    .o d wdata(),
    .i d rdata(),
    .o status(),
    .o status valid()
    .i clk(),
    .i rst n(),
    .i addr(),
    .o_inst()
);
data_mem u_data_mem (
    .i_clk(),
    .i_rst_n(),
    .i_we(),
    .i_addr(),
    .i wdata(),
    .o_rdata()
```

#### **Protected Files**



- The following files are protected
  - inst\_mem.vp
  - data\_mem.vp

```
module inst mem (
    input
                      i_clk,
    input
                      i rst n,
    input [ 31 : 0 ] i addr,
    output [ 31 : 0 ] o inst
 protected
Ndi5kSQH5DT^<D9i:i7T7ceFn3@o:C2]Ke:L;dfq^QGQOG?3K:ogIe8]1ge<gcg3
1CH3E]ekmLN<RVkKa1o39E7E21a; hJRSFMUb2pAgL?TeZdH>]^RK;KWYU@>G2G6
H[IMYG;D<[Z>[;0]] NbPoEAQM<_ZfDbp1HN@HmqSOQ<5[53C:9UD4^:Y44]9a^e
PDH[cdHb;HPi\R4k7mAlPdY8ZpI=4?nNZgQ2I>QUg[agM4j@cTl]hnMoC<i1F9DR
[kf;]ULlecpF`H;9L2DeZa>@LdfLgfB8l4bWgT:_P3?ENhifQW@_Ne;gMZE9@f0A
OERY:F4d68KqAIn]N1dj4LN7 8:Uigk?9UJ9JYQM4l=Lq\TEXDQO1>Zo^SJq=Cge
?kp68am:9p81Q1[<jSXm?;GhoPHHYKp\Q][2epXn_18k8LA5g=N7=D?=VOX<Ham8
[A:Qc;RlpO38>d9 Qk9cfk?:5hXP>LT3n=DP08A ]WPa6nA3cYZjGl32qB9]I4kp
>=:4m9P`dCB8@?ip`@VR7AahIggjNR:M1: \KXElBFOm<Bb@ZS[^W7EheJ18mX8;
?7F`Pg\CCA8igfFUoWY@k>Yq=U3 4>E50 nJ\`aUGcfWD 89dab]cUQfF<?2P?OG
qWglWC[\iqnjC<OipHHnb<T4Sg<:UORVSVocI g?<a@o <PQ493cZIE;7^Sp1AQ
G<cl7[]R\>VT]]LA\7?Uk=]\bG19MT9N;K<Y92[iKOged92EIkQZliW>qlG]QI?5
ST06RFN<KJl@VM1EWKSmB1B5U:BaX`E7of7mqOJBgO`9k$
 endprotected
endmodule
```

#### **PATTERN**

Files in PATTERN are for your references

#### inst\_assemble.dat

| R-type | \$52        | <b>\$</b> s3 | <b>\$</b> 51 |
|--------|-------------|--------------|--------------|
| I-type | \$52        | <b>\$</b> s1 | im           |
|        |             |              |              |
| and    | \$1         | \$3          | \$1          |
| lw     | <b>\$</b> 3 | \$1          | 8            |
| bne    | \$2         | \$0          | 8            |
| add    | \$7         | \$1          | \$4          |
| slt    | \$6         | \$5          | \$4          |
| slt    | \$4         | \$1          | \$1          |
| lw     | \$1         | \$3          | 12           |
| lw     | \$7         | \$7          | 4            |
| bne    | \$6         | \$7          | 8            |
| lw     | \$6         | \$5          | 8            |
| lw     | <b>\$</b> 5 | \$2          | 8            |

# **Grading Policy**



TA will run your code with following command

#### ncverilog -f rtl.f +define+p0 +access+r

- Pass the patterns to get full score
  - Provided pattern: 80%
    - 40% for each test (data from data memory: 20%, status check: 20%)
  - Hidden pattern: 20% (20 patterns in total)
    - 1% for each test (data & status both correct)
- No delay submission is allowed
- Lose 3 point for any wrong naming rule or format for submission

### **Submission**



- Create a folder named studentID\_hw2, and put all below files into the folder
  - rtl.f (your file list)
  - core.v
  - all other design files in your file list (optional)
- Compress the folder studentID\_hw2 in a tar file named studentID\_hw2\_vk.tar (k is the number of version, k =1,2,...)

### Hint

- Design your FSM with following states
  - 1. Idle
  - 2. Instruction Fetching
  - 3. Instruction decoding
  - 4. ALU computing/Load data
  - 5. Data write-back
  - 6. Next PC generation
  - 7. Process end