# Project 3: Designing a 32-bit CPU

Adam Sumner - A20283081, Contribution - 25% Bobby Unverzagt - A2028923, Contribution - 25% Emilie Woog - A20265269, Contribution - 25% Nash Kaminski - A20283999, Contribution - 25%

ECE 485

December 5<sup>th</sup>, 2015

## 1 Introduction

This goal of this project is to design a stripped down version of the MIPS processor. The processor will be a 32-bit version of the processor discussed in class and the text book, however, its instruction set will be a small subset of the MIPS processor's full capability.

### 1.1 Background Information

#### 1.1.1 MIPS

MIPS is a reduced instruction set computer (RISC) instruction set architecture (ISA). It defines three types of instruction types: R (register), I (Immediate), and J (Jump). For the implementation that this project is focused on, only R and I instructions will be executed. R type instructions are the most common form of instructions. The format for an r-type instruction is:

| Bits[31:26] | Bits[25:21] | Bits[20:16] | Bits[15:11] | Bits[10:6] | Bits[5:0] |
|-------------|-------------|-------------|-------------|------------|-----------|
| opcode      | Rs          | Rt          | Rd          | shamt      | funct     |

For this instruction, the opcode field is always  $000000_2$ , while the function code funct is used to determine which instruction is to be carried out. Rs and Rt are the two registers in which the operation reads and Rd is the destination of the result. Some instructions require a shift amount (shamt), so it is specified explicitly.

The I type instruction involves an immediate value, so the instruction format must accommodate this. The format of this type of instruction is:

| Bits[31:26] | Bits[25:21] | Bits[20:16] | Bits[15:0] |
|-------------|-------------|-------------|------------|
| opcode      | Rs          | Rt          | immediate  |

For this instruction, the op code field is used to define the specific instruction, Rs is the register in which the operation acts on along with the immediate value as the other operand. Rt is the destination register in which the result is stored.

#### 1.1.2 Datapath and Control

A datapath is a collection of functional units that perform data processing operations. It includes units such as a program counter, a register file, instruction memory, an ALU, data memory, and a control unit. Figure 1 shows a high level overview of a simple datapath with control.



Figure 1: Datapath Overview

# 2 Design

#### 2.1 Instruction Set

Table 1 shows the instructions that were chosen to be implemented in the CPU with the respective OpCode and Function Field for each instruction.

| OpCode[31:26] | Function Field [5:0] | Instruction | Operation            |  |
|---------------|----------------------|-------------|----------------------|--|
| $100011_2$    |                      | lw          | lw \$t3, 200(\$s2)   |  |
| $101011_2$    |                      | SW          | sw \$t4, 100(\$t3)   |  |
| $000000_2$    | $100000_2$           | add         | add \$s3, \$t2, \$s2 |  |
| $000000_2$    | $110000_2$           | sub         | sub \$s3, \$t2, \$s2 |  |
| $000100_2$    |                      | beq         | beq \$s5, \$s2, 500  |  |
| $000000_2$    | $000001_2$           | nand        | tbd                  |  |
| $000010_2$    |                      | andi        | tbd                  |  |
| $000000_2$    | $000010_2$           | or          | tbd                  |  |
| $000011_2$    |                      | ori         | tbd                  |  |

Table 1: CPU Instruction Set

Because it was only required to implement 9 instructions, and the MIPS instruction set format requires 6 bits for op code and function field, it was an easy decision to choose these values for the implemented instructions. For all R-type instructions, the functions fields were chosen to be vastly different from one another to make debugging easier for the team. Likewise, the same approach was taken for the op code decisions for the I-type instructions.

#### 2.2 Memory

For this project, it seemed unnecessary to implement memory of 4GB  $(2^{32})$ . It was chosen to use an array of 256 words instead. If need be, this memory size could be upgraded easily, so this choice does not hinder performance on the actual design of the CPU.

## 2.3 Datapath

Because of the simplicity of this design, the implemented datapath did not need to be modified by much from Figure 1. Therefore, the design of a single cycle datapath from the textbook acted as the skeletal structure of the final implementation. Because an ALU and Register file were previously implemented in earlier projects, it was necessary to extend their functionality to be able to handle 32-bit words. Once this was complete, this left the data memory entity to be completed so that it could be included in the processor entity. As mentioned earlier in Section 2.2, this entity contains an array of

256 words, and allows for reading and writing.

The processor entity combines all of the components into the desired datapath. It synchronizes the clock of the instruction memory, data memory, and register file so that the entire system is in sync with an external clock signal. The program counter is updated during the rising edge of the clock, and all writes happen on the falling edge. The processor relies on the control unit to carry out the instruction read from memory. Figure 2 shows the overview block diagram of the implemented datapath for the CPU.



Figure 2: Implemented Data Path Overview

#### 2.4 Control

The control lines can also be seen in Figure 2. It is a simple design of several signals acting as the sel line of a series of multiplexers. Based on the op codes and function field read from the instruction memory, the signals are asserted accordingly to relay the correct signals into the Register File, ALU, and Data memory. This unit is what determines which units will read/write, and what operations the ALU should perform.

## 3 Analysis

While this processor was optimized to be able to fully accomplish the tasks specified in the business requirements document, it could still be improved. In its current stage, it can be considered a bare bones prototype. To transform the current design into a processor on par with the current industry standard, a complete instruction set would have to be implemented. Furthermore, pipelining is a necessity to add. Any processor that doesn't implement pipelining is not making efficient use of its own components. After pipelining is implemented, hazard controls would need to coexist. This would allow for cool features of the processor to exist such as forwarding, making it a truly efficient piece of hardware.

#### 4 Simulation Results

Include blurb about test bench code and all figures of waveforms explaining each section of the test.

## 5 Conclusion

The design and implementation of a 32-bit CPU was a success. a set of 9 instructions were successfully implemented and verified with test bench code. All requested functionality was achieved. This 32-bit CPU can now be used in further projects.

## **Appendix**

Listing 1: CPU Code

```
1 library ieee;
2 use ieee.std_logic_1164.all;
3 use ieee.numeric_std.all;
4
5 entity regFile is
6   port(
7   regA : out std_logic_vector(31 downto 0);
8   regB : out std_logic_vector(31 downto 0);
```

```
selA : in std_logic_vector(3 downto 0);
9
     selB : in std_logic_vector(3 downto 0);
10
    wData : in std_logic_vector(31 downto 0);
11
12
     registerWrite : in std_logic;
    selW : in std_logic_vector(3 downto 0);
13
14
     clk : in std_logic);
15 end regFile;
16
17 architecture behavioral of regFile is
18 type reg_arr is array (0 to 15) of std_logic_vector (31
      downto 0);
19 signal rData : reg_arr;
20 begin
     with selA
21
22
       select regA \le x"00000000" when b"0000",
23
      rData(to_integer(unsigned(selA))) when others;
24
       select regB \le x"00000000" when b"0000",
25
26
       rData(to_integer(unsigned(selB))) when others;
27
28
    wrProc: process(clk) is
29
    begin
       if falling_edge(clk) then
30
       if(registerWrite = '1') then
31
         rData(to_integer(unsigned(selW))) <= wData;
32
33
      end if;
      end if;
34
35
    end process;
36 end behavioral;
37
38 ----
39
40 library ieee;
41 use ieee.std_logic_1164.all;
42 use ieee.numeric_std.all;
43
44 entity control is
    port (
45
46
       inst_in : in std_logic_vector(5 downto 0);
       func : in std_logic_vector(5 downto 0);
47
```

```
48
       stall : in std_logic;
       branch : out std_logic;
49
50
       reg_dest : out std_logic;
51
       reg_write : out std_logic;
       ALU_src : out std_logic;
52
53
       ALU_op : out std_logic_vector(2 downto 0);
       mem_write : out std_logic;
54
       mem_to_reg : out std_logic
55
56
     );
57 end control;
58
59 architecture behavioral of control is
     signal branch_o, reg_dest_o, reg_write_o, ALU_src_o,
60
      mem_write_o, mem_to_reg_o : std_logic;
     signal ALU_op_o : std_logic_vector(2 downto 0);
61
62
     signal branch_f, reg_dest_f, reg_write_f, ALU_src_f,
      mem_write_f, mem_to_reg_f : std_logic;
63
     signal ALU_op_f : std_logic_vector(2 downto 0);
64 begin
65
    -- set intermediate signals incase of r-type
      instruction
     with func select
66
       b \, ranch\_f <= \ '0' \ when \ "100000" \, , \quad -\!\!-\!\! add
67
          ^{\prime}0\,^{\prime} when ^{"}110000\,^{"} , --\mathrm{sub}
68
          0, when 000001, —nand
69
          '0' when "000010",
70
          '0' when others;
71
72
     with func select
73
       reg_dest_f \le '1' \text{ when "100000", } --add
          '1' when "110000", —sub
74
         '1' when "000001", —nand
75
          '1' when "000010", —or
76
          'Z' when others;
77
     with func select
78
       reg_write_f <= '1' \text{ when "} 100000", --add
79
         '1' when "110000", —\operatorname{sub}
80
         '1' when "000001", —nand
81
          '1' when "000010", —or
82
          'Z' when others;
83
     with func select
84
```

```
ALU_src_f \le '0' \text{ when "100000", } --add
85
          '0' when "110000", —sub
86
          '0' when "000001", —nand
87
          '0' when "000010", --or
88
          'Z' when others;
89
90
     with func select
       ALU_{op_f} \le "000" when "100000", --add
91
         "00\overline{1}" when "110000", —sub
92
         "010" when "000001", --nand
93
         "100" when "000010", —or
94
95
         "ZZZ" when others;
     with func select
96
       mem_write_f \ll 0 when "100000", --add
97
          '0' when "110000", —sub
98
          '0' when "000001", —nand
99
          '0' when "000010", --or
100
101
          'Z' when others;
102
     with func select
103
       mem_to_reg_f \le '1' \text{ when "100000"}, --add
          '1' when "110000", —sub
104
          '1' when "000001", —nand
105
          '1' when "000010",
106
          'Z' when others;
107
108
109
     -- set intermediate signals incase of non r-type
      instruction
110
     with inst_in select
       branch_o <= '0' when "100011", --lw
111
          '0' when "101011", —sw
112
          '1' when "000100", —beq
113
          '0' when "000010", —andi
114
115
          '0' when "000011", --ori
116
          '0' when others;
     with inst_in select
117
118
       reg_dest_o \ll 0' when "100011", --lw
          '0' when "101011", —sw
119
          '0' when "000100", —beq
120
          '0' when "000010", —andi
121
122
          '0' when "000011", —ori
          'Z' when others;
123
```

```
124
     with inst_in select
        reg_write_o \ll '1' when "100011", --lw
125
          '0' when "101011", —sw
126
          "0" when "000100", --beq
127
          '1' when "000010", —andi
128
129
          '1' when "000011", --ori
          'Z' when others;
130
131
      with inst_in select
       ALU_{src_o} \le '1' \text{ when "100011"}, --lw
132
          '1' when "101011", --sw
133
          '0' when "000100",
                             ---beq
134
          '1' when "000010", —andi
135
          '1' when "000011", —ori
136
          'Z' when others;
137
138
     with inst_in select
139
       ALU_{op_o} \le "000" \text{ when } "100011", --lw
         "000" when "101011", --sw
140
         "001" when "000100",
                               ---beq
141
         "011" when "000010", —andi
142
         "100" when "000011", — ori
143
         "ZZZ" when others;
144
      with inst_in select
145
       mem_write_o <= '0' when "100011", --lw
146
          '1' when "101011", --sw
147
          '0' when "000100", —beq
148
          '0' when "000010", --andi
149
          '0' when "000011", —ori
150
          'Z' when others;
151
152
     with inst_in select
       mem\_to\_reg\_o \le '0' when "100011", --lw
153
          '1' when "101011", —sw
154
          '1' when "000100", --beq
155
          '1' when "000010", —andi
156
          '1' when "000011", --ori
157
          'Z' when others;
158
159
160
     -- select from intermediate signals
161
     with inst_in select
       branch <= branch_f when "000000",
162
163
          branch_o when others;
```

```
with inst_in select
164
165
        reg_dest \le reg_dest_f when "000000",
166
          reg_dest_o when others;
167
     with inst_in select
        reg_write <= reg_write_f when "000000",
168
169
          reg_write_o when others;
      with inst_in select
170
        ALU_src \le ALU_src_f when "000000",
171
172
          ALU_src_o when others;
      with inst_in select
173
174
        ALU_{op} \leftarrow ALU_{op_f} \text{ when "000000"},
          ALU_op_o when others;
175
176
      with inst_in select
177
        mem_write <= mem_write_f when "000000",
          mem_write_o when others;
178
179
      with inst_in select
        mem\_to\_reg \le mem\_to\_reg\_f when "000000",
180
181
          mem_to_reg_o when others;
182 end behavioral;
183
184 ----
185
186 library ieee;
187 use ieee.std_logic_1164.all;
188 use ieee.numeric_std.all;
189
190 entity dataMem is
191
      port (
192
      data : out std_logic_vector(31 downto 0);
193
      sel : in std_logic_vector(31 downto 0);
     wData : in std_logic_vector(31 downto 0);
194
     memWrite : in std_logic;
195
      clk : in std_logic);
197 end dataMem;
198
199 architecture behavioral of dataMem is
200 type mem_arr is array(0 to 255) of std_logic_vector(31
       downto 0);
201 signal mData : mem_arr;
202 begin
```

```
203
     data <= mData(to_integer(resize(unsigned(sel),8)));
204
205
     wrProc: process(clk) is
206
     begin
        if falling_edge(clk) then
207
208
        if (memWrite = '1') then
209
              mData(to_integer(resize(unsigned(sel),8))) <=
      wData;
210
       end if;
211
       end if;
212
     end process;
213 end behavioral;
214
215
216
217 -
218 library ieee;
219 use ieee.std_logic_1164.all;
220 use ieee.numeric_std.all;
221
222 entity ALU is
223
       port (
224
       inA : in std_logic_vector(31 downto 0);
225
       inB : in std_logic_vector(31 downto 0);
226
        ctl: in std_logic_vector(2 downto 0);
227
       res : out std_logic_vector(31 downto 0));
228 end ALU;
229
230 architecture behavioral of ALU is
231 signal add : std_logic_vector(31 downto 0);
232 signal sub : std_logic_vector(31 downto 0);
233 signal andres : std_logic_vector(31 downto 0);
234 signal nandres : std_logic_vector(31 downto 0);
235 signal orres : std_logic_vector(31 downto 0);
236
       begin
237
       add <= std_logic_vector(signed(inA)+signed(inB));
238
       sub <= std_logic_vector(signed(inA)-signed(inB));
239
       andres <= std_logic_vector(unsigned(inA) and
       unsigned (inB));
```

```
240
       nandres <= std_logic_vector(not(unsigned(inA) and
      unsigned (inB)));
241
       orres <= std_logic_vector(unsigned(inA) or unsigned(
      inB));
242
243
     -- Multiplexer
     with ctl select
244
       res \ll add when "000",
245
          sub when "001",
246
          nandres when "010",
247
          andres when "011",
248
          orres when "100",
249
         "00000000000000000000000000000000" when others;
250
251 end behavioral;
252
253 --
254
255 library ieee;
256 use ieee.std_logic_1164.all;
257 use ieee.numeric_std.all;
258 entity processor is
259
       port (
260
       extPC : in std_logic_vector(31 downto 0);
261
       IMdata: in std_logic_vector(31 downto 0);
       DMdata: in std_logic_vector(31 downto 0);
262
       IMwrite : in std_logic;
263
264
       DMwrite: in std_logic;
265
       DMaddr: in std_logic_vector(31 downto 0);
266
        stall : in std_logic;
267
       clk : in std_logic
268);
269 end processor;
270
271 architecture behavioral of processor is
272 signal im_wrEn, im_clk : std_logic;
273 signal im_data, im_addr, im_wData : std_logic_vector(31
      downto 0);
274 signal dm_wrEn, dm_clk : std_logic;
275 signal dm_data, dm_addr, dm_wData : std_logic_vector(31
      downto 0);
```

```
276 signal PC: std_logic_vector(31 downto 0);
277 signal regA, regB, wData: std_logic_vector(31 downto 0);
278 signal selA, selB, selW: std_logic_vector(3 downto 0);
279 signal aluCtl : std_logic_vector(2 downto 0);
280 signal regWrite, regDest, regClk, dm_write, aluSrc,
      memtoreg : std_logic;
281 signal aluA, aluB, aluRes : std_logic_vector(31 downto 0);
282 signal branch, branchI, zero : std_logic := '0';
283 signal braAddr : std_logic_vector(15 downto 0);
284 signal op_code, func : std_logic_vector(5 downto 0);
285
286 begin
     IM : entity work.dataMem port map(im_data,im_addr,
287
      im_wData, im_wrEn, im_clk);
     DM: entity work.dataMem port map(dm_data,dm_addr,
288
      dm_wData, dm_wrEn, dm_clk);
       RF: entity work.regFile port map(regA, regB, selA,
289
      selB, wData, regWrite, selW, regClk);
290
       ALU: entity work.ALU port map(aluA, aluB, aluCtl,
      aluRes);
291
     CTRL: entity work.control port map(op_code, func,
       stall, branchI, regDest, regWrite, aluSrc, aluCtl,
      dm_write, memtoreg);
292
293
       —all clocks synced
       im_clk \ll clk;
294
295
       dm_{clk} \ll clk;
       regClk <= clk;
296
297
298
       im_wData <= IMData;
299
       im_wrEn <= IMWrite;
       -- allow testbench to initialize
300
       process (clk)
301
302
       begin
        if (rising_edge(clk)) then
303
            if (stall = '1') then
304
                PC \le extPC;
305
            elsif(branch = '1') then
306
307
                PC \le std_logic_vector(unsigned(PC) + (
       unsigned (resize (signed (braAddr), 32)));
```

```
else
308
309
                PC <= std_logic_vector(unsigned(PC) + x"1");
310
            end if;
311
        end if;
312
        end process;
313
        braAddr <= im_data(15 downto 0);
        im_addr \le PC;
314
        aluA \le regA;
315
        with aluSrc
316
317
            select aluB <= regB when '0',
318
            std_logic_vector(unsigned(resize(signed(im_data
       (15 downto 0)), 32))) when '1',
            x"00000000" when others;
319
320
        with regDest
            select selW <= im_data(19 downto 16) when '0',
321
322
            im_data(14 downto 11) when '1',
323
            "ZZZZ" when others;
324
        with stall
325
            select dm_addr <= DMaddr when '1',
326
            aluRes when others;
327
        with stall
328
            select dm_wData <= DMdata when '1',
            regB when others;
329
330
        with stall
            select dm_wrEn <= DMWrite when '1',
331
        dm_write when others;
332
333
        with memtoreg
334
            select wData <= aluRes when '1',
335
            dm_data when others;
336
337
338
        with aluRes
            select zero <= '1' when x"00000000",
339
            '0' when others;
340
341
        branch <= branchI and zero;
342
343
344
        op_code <= im_data(31 downto 26);
345
        func <= im_data(5 downto 0);
346
```

```
selA \le im_data(24 downto 21);
348
        selB \le im_data(19 downto 16);
349 end behavioral;
                    Listing 2: Test Bench Code
 1 library ieee;
 2 use ieee.std_logic_1164.all;
 3 use ieee.numeric_std.all;
 5 entity test_bench is
 6 end test_bench;
 7
 8 architecture behavioral of test_bench is
     signal clk : std_logic;
 9
      signal extPC, IMdata, DMdata, DMaddr: std_logic_vector
10
      (31 \text{ downto } 0) := x"00000000";
11
12
     signal IMwrite, DMwrite, stall : std_logic := '0';
13 begin
14
     cpu : entity work.processor port map(extPC, IMdata,
      DMdata, IMwrite, DMwrite, DMaddr, stall, clk);
15
16
     -- clk process
     clkgen: process
17
18
     begin
        clk <= '1';
19
        wait for 1 ns;
20
21
        clk <= '0';
22
        wait for 1 ns;
23
     end process;
24
25
      tester: process
26
     begin
27
       -- init values
28
        stall \ll '1';
              IMwrite <= '0';
29
              DMwrite <= '1';
30
31
       -- put some data into the DM
32
       DMdata \le x"AAAAAAA";
       DMwrite <= '1';
33
```

347

```
DMaddr \le x"00000001";
34
35
      wait for 2 ns;
36
      DMwrite <= '1';
37
      DMaddr \le x"00000002";
38
      wait for 2 ns;
39
      DMdata \le x"CCCCCCCC";
40
      DMwrite \le '1';
41
      DMaddr \le x"00000003";
42
      wait for 2 ns;
43
      DMdata <= x"FFFFFFFF;;
44
      DMwrite <= '1';
45
46
      DMaddr \le x"00000004";
      wait for 2 ns;
47
      DMdata \le x"000000000";
48
      DMwrite <= '1';
49
      DMaddr \le x"00000005";
50
      wait for 2 ns;
51
52
      -- Now load program, start from address 1
53
54
      DMwrite \leq '0';
      IMwrite <= '1';
55
      -- lw $1, 1($zero)
56
      extPC \le x"00000001";
57
      IMdata <= b"10001100000000010000000000000001";
58
59
      wait for 2 ns;
      -- sw $1, 6($zero)
60
61
      extPC \le x"00000002";
      IMdata <= b"10101100000000010000000000000110";
62
      wait for 2 ns;
63
      -- lw $2, 2($zero)
64
      extPC \le x"00000003";
65
      66
      wait for 2 ns;
67
      — add $3, $1, $2
68
      extPC \le x"00000004";
69
      IMdata <= b"00000000001000100001100000100000";
70
      wait for 2 ns;
71
72
      — sub $4, $2, $1
      extPC \le x"00000005";
73
```

```
IMdata <= b"0000000001000001001000000110000";
74
75
       wait for 2 ns;
      -- beq $1, $2, 100
76
       extPC \le x"00000006";
77
       IMdata <= b"0001000000100010000000001100100";
78
       wait for 2 ns;
79
80
81
      -- lw $2, 4($zero)
       extPC \le x"00000007";
82
       83
       wait for 2 ns;
84
      — nand $5, $1, $2
85
       extPC \le x"00000008";
86
       IMdata <= b"00000000001000100010100000000001";
87
88
       wait for 2 ns;
      — andi $6, $2, 00FF
89
       extPC \le x"00000009";
90
       IMdata <= b"000010000100011000000000111111111";
91
92
       wait for 2 ns;
      — ori $7, $1, 00FF
93
94
       extPC \le x"0000000A";
       IMdata <= b"0000110000100111000000001111111111";
95
       wait for 2 ns;
96
      -- or $8, $1, $2
97
       extPC \le x"0000000B";
98
99
       IMdata <= b"000000000010001001000000000000010";
       wait for 2 ns;
100
101
      -- beq $1, $1 -0x000B
       extPC \le x"0000000C";
102
       103
       wait for 2 ns;
104
105
106 — Begin execution here
107
       wait for 2 ns;
       IMwrite <= '0';
108
       extPC \le x"00000000";
109
       wait for 2 ns;
110
       stall \ll 0;
111
112
```

```
113 — allow enough time for processor to execute instructions
114 wait for 100 ns;
115
116 end process;
117 end behavioral;
```