# Single Cycle Datapath Processor using MIPS

# Piero Morales

Computer Science University Student
University of Engineering and Technology
Lima, Peru
piero.morales@utec.edu.pe

Abstract—Two undergraduates implemented a 32 bits pathline based on RISC MIPS for a computer architecture course. This datapath support instructions R-type, I-type and J-type. This included designing the architecture in Verilog, developing test bench modules for the implementation.

*Index Terms*—Computer architecture, risc, verilog, processor, big endgian, microprocessor without interlocked pipeline stages

## I. INTRODUCTION

The form, design, and implementation of CPUs have changed over the course of their history, but their fundamental operation remains almost unchanged. The CPU has become the nerve center of any computer, from mobile devices to supercomputers. From the beginning of computer era scientists have tried to improve processor performance not only increasing the number of transistors, but also by improving the instructions that the processor executes. A major change that happened for CPUs is the paradigm from a single core to multi core that increased significantly its performance. In this way, Moore's law, that until this moment had traced the future of processors, is discarded.

# II. METHODOLOGY

For this project we use a Hardware Description Language (HDL) to design and simulate our processor and all its components related. The datapath was coded in Verilog. We choose Verilog [2] as HDL because is widely used in the industry and it was simulated and tested using test bench modules.

The goal of this project is achieve a better understanding of MIPS single-cycle and implement it with focussing in the basic operations with integers, covering R-type, I-type and J-type instructions for 32 bits MIPS ISA:

TABLE I

| Instructions  |             |               |  |  |
|---------------|-------------|---------------|--|--|
| ADD           | Subtraction | AND           |  |  |
|               | (SUB)       |               |  |  |
| NOR           | OR          | Set Less Than |  |  |
|               |             | (SLT)         |  |  |
| Jump Register |             |               |  |  |
| (JR)          |             |               |  |  |

# Angel Motta

Computer Science University Student
University of Engineering and Technology
Lima, Peru
angel.motta@utec.edu.pe

TABLE II I Type

| Instructions    |                                     |                        |  |
|-----------------|-------------------------------------|------------------------|--|
| Add Immediate   | Subtraction Inmediate AND Inmediate |                        |  |
| (ADDI)          | (SUBI)                              | (ANDI)                 |  |
| OR Immediate    | Set Less Than Immediate             | Store Byte             |  |
| (ORI)           | (SLTI)                              | (SB)                   |  |
| Store Halfword  | Store Word                          | Load Byte              |  |
| (SH)            | (SW)                                | (LB)                   |  |
| Load Halfword   | Load Word                           | Load Upper             |  |
| (LH)            | (LW)                                | Immediate (LUI)        |  |
| Branch On Equal | Branch On Not Equal                 | Branch On Greater      |  |
| (BEQ)           | (BNEQ)                              | than equal zero (BGEZ) |  |

TABLE III J Type

| Instructions |               |  |  |
|--------------|---------------|--|--|
| Jump         | Jump and Link |  |  |
| (J)          | (JAL)         |  |  |

## A. Datapath

To achieve the goal of supporting all instructions listed before we need to implement the following components:

- Aritmetic Logic Unit (ALU), one of the core componentes of the processor who make the operations of addition, subtraction, comparation between two numbers, logic AND, logic OR, logic NOR.
- Instruction Memory, stores all the instructions to be read and executed according to the address selected.
- PC Counter, a register to hold the address of the current instruction being executed.
- Register File, space that stores 32 registers for MIPS ISA, each one of 32 bits.
- Data Memory, stores the data to support load and stores instructions.
- Multiplexor 2 to 1, determine which of the 2 inputs input select, based on a selector signal i.e. in the selection between the PC Counter, the branch or the jump.
- Adder, execute PC + 4 to link the following instruction, also is used for the offset to cover the branch instruction.
- Shift Left 2 and 16, to be used to calculate the offset for the branch and load a number up to 32 bits respectively.
- Sign extend, used to extend the most significant bit of

the number.

 and the control component for support all the instructions deciding which signal activate depending on the type of instruction and the operation.

The structure of the datapath including all the components:



Fig. 1. Datapath.

# B. Verilog

The implementation in Verilog is in a single file called Datapath.v where is coded the instruction set architecture. Three differents files called instruction.txt are used to test and validate the correct functionality of the implementation along with a file called Test\_datapath.v In total the implementation is composed of the following 5 files:

```
Datapath.v (implementation)
Test_datapath.v (testing module)
instruction.txt (for R-type)
instruction.txt (for I-type)
instruction.txt (for J-type)
```

# III. EXPERIMENTAL SETUP

To test our datapath we first test each component separetly to isolate the problems, once all the components pass its respective tests we continue with a complete test of our datapath, in order to do that we agroup the previous instructions in 3 files to be loaded in the instruction memory.

Each test bench file contains the instructions and its respective operands (registers). Previously for purpose testing we load the register file with random values.

In each test bench are using 10 nanoseconds as a positive clock signal and negative clock signal which give us 20 nanoseconds in total per clock cycle.

To get a real approach of the use of the processor we will run the following C code.

The translation into MIPS instructions it would be as follow:

```
C:/Programas/Modeltech_pe_edu_10.4a/examples/Sabado/regfile_lab5.v - Default
 Ln#
         // Design Name : regfile
         // File Name
                       : regfile_lab5.v
                         : Store the MIPS 32 registers
         // Function
         // Coder
                         : Raúl Mosquera Pumaricra
       module regfile(in1, in2, in3, wreg, clk, sel, out1, out2);
         input inl;
         input in2;
         input in3;
 11
         input clk;
         input wreg;
 13
         input sel;
         output outl
 15
         output out2;
 17
         wire sel;
 18
         wire [4:0] in1, in2, in3;
 19
         wire [31:0] wreg;
 20
         wire [31:0] out1, out2;
         reg[31:0] memory[31:0];
```

Fig. 2. Register File module in ModelSim.

#### TABLE IV REGISTERS

| Register | Number in decimal | Number in hexadecimal |
|----------|-------------------|-----------------------|
| number   | notation          | notation              |
| 16       | 49527             | 0000C177              |
| 17       | 63767             | 0000F917              |
| 18       | 31778             | 00007C22              |
| 19       | 23198             | 00005A9E              |
| 20       | 917               | 00000395              |
| 21       | 24182             | 00005E76              |
| 22       | 52687             | 0000CDCF              |
| 23       | 20726             | 000050F6              |
| 29       | 150               | 0000096               |

The others registers, 1 to 15, 24 to 28 and 30,31 remain with zero.

As we can use the factorial function will use most of the intructions implemented and the recursivity technique, this will be our test bench 4.

To calculate the CPU time [5] (time processing) for each test bench we use the following formula:

$$Time = PI * CPI * TimeperClockCycle$$

where PI is Program Instructions and CPI is Clock Cycles per instruction.

### IV. EVALUATION

We executed the test bench 1, 2, 3 and 4, these were the results:

TABLE V TESTBENCH 1

| Instructions  |                         |                     |  |
|---------------|-------------------------|---------------------|--|
| ADD           | ADD Subtraction (SUB)   |                     |  |
| NOR           | OR                      | Set Less Than (SLT) |  |
| Add Immediate | Subtraction Inmediate   | AND Inmediate       |  |
| (ADDI)        | (SUBI)                  | (ANDI)              |  |
| OR Immediate  | Set Less Than Immediate |                     |  |
| (ORI)         | (SLTI)                  |                     |  |

# TABLE VI TESTBENCH 2

| Instructions         |                |            |  |
|----------------------|----------------|------------|--|
| Store Byte           | Store Halfword | Store Word |  |
| (SB)                 | (SH)           | (SW)       |  |
| Load Byte            | Load Halfword  | Load Word  |  |
| (LB)                 | (LH)           | (LW)       |  |
| Load Upper Inmediate |                |            |  |
| (LUI)                |                |            |  |

TABLE VII TESTBENCH 3

| Instructions    |                     |                        |  |
|-----------------|---------------------|------------------------|--|
| Branch On Equal | Branch On Not Equal | Branch On Greater      |  |
| (BEQ) (BNEQ)    |                     | than equal zero (BGEZ) |  |
| Jump            | Jump and Link       | Jump Register          |  |
| (J)             | (JAL)               | (JR)                   |  |

Since we need to simulate the branch and the jumps between the instructions additional instructions were added i.e. ADDI and SUB

We ran the test bench in intervals of 100 nanoseconds as we can see in Fig. 5. . For the test bench 1 we get 100 nanoseconds + 100 nanoseconds + 20 nanoseconds with in total give us 220 nanoseconds.

We apply the same procedure to the test bench 2, Fig. 8. and the result was 100 nanoseconds + 40 nanoseconds = 140 nanoseconds.

In test bench 3, Fig. 9., we got 100 nanoseconds + 100 nanoseconds + 100 nanoseconds + 40 nanoseconds with th total of 340 nanoseconds.

And finally for the test bench 4, Fig. 6, we use intervals of 500 nanoseconds since the execution is elevated, we get 2610 nanoseconds in total also the output of the factorial of 10 was 0x003750f00 which in decimal notation correspond to 3628800.

Comparing the results of Fig. 5., Fig. 8. and Fig. 9. with the Table VIII we get the same amount of clock cycles and the time for each file, also the results of the instructions are as we expected.

# V. CONCLUSION

- A team of 2 undergraduates designed and implemented and tested a 32-bits MIPS processor. The implementation was completed as part of an academic semester-long Computer Architecture course.
- This implementation of single-cycle datapath is a close replica of the original in the early days of RISC architecture. Nowadays this approach show limitations of performance due to the execution of 1 instruction per cycle and this implementation is not considering pipeline technique to improve the performance.
- The implementation successfully passed all tests bench including a factorial program which used many components of the architecture.

```
int fact(int n){
    if (n<1)
    return 1;
    else
    return n*factorial(n-1)
    }
variable = factorial(10);</pre>
```

Fig. 3. Factorial function - C code.

| ADDi  | \$a0 \$0  |      | 10     |
|-------|-----------|------|--------|
| JAL   | factorial |      |        |
| ADD   | \$s0      | \$v0 | \$0    |
| SUBI  | \$sp      | \$sp | 8      |
| SW    | \$a0      | \$sp | 0      |
| SW    | \$ra      | \$sp | 4      |
| SLTI  | \$t0      | \$a0 | 1      |
| BEQ   | \$t0      | \$0  | label1 |
| ADDI  | \$v0      | \$0  | 1      |
| ADDI  | \$sp      | \$sp | 8      |
| JR    |           | \$ra |        |
| SUBI  | \$a0      | \$a0 | 1      |
| JAL   | factorial |      |        |
| LW    | \$a0      | \$sp | 0      |
| LW    | \$ra      | \$sp | 4      |
| ADDI  | \$sp      | \$sp | 8      |
| MULTI | \$v0      | \$v0 | \$a0   |
| JR    |           | \$ra |        |

Fig. 4. Factorial function - MIPS.

## VI. COMMENTS

- When we are simulating our component in ModelSim no warnings must appear when the simulation starts, otherwise there was some error or unexpected behaviour.
   One common issue is referring a wire or register as an input or output of a module with different length.
- In ModelSim is the identifier is not declare verilog assume is a wire.
- The verilog compiler doesn't warn you when a module instantiation does not exists until you simulate it
- One common problem is asume the execution of the code in the components of the datapth will be sequential, that is not correct since we have the always @ block and that could be executed in the upper sign of the clock or the lower sign.
- For those who are used to the conditional statements of the programming languages it is a little difficult at the beginning use verilog, because at the digital circuit level there we only have and, or, xor and all the gates.
- To find the errors in the testing fase we can navigate in the windows objects in ModelSim throught the modules to find the issue.

# REFERENCES

[1] MIPS.com. (2016). MIPS® Architecture for Programmers Volume II-A: The MIPS32® Instruction Set Manual. [online] Available

## TABLE VIII CLOCK CYCLES

|              | Total instructions | Total executed instructions (Expected) | Clock<br>Cycles | CPU Time<br>(Nanoseconds) |
|--------------|--------------------|----------------------------------------|-----------------|---------------------------|
| Test bench 1 | 11                 | 11                                     | 11              | 220                       |
| Test bench 2 | 7                  | 7                                      | 7               | 140                       |
| Test bench 3 | 22                 | 17                                     | 17              | 340                       |
| Test bench 4 | 18                 | 131                                    | 131             | 2620                      |

Fig. 5. Execution results for test bench 1.

- at: https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00086-2B-MIPS32BIS-AFP-6.06.pdf [Accessed 27 Nov. 2018].
- [2] IEEE Standard for Verilog Hardware Description Language. IEEE Standard 1364-2005 (Revision of IEEE Standard 1364-2001). http://dx. doi.org/10.1109/IEEESTD.2006.99495, 2006. Last access 26 November 2018.
- [3] Mentor.com. (2018). ModelSim PE Student Edition. [online] Available at: https://www.mentor.com/company/higher\_ed/ modelsim-student-edition [Accessed 27 Nov. 2018].
- [4] Ashenden, P. (2008). Digital Design: An Embedded Systems Approach Using Verilog. Burlington, MA: Elsevier Science, pp.22,23.
- [5] Patterson, D., Hennessy, J. and Alexander, P. (2012). Computer organization and design. 4th ed. Waltham, Mass: Morgan Kaufmann, pp.35.
- [6] Wolframalpha.com (2018). Wolfram—Alpha: Making the world's knowledge computable. [online] Wolframalpha.com. Availableat:https:// www.wolframalpha.com/input/?i=factorial+10 [Accessed 28 Nov. 2018].

Fig. 6. Execution results of the test bench for the factorial.



Fig. 7. Wolfram Alpha - Factorial of 10. [6]

Fig. 8. Execution results for test bench 2.



Fig. 9. Execution results for test bench 3.