# Description

In this project you are going to get familiar with designing an out of order MIPS processor. The code for in-order processor with caches is provided. You are free to use your platform from the previous project if you made it pass all the test programs.<sup>1</sup> An illustrative example of the desired architecture is shown below:



| Fetch/Decode/Rename/Issue/Commit width | 1/1/1/1                    |
|----------------------------------------|----------------------------|
| Branch Prediction                      | Always not taken           |
| Decode Queue                           | 8-entry, FIFO              |
| Rename Queue                           | 8-entry, FIFO              |
| Renaming                               | Dual-RAT, No checkpointing |
| Issue Queue                            | 16-entry, compacting       |
| Load/Store Queue                       | 16-entry, FIFO             |
| Physical Register File                 | 64 entries                 |
| Functional Units                       | 1 ALU                      |
| ROB                                    | 64 entries                 |

# Requirements

In this project you are asked to complete the following tasks:

- Compile and run the provided code for all the test programs and immediately report if you see any problem.
- Convert the existing in-order code to out-of-order code.

 $<sup>^{1}</sup>$ If you do use your code from project 2, be sure to modify the Makefile to set PROJECT=4.

- Report all the issues regarding implementing out of order that you solve in your project (and how).
- Create a project report (details are enumerated below).

To test your design, you will run the test programs provided for this project. For full credit, the test applications in the list must all execute correctly. Substantial partial credit will be awarded for being able to get *any* test application to run correctly. Traditionally, noio is the easiest one to get working. Note that the cycle count is always higher than the instruction count. Please explain this discrepancy in your writeup, along with the cycle and instruction counts you observed.

| Application | Instruction Count |
|-------------|-------------------|
| noio        | 2081              |
| file        | 95215             |
| hello       | 95705             |
| class       | 97177             |
| sort        | 104924            |
| fact12      | 110820            |
| matrix      | 137286            |
| hanoi       | 201667            |
| ical        | 216479            |
| fib18       | 305749            |

## **Extra Credit**

This lab includes the opportunity to earn up to 200 bonus points for implementing optimizations to the required structures. (You can compare these relative to the project itself, which is worth 100 points.) Some of the possible things that you can do for bonus points include:

- Dual-issue superscalar processor
- Dynamic branch prediction
- Lock-up free/non-blocking caches
- Separate load and store queues, with load-store forwarding and speculation
- Checkpointing
- Exceptions
- Floating Point

Bonus points will be awarded based on the difficulty of the task and the quality of the implementation. For example, a full load-store forwarding scheme with speculation and a dependence predictor will be worth many more points than a simple forwarding scheme.

You can also earn the bonus points that were available in project 2. (Their values are the same as they were then, but because this project is worth more of your final grade, the point values have been adjusted.)

Four bonus points (2% of your total class grade) are available for implementing a shared L2 cache that is used by both L1 caches. It should be a 4MB cache that is 8-way set associative, with write-back and write-allocate policies. The block size is still 32 bytes, and the access latency is 3 cycles. Read requests that need to go to sim\_main will still need 9 additional cycles (12 cycles total). It is up to you to code the appropriate delays into your cache.

An additional four bonus points (2% of your total class grade) are available for implementing cache optimizations such as early restart and critical word first.

Finally, one bonus point (½% of your total class grade) is available for implementing Load Linked and Store Conditional, and running sim\_main with the -1 0 flag (to test the cpp programs without the LL/SC emulation).

Remember that you can only earn the bonus points for a given piece of extra credit once. (If you added an L2 to the cache project, and then also have an L2 in this project, you only get the bonus points once.)

Also, as a general policy, if you can come up with something else that you would like to add to the processor for bonus points (eg: exceptions, floating point, ...), please talk to Isaac. There will probably be bonus points for doing it.

### **Downloads**

## Verilog source

The source code for this project can be found on blackboard. The tarball includes the necessary c++ files (collectively called sim\_main) that emulate the operating system and memory, and are responsible for

driving your MIPS processor.

If you have trouble downloading the source tarball on an ECE machine, you can also use the following command to retrieve it:

```
cp /usr/ece/www/users/irichter/ece401/ece401-project3.tar.bz2 ~/
```

You should then find ece401-project3.tar.bz2 in your home directory.

To decompress the source tarball, you can use the following command:

```
tar xvjf ece401-project3.tar.bz2
```

After decompressing the source, you will have a directory named ece401-project1. In this directory will be the following contents:

#### verilog/

This folder contains the verilog source files for your processor. If you need to add any additional verilog source files, please put them in this folder.

#### sim\_main/

This folder contains the c++ source files for the simulator. You should not need to make any changes to them.

#### project3\_instructions.pdf

A copy of these instructions

#### decoder.csv

A list of MIPS instructions recognized by the decoder, and the flags that are set for each instruction. These flags are used by the Instruction Decode stage. This file is generated from decoder.v; changes made to it do not propagate.

#### Makefile

A make-compatible build script used to compile your processor. (See the Compile section below for how to use this file.)

## Verilator

You do not need to install Verilator on your system; the build system included with the product will download and compile a known-to-work version for you.

#### **Tests**

To test your processor, you will need compiled MIPS binaries; sim\_main accepts System V ELF<sup>2</sup>s. You can download and extract the tests automatically by running the following commands <sup>3</sup>:

```
#Enter the ece401-project3 folder
#If you are already in that folder, just
#skip to the next command.
cd ece401-project3

#Download and decompress the tests
make tests
```

<sup>&</sup>lt;sup>2</sup>ELFs are used on Linux the way the Portable Executables (EXEs) are used on Windows.

<sup>&</sup>lt;sup>3</sup>Alternatively, you can download the tests from: http://www.ece.rochester.edu/~irichter/ece401-tests.tar.xz and then run tar xvJf ece401-tests.tar.xz

You will then have a tests directory. Inside, there will be a cpp subfolder containing compiled versions of the c++ test applications. (These are the same test applications mentioned above in the requirements section.)

There will also be an asm subfolder containing various small tests intended to target specific issues that may be encountered while debugging your pipeline.

Each test will include the compiled ELF, a text file with the disassembly of the compiled version, and the source code used to generate the ELF.

## Compile

After getting the source files and decompressing them, you should have a directory named ece401-project3. To compile the source, enter this directory (type cd ece401-project3) and then run make. If necessary, the build system will download and compile Verilator; it will then compile your processor. If the build process does not complete successfully, there will probably be some error messages from Verilator.

By default, Verilator will treat warnings as fatal errors, and refuse to finish the compile if there are any warnings. You can edit the Makefile to tell Verilator that it should ignore the warnings and keep going. (There are comments in the makefile to indicate what should be changed.)

## Simulate

Once the build completes successfully, you should be able to run ./VMIPS, which is the compiled MIPS simulator using your verilog code. To run a test, you will need to tell VMIPS which test application to load. For example:

```
./VMIPS -f tests/cpp/noio #run the noio test
```

VMIPS supports many options. You can get a complete list by running ./VMIPS -h. Of particular interest will be the options below:

- -d [number] Number of cycles to simulate before halting. If, for example, you know that your simulation will run properly for 100 cycles, you can use this to speed through those first 100 cycles. If you do not want to stop after a certain number of cycles, set this to an extremely large positive number (1 million is good). After reaching the specified number of cycles, the simulator will drop into single-cycle mode.
- **-b** [number] A program counter to use as a breakpoint address (you can specify a hexadecimal number by prefixing it with "0x", eg: "0x13A4B2F4"). When the Instruction Fetch stage requests a given address, and that address matches this address, the simulator will drop into single-cycle mode, as if the cycle limit was reached. Once the cycle number specified by -d is reached, the simulator will pause, even if it hasnt reached this address. The simulator will also pause if the CPU attempts to execute an instruction at address 0x00000000.

Once the simulator reaches single-step mode, it will wait for further instruction. To step by a single cycle, just press enter. You can also provide another breakpoint address by specifying a new PC address. (As with the -b option, you can specify a hexadecimal address by prefixing it with "0x".)

sim\_main creates three output files when run:

#### stdout.txt

This contains anything that the test application wrote to standard output (file descriptor 1). sim\_main mirrors text output here in addition to outputting it to the terminal so that you can review it unencumbered by verbose \$display output. All test applications other than noio will contribute to this file.

#### stderr.txt

This contains anything that the test application wrote to standard error (file descriptor 2). Usually, attempts made to write to stderr are going to be due to bugs in the processor.

#### memwrite.txt

This will contain a list of reads and writes to main memory. Because sim\_main evaluates memory accesses twice per clock, each request will usually be shown twice.

#### cachewrite.txt

This will contain a list of reads and writes to the data cache. This is somewhat analogous to memwrite.txt, but is for requests that go to the caches. It makes use of the various <code>\_2DC</code> and <code>\_fDC</code> wires in MIPS.v, so you probably should be careful how they get modified. (To avoid compilation errors and ensure that the file contents are correct.)

## Some Advice

You may find the following advice useful for doing this project.

- 1. Start the project early; right now is a good time. Otherwise you won't be able to finish it before deadline.
- 2. Build the necessary data structures (FIFOs, RATs, Issue Queue, and ROB, Physical Register File) early, and extend them as needed
  - You will find base PhysRegFile and RAT modules. If you build off of them, sim\_main will provide correct register displays.
  - Note that there is already a RegRead module skeleton (which hosts an instance of a PhysRegFile).
  - Likewise, there is a RetireCommit module skeleton containing a RAT.
- 3. Iterate on modification and verification of the design using the provided test programs.

Note that sim\_main is configured by default to use the physical register file and R-RAT to obtain access the architectural registers. If you wish to run the in-order processor, you must run the following command:

```
make clean && PROJECT=3 make
```

When you are ready to switch back to the out-of-order processor, you will need to run:

```
make clean & make
```

- 4. Use \$display to print out messages on the screen as needed.
- 5. Take advantage of the asm test programs to diagnose why the pipeline may be malfunctioning.
- 6. You may reduce the total of code you need to write by creating parameterized modules<sup>4</sup>.
- 7. System calls still require that the pipeline be fully flushed.
- 8. Both partners should contribute.

### Turn-in

Your submission should be in the form of a tarball (.tar or .tar.bz2). You must include all files necessary to compile and run your processor, and the accompanying report (see below). To create the tarball, you can use a command like this:

```
#Enter the ece401-project3 folder
#If you are already in that folder, just
#skip to the next command.
cd ece401-project3

#Compress the Verilog files, your report,
#sim_main, and the Makefile
tar -cvjf usernames.tar.bz2 verilog/ report.pdf sim_main/ Makefile
```

<sup>&</sup>lt;sup>4</sup>See http://www.asic-world.com/verilog/para\_modules1.html to learn how to parameterize a module.

The tarball you create should be submitted via blackboard. If you are working with a partner, the submission should specify who the partner is. Only one submission is needed per group.

You must provide a project report explaining what you did, and explaining how you did it. You should specify what is and is not working. If something does not work, try to explain why. If you have made any changes to sim\_main, you must justify them in your report.

The report must be submitted in PDF format.

## **Plagiarism**

What you submit must be your own work (or that of your partner, if you are working in a two-person group). It is permissible to use general code snippets having nothing to do with MIPS or microprocessors (eg: bit twiddling methods or module templates) that are found online as long as you comment the code to attribute its source. You are also permitted (and encouraged) to discuss ideas with other groups, but you must not share code to avoid accidental appropriation.