**CS 4290: Advanced Computer Organization**

**Project 3**

**Due: 07/21/2020, 11:55 pm, EST (via Canvas)**

**Important policies:**

1. Sharing of code between students is viewed as cheating and will receive appropriate action in accordance with University policy.
2. It is acceptable for you to compare your results, and only your results, with other students to help debug your program. It is not acceptable to collaborate either on the code development or on the final experiments.
3. You should do all your work in the C or C++ programming language and should be written according to the C99 or C++11 standards, using only the standard libraries.
4. Unfortunately, experience has shown that there is a very high chance that there are errors in this project description. The online version will be updated as errors are discovered. It is your responsibility to check the website often and download new versions of this project description as they become available.

**Project Description**

On the surface cache coherence seems straightforward; all caches simply must see all operations on a piece of data in the same order. Implementation of coherence, however, is not so simple. In this project, you will be creating a simulator that maintains coherent caches for a 4,8, and 16 core CMP. You will be implementing the **MSI**, **MOSI, MESI,** and **MOESI** protocols for a bus-based broadcast system.

**Specification of Simulator**

A simulator will be provided that is capable of simulating a 4, 8, and 16 core CMP system. Each core will consist of a memory trace reader. This trace reader will read in trace files provided to you. The trace reader code will be provided to you.

Each core in the CMP has one level of cache. The cache is fully associative, has infinite size,

and has a single cycle lookup time. The base cache code is provided for you. You will only need

to implement the protocol files needed to process requests at the cache (described later). The

CMP has a single memory controller which can access the off-chip memory. This memory controller is provided for you and will respond to any query (GETS or GETM) placed on the bus

with data after a 100-cycle delay.

The bus modeled is an atomic bus. This means that once a query is placed on the bus, the bus will not allow any other requests onto the bus until a DATA response is seen. Caches request the bus using the bus\_request() function. If the bus is not available, it will place the request on an arbitration queue to be scheduled in the future. This is done on a first come first served basis, with node 0 having the highest priority and node N having the lowest priority.

Each processor (trace reader) will have up to one outstanding memory request at a time. The processor will send a request to the cache and will wait until the cache responds with a DATA msg. This will be done using the send\_DATA\_to\_proc() command.

We have provided a full simulator framework in C++. This framework creates the simulator, reads in the traces, creates a basic cache structure, and creates the memory controller. All of this code is in the sim/ directory of the downloadable code. **You should not need to change ANY code in the sim/ directory.**

The protocols/ directory contains files you may need to modify. Most notably, you need to implement the protocol files. When a request comes from a processor to the cache, the cache

finds the entry and then calls **process\_cache\_request()** in the protocol. It is in this function that you should look at the cache entry’s state and decide what messages (if any) should be sent, and what state the cache should transition to. When a request is snooped on the bus, the cache finds the entry and then calls **process\_snoop\_request()** in the protocol. It is in this function that you should look at the cache entry’s state and decided what messages (if any) should be sent, and what state the cache should transition to.

To help you in understanding the framework, the MI protocol is already completed and given to

you. You MUST fill in the following files: **MSI\_protocol.h/.cpp**, **MOSI\_protocol.h/.cpp**, **MESI\_protocol.h/.cpp, MOESI\_protocol.h/.cpp**.

In order to interface properly with the Simulator, do not change any of the class names or delete any functions. **You may however need to add additional functions, states, and/or messages** in order to complete the assignment. If you add additional states, they need to be placed in both the enum in the header file, as well as the array in the protocol’s dump() function.

**Important Notes About Simulator Assumptions:**

1. All requests that are not DATA (GETS and GETM) always expect to have someone reply with DATA. To ensure this, the memory will always respond 100 cycles after the request with DATA unless another cache places DATA on the bus first.

There are cases in the traditional protocol where there were certain messages that did not expect replies (e.g. Bus\_Upgrade). These types of messages are not supported by the bus and memory, so you cannot use them. Instead you should always send a query that expects a data response (e.g. GETS and GETM). This creates situations where the cache sending the GETS or GETM may be the one that should supply the data. In these cases, the cache should simply send DATA to itself on the bus. Given the difficulties in identifying your own GETS or GETM request, the validations assume that you only reply when you are already responsible for supplying the data.

1. In general, there is more than one way to implement each protocol. For this project, the reference implementations were made with simple logic and emphasizing cache to cache transfers, while running in minimal time.

**How to run:**

In the root directory type make and hit enter. This should build an executable in the root

directory called **sim\_trace**.

Run the simulator using:

./sim\_trace -t trace\_directory -p protocol

As an example, right after download you can test your install by running:

./sim\_trace -t traces/4proc\_validation/ -p MI

Trace\_directory is the directory with the trace you want to run. A trace directory consists of a

trace for each core in the machine and a config file that contains the number of cores for this trace. Each line in the trace directory denotes one memory access and includes the action (read

or write) and the address.

Protocol is the protocol you want to run. The options supported by the framework are:

* MI (already implemented)
* MSI (need to implement)
* MOSI (need to implement)
* MESI (need to implement)
* MOESI (need to implement)
* MOESIF (no need to implement)

**Statistics (output)**

The simulator outputs the following statistics after completion of the run:

1. Final cache coherence state (Already output by framework)
2. Number of cycles to complete execution (Already output by framework)
3. Number of cache misses (This can be due to a cold miss or coherence)
4. Number of cache accesses (Already output by framework)
5. Number of “silent upgrades” for the MESI (and extensions) protocol
6. Number of Cache-to-cache transfers (This refers to the number of times data is not supplied by Memory)

**Validation**

Inside of each trace directory you will find multiple text files for the validation runs of each

protocol. You should perform experiments on all of the provided experiment traces.

**Experiments**

1. For each program (trace) individually, which protocol (MSI, MESI, MOSI, MOESI) would you recommend, and why?
   1. Summarize key results/take-aways in an intuitive manner using appropriate data visualization techniques (such as plots and tables) to explain your reasoning.
   2. Hint: Compare each of the provided programs using the various protocols you implemented (MSI, MESI, MOSI, MOESI).
   3. Hint: Using the statistics above (and any other information you deem necessary), reason out why certain protocols perform better for certain traces.
2. If you were to architect a system where all the provided programs were equally important, which protocol (MSI, MESI, MOSI, MOESI) would you use, and why?
   1. Summarize key results/take-aways in an intuitive manner using appropriate data visualization techniques (such as plots and tables) to explain your reasoning.
   2. Hint: Recall lessons learnt on aggregation and performance evaluation in the early part of the course.
   3. Hint: Note that the traces have a mix of 4-core, 8-core and 16-core configurations. You may choose to propose one protocol for all three configurations, or one for each. Justify your reasoning.
3. What are the limitations of the simulator? What are some of the enhancements needed to make it more realistic? Limit your answer to a couple of paragraphs.

**Deliverables**

What to hand in via T-Square:

* **<gtusername>\_prj3.tar.gz**
  + This should be an archive that contains exactly one folder and one file in its root.
  + The commented source code for the protocols added to the simulator program. This is the protocols/ folder from your completed implementation.
  + A report (.pdf) that contains the design results as required by the Experiments section above.

Remember that your code must compile and run on a current variant of Linux (i.e., Debian, Red Hat, Ubuntu) running on an x86 architecture (i.e., Intel or AMD).

Late submissions will be deducted 25% per day

**Grading Rubric**

0% You do not hand in anything

+50% Your simulator doesn't run, does not work, but you hand in significant commented code

+35% Your simulator matches the validation outputs

+15% You completed the experimental section