# **Group member summaries:**

#### David Michelman:

David wrote most of the pipeline code (but less of the debugging than Sean), wrote the pipeline documentation, and did some of the performance analysis.

#### Alex Zhao:

All cache implementation and cache writeup.

#### Sean Rice:

Wrote a few of the pipeline functions. Primarily debugged and fixed issues and misconceptions with the pipeline stage. Made script to run all tests for easy debugging. Data analysis.

# **Implementation details:**

#### Cache:

The cache is represented by (index\_bits) dynamically allocated sets which each contain (cache\_assoc) dynamically allocated cache entries (as an array). Each cache entry has a valid bit and a tag. The entry at array index (0) is the LRU entry and the entry at index (cache assoc - 1) is the MRU.

When the system accesses the cache, it first extracts the index and tag from the address. It then checks the entire array associated with that index for an entry with the same tag and a valid bit of 1. If there is a hit, it moves all the entries after the accessed index forward one in the array and moves the accessed entry to the last element of the array (the MRU entry). If it is a miss, it moves all the entries in the array forward one, discarding the LRU entry and moving the new entry into the last index.

## Pipeline:

The pipeline is represented by a five element array containing structs that represent the instruction in that stage of the pipeline. The functions iplc\_sim\_process\_pipeline\_rtype, iplc\_sim\_process\_pipeline\_lw, iplc\_sim\_process\_pipeline\_sw, iplc\_sim\_process\_pipeline\_branch, iplc\_sim\_process\_pipeline\_jump, iplc\_sim\_process\_pipeline\_syscall, and iplc\_sim\_process\_pipeline\_nop all write an

instruction to the spot in this array that represents the decode stage of the pipeline (the array index is enum FETCH).

Calling iplc\_sim\_push\_pipeline\_stage moves each instruction to the next stage of the pipeline, and adds the cycles taken to pipeline\_cycles. Under normal operation, pipeline\_cycles will be incremented by 1 after every call, but it will be incremented 10 cycles in the case of data misses from lw and sw instruction (instruction misses are handled elsewhere). Additionally, pipeline\_cycles will be incremented by 1 if data from an LW instruction needs to be forwarded to another instruction in the ALU stage to simulate an inserted NOP. No additional cycles are added if forwarding would not add any delays in a physical processor. Iplc\_sim\_push\_pipeline\_stage will also increment pipeline\_cycles an additional time if a branch was incorrectly predicted to simulate the inserted stall. If a branch was correctly predicted then correct\_branch\_predictions will be incremented by 1.

## **Performance Summary:**

After running the tests, our data almost mirrored the expected trends. When increasing the associativity we saw a decrease in the number of cache misses, which in turn lead to a decrease in pipeline cycles and CPI. We decided that the best measure of performance is execution time, which is modeled by the equation:

Time = (Instructions \* CPI) / Clock Rate

Since we are using the assumption that all tests are run on the same machine, we can say that clock rate is constant between all tests, so we can ignore it. Additionally, the instruction count is constant between all tests, we can ignore that as well. This leaves us with our performance directly tied to the CPI; as the CPI goes up, time goes up and performance suffers. This is the reason why we chose to show the CPI in our graphs. In the graphs below, cache-miss rates went down as the associativity and block size went up.



The other factor that affected performance was branch prediction, which was responsible for adding stall cycles on incorrect branch predictions. We found that predicting branches were not taken yielded a higher number of correct branch predictions than predicting they were taken. Below we can see the performance scales the CPIs down by a small, but quite significant number.







The fastest configuration we found was 4 index bits, a block size of 4, an associativity of 4, and predicting that branches are not taken. The instruction trace took 37118 pipeline cycles to compute under these settings, yielding a CPI of 1.068. The nine configurations we tested are listed in the following table: (we found configuration 9 to be the fastest)

| Configuration # | 1     | 2     | 3    | 4    | 5    | 6    | 7    | 8    | 9    |
|-----------------|-------|-------|------|------|------|------|------|------|------|
| Index           | 7     | 6     | 5    | 6    | 5    | 4    | 6    | 5    | 4    |
| Block Size      | 1     | 1     | 1    | 2    | 2    | 2    | 4    | 4    | 4    |
| Associativity   | 1     | 2     | 4    | 1    | 2    | 4    | 1    | 2    | 4    |
| Cache Size      | 7,168 | 7,296 | 7424 | 5632 | 5696 | 5760 | 9664 | 9728 | 9792 |

### Full summary output for all runs:

## Summary output for branches predicted taken:

```
Cache Configuration
   Index: 7 bits or 128 lines
  BlockSize: 1
  Associativity: 1
  BlockOffSetBits: 2
   CacheSize: 7168
Cache Performance
    Number of Cache Accesses is 35863
    Number of Cache Misses is 1390
    Number of Cache Hits is 34473
    Cache Miss Rate is 0.038759
Pipeline Performance
    Total Cycles is 53120
    Total Instructions is 34753
    Total Branch Instructions is 7044
    Total Correct Branch Predictions is 1279
    CPI is 1.528501
Cache Configuration
   Index: 6 bits or 64 lines
  BlockSize: 1
  Associativity: 2
  BlockOffSetBits: 2
  CacheSize: 7296
Cache Performance
    Number of Cache Accesses is 35863
    Number of Cache Misses is 502
    Number of Cache Hits is 35361
    Cache Miss Rate is 0.013998
Pipeline Performance
    Total Cycles is 45324
    Total Instructions is 34753
    Total Branch Instructions is 7044
    Total Correct Branch Predictions is 1280
    CPI is 1.304175
Cache Configuration
  Index: 5 bits or 32 lines
  BlockSize: 1
  Associativity: 4
  BlockOffSetBits: 2
   CacheSize: 7424
Cache Performance
    Number of Cache Accesses is 35863
    Number of Cache Misses is 363
    Number of Cache Hits is 35500
```

```
Cache Miss Rate is 0.010122
Pipeline Performance
    Total Cycles is 44117
    Total Instructions is 34753
    Total Branch Instructions is 7044
    Total Correct Branch Predictions is 1284
    CPI is 1.269444
Cache Configuration
   Index: 6 bits or 64 lines
   BlockSize: 2
  Associativity: 1
  BlockOffSetBits: 3
   CacheSize: 5632
 Cache Performance
    Number of Cache Accesses is 35863
    Number of Cache Misses is 1301
    Number of Cache Hits is 34562
    Cache Miss Rate is 0.036277
Pipeline Performance
    Total Cycles is 52307
    Total Instructions is 34753
    Total Branch Instructions is 7044
    Total Correct Branch Predictions is 1290
    CPI is 1.505107
Cache Configuration
   Index: 5 bits or 32 lines
  BlockSize: 2
  Associativity: 2
  BlockOffSetBits: 3
   CacheSize: 5696
Cache Performance
    Number of Cache Accesses is 35863
    Number of Cache Misses is 357
    Number of Cache Hits is 35506
    Cache Miss Rate is 0.009955
Pipeline Performance
    Total Cycles is 44021
    Total Instructions is 34753
    Total Branch Instructions is 7044
    Total Correct Branch Predictions is 1290
    CPI is 1.266682
Cache Configuration
   Index: 4 bits or 16 lines
   BlockSize: 2
  Associativity: 4
  BlockOffSetBits: 3
   CacheSize: 5760
```

```
Cache Performance
    Number of Cache Accesses is 35863
    Number of Cache Misses is 200
    Number of Cache Hits is 35663
    Cache Miss Rate is 0.005577
Pipeline Performance
    Total Cycles is 42664
    Total Instructions is 34753
    Total Branch Instructions is 7044
    Total Correct Branch Predictions is 1290
    CPI is 1.227635
Cache Configuration
   Index: 6 bits or 64 lines
  BlockSize: 4
  Associativity: 1
  BlockOffSetBits: 4
  CacheSize: 9664
Cache Performance
    Number of Cache Accesses is 35863
    Number of Cache Misses is 770
    Number of Cache Hits is 35093
    Cache Miss Rate is 0.021471
Pipeline Performance
    Total Cycles is 47646
    Total Instructions is 34753
    Total Branch Instructions is 7044
    Total Correct Branch Predictions is 1291
    CPI is 1.370990
Cache Configuration
  Index: 5 bits or 32 lines
  BlockSize: 4
  Associativity: 2
  BlockOffSetBits: 4
  CacheSize: 9728
Cache Performance
    Number of Cache Accesses is 35863
    Number of Cache Misses is 144
    Number of Cache Hits is 35719
    Cache Miss Rate is 0.004015
Pipeline Performance
    Total Cycles is 42131
    Total Instructions is 34753
    Total Branch Instructions is 7044
    Total Correct Branch Predictions is 1291
    CPI is 1.212298
Cache Configuration
   Index: 4 bits or 16 lines
```

```
Associativity: 4
      BlockOffSetBits: 4
      CacheSize: 9792
    Cache Performance
       Number of Cache Accesses is 35863
       Number of Cache Misses is 78
       Number of Cache Hits is 35785
       Cache Miss Rate is 0.002175
   Pipeline Performance
       Total Cycles is 41576
       Total Instructions is 34753
       Total Branch Instructions is 7044
       Total Correct Branch Predictions is 1291
       CPI is 1.196328
Summary output for branches predicted not taken:
Cache Configuration
  Index: 7 bits or 128 lines
  BlockSize: 1
  Associativity: 1
  BlockOffSetBits: 2
  CacheSize: 7168
 Cache Performance
       Number of Cache Accesses is 35863
       Number of Cache Misses is 1390
       Number of Cache Hits is 34473
       Cache Miss Rate is 0.038759
Pipeline Performance
       Total Cycles is 48669
       Total Instructions is 34753
       Total Branch Instructions is 7044
       Total Correct Branch Predictions is 5730
       CPI is 1.400426
Cache Configuration
   Index: 6 bits or 64 lines
  BlockSize: 1
  Associativity: 2
  BlockOffSetBits: 2
  CacheSize: 7296
Cache Performance
       Number of Cache Accesses is 35863
       Number of Cache Misses is 502
       Number of Cache Hits is 35361
       Cache Miss Rate is 0.013998
Pipeline Performance
       Total Cycles is 40865
       Total Instructions is 34753
```

Total Branch Instructions is 7044

BlockSize: 4

```
Total Correct Branch Predictions is 5739
       CPI is 1.175870
Cache Configuration
  Index: 5 bits or 32 lines
  BlockSize: 1
  Associativity: 4
  BlockOffSetBits: 2
  CacheSize: 7424
Cache Performance
       Number of Cache Accesses is 35863
       Number of Cache Misses is 363
       Number of Cache Hits is 35500
       Cache Miss Rate is 0.010122
Pipeline Performance
       Total Cycles is 39657
       Total Instructions is 34753
       Total Branch Instructions is 7044
       Total Correct Branch Predictions is 5744
       CPI is 1.141110
Cache Configuration
   Index: 6 bits or 64 lines
  BlockSize: 2
  Associativity: 1
  BlockOffSetBits: 3
  CacheSize: 5632
Cache Performance
       Number of Cache Accesses is 35863
       Number of Cache Misses is 1301
       Number of Cache Hits is 34562
       Cache Miss Rate is 0.036277
Pipeline Performance
       Total Cycles is 47871
       Total Instructions is 34753
       Total Branch Instructions is 7044
       Total Correct Branch Predictions is 5726
       CPI is 1.377464
Cache Configuration
  Index: 5 bits or 32 lines
  BlockSize: 2
  Associativity: 2
  BlockOffSetBits: 3
  CacheSize: 5696
 Cache Performance
       Number of Cache Accesses is 35863
       Number of Cache Misses is 357
       Number of Cache Hits is 35506
       Cache Miss Rate is 0.009955
```

```
Pipeline Performance
       Total Cycles is 39575
       Total Instructions is 34753
       Total Branch Instructions is 7044
       Total Correct Branch Predictions is 5736
       CPI is 1.138751
Cache Configuration
   Index: 4 bits or 16 lines
  BlockSize: 2
   Associativity: 4
  BlockOffSetBits: 3
   CacheSize: 5760
 Cache Performance
       Number of Cache Accesses is 35863
       Number of Cache Misses is 200
       Number of Cache Hits is 35663
       Cache Miss Rate is 0.005577
Pipeline Performance
       Total Cycles is 38208
       Total Instructions is 34753
       Total Branch Instructions is 7044
       Total Correct Branch Predictions is 5746
       CPI is 1.099416
Cache Configuration
  Index: 6 bits or 64 lines
   BlockSize: 4
  Associativity: 1
  BlockOffSetBits: 4
   CacheSize: 9664
 Cache Performance
       Number of Cache Accesses is 35863
       Number of Cache Misses is 770
       Number of Cache Hits is 35093
       Cache Miss Rate is 0.021471
Pipeline Performance
       Total Cycles is 43208
       Total Instructions is 34753
       Total Branch Instructions is 7044
       Total Correct Branch Predictions is 5729
       CPI is 1.243288
Cache Configuration
   Index: 5 bits or 32 lines
   BlockSize: 4
  Associativity: 2
  BlockOffSetBits: 4
   CacheSize: 9728
```

# Cache Performance Number of Cache Accesses is 35863 Number of Cache Misses is 144 Number of Cache Hits is 35719 Cache Miss Rate is 0.004015 Pipeline Performance Total Cycles is 37673 Total Instructions is 34753 Total Branch Instructions is 7044 Total Correct Branch Predictions is 5749 CPI is 1.084022 Cache Configuration Index: 4 bits or 16 lines BlockSize: 4 Associativity: 4 BlockOffSetBits: 4 CacheSize: 9792 Cache Performance Number of Cache Accesses is 35863 Number of Cache Misses is 78 Number of Cache Hits is 35785 Cache Miss Rate is 0.002175 Pipeline Performance Total Cycles is 37118 Total Instructions is 34753 Total Branch Instructions is 7044 Total Correct Branch Predictions is 5749 CPI is 1.068052