# ECE 511 Assignment 1 Part 1

#### Andrew Smith

January  $29^{th}$ , 2019

## 1 Introduction

This part of the assignment was intended to familiarize ourselves with the popular computer architecture simulation tool Gem5. I set up processors with three different memory configurations and analyzed their performance across two different benchmarks. The Fast Fourier Transform benchmark and the Correlation Medium benchmarks highlight the performance improvements of adding an L3 cache and more processors. These systems are detailed in section 2.

## 2 Systems

#### 2.1 Baseline System

The first system that I configured was a benchmark system. This system was a dual CPU ARM processor with a 16kB L1 instruction cache, 64kB L1 data cache, and a single 256kB L2 cache. The topology is shown below in figure 2.1.



Figure 1: Baseline system configuration

#### 2.2 L3 System

The second system was a dual CPU ARM system with the same L1 caches as the baseline but this time each CP has a private L2 cache and a 2MB L3 cache. The system configuration is shown below (Figure 2.2).



Figure 2: Baseline system with an L3 cache

## 2.3 8 CPU System

The third system was a 8 CPU ARM system with the same L1 and L3 caches as the base line, but with two shared 256kB L2 caches. The system configuration is shown below (Figure 2.3).



Figure 3: 8 CPU system with two L2 caches and a L3 cache

## 3 Results

| Correlation Medium Cache Miss Rates |     |          |          |          |
|-------------------------------------|-----|----------|----------|----------|
| Config\Cache                        | L1I | L1D      | L2       | L3       |
| Baseline                            | 0   | 0.005041 | 0.537363 | ~        |
| Baseline + L3                       | 0   | 0.005066 | 0.531795 | 0.022007 |
| 8 CPU + L3                          | 0   | 0.005066 | 0.531795 | 0.022007 |

Figure 4: Cache Miss rates for Correlation Medium

| FFT Cache Miss Rates       |          |          |          |          |  |
|----------------------------|----------|----------|----------|----------|--|
| Config\Cache L1I L1D L2 L3 |          |          |          |          |  |
| Baseline                   | 0.000174 | 0.002112 | 0.543761 | ~        |  |
| Baseline + L3              | 0.000174 | 0.002375 | 0.69136  | 0.624941 |  |
| 8 CPU + L3                 | 0.000109 | 0.002585 | 0.695892 | 0.614157 |  |

Figure 5: Cache Miss rates for Fast Fourier Transform

| Correlation Medium Cache Hit Rates |     |          |          |          |
|------------------------------------|-----|----------|----------|----------|
| Config\Cache                       | L1I | L1D      | L2       | L3       |
| Baseline                           | 1   | 0.994959 | 0.462637 | ~        |
| Baseline + L3                      | 1   | 0.994934 | 0.468205 | 0.977993 |
| 8 CPU + L3                         | 1   | 0.994934 | 0.468205 | 0.977993 |

Figure 6: Cache Hit rates for Correlation Medium

| FFT Cache Hit Rates |          |          |          |          |
|---------------------|----------|----------|----------|----------|
| Config\Cache        | L1I      | L1D      | L2       | L3       |
| Baseline            | 0.999826 | 0.997888 | 0.456239 | ~        |
| Baseline + L3       | 0.999826 | 0.997625 | 0.30864  | 0.375059 |
| 8 CPU + L3          | 0.999891 | 0.997415 | 0.304108 | 0.385843 |

Figure 7: Cache Hit rates for Fast Fourier Transform

| Correlation Medium Misc Stats |                 |                         |                   |  |  |
|-------------------------------|-----------------|-------------------------|-------------------|--|--|
| Config\Stat                   | IPC (inst/tick) | Simulation Time (ticks) | Cache Write Backs |  |  |
| Baseline                      | 0.0008083371    | 1842635153500           | 40147             |  |  |
| Baseline + L3                 | 0.0008322298    | 1789733326000           | 704321            |  |  |
| 8 CPU + L3                    | 0.0008322296    | 1789733326000           | 704321            |  |  |

Figure 8: Miscellaneous statistics for Correlation Medium

| FFT Misc Stats |                 |                         |                      |                      |  |
|----------------|-----------------|-------------------------|----------------------|----------------------|--|
| Config\Stat    | IPC (inst/tick) | Simulation Time (ticks) | Cache Write Backs L2 | Cache Write Backs L3 |  |
| Baseline       | 0.0011668012    | 530865809500            | 112812               | ~                    |  |
| Baseline + L3  | 0.0011681214    | 530418427000            | 68662                | 49853                |  |
| 8 CPU + L3     | 0.0014541478    | 425822208000            | 68418                | 47233                |  |

Figure 9: Miscellaneous statistics for Fast Fourier Transform

## 4 Conclusion

The results for the three systems are fairly straightforward to analyze. The 8 CPU system outperforms the other two systems because of its collectively larger cache hierarchy and because it can use the 8 CPUs to exploit the most parallelism from the two benchmark algorithms.

It is interesting to note that the L3 caches have much lower hit rates than the L1 and L2 caches, this is due to the L3 being accessed infrequently compared to the L1 and L2 caches.

The more cache levels the faster the simulations ran in terms of Instructions per Tick Figures 3 and 3. Presumably because the L3 Cache hid some of the memory latency improving the performance.