# **Assignment 3: Exploring Memory Hierarchy Design in gem5**

## Pavani Chavali

Department of Computer Science, University of The Cumberlands

MSCS-531-A01: Computer Architecture and Design

Professor. Vanessa Cooper

Date: 9/14/2025

## Part2: Implementing and Analyzing Cache Configurations in gem5

## **Environment Setup:**

All the gem5 dependencies are installed Python, Scons, GCC, Git, gem5

Attaching the image for all the dependencies and their version installed

```
[pavanichavali@Pavanis-MacBook-Pro ~ % python3 --version
Python 3.13.7
pavanichavali@Pavanis-MacBook-Pro ~ % scons --version
SCons by Steven Knight et al.:
        SCons: v4.9.1.39a12f34d532ab2493e78a7b73aeab2250852790, Thu, 27 Mar 2025
 11:44:24 -0700, by bdbaddog on M1Dog2021
        SCons path: ['/Library/Frameworks/Python.framework/Versions/3.13/lib/pyt
hon3.13/site-packages/SCons']
Copyright (c) 2001 - 2025 The SCons Foundation
[pavanichavali@Pavanis-MacBook-Pro ~ % gcc --version
Apple clang version 17.0.0 (clang-1700.0.13.5)
Target: arm64-apple-darwin24.3.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
[pavanichavali@Pavanis-MacBook-Pro ~ % clang --version
Apple clang version 17.0.0 (clang-1700.0.13.5)
Target: arm64-apple-darwin24.3.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
[pavanichavali@Pavanis-MacBook-Pro ~ % git --version
git version 2.39.5 (Apple Git-154)
```

#### Built gem5 for X86 ISA:



Example Hello world program execution using gem5 after setting up gem5:

./build/X86/gem5.opt configs/example/se.py -c

/Users/pavanichavali/gem5/configs/example/Test/hello

```
gem5 Simulator System. https://www.gem5.org
gem5 is copyrighted software; use the --copyright option for details.

gem5 version 25.0.0.1
gem5 compiled Sep 5 2025 21:48:37
gem5 started Sep 7 2025 16:27:27
gem6 executing on Pavanis-MacBook-Pro.local, pid 39909
command line: ./build/X86/gem5.opt configs/deprecated/example/se.py -c /Users/pavanichavali/gem5/configs/example/Test/hello

Global frequency set at 10000000000000 ticks per second
warn: No dot file generated. Please install pydot to generate the dot file and pdf.
src/mem/dram_interface.cc:092: warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes)
src/base/statistics.hh:279: warn: One of the stats is a legacy stat. Legacy stat is a stat that does not belong to any statistics::Group. Legacy stat is deprecated.
system.remote_gdb: Listening for connections on port 7000
***** REAL SIMULATION ****
Hello world!
Exiting 0 tick 6011500 because exiting with last active thread context
pavanichavali@Pavanis-MacBook-Pro gem5 %
```

#### **Configuration:**

A system with CPU, membus and mem\_ctrl has been created and ran the configured script using syscall emulation mode, since we are focusing on simulating CPU and memory system.

```
[pavanichavali@Pavanis-MacBook-Pro gem5 % ./build/X86/gem5.opt /Users/pavanichavali/ge]
m5/configs/learning_gem5/part1/simple.py
gem5 Simulator System. https://www.gem5.org
gem5 is copyrighted software; use the --copyright option for details.
gem5 version 25.0.0.1
gem5 compiled Sep 5 2025 21:48:37
gem5 started Sep 14 2025 10:48:01
gem5 executing on Pavanis-MacBook-Pro.local, pid 34760
command line: ./build/X86/gem5.opt /Users/pavanichavali/gem5/configs/learning gem5/pa
rt1/simple.py
Global frequency set at 100000000000 ticks per second
warn: No dot file generated. Please install pydot to generate the dot file and pdf.
src/mem/dram_interface.cc:692: warn: DRAM device capacity (8192 Mbytes) does not matc
h the address range assigned (512 Mbytes)
src/base/statistics.hh:279: warn: One of the stats is a legacy stat. Legacy stat is a
 stat that does not belong to any statistics::Group. Legacy stat is deprecated.
system.remote_gdb: Listening for connections on port 7000
Beginning simulation!
Hello world!
Exiting @ tick 498755000 because exiting with last active thread context
```

## **Default Cache Configuration:**

To work with Cache's ,L1I, L1D and L2 Caches are configured into simple.py and created a whole new file with the name two level .py

```
class L1Cache(Cache):
    """Simple L1 Cache with default values"""

    assoc = 2
    tag_latency = 2
    data_latency = 2
    response_latency = 2
    mshrs = 4
    tgts_per_mshr = 20
```

Default values of the caches in configuration file are already set to L1ICache-32KiB, L1ICache-32KiB and L2Cache-256KiB

```
class L1ICache(L1Cache):
    """Simple L1 instruction cache with default values"""

# Set the default size
size = "32KiB"
```

```
class L1DCache(L1Cache):
    """Simple L1 data cache with default values"""

# Set the default size
size = "32KiB"
```

```
class L2Cache(Cache):
    """Simple L2 Cache with default values"""

# Default parameters
    size = "256KiB"
    assoc = 8
    tag_latency = 20
    data_latency = 20
    response_latency = 20
    mshrs = 20
    tgts_per_mshr = 12
```

Performance metrics:

Command used for default cache simulation:

build/X86/gem5.opt/Users/pavanichavali/gem5/configs/learning\_gem5/part1/simple.py

```
pavanichavali@Pavanis-MacBook-Pro gem5 % build/X86/gem5.opt /Users/pavanichavali
/gem5/configs/learning_gem5/part1/simple.py
gem5 Simulator System. https://www.gem5.org
gem5 is copyrighted software; use the --copyright option for details.
gem5 version 25.0.0.1
gem5 compiled Sep 5 2025 21:48:37
gem5 started Sep 14 2025 18:48:36
gem5 executing on Pavanis-MacBook-Pro.local, pid 37595
command line: build/X86/gem5.opt /Users/pavanichavali/gem5/configs/learning_gem5
/part1/simple.py
Global frequency set at 100000000000 ticks per second
warn: No dot file generated. Please install pydot to generate the dot file and p
df.
src/mem/dram interface.cc:692: warn: DRAM device capacity (8192 Mbytes) does not
 match the address range assigned (512 Mbytes)
src/base/statistics.hh:279: warn: One of the stats is a legacy stat. Legacy stat
 is a stat that does not belong to any statistics::Group. Legacy stat is depreca
system.remote_gdb: Listening for connections on port 7000
Beginning simulation!
Hello world!
Exiting @ tick 498755000 because exiting with last active thread context
```

After running simulation using default cache settings below are the values observed for hit rate, miss rate, cycles per instruction for individual Caches from stats.text file which is in m5out folder.

#### Command used:

build/X86/gem5.opt /Users/pavanichavali/gem5/configs/deprecated/example/se.py -cmd=tests/test-progs/hello/bin/x86/linux/hello --cpu-type=TimingSimpleCPU -lld size=32kB --lli size=32kB -caches

```
pavanichavali@Pavanis-MacBook-Pro gem5 % build/X86/gem5.opt /Users/pavanichavali/gem5/con
figs/deprecated/example/se.py --cmd=tests/test-progs/hello/bin/x86/linux/hello --cpu-type
=TimingSimpleCPU --l1d_size=32kB --l1i_size=32kB --caches
gem5 Simulator System. https://www.gem5.org
gem5 is copyrighted software; use the --copyright option for details.
gem5 version 25.0.0.1
gem5 compiled Sep 5 2025 21:48:37
gem5 started Sep 14 2025 20:04:35
gem5 executing on Pavanis-MacBook-Pro.local, pid 38642
command line: build/X86/gem5.opt /Users/pavanichavali/gem5/configs/deprecated/example/se.
py --cmd=tests/test-progs/hello/bin/x86/linux/hello --cpu-type=TimingSimpleCPU --l1d_size
=32kB --11i_size=32kB --caches
warn: Base 10 memory/cache size 32kB will be cast to base 2 size 32KiB.
warn: Base 10 memory/cache size 32kB will be cast to base 2 size 32KiB.
Global frequency set at 1000000000000 ticks per second
warn: No dot file generated. Please install pydot to generate the dot file and pdf.
src/mem/dram_interface.cc:692: warn: DRAM device capacity (8192 Mbytes) does not match th
e address range assigned (512 Mbytes)
src/base/statistics.hh:279: warn: One of the stats is a legacy stat. Legacy stat is a sta
t that does not belong to any statistics::Group. Legacy stat is deprecated.
system.remote_gdb: Listening for connections on port 7000
**** REAL SIMULATION ****
Hello world!
Exiting @ tick 31497000 because exiting with last active thread context
pavanichavali@Pavanis-MacBook-Pro gem5 %
```

```
system.cpu.dcache.overallHits::total
                                          1890
                                                            # number of overall hits (Count)
system.cpu.dcache.overallMisses::cpu.data 135
                                                          # number of overall misses (Count)
system.cpu.cpi
                               11.088554
                                                       # CPI: cycles per instruction (core
level) ((Cycle/Count))
system.cpu.icache.overallHits::total
                                         7051
                                                            # number of overall hits (Count)
system.cpu.icache.overallMisses::total
                                           235
                                                             # number of overall misses
(Count)
```

build/X86/gem5.opt /Users/pavanichavali/gem5/configs/deprecated/example/se.py -cmd=tests/test-progs/hello/bin/x86/linux/hello --cpu-type=TimingSimpleCPU -lld size=64kB --lli size=64kB -caches

```
pavanichavali@Pavanis-MacBook-Pro gem5 % build/X86/gem5.opt /Users/pavanichavali/gem5/con
figs/deprecated/example/se.py --cmd=tests/test-progs/hello/bin/x86/linux/hello --cpu-type
=TimingSimpleCPU --l1d_size=64kB --l1i_size=64kB --caches
gem5 Simulator System. https://www.gem5.org
gem5 is copyrighted software; use the --copyright option for details.
gem5 version 25.0.0.1
gem5 compiled Sep 5 2025 21:48:37
gem5 started Sep 14 2025 20:05:49
gem5 executing on Pavanis-MacBook-Pro.local, pid 38671
command line: build/X86/gem5.opt /Users/pavanichavali/gem5/configs/deprecated/example/se.
py --cmd=tests/test-progs/hello/bin/x86/linux/hello --cpu-type=TimingSimpleCPU --l1d_size
=64kB --l1i_size=64kB --caches
warn: Base 10 memory/cache size 64kB will be cast to base 2 size 64KiB.
warn: Base 10 memory/cache size 64kB will be cast to base 2 size 64KiB.
Global frequency set at 1000000000000 ticks per second
warn: No dot file generated. Please install pydot to generate the dot file and pdf.
src/mem/dram_interface.cc:692: warn: DRAM device capacity (8192 Mbytes) does not match th
e address range assigned (512 Mbytes)
src/base/statistics.hh:279: warn: One of the stats is a legacy stat. Legacy stat is a sta
t that does not belong to any statistics::Group. Legacy stat is deprecated.
system.remote_gdb: Listening for connections on port 7000
**** REAL SIMULATION ****
Hello world!
Exiting @ tick 31497000 because exiting with last active thread context
```

| system.cpu.dcache.overallHits::to | otal       | 1948 | # number of overall hits (Count)    |
|-----------------------------------|------------|------|-------------------------------------|
| system.cpu.dcache.overallMisses   | ::cpu.data | 136  | # number of overall misses (Count)  |
| system.cpu.cpi                    | 11.088554  |      | # CPI: cycles per instruction (core |
| level) ((Cycle/Count))            |            |      |                                     |
| system.cpu.icache.overallHits::to | tal 7      | 853  | # number of overall hits (Count)    |
| system.cpu.icache.overallMisses:  | :total     | 230  | # number of overall misses          |
| (Count)                           |            |      |                                     |

build/X86/gem5.opt /Users/pavanichavali/gem5/configs/deprecated/example/se.py -cmd=tests/test-progs/hello/bin/x86/linux/hello --cpu-type=TimingSimpleCPU -lld size=128kB --lli size=128kB -caches

```
[pavanichavali@Pavanis-MacBook-Pro gem5 % build/X86/gem5.opt /Users/pavanichavali/gem5/con]
figs/deprecated/example/se.py --cmd=tests/test-progs/hello/bin/x86/linux/hello --cpu-type
=TimingSimpleCPU --l1d_size=128kB --l1i_size=128kB --caches
gem5 Simulator System. https://www.gem5.org
gem5 is copyrighted software; use the --copyright option for details.
gem5 version 25.0.0.1
gem5 compiled Sep 5 2025 21:48:37
gem5 started Sep 14 2025 20:06:51
gem5 executing on Pavanis-MacBook-Pro.local, pid 38680
command line: build/X86/gem5.opt /Users/pavanichavali/gem5/configs/deprecated/example/se.
py --cmd=tests/test-progs/hello/bin/x86/linux/hello --cpu-type=TimingSimpleCPU --lld_size
=128kB --l1i_size=128kB --caches
warn: Base 10 memory/cache size 128kB will be cast to base 2 size 128kiB.
warn: Base 10 memory/cache size 128kB will be cast to base 2 size 128KiB.
Global frequency set at 100000000000 ticks per second
warn: No dot file generated. Please install pydot to generate the dot file and pdf.
src/mem/dram_interface.cc:692: warn: DRAM device capacity (8192 Mbytes) does not match th
e address range assigned (512 Mbytes)
src/base/statistics.hh:279: warn: One of the stats is a legacy stat. Legacy stat is a sta
t that does not belong to any statistics::Group. Legacy stat is deprecated.
system.remote_gdb: Listening for connections on port 7000
**** REAL SIMULATION ****
Hello world!
Exiting @ tick 31497000 because exiting with last active thread context
pavanichavali@Pavanis-MacBook-Pro gem5 %
```

```
system.cpu.dcache.overallHits::total 2008 # number of overall hits (Count)
system.cpu.dcache.overallMisses::cpu.data 138 # number of overall misses (Count)
system.cpu.icache.overallHits::total 8655 # number of overall hits (Count)
system.cpu.icache.overallMisses::total 240 # number of overall misses
(Count)
```

build/X86/gem5.opt /Users/pavanichavali/gem5/configs/deprecated/example/se.py -cmd=tests/test-progs/hello/bin/x86/linux/hello --cpu-type=TimingSimpleCPU -lld size=32kB --lli size=32kB --lli assoc=4 -caches

```
pavanichavali@Pavanis-MacBook-Pro gem5 % build/X86/gem5.opt /Users/pavanichavali/gem5/con
figs/deprecated/example/se.py --cmd=tests/test-progs/hello/bin/x86/linux/hello --cpu-type
=TimingSimpleCPU --l1d_size=128kB --l1i_size=128kB --l1i_assoc=4 --caches
gem5 Simulator System. https://www.gem5.org
gem5 is copyrighted software; use the --copyright option for details.
gem5 version 25.0.0.1
gem5 compiled Sep 5 2025 21:48:37
gem5 started Sep 14 2025 20:08:06
gem5 executing on Pavanis-MacBook-Pro.local, pid 38692
command line: build/X86/gem5.opt /Users/pavanichavali/gem5/configs/deprecated/example/se.
py --cmd=tests/test-progs/hello/bin/x86/linux/hello --cpu-type=TimingSimpleCPU --l1d_size
=128kB --11i_size=128kB --11i_assoc=4 --caches
warn: Base 10 memory/cache size 128kB will be cast to base 2 size 128KiB.
warn: Base 10 memory/cache size 128kB will be cast to base 2 size 128KiB.
Global frequency set at 100000000000 ticks per second
warn: No dot file generated. Please install pydot to generate the dot file and pdf.
src/mem/dram_interface.cc:692: warn: DRAM device capacity (8192 Mbytes) does not match th
e address range assigned (512 Mbytes)
src/base/statistics.hh:279: warn: One of the stats is a legacy stat. Legacy stat is a sta
t that does not belong to any statistics::Group. Legacy stat is deprecated.
system.remote_gdb: Listening for connections on port 7000
**** REAL SIMULATION ****
Hello world!
Exiting @ tick 31497000 because exiting with last active thread context
pavanichavali@Pavanis-MacBook-Pro gem5 %
```

| system.cpu.dcache.overallHits::total      | 1885 | # number of overall hits (Count)   |
|-------------------------------------------|------|------------------------------------|
| system.cpu.dcache.overallMisses::cpu.data | 130  | # number of overall misses (Count) |
| system.cpu.icache.overallHits::total      | 7057 | # number of overall hits (Count)   |
| system.cpu.icache.overallMisses::total    | 229  | # number of overall misses         |
| (Count)                                   |      |                                    |

#### **Observations and Analysis:**

The simulation report provides a highly positive view of the cache and memory hierarchy design for this specific workload. Rather than indicating a performance problem, the data suggests that the default cache configuration is exceptionally efficient and well-suited for the task.

#### **Optimal Cache Sizing and Efficiency**

The most positive takeaway is the remarkable effectiveness of the 32KB L1 cache. The data shows that this cache size is more than sufficient to contain the entire working set of the "hello" program. The consistently low number of misses (around 135 for the dcache and 235 for the icache) demonstrates that the cache is performing at near-perfect efficiency. This finding is crucial for a system architect, as it proves that a larger, more expensive, or power-consuming cache is completely unnecessary for this type of workload.

#### **Successful Conflict Miss Mitigation**

The slight decrease in misses when L1i associativity was increased from its default to 4-way is a positive indicator of the cache's ability to be finely tuned. This small reduction confirms the effectiveness of higher associativity in mitigating conflict misses, proving the cache's design is robust and can be adapted to more complex memory access patterns. It shows that even in an already highly efficient system, minor, targeted optimizations can still yield positive results.

#### **Performance Bottleneck Is Not The Cache**

The unchanging CPI (Cycles Per Instruction) of 11.088554 across all configurations is an extremely positive sign for system predictability. It confirms that the memory hierarchy is no longer the performance bottleneck. Since the cache is so efficient, the CPU is rarely stalled waiting for data, and its performance is instead limited by other factors inherent to the CPU model itself. This report is a success story, as it demonstrates a memory hierarchy that is so well-designed for its workload that it allows the CPU to operate without any significant memory-related delays.

12

**Virtual Memory Exploration:** 

The gem5 is simulated with virtual memory enabled with different page sizes and Translation

lookaside buffer.

The TLB miss rate for the default simulation is calculated as below

Data TLB (dTLB) Miss Rate:

To calculate the dTLB miss rate, we use the total number of accesses and misses for the data

TLB.

**Total dTLB Accesses:** system.cpu.mmu.dtb.rdAccesses (1084) +

system.cpu.mmu.dtb.wrAccesses (943) = 2027

**Total dTLB Misses:** system.cpu.mmu.dtb.rdMisses (11) + system.cpu.mmu.dtb.wrMisses (9) =

20

dTLB Miss Rate:

(Total Misses/ Total Accesses)=20/2027≈0.99%

This low miss rate shows the dTLB is very effective at caching page table entries for data

accesses.

**Page Faults:** 

A page fault is an operating system event that occurs when a memory access attempts to

use a virtual address that doesn't have a corresponding physical page in main memory.

The number of TLB misses gives us a close proxy for the number of page table walks. Each TLB

miss requires the system to look up the address in the page table, and these misses are often what

would eventually lead to a page fault if the page were not present in memory.

From the stats, we can infer that a total of **57** page table walks were triggered (20 from the dTLB and 37 from the iTLB), representing the system's attempts to resolve virtual address translations.

Another example data for TLB from simulation

### Data TLB (dTLB) Miss Rate

Total dTLB Accesses: system.cpu.mmu.dtb.rdAccesses (1133) +

system.cpu.mmu.dtb.wrAccesses (953) = 2086

**Total dTLB Misses:** system.cpu.mmu.dtb.rdMisses (11) + system.cpu.mmu.dtb.wrMisses (9) = **20** 

#### dTLB Miss Rate:

(Total Misses/ Total Accesses)=20/2027≈0.96%

This is a very low miss rate, indicating the dTLB is highly effective.

## **Instruction TLB (iTLB) Miss Rate:**

**Total iTLB Accesses:** system.cpu.mmu.itb.rdAccesses (0) + system.cpu.mmu.itb.wrAccesses (8083) = 8083

**Total iTLB Misses:** system.cpu.mmu.itb.rdMisses (0) + system.cpu.mmu.itb.wrMisses (37) = 37

#### iTLB Miss Rate:

(Total Misses/ Total Accesses)=37/8083≈0.46%

The iTLB also shows a very low miss rate, which is typical for instruction access patterns.

# **Trouble shooting:**

Had a hard time working with options as parameters for the commands but with the –help figured out the cache options, associative options.