# gem5's Standard Library



#### Why a Standard Library?

- Using stdlib you don't need to specify 1,000s-10,000s lines to describe the system.
- Standard "components" reduces
  - Duplicated code.
  - Error-prone configurations.
  - A lack of portability between different simulation setups.

Previously, there was se.py and fs.py

- These tried to be everything to everyone
- "Spaghetti code"
- The default "interface" to gem5 was massive bash lines and hacks

Think of gem5 more like TensorFlow than a command line tool. **gem5** is a *framework* or *language*.



### What is the Standard Library (stdlib)?

The purpose of the gem5 Standard Library is to provide a set of predefined components that can be used to build a simulation that does the majority of the work for you.

For the remainder that is not supported by the standard library, APIs are provided that make it easy to extend the library for your own use.



### The metaphor: Plugging components together into a board





#### Main idea

Due to its modular, object-oriented design, gem5 can be thought of as a set of components that can be plugged together to form a simulation.

The types of components are *boards*, *processors*, *memory systems*, and *cache hierarchies*:

- **Board**: The "backbone" of the system. You plug components into the board. The board also contains the system-level things like devices, workload, etc. It's the boards job to negotiate the connections between other components.
- **Processor**: Processors connect to boards and have one or more *cores*.
- Cache hierarchy: A cache hierarchy is a set of caches that can be connected to a processor and memory system.
- **Memory system**: A memory system is a set of memory controllers and memory devices that can be connected to the cache hierarchy.



### Remember: gem5 software architecture





#### Python

Control the simulation

sim.start()
sim.dump\_stats()
sim.stop\_at\_inst(...)

Develop experiments with the key parameters



### **Exercise: Build your own board**

#### **Characteristics**

- A single core Arm CPU using a "simple" model.
- Two-level cache hierarchy with 32 KiB 8-way L1 and 512 KiB 16-way L2.
- Single channel DDR4 2400 memory.

Run BFS from the GAP benchmark suite.

#### **Questions**

- What is the average IPC?
- What is the total simulated time?
- What is the output of the simulated program?



#### Step 0

In [materials/02-Using-gem5/01-stdlib/01-components.py] you'll see some imports already included for you.



### **Step 1: Instantiate a processor**

processor = SimpleProcessor(cpu\_type=CPUTypes.TIMING, isa=ISA.ARM, num\_cores=1)

SimpleProcessor is a component that allows you to customize the model for the underlying cores.

The [cpu\_type] parameter specifies the type of CPU model to use.



### **Step 2: Instantiate a cache hierarchy**

MESITwoLevelCacheHierarchy is a component that represents a two-level MESI cache hierarchy. This uses the <u>Ruby memory model</u>.

The component for the cache hierarchy is parameterized with the sizes and associativities of the L1 and L2 caches.

### Step 3: Instantiate a memory system

memory = SingleChannelDDR4\_2400()

This component represents a single-channel DDR4 memory system.

There is a size parameter that can be used to specify the capacity of the memory of the simulated system. You can reduce the size to save simulation time, or use the default for the memory type (e.g., one channel of DDR4 defaults to 8 GiB).

There are also multi-channel memories available.

We'll cover this more in <u>Memory Systems</u>.



### Step 4: Plug components into the board

A SimpleBoard is a board which can run any ISA in Syscall Emulation (SE) mode.

It is "Simple" due the relative simplicity of SE mode.

Most boards are tied to a specific ISA and require more complex designs to run Full System (FS) simulation.

```
board = SimpleBoard(
    clk_freq="3GHz",
    processor=processor,
    memory=memory,
    cache_hierarchy=cache_hierarchy,
)
```



### **Step 5: Set up the workload**

board.set\_workload(obtain\_resource("arm-gapbs-bfs-run"))

The obtain\_resource function downloads the files needed to run the specified workload. In this case "arm-gapbs-bfs-run" is a BFS workload from the GAP Benchmark Suite.

#### This uses "gem5 resources"

Here we can search the available resources: <a href="https://resources.gem5.org/">https://resources.gem5.org/</a>.

Here is the arm-gabps-bfs-run resource: <a href="https://resources.gem5.org/resources/arm-gapbs-bfs-run">https://resources.gem5.org/resources/arm-gapbs-bfs-run</a>? <a href="https://resources.gem5.org/resources/arm-gapbs-bfs-run">version=1.0.0</a>.



### Step 6: Set up the simulation and run

Set up the simulation:

```
simulator = Simulator(board=board)
simulator.run()
```

(More on this later, but this is object that controls the simulation loop).

#### Run it

gem5-mesi 01-components.py



### **Output**

```
Generate Time:
                     0.00462
Build Time:
                     0.00142
Graph has 1024 nodes and 10496 undirected edges for degree: 10
Trial Time:
                     0.00010
Trial Time:
                     0.00008
Trial Time:
                     0.00008
Trial Time:
                     0.00008
Trial Time:
                     0.00008
Trial Time:
                     0.00009
Trial Time:
                     0.00008
Trial Time:
                     0.00008
Trial Time:
                     0.00008
Trial Time:
                     0.00011
Average Time:
                     0.00009
```

#### stats.txt

| simSeconds | 0.009093   |
|------------|------------|
| simTicks   | 9093461436 |



#### **Exercise questions**

#### What is the average IPC?

See the ipc in the stats.txt file.

Answer:

#### What is the total simulated time?

See the simSeconds in the stats.txt file.

**Answer**: 0.009093s

#### What is the output of the simulated program?

See the standard output of the program.

Answer: "Generate Time: 0.00462"... etc.



### **Future things to consider**

- How to change the processor?
  - The number of cores
  - The type of CPU model
  - Details of the CPU model (e.g., pipeline depth)
  - The ISA
- How to change the cache hierarchy?
  - The sizes and associativities of the L1 and L2 caches
  - The number of L2 banks
  - How to change the hierarchy (e.g., 3-level, 2-level, write-through)

- How to change the memory system?
  - The size of the memory
  - The type of memory (e.g., DDR3, DDR4, HBM)
  - The number of channels

We'll cover all of this in the coming sections.



## Overview of stdlib components

A brief overview of the different kinds of components in the stdlib



### **Components included in gem5**

```
gem5/src/python/gem5/components
----/boards
----/cachehierarchies
----/memory
----/processors

gem5/src/python/gem5/prebuilt
----/demo/x86_demo_board
----/riscvmatched
```

- gem5 stdlib in src/python/gem5
- Two types
  - Prebuilt: full systems with set parameters
  - Components: Components to build systems
- Prebuilt
  - Demo: Just examples to build off of
  - riscvmatched: Model of SiFive Unmatched



#### **Components: Boards**

```
gem5/src/python/gem5/components
----/boards
    ----/simple
    ----/arm_board
    ----/riscv_board
    ----/x86_board
----/cachehierarchies
----/memory
----/processors
```

- Boards: Things to plug into
  - Have "set\_workload" and "connect\_things"
- Simple: SE-only, configurable
- Arm, RISC-V, and X86 versions for full system simulation



### **Components: Cache hierarchies**

- Have fixed interface to processors and memory
- Ruby: detailed cache coherence and interconnect
- CHI: Arm CHI-based protocol implemented in Ruby
- Classic caches: Hierarchy of crossbars with inflexible coherence



#### A bit more about cache hierarchies

- Quick caveat: You need different gem5 binaries for different protocols
- Any binary can use classic caches
- Only one Ruby protocol per gem5 binary

#### In your codespaces, we have some pre-built binaries

- [gem5]: CHI (Fully configurable; based on Arm CHI)
- [gem5-mesi]: MESI\_Two\_Level (Private L1s, Shared L2)
- gem5-vega: GPU\_VIPER (CPU: Private L1/L2 core pairs, shared L3; GPU: Private L1, shared L2)



### **Components: Memory systems**

- Pre-configured (LP)DDR3/4/5 DIMMs
  - Single and multi channel
- Integration with DRAMSim and DRAMSys
  - Not needed for accuracy, but useful for comparisons
- HBM: An HBM stack



#### **Components: Processors**

```
gem5/src/python/gem5/components
----/boards
----/cachehierarchies
----/memory
----/processors
    ----/generators
    ----/simple
    ----/switchable
```

- Mostly "configurable" processors to build off of.
- Generators
  - Synthetic traffic, but act like processors.
  - Have linear, random, and more interesting patterns
- Simple
  - Only default parameters, one ISA.
- Switchable
  - We'll see this later, but you can switch from one to another during simulation.



#### More on processors

- Processors are made up of cores.
- Cores have a "BaseCPU" as a member. This is the actual CPU model.
- [Processor] is what interfaces with [CacheHierarchy] and [Board]
- Processors are organized, structured sets of cores. They define how cores connect with each other and with outside components and the board though standard interface.

#### gem5 has three (or four or five) different processor models

More details coming in the <u>CPU Models</u> section.

- CPUTypes.TIMING: A simple in-order CPU model
  - This is a "single cycle" CPU. Each instruction takes the time to fetch and executes immediately.
  - Memory operations take the latency of the memory system.
  - OK for doing memory-centric studies, but not good for most research.



### **CPU** types

#### Other options for CPU types

- [CPUTypes.03]: An out-of-order CPU model
  - Highly detailed model based on the Alpha 21264.
  - Has ROB, physical registers, LSQ, etc.
  - Don't use [SimpleProcessor] if you want to configure this.
- CPUTypes.MINOR: An in-order core model
  - A high-performance in-order core model.
  - Configurable four-stage pipeline
  - Don't use [SimpleProcessor] if you want to configure this.
- [CPUTypes.ATOMIC]: Used in "atomic" mode (more later)
- CPUTypes.KVM: More later



#### Summary

- gem5's standard library is a set of components that can be used to build a simulation that does the majority of the work for you.
- The standard library is designed around *extension* and *encapsulation*.
- The main types of components are boards, processors, memory systems, and cache hierarchies.
- The standard library is designed to be modular and object-oriented.
- The Simulator object controls the simulation.

