# Storage: memory hierarchy; locality
_COSC 208, Introduction to Computer Systems, Fall 2024_

## Memory hierarchy

Q1: _For each of the following characteristics, circle the type(s) of memory to which the characteristic applies. (HDD = Hard Disk Drive; RAM = Random Access Memory; SSD = Solid State Drive)_

* Lowest monetary cost: HDD
* Fastest: Registers
* On CPU: Cache, Registers
* Volatile: Cache, RAM, Registers
* Size measured in megabytes (MB) in a present day laptop: Cache
* Size measured in gigabytes (GB) in a present day laptop: RAM
* Size measured in terabytes (TB) in a present day laptop: HDD, SSD

| Characteristic | | | | | |
|-----|-|-|-|-|-|
| <br/>Lowest monetary cost<br/><br/> | Cache | HDD | RAM | Registers | SSD |
| <br/>Fastest<br/><br/> | Cache | HDD | RAM | Registers | SSD |
| <br/>On CPU<br/><br/> | Cache | HDD | RAM | Registers | SSD |
| <br/>Volatile<br/><br/> | Cache | HDD | RAM | Registers | SSD |
| <br/>Size measured in megabytes (MB)<br/>in a present day laptop | Cache | HDD | RAM | Registers | SSD |
| <br/>Size measured in gigabytes (GB)<br/>in a present day laptop | Cache | HDD | RAM | Registers | SSD |
| <br/>Size measured in terabytes (TB)<br/>in a present day laptop | Cache | HDD | RAM | Registers | SSD |

* Access latency
    * Let's consider a 1hz CPU, which means 1 cycle = 1 second
    * Registers — 1 cycle = 1 second
    * Caches — ~10 cycles = ~10 seconds
    * Main memory — ~100 cycles = ~2 minutes
    * Solid-state drive — ~1 million cycles = ~11.5 days
    * Hard (i.e., traditional) disk drive — ~10 million cycles = ~115 days
    * Remote (i.e., network) storage — ~20ms = ~2 years
* Storage capacity
    * Let's assume 1 byte = 1mL
    * Registers — 30 * 8B = ~250mL = ~1 cup
    * Caches (Core i7 in MacBook Pro)
        * L1 — 32KB + 32KB = 64L = ~1 tank of gas
        * L2 — 512KB * 4 cores = 2048L = ~7 bathtubs
    * Main memory = 32GB (in MacBook Pro) = ~13 olympic swimming pools
    * SSD = 1TB (in MacBook Pro) = ~Lake Moraine
* Monetary cost – due to the design/components required for each storage technology

<p style="height:20em;"></p>

## Efficiency

* Ideally, all data would be stored in registers, because these are the fastest type of storage
* _Why is this not practical?_ – limited storage capacity; high monetary cost (relative to other forms of memory)
* _Where are a program's values stored when they are not stored in registers?_ – on the stack and heap in main memory
* Recall: _How does data move between the CPU, main memory, and secondary storage in the von Neumann Architecture?_ — bus
* To make code run faster, we want to limit how frequently data has to be moved betwee the CPU and main memory
    * Better register assignment – can reduce number of loads and stores
    * Caches – temporaily store some regions of main memory in the CPU
    * Better code design – write/revise code with caches in mind

## Reducing loads and stores

* Loads and stores are unnecessary when the value of a register is not changed between store and load instructions involving the same register and memory address
* Example load which is _unnecessary_
    ```
    str x0, [sp,#4]
    ldr x0, [sp,#4] // Can eliminate
    ```
    * If there are no other `ldr` instructions that load from `[sp,#4]`, then it is unnecessary to store a value at `[sp,#4]` and the `str` instruction can also be eliminated
* Example store which is _necessary_
    ```
    str w0, [sp,#4]
    mov w0, #0x1
    str w0, [sp]
    ldr w0, [sp,#4]
    ```
* Better register assignments to eliminate loads (and stores)
    ```
    str w0, [sp,#4]
    mov w1, #0x1
    str w1, [sp]
    ldr w0, [sp,#4] // Can eliminiate
    ```
* Must preserve calling conventions
    * Parameters are stored in w/x0, w/x1, ...
    * Return value is stored in w/x0
    * Caller must store register values into caller's stack frame before `bl` to callee — actually only needed if values in registers are needed by caller after `bl` and callee overwrites the values in those registers

_Example_

```
000000000000088c <interest_due>:
    88c:    sub    sp, sp, #0x20
    890:    str    w0, [sp, #12]    XXXXX
    894:    str    w1, [sp, #8]     XXXXX
    898:    ldr    w0, [sp, #12]    XXXXX
    89c:    ldr    w1, [sp, #8]     XXXXX
    8a0:    mul    w0, w1, w0
    8a4:    str    w0, [sp, #20]
    8a8:    mov    w0, #0x4b0
    8ac:    str    w0, [sp, #24]    XXXXX
    8b0:    ldr    w1, [sp, #20]
    8b4:    ldr    w0, [sp, #24]    XXXXX
    8b8:    sdiv   w0, w1, w0
    8bc:    str    w0, [sp, #28]    XXXXX
    8c0:    ldr    w0, [sp, #28]    XXXXX
    8c4:    add    sp, sp, #0x20
    8c8:    ret
```

```
000000000000088c <interest_due>:
    88c:    sub    sp, sp, #0x20    XXXXX
    8a0:    mul    w0, w1, w0       
    8a4:    str    w0, [sp, #20]    XXXXX
    8a8:    mov    w0, #0x4b0       // mov w1 #0x4b0
    8b0:    ldr    w1, [sp, #20]    XXXXX
    8b8:    sdiv   w0, w1, w0       // sdiv w0, w0, w1
    8c4:    add    sp, sp, #0x20    XXXXX
    8c8:    ret
```

```
000000000000088c <interest_due>:
    88c:    sub    sp, sp, #0x20
    890:    str    w0, [sp, #12]
    894:    str    w1, [sp, #8]
    898:    ldr    w0, [sp, #12]
    89c:    ldr    w1, [sp, #8]
    8a0:    mul    w0, w1, w0
    8a4:    str    w0, [sp, #20]
    8a8:    mov    w0, #0x4b0
    8ac:    str    w0, [sp, #24]
    8b0:    ldr    w1, [sp, #20]
    8b4:    ldr    w0, [sp, #24]
    8b8:    sdiv   w0, w1, w0
    8bc:    str    w0, [sp, #28]
    8c0:    ldr    w0, [sp, #28]
    8c4:    add    sp, sp, #0x20
    8c8:    ret
```

<p style="height:15em;"></p>

Q3: _Cross-out unnecessary loads and stores in the following assembly code:_

```
000000000000076c <divide>:                      
    76c:    d10083ff     sub    sp, sp, #0x20   
    770:    b9000fe0     str    w0, [sp, #12]
    774:    b9000be1     str    w1, [sp, #8]        XXXX
    778:    12800000     mov    w0, #0xffffffff     
    77c:    b9001fe0     str    w0, [sp, #28]       XXXX
    780:    b9400fe1     ldr    w1, [sp, #8]        XXXX
    784:    b9400be0     ldr    w0, [sp, #12]
    788:    1ac00c20     sdiv   w0, w1, w0      
    78c:    b9001fe0     str    w0, [sp, #28]       XXXX
    790:    b9401fe0     ldr    w0, [sp, #28]       XXXX
    794:    910083ff     add    sp, sp, #0x20   
    798:    d65f03c0     ret                    
```

```
000000000000076c <divide>:                      
    76c:    d10083ff     sub    sp, sp, #0x20
    770:    b9000fe0     str    w0, [sp, #12]
    774:    b9000be1     str    w1, [sp, #8]
    778:    12800000     mov    w0, #0xffffffff
    77c:    b9001fe0     str    w0, [sp, #28]
    780:    b9400fe1     ldr    w1, [sp, #8]
    784:    b9400be0     ldr    w0, [sp, #12]
    788:    1ac00c20     sdiv   w0, w1, w0
    78c:    b9001fe0     str    w0, [sp, #28]
    790:    b9401fe0     ldr    w0, [sp, #28]
    794:    910083ff     add    sp, sp, #0x20
    798:    d65f03c0     ret
```

Q4: _Cross-out unnecessary loads and stores in the following assembly code:_

```
0000000000400584 <pow2>:
    400584:       d10043ff        sub     sp, sp, #0x10
    400588:       b9000fe0        str     w0, [sp, #12]     XXX
    40058c:       52800028        mov     w8, #0x1
    400590:       b9000be8        str     w8, [sp, #8]      XXX
    400594:       b9400fe8        ldr     w0, [sp, #12]     XXX
    400598:       7100011f        cmp     w0, #0x0
    40059c:       37000128        b.le    4005c0 <pow2+0x3c>
    4005a0:       b9400be8        ldr     w8, [sp, #8]      XXX
    4005a4:       52800049        mov     w9, #0x2
    4005a8:       1b097d08        mul     w8, w8, w9
    4005ac:       b9000be8        str     w8, [sp, #8]
    4005b0:       b9400fe8        ldr     w0, [sp, #12]     XXX
    4005b4:       71000508        subs    w0, w0, #0x1
    4005b8:       b9000fe8        str     w0, [sp, #12]     XXX
    4005bc:       17fffff5        b       400594 <pow2+0x10>
    4005c0:       b9400be0        ldr     w0, [sp, #8]
    4005c4:       910043ff        add     sp, sp, #0x10
    4005c8:       d65f03c0        ret
```

```
0000000000400584 <pow2>:
    400584:       d10043ff        sub     sp, sp, #0x10
    400588:       b9000fe0        str     w0, [sp, #12]
    40058c:       52800028        mov     w8, #0x1
    400590:       b9000be8        str     w8, [sp, #8]
    400594:       b9400fe8        ldr     w0, [sp, #12]
    400598:       7100011f        cmp     w0, #0x0
    40059c:       37000128        b.le    4005c0 <pow2+0x3c>
    4005a0:       b9400be8        ldr     w8, [sp, #8]
    4005a4:       52800049        mov     w9, #0x2
    4005a8:       1b097d08        mul     w8, w8, w9
    4005ac:       b9000be8        str     w8, [sp, #8]
    4005b0:       b9400fe8        ldr     w0, [sp, #12]
    4005b4:       71000508        subs    w0, w0, #0x1
    4005b8:       b9000fe8        str     w0, [sp, #12]
    4005bc:       17fffff5        b       400594 <pow2+0x10>
    4005c0:       b9400be0        ldr     w0, [sp, #8]
    4005c4:       910043ff        add     sp, sp, #0x10
    4005c8:       d65f03c0        ret
```

Q5: _Cross-out unnecessary loads and stores in the following assembly code:_

```
000000000000071c <flip>:
    71c:    d10083ff     sub    sp, sp, #0x20
    720:    b9000fe0     str    w0, [sp, #12]   XXXX
    724:    12800000     mov    w1, #0xffffffff
    728:    b9001fe0     str    w1, [sp, #28]
    72c:    b9400fe0     ldr    w0, [sp, #12]   XXXX
    730:    7100001f     cmp    w0, #0x0
    734:    54000081     b.eq   740 <flip+0x28>
    738:    b9001fff     str    wzr, [sp, #28]
    73c:    14000002     b      748 <flip+0x2c>
    740:    52800020     mov    w0, #0x1
    744:    b9001fe0     str    w0, [sp, #28]
    748:    b9401fe0     ldr    w0, [sp, #28]
    74c:    910083ff     add    sp, sp, #0x20
    750:    d65f03c0     ret    
```

```
000000000000071c <flip>:
    71c:    d10083ff     sub    sp, sp, #0x20
    720:    b9000fe0     str    w0, [sp, #12]
    724:    12800000     mov    w1, #0xffffffff
    728:    b9001fe0     str    w1, [sp, #28]
    72c:    b9400fe0     ldr    w0, [sp, #12]
    730:    7100001f     cmp    w0, #0x0
    734:    54000081     b.eq   740 <flip+0x28>
    738:    b9001fff     str    wzr, [sp, #28]
    73c:    14000002     b      748 <flip+0x2c>
    740:    52800020     mov    w0, #0x1
    744:    b9001fe0     str    w0, [sp, #28]
    748:    b9401fe0     ldr    w0, [sp, #28]
    74c:    910083ff     add    sp, sp, #0x20
    750:    d65f03c0     ret    
```

<div style="page-break-after:always;"></div>

## Temporal vs. spatial locality

* _What is temporal locality?_
    * Access the same data repeatedly
    * E.g., for loop variable
* _What is spatial locality?_
    * Access data with a similar scope
    * E.g., next item in array
    * E.g., local variables/parameters, which are stored in the same stack frame
* Analogies for temporal and spatial locality
    * Book storage (Dive Into Systems Section 11.3.2)
        * Temporal locality — store most frequently used books at your desk, less frequently used books on your bookshelf, and least frequently used books at the library
        * Spatial locality — checkout books on the same/nearby subjects when you go to the library
    * Groceries
        * Temporal locality — you store food you eat frequently in the front of the refrigerator, while you store food you eat infrequently in the back of the refrigerator
        * Spatial locality — you organize the items on your grocery list based on the aisle in which they are located

Q6: _For each of the following scenarios, indicate whether it is an example of temporal locality, spatial locality, or neither._

* Gates for flights on the same airline are located in the same airport terminal/concourse – spatial locality
* A grocery list is arranged in alphabetical order – neither
* Clothes in a closet are grouped into outfits, with a shirt and a pair of pants stored next to each other – spatial locality
* Boxes of cereal, bowls, and spoons are stored in adjacent kitchen cabinets/drawers – spatial locality
* You repeatedly check your phone for new messages – temporal locality
* A variable used in a for loop – temporal locality
* Variables used in different functions – neither
* A function's parameters, which are each used once within the function – spatial locality

* Gates for flights on the same airline are located in the same airport terminal/concourse
* A grocery list is arranged in alphabetical order
* Clothes in a closet are grouped into outfits, with a shirt and a pair of pants stored next to each other
* Boxes of cereal, bowls, and spoons are stored in adjacent kitchen cabinets/drawers
* You repeatedly check your phone for new messages
* A variable used in a for loop
* Variables used in different functions
* A function's parameters, which are each used once within the function