# Storage: memory hierarchy; locality
_COSC 208, Introduction to Computer Systems, 2024-04-24_

## Announcements
* Project 3 due today @ 11pm

## Outline
* Memory hierarchy
* Temporal vs. spatial locality
* CPU caches
* Cache replacement

## Warm-up: memory hierarchy

* Q1: _For each of the following characteristics, circle the type(s) of memory to which the characteristic applies. (HDD = Hard Disk Drive; RAM = Random Access Memory; SSD = Solid State Drive)_

| Characteristic | | | | | |
|-----|-|-|-|-|-|
| <br/>Cheapest<br/><br/> | Cache | HDD | RAM | Registers | SSD |
| <br/>Fastest<br/><br/> | Cache | HDD | RAM | Registers | SSD |
| <br/>On CPU<br/><br/> | Cache | HDD | RAM | Registers | SSD |
| <br/>Volatile<br/><br/> | Cache | HDD | RAM | Registers | SSD |
| <br/>Size measured in megabytes (MB)<br/>in a present day laptop | Cache | HDD | RAM | Registers | SSD |
| <br/>Size measured in gigabytes (GB)<br/>in a present day laptop | Cache | HDD | RAM | Registers | SSD |
| <br/>Size measured in terabytes (TB)<br/>in a present day laptop | Cache | HDD | RAM | Registers | SSD |

    Cheapest: HDD
    Fastest: Registers
    On CPU: Cache, Registers
    Volatile: Cache, RAM, Registers
    Size in MB: Cache
    Size in GB: RAM
    Size in TB: HDD, SSD

<div style="page-break-after:always;"></div>

## Data movement

* Recall: _How does data move between the CPU, main memory, and secondary storage in the von Neumann Architecture?_ — bus
* _Why does data move between registers and main memory?_ — not enough room in registers to store all values used by a program at runtime
* _How can we move less data (i.e., perform fewer loads and stores)?_
* Better use of registers — Loads and stores are unnecessary when the value of a register is not changed between store and load instructions involving the same register and memory address
    * Example load which is _unnecessary_
        ```
        str x0, [sp,#4]
        ldr x0, [sp,#4] // Can eliminiate
        ```
    * Example load store which is _necessary_
        ```
        str w0, [sp,#4]
        mov w0, #0x1
        str w0, [sp]
        ldr w0, [sp,#4]
        ```
    * Better register assignments to eliminate loads (and stores)
        ```
        str w0, [sp,#4]
        mov w1, #0x1
        str w1, [sp]
        ldr w0, [sp,#4] // Can eliminiate
        ```
    * Must preserve calling conventions
        * Parameters are stored in w/x0, w/x1, ...
        * Return value is stored in w/x0
        * Caller must store register values into caller's stack frame before `bl` to callee — actually only needed if values in registers are needed by caller after `bl` and callee overwrites the values in those registers
* Leverage locality
    * Add additional memory to the CPU — i.e., a cache
    * Optimize code to improve locality

## Temporal vs. spatial locality

* _What is temporal locality?_
    * Access the same data repeatedly
    * E.g., for loop variable
* _What is spatial locality?_
    * Access data with a similar scope
    * E.g., next item in array
    * E.g., local variables/parameters, which are stored in the same stack frame
* Analogies for temporal and spatial locality
    * Book storage (Dive Into Systems Section 11.3.2)
        * Temporal locality — store most frequently used books at your desk, less frequently used books on your bookshelf, and least frequently used books at the library
        * Spatial locality — checkout books on the same/nearby subjects when you go to the library
    * Groceries (pre-class questions 3 & 4)
        * Temporal locality — you store food you eat frequently in the front of the refrigerator, while you store food you eat infrequently in the back of the refrigerator
        * Spatial locality — you organize the items on your grocery list based on the aisle in which they are located
    * _With a partner, develop your own analogy for temporal and spatial locality_

## CPU caches

* _Why do we have caches on the CPU?_ — accessing main memory is ~100x slower than accessing a register
* Store instructions and data (stack, heap, etc.) from main memory
* Three levels — L1, L2, and L3
* Range in size from a few KB to a few MB – L1 is smallest and fastest; L3 is largest and slowest
* Cache line (i.e., cache entry) is typically larger than a word — e.g., 128 bytes
    * _Why?_ — spatial locality
* What happens when we write to memory?
    * Write through cache — write to the cache and main memory
    * Write back cache — initially write to the cache; write to main memory when the entry is evicted from the cache
    * What are the advantages of each approach?
        * Write through cache ensures consistency between CPU cores
        * Write back cache only incurs the overhead of accessing main memory when absolutely necessary

## Extra practice
Q2: _Consider the following program:_

In [None]:
/* 1*/  #include <ctype.h>
/* 2*/  #include <pthread.h>
/* 3*/  #include <stdio.h>
/* 4*/  #include <stdlib.h>
/* 5*/  #include <string.h>
/* 6*/  int count_upper(char *str) {
/* 7*/      int count = 0;
/* 8*/      for (int i = 0; i < strlen(str); i++) {
/* 9*/          if (isupper(str[i])) {
/*10*/              count++;
/*11*/          }
/*12*/      }
/*13*/      return count;
/*14*/  }
/*15*/  int main(int argc, char *argv[]) {
/*16*/      if (argc < 2) {
/*17*/          printf("Error: provide a string\n");
/*18*/          return 1;
/*19*/      }
/*20*/      char *str = argv[1];
/*21*/      pthread_t thr;
/*22*/      pthread_create(thr, NULL, &count_upper, str);
/*23*/      int count = 0;
/*24*/      pthread_join(thr, &count);
/*25*/      printf("There are %d uppercase letters\n", count);
/*26*/  }

<div style="page-break-after:always;"></div>

_Compiling this program results in the following warnings:_
```
buggy.c:22:20: warning: incompatible integer to pointer conversion passing 
'pthread_t' (aka 'unsigned long') to parameter of type 'pthread_t *' (aka 
'unsigned long *'); take the address with & [-Wint-conversion]
    pthread_create(thr, NULL, &count_upper, str);
                   ^~~
                   &
/usr/include/pthread.h:198:50: note: passing argument to parameter 
'__newthread' here
extern int pthread_create (pthread_t *__restrict __newthread,
                                                 ^

buggy.c:22:31: warning: incompatible function pointer types passing 
'int (*)(char *)' to parameter of type 'void *(*)(void *)' 
[-Wincompatible-function-pointer-types]
    pthread_create(thr, NULL, &count_upper, str);
                              ^~~~~~~~~~~~
/usr/include/pthread.h:200:15: note: passing argument to parameter 
'__start_routine' here
                           void *(*__start_routine) (void *),
                                   ^

buggy.c:24:23: warning: incompatible pointer types passing 
'int *' to parameter of type 'void **' [-Wincompatible-pointer-types]
    pthread_join(thr, &count);
                      ^~~~~~
/usr/include/pthread.h:215:49: note: passing argument to parameter 
'__thread_return' here
extern int pthread_join (pthread_t __th, void **__thread_return);
                                                ^
3 warnings generated.
```
_How would you change the code to fix these problems?_

* Need to pass `&thr` to `pthread_create` (instead of `thr`) on line 22
* Function executed by a thread must return `void *` and take a single `void *` parameter; replace lines 6-7 with:
    ```C
    void *count_upper(void *arg) {
        char *str = (char *)arg;
        int *count = malloc(sizeof(int));
        *count = 0;
    ```
    Also replace line 10 with:
    ```C
    *count++;
    ```
* Need to pass a double pointer to `pthread_join` on line 24; replace lines 23-25 with:
    ```C
    int *count = NULL;
    pthread_join(thr, &count);
    printf("There are %d uppercase letters\n", *count);
    ``` 