# Storage: cache replacement; loop interchange
_COSC 208, Introduction to Computer Systems, 2024-04-26_

## Announcements
* Attend VAP candidate talk today 11:15am-11:45am
* Project 3 revisions due next Friday @ 5pm
* Quiz 6 Wednesday – can start as early as 8:05am
    * 6.1: Describe the mechanisms operating systems use to allocate hardware resources and ensure errant processes do not take over the system
    * 6.2: Determine the possible outputs of C programs that involve multiple processes and/or threads
    * 6.3: Develop C programs that create and wait for processes and execute other programs
    * 6.4: Modify C programs to use threads
    * 6.5: Compare virtualization technologies
    * 5.1: Determine where code and data resides throughout a program’s lifecycle
    * 5.2: Determine how assembly and/or C code can be modified for more efficient use of storage
    * Optional:
        * 4.1: Interpret ARM assembly code
        * 4.2: Determine the relationship between C code and ARM assembly code

## Outline
* SETs
* Warm-up
* Cache replacement
* Loop interchange

## SETs

## Warm-up

Q1: _For each of the following scenarios, indicate whether it is an example of temporal locality, spatial locality, or neither._

* Gates for flights on the same airline are located in the same airport terminal/concourse – spatial locality
* A grocery list is arranged in alphabetical order – neither
* Clothes in a closet are grouped into outfits, with a shirt and a pair of pants stored next to each other – spatial locality
* Boxes of cereal, bowls, and spoons are stored in adjacent kitchen cabinets/drawers – spatial locality
* You repeatedly check your phone for new messages – temporal locality
* A variable used in a for loop – temporal locality
* Variables used in different functions – neither
* A function's parameters, which are each used once within the function – spatial locality

* Gates for flights on the same airline are located in the same airport terminal/concourse
* A grocery list is arranged in alphabetical order
* Clothes in a closet are grouped into outfits, with a shirt and a pair of pants stored next to each other
* Boxes of cereal, bowls, and spoons are stored in adjacent kitchen cabinets/drawers
* You repeatedly check your phone for new messages
* A variable used in a for loop
* Variables used in different functions
* A function's parameters, which are each used once within the function

🛑 **STOP here** after completing the above question; if you have extra time please **skip ahead** to the extra practice.

## Cache replacement

* If a cache is full, then a cache entry must be removed so different data can be placed in the cache
* Cache replacement policy governs which data is removed
* _What should a good cache replacement policy do?_ — maximize the number of cache hits (or minimize the number of cache misses)
    * Evaluation metric: Hit ratio = number of hits / total number of memory accesses
* _How do we determine which cache entry to replace?_
* First-in First-out (FIFO)
    * Simple to implement
    * Doesn’t consider the importance of a cache entry
* Optimal replacement policy – replace the entry that will be accessed furthest in the future
    * Impractical because we don’t know data access patterns a priori
* Least Recently Used (LRU)
    * LRU assumes an item that was accessed recently will be accessed again soon – temporal locality
    * Downside: lots of overhead to implement — need to store an ordered list of items and move an item up in the list whenever it’s accessed
    * Where does this go wrong? — when working-set size (i.e., number of repeatedly accessed entries) is (slightly) greater than size of the cache

* Assume a cache can hold 3 entries and the following 15 data accesses occur: 
```
3, 4, 4, 5, 3, 2, 3, 4, 1, 4, 4, 2, 5, 2, 4
```
* Q2: _What is the sequence of hits, insertions, and replacements that occur when an **optimal** cache replacement algorithm is used?_

```
+3, +4, H4, +5, H3, -5/+2, H3, H4, -3/+1, H4, H4, H2, -1/+5, H2, H4
Hit ratio = 9/15 = 60%
```

<p style="height:5em;"></p>

* Q3: _What is the sequence of hits, insertions, and replacements that occur when a **least recently used (LRU)** cache replacement algorithm is used?_

```
+3, +4, H4, +5, H3, -4/+2, H3, -5/+4, -2/+1, H4, H4, -3/+2, -1/+5, H2, H4
Hit ratio = 7/15 = 47%
```

<p style="height:5em;"></p>

🛑 **STOP here** after completing the above question; if you have extra time please **skip ahead** to the extra practice.

## Loop interchange

* Example

In [1]:
#include <stdlib.h>
#include <stdio.h>
#define LEN 12
int main() {
    int *array = malloc(sizeof(int) * LEN);

    for (int i = 0; i < LEN; i++) {
        array[i] = i;
    }
    
    int sum = 0;
    for (int j = 0; j < 4; j++) {
        for (int k = 0; k < LEN; k += 4) {
            sum += array[j+k];
        }
    }
    printf("%d\n", sum);
}

66


* _Assume the values of all local variables are stored in registers (**not** the stack) and the value of `array` is `0x400`. What is the sequence of memory addresses that are accessed?_
    * First for loop: `0x400`, `0x404`, `0x408`, `0x40c`, `0x410`, `0x414`, `0x418`, `0x41c`, `0x420`, `0x424`, `0x428`, `0x42c`
    * Second for loop: `0x400`, `0x410`, `0x420`, `0x404`, `0x414`, `0x424`, `0x408`, `0x418`, `0x428`, `0x40c`, `0x41c`, `0x42c`, 
    * Notice that the first for loop accesses memory addresses in order, whereas the second for loop accesses addresses out of order
* _Now assume the system uses a cache that holds 2 entries which are each 16 bytes large. What is the sequence of hits and misses using a least recently used (LRU) replacement policy?_
    * First for loop: Miss (+0x4000), Hit, Hit, Hit, Miss (+0x4010), Hit, Hit, Hit, Miss (-0x4000/+0x4020), Hit, Hit, Hit
    * Second for loop: Miss (-0x4010/+0x4000), Miss (-0x4020/+0x4010), Miss (-0x4000/+0x4020), Miss (-0x4010/+0x4000), Miss (-0x4020/+0x4010), Miss (-0x4000/+0x4020), Miss (-0x4010/+0x4000), Miss (-0x4020/+0x4010), Miss (-0x4000/+0x4020), Miss (-0x4010/+0x4000), Miss (-0x4020/+0x4010), Miss (-0x4000/+0x4020)
    * Notice that the first for loop has three hits after each miss, whereas the second for loop is all misses
* _How could we modify the code to achieve a higher hit ratio?_ – loop interchange, i.e., swap inner and outer loops

In [2]:
#include <stdlib.h>
#include <stdio.h>
#define LEN 12
int main() {
    int *array = malloc(sizeof(int) * LEN);

    for (int i = 0; i < LEN; i++) {
        array[i] = i;
    }
    
    int sum = 0;
    for (int k = 0; k < LEN; k += 4) {
        for (int j = 0; j < 4; j++) {
            sum += array[j+k];
        }
    }
    printf("%d\n", sum);
}

66


* Q4: _Would loop interchange improve the efficiency of this code?_

In [None]:
void hundreds() {
    int *nums = malloc(sizeof(int) * 1000);
    for (int i = 0; i < 1000; i+= 100) {
        for (int j = 0; j < 100; j++) {
            nums[i+j] = i;
        }
    }
}

    No

<p style="height:2em;"></p>

* Q5: _Would loop interchange improve the efficiency of this code?_

In [None]:
void multiplication(int grid[][], int rows, int cols) {
    for (int c = 0; c < cols; c++) {
        for (int r = 0; r < rows; r++) {
            grid[r][c] = c * r;
        }
    }
}

    Yes

<div style="page-break-after:always;"></div>

## Extra practice

Q6: _For each of the following instances of caching, indicate whether the caching is motivated by temporal or spatial locality._

* A CPU caches the first 32 instructions of a function when the function is called – spatial
* A CPU caches all of the instructions for a frequently called function – temporal
* A web browser caches the Moodle pages for your courses, which you view multiple times per week – temporal
* A content distribution network (CDN) caches a video that has gone viral – temporal
* A content distribution network (CDN) caches "recommended videos" related to a video – spatial

* A CPU caches the first 32 instructions of a function when the function is called
* A CPU caches all of the instructions for a frequently called function
* A web browser caches the Moodle pages for your courses, which you view multiple times per week
* A content distribution network (CDN) caches a video that has gone viral
* A content distribution network (CDN) caches "recommended videos" related to a video

Q7: _Assume a cache can hold 3 entries and the following 15 data accesses occur:_ 
```
3, 4, 4, 5, 3, 2, 3, 4, 1, 4, 4, 2, 5, 2, 4
```
_What is the sequence of hits, insertions, and replacements that occur when a **first in first out (FIFO)** cache replacement algorithm is used?_

```
+3, +4, H4, +5, H3, -3/+2, -4/+3, -5/+4, -2/+1, H4, H4, -3/+2, -4/+5, H2, -1/+4
Hit ratio = 5/15 = 33%
```