# Efficiency: caching
_COSC 208, Introduction to Computer Systems, 2023-04-07_

## Announcements
* Project 4 due Thursday, April 13

## Outline
* Warm-up
* Instances of caching
* Cache replacement

## Warm-up

Q1: _Cross-out unnecessary loads and stores in the following assembly code._

```
000000000040056c <volume>:                             
    40056c:    d10083ff     sub    sp, sp, #0x20       
    400570:    f9000bfe     str    x30, [sp, #16]      
    400574:    b9000fe0     str    w0, [sp, #12]       XXX
    400578:    b9000be1     str    w1, [sp, #8]        XXX
    40057c:    b90007e2     str    w2, [sp, #4]        
    400580:    b9400fe0     ldr    w0, [sp, #12]       XXX
    400584:    b9400be1     ldr    w1, [sp, #8]        XXX
    400588:    97ffffef     bl     400544 <multiply>   
    40058c:    b90003e0     str    w0, [sp]            XXX
    400590:    b94003e0     ldr    w0, [sp]            XXX
    400594:    b94007e1     ldr    w1, [sp, #4]        
    400598:    97ffffeb     bl     400544 <multiply>    
    40059c:    b90003e0     str    w0, [sp]            XXX
    4005a0:    b94003e0     ldr    w0, [sp]            XXX
    4005a4:    f9400bfe     ldr    x30, [sp, #16]      
    4005a8:    910083ff     add    sp, sp, #0x20       
    4005ac:    d65f03c0     ret                        
```

```
000000000040056c <volume>:                             
    40056c:    d10083ff     sub    sp, sp, #0x20       
    400570:    f9000bfe     str    x30, [sp, #16]      
    400574:    b9000fe0     str    w0, [sp, #12]       
    400578:    b9000be1     str    w1, [sp, #8]        
    40057c:    b90007e2     str    w2, [sp, #4]        
    400580:    b9400fe0     ldr    w0, [sp, #12]       
    400584:    b9400be1     ldr    w1, [sp, #8]        
    400588:    97ffffef     bl     400544 <multiply>    
    40058c:    b90003e0     str    w0, [sp]            
    400590:    b94003e0     ldr    w0, [sp]            
    400594:    b94007e1     ldr    w1, [sp, #4]        
    400598:    97ffffeb     bl     400544 <multiply>    
    40059c:    b90003e0     str    w0, [sp]            
    4005a0:    b94003e0     ldr    w0, [sp]            
    4005a4:    f9400bfe     ldr    x30, [sp, #16]      
    4005a8:    910083ff     add    sp, sp, #0x20       
    4005ac:    d65f03c0     ret                        
```

Q2: _Where are caches used in computer systems?_

* Operating systems
* Web browsers
* Web servers
* Domain Name System (DNS)
* Databases
* Central Processing Units (CPUs)
* Graphics Processing Units (GPUs)
* Hard Disk Drives (HDDs)
* Solid State Drives (SSDs)

<p style="height:10em;"></p>

🛑 **STOP here** after completing the above question; if you have extra time please **skip ahead** to the extra practice.

## Instances of caching

* CPU caches
    * _Why do we have caches on the CPU?_ — accessing main memory is ~100x slower than accessing a register
    * Store instructions and data (stack, heap, etc.) from main memory
    * Three levels --- L1, L2, and L3
    * Range in size from a few KB to a few MB
    * Cache line (i.e., cache entry) is typically larger than a word — e.g., 128 bytes
        * _Why?_ — spatial locality
    * What happens when we write to memory?
        * Write through cache — write to the cache and main memory
        * Write back cache — initially write to the cache; write to main memory when the entry is evicted from the cache
        * What are the advantages of each approach?
            * Write through cache ensures consistency between CPU cores
            * Write back cache only incurs the overhead of accessing main memory when absolutely necessary
* Web browser caches
    * _Why do web browsers have caches?_
        * Accessing remote network storage is >50x slower than accessing a solid state drive (SSD)
        * Spatial locality — many aspects of a web page are also used with other pages on the same site: e.g., images, Cascading Style Sheets (CSS), JavaScript (JS)
        * Temporal locality — users often visit the same web page repeatedly: e.g., Google
        * Internet Service Provider (ISP) may limit amount of data downloaded/uploaded per month
    * Store static content (e.g., images, CSS, JS)
    * Web browser caches are read-only
* Content distribution networks (CDNs)
    * Collection of geographically distributed servers that delivery content (e.g., streaming videos) to users
    * User's computers contact a server that is "nearby"
        * Ideally measured in terms of latency, which is a function of geographic distance, network routes, and network load
        * Analogy: time it takes to drive somewhere is a function of geographic distance, the route you take, and the amount of traffic on the road
    * CDN servers fetch and cache content from origin servers
    * Popular content (e.g., image from the front page of the NY Times) is more likely to already be cached
* Other uses of caching in computer systems
    * Domain Name System (DNS)–web browser, operating system, and/or DNS server cache mappings from domain names (e.g., `portal.colgate.edu`) to Internet Protocol (IP) addresses (e.g., `149.43.134.29`)

## Cache replacement

* If a cache is full, then a cache entry must be removed so different data can be placed in the cache
* Cache replacement policy governs which data is removed
* _What should a good cache replacement policy do?_ — maximize the number of cache hits (or minimize the number of cache misses)
    * Evaluation metric: Hit ratio = number of hits / total number of memory accesses
* _How do we determine which cache entry to replace?_
* Optimal replacement policy – replace the entry that will be accessed furthest in the future
    * Impractical because we don’t know data access patterns a priori
* Least Recently Used (LRU)
    * LRU assumes a item that was accessed recently will be accessed again soon – temporal locality
    * Downside: lots of overhead to implement — need to store an ordered list of items and move an item up in the list whenever it’s accessed
    * Where does this go wrong? — when working-set size (i.e., number of repeatedly accessed entries) is (slightly) greater than size of the cache
* First-in First-out (FIFO)
    * Simple to implement
    * Doesn’t consider the importance of a cache entry
* Random
    * Even simpler to implement
    * Doesn’t consider the importance of a cache entry

* Assume a cache can hold 3 entries and the following 15 data accesses occur: 
```
3, 4, 4, 5, 3, 2, 3, 4, 1, 4, 4, 2, 5, 2, 4
```
* Q3: _What is the sequence of hits, insertions, and replacements that occur when an **optimal** cache replacement algorithm is used?_

```
+3, +4, H4, +5, H3, -5/+2, H3, H4, -3/+1, H4, H4, H2, -1/+5, H2, H4
Hit ratio = 9/15 = 60%
```

<p style="height:8em;"></p>

* Q4: _What is the sequence of hits, insertions, and replacements that occur when a **first in first out (FIFO)** cache replacement algorithm is used?_

```
+3, +4, H4, +5, H3, -3/+2, -4/+3, -5/+4, -2/+1, H4, H4, -3/+2, -4/+5, H2, -1/+4
Hit ratio = 5/15 = 33%
```

<p style="height:8em;"></p>

* Q5: _What is the sequence of hits, insertions, and replacements that occur when a **least recently used (LRU)** cache replacement algorithm is used?_

```
+3, +4, H4, +5, H3, -4/+2, H3, -5/+4, -2/+1, H4, H4, -3/+2, -1/+5, H2, H4
Hit ratio = 7/15 = 47%
```

<p style="height:8em;"></p>

## Extra practice

* Q6: _Cross-out unnecessary loads and stores in the following assembly code._

```
0000000000400544 <adjust>:                          
    400544:  d10043ff   sub  sp, sp, #0x10          
    400548:  b9000fe0   str  w0, [sp, #12]          
    40054c:  b9400fe8   ldr  w8, [sp, #12]          
    400550:  7100291f   cmp  w8, #0xa               
    400554:  540000ca   b.ge 40056c <adjust+0x28>   
    400558:  b9400fe8   ldr  w8, [sp, #12]          XXX
    40055c:  52800149   mov  w9, #0xa               
    400560:  1b097d08   mul  w8, w8, w9             
    400564:  b9000fe8   str  w8, [sp, #12]          
    400568:  14000005   b    40057c <adjust+0x38>   
    40056c:  b9400fe8   ldr  w8, [sp, #12]          XXX
    400570:  52800149   mov  w9, #0xa               
    400574:  1ac90d08   sdiv w8, w8, w9             
    400578:  b9000fe8   str  w8, [sp, #12]          
    40057c:  b9400fe0   ldr  w0, [sp, #12]          
    400580:  910043ff   add  sp, sp, #0x10          
    400584:  d65f03c0   ret                         
```

```
0000000000400544 <adjust>:                          
    400544:  d10043ff   sub  sp, sp, #0x10          
    400548:  b9000fe0   str  w0, [sp, #12]          
    40054c:  b9400fe8   ldr  w8, [sp, #12]          
    400550:  7100291f   cmp  w8, #0xa               
    400554:  540000ca   b.ge 40056c <adjust+0x28>   
    400558:  b9400fe8   ldr  w8, [sp, #12]          
    40055c:  52800149   mov  w9, #0xa               
    400560:  1b097d08   mul  w8, w8, w9             
    400564:  b9000fe8   str  w8, [sp, #12]          
    400568:  14000005   b    40057c <adjust+0x38>   
    40056c:  b9400fe8   ldr  w8, [sp, #12]          
    400570:  52800149   mov  w9, #0xa               
    400574:  1ac90d08   sdiv w8, w8, w9             
    400578:  b9000fe8   str  w8, [sp, #12]          
    40057c:  b9400fe0   ldr  w0, [sp, #12]          
    400580:  910043ff   add  sp, sp, #0x10          
    400584:  d65f03c0   ret                         
```

```
000000000000076c <divide_safe>:
    76c:    d10083ff     sub    sp, sp, #0x20
    770:    b9000fe0     str    w0, [sp, #12]
    774:    b9000be1     str    w1, [sp, #8]
    778:    12800000     mov    w0, #0xffffffff
    77c:    b9001fe0     str    w0, [sp, #28]
    780:    b9400be0     ldr    w0, [sp, #8]
    784:    7100001f     cmp    w0, #0x0
    788:    540000a0     b.eq   79c <divide_safe+0x30>
    78c:    b9400fe1     ldr    w1, [sp, #12]
    790:    b9400be0     ldr    w0, [sp, #8]    XXX
    794:    1ac00c20     sdiv   w0, w1, w0
    798:    b9001fe0     str    w0, [sp, #28]
    79c:    b9401fe0     ldr    w0, [sp, #28]
    7a0:    910083ff     add    sp, sp, #0x20
    7a4:    d65f03c0     ret
```

```
000000000000076c <divide_safe>:
    76c:    d10083ff     sub    sp, sp, #0x20
    770:    b9000fe0     str    w0, [sp, #12]
    774:    b9000be1     str    w1, [sp, #8]
    778:    12800000     mov    w0, #0xffffffff
    77c:    b9001fe0     str    w0, [sp, #28]
    780:    b9400be0     ldr    w0, [sp, #8]
    784:    7100001f     cmp    w0, #0x0
    788:    540000a0     b.eq   79c <divide_safe+0x30>
    78c:    b9400fe1     ldr    w1, [sp, #12]
    790:    b9400be0     ldr    w0, [sp, #8]
    794:    1ac00c20     sdiv   w0, w1, w0
    798:    b9001fe0     str    w0, [sp, #28]
    79c:    b9401fe0     ldr    w0, [sp, #28]
    7a0:    910083ff     add    sp, sp, #0x20
    7a4:    d65f03c0     ret
```