# Motivation

- databases are fast but have limits
    - e.g., MySQL 5.7 can process ~1M small queries per second on commodity hardware
- scaling a database across multiple servers is tricky
- many workloads are read-dominated, and can be accelerated effectively by caching the results of queries
- in-memory caching became

# Memcached

- "an in-memory key-value store for small chunks of arbitrary data from results of database calls, API calls, or page rendering"
- cache hit: retrieve data from the cache if found
- cache miss: retrieve data from the database and store it in the cache

## Features

- cache management
    - data items can be assigned an expiration time by the application
    - data items can be updated or deleted explicitly by the application
    - eviction is implemented using the least recent used (LRU) policy
- partitioning
    - the key-value structure stored by the cache can be scaled out across multiple servers by hash-partitioning on the primary key
    - a distributed cache can be used with a centralized database
    
## Look-aside vs Look-through

- the L1/L2 cache in a CPU is called a **look-through** cache because in the event of a cache miss, the data fetch from main memory is generated by the cache itself without any additional work from the application program
- Memcached is called a **look-aside** cache because the data fetch has to be issued explicitly by the application

<img src="img/Snip20190924_3.png" width=80%/>

## API

- storage commands
    - `set`: adds or overwrites a key-value pair
    - `add`: adds but does not overwrite
    - `replace`: overwrites but does not add
    - `append/prepend`: add data to the value of a key-value pair
    - `cas`: check-and-set (or compare-and-swap), replaces a key-value pair but only if no one has updated the data since last read
- retrieval commands
    - `get`: retrieves a key-value pair
    - `gets`: similar to get but returns a numerical "CAS identifier" (representing a data version) for use with the cas operation
- deletion
    - `delete`: removes a key-value pair
- counter commands
    - `incr/decr`: adds/subtracts an integer to/from the value of a key (must be positive)
- statistics
    - `stats`: returns the values of various performance counters and settings
- flush
    - `flush_all`: causes all items in the cache to expire

## Caveat

- keys must be named consistently throughout the application
    - one option is the to use human-readable hierarchical naming scheme
- loading of data from the database to the cache is non-transactional, meaning the data items can be outdated by the time they enter the cache
    - protect the `SELECT/set` operations with a lock

<img src="img/Snip20190924_4.png" width=80%/>

- slow queries can lead to **thundering herds**
    - similar to stale data issue above, also solvable with locks

<img src="img/Snip20190924_5.png" width=80%/>

- values can be evicted and the data integrity is affected
    - solution: pack values into one key-value pair
