## Implementation of Dictionary in Python

### Dictionary

* An array / list allows access through positional indices
* A dictionary allows access through arbitrary keys
  - A collection of key-value pairs
  - Random access -- access time is the same for all they keys
* How is a dictionary implemented?

### Implementing a Dictionary

* The underlying storage is an array
  - Given an offset `i`, find `A[i]` in constant time
* Keys have to be mapped to ${0, 1, ..., n - 1}$
  - Given a key `k`, convert it to offset `i`
* Hash function
  - $h:S \rightarrow X$ maps a set of values $S$ to a small range of integers $X = {0, 1, ..., n - 1}$
  - Typically, $|X| \ll |S|$, so there will be collisions, $h(s) = h(s'), s \neq s'$
  - A good hash function will minimize collisions
  - SHA-256 is an industry standard hashing function whose range is 256 bits
    - Used to hash large files -- avoid uploading duplicates to cloud storage
    - Also used in cryptography
    - The output of this algorithm is 256 bits (it is a large number ie. $2^{256}$)
    - An application:
      - In cloud storage systems, for example Dropbox, sometimes when we upload a large file, it will upload very fast
      - Why does that happen?
      - The reason is the system computer the SHA-256 hash and if it detects that this hash is already present, it doesn't actually upload. It simply makes a pointer saying "one more copy of this file is uploaded"

### Hash Table

* An array `A` of size $n$ combined with a hash function $h$
* $h$ maps the keys to ${0, 1, ..., n - 1}$
* Ideally, when we create an entry for key `k`, `A[h(k)]` will be unused
  - What if there is already a value at that location?
* Dealing with collisions
  - Open addressing (closed hashing)
    - Probe (examine) a sequence of alternate slots in the same way
  - Open hashing (not to be confused with open addressing)
    - Each slot in the array points to a list of values
    - Insert into the list for the given slot
* Dictionary keys in Python must be immutable
  - If the value changes, the hash also changes

### Summary

* A dictionary is implemented as a hash table
  - An array plus a hash function
* Creating a good hash function is important (and difficult too)
* Need a strategy to deal with collisions
  - Open addressing / Close hashing -- probe for free space in the array
  - Open hashing -- each slot in the hash table points to a list of key-value pairs
  - Many heuristics / optimizations possible for dealing with collisions