# Hash
* hash function
* hash table
## Making a Hash: "Ingredient"
* universe U: set of all possible keys, |U| is very large
* hash table T: array of size m, each index is a bucket

## Using a hash: "Recipe" 
* search(k): look in T[h(k)] (home bucket of k)
* insert(x): store x in T[h(x.key)]
* delete(x): Remove x from T[h(x.key)]

if |U| is small-ish &rarr; |U| = len(T) = m &rarr; direct access table

## Collisions: Closed Addressing

* chaining by linked list
* each bucket T[i] stores a collection of elements
* key k is always stored in T[h(k)]

* insert(x):
    * search linkedlist at index (h(x.key)
    * replace/insert $\theta$(1)
* delete(x):
    * search linked list at index h(x.key)
    * remove node $\theta$(1)
* search(k):
    * h(k)
    * search for k in h(k) $\theta$(n)

<span style="color:red">**Question: What is the difference between x and k?**</span>

### Worst Case:
* as long as |U| > m(n- 1), possible to pick n keys that all hash to the same bucket. <span style="color:red"> **What is m(n-1) ?**</span>
* never encountered in real life
* how do we reconcile this with w - c? Average Case

### Analyze Average Search
Simple uniform hashing: "Spread out" (assumption)
* HashTable T has m buckets
* T has n elements
So $Pr[h(k) = i] = \frac{1}{m}$

$k \in V, i = 0, 1, ..., m-1$

* Expect to have the same number of elements in each buckets &rarr; **Load Factor**: expected # of elements in each bucket
$\alpha = \frac{n}{m}$

$L_{i}$ = # of elements in bucket i
Number of elements (n) = $\Sigma_{i = 0}^{m} Li$

N(k) = # of elements examined during search for k ( $k \in U$)

$E[N(k)]$ 

$=  \Sigma_{k \in I} Pr[k] * N(k) $

$= \Sigma_{i = 0}^{m - 1} \Sigma_{k \in U, h(k) = i} Pr[k] * N(k)$

$N(k) \leq L_i$

$\leq \Sigma_{i = 0}^{m -1} \Sigma_{k \in U, h(k) = i} Pr[k] * L_i$

$ = \Sigma_{i = 0}^{m - 1} Li \Sigma_{k \in u, h(k) = i} Pr[k]$

$ = \Sigma_{i=0}^{m - 1} Li Pr[h(k) = i]$

$ = \Sigma_{i = 0}^{ m -1} Li \frac{1}{m}$

$ = \frac{1}{m} \Sigma_{ i = 0}^{m -1} Li$

$ = \frac{n}{m}$

<span style="color: red"> **Why is it constant, isn't n the number of element in the hash?**</span>

## Collisions: Open Addressing 
* find another slot for k when T[h(k)] is taken.
* key k can be anywhere in T.

### Linear Probing
* store everything directly
* probe sequence: sequence of buckets to try
    * Home bucket h'(k)
    * h(k, i) p.s. i = # of collisions when hashing k
* h(k) &rarr; h(k) +1 &rarr; h(k) + 2 &rarr; ...
* h(k, 1) = (h'(k) + i) mod m

### Quadratic Probing
* h(k, i) = (h'(k) + $i^2$ mod m)
* issue: same probing seq
* resolve: (h'(k) + a$i^2$ + bi) mod m, where a, b constants
<span style="color: red">**Watch a video on it** </span>

### Double Hashing
* $h_1(k), h_1(k) + h_2(k) + 2h_2(k)$
* $h(k, i) = (h_1(k) + ih_2(k))$ mod m
    * $h_2(k)$ relatively prime to m
    * $h(k, i)$ will cover the most home buckets
    
### Delete
* when Del is encountered during search (or even delete), the algo will continue until found or NIL
* when Del is encountered during insert we can treat it like NIL

<span style="color: red"> **I don't understsand this part** </span>

## Hash Functions
* ensure simple uniform hashing
* h(k) depends on every part of k
* h(k) = k mod 10, depend on the last dec digit of k
* h(k) spreads out values for T <span style="color: red"> **I don't understand this part** </span>
* h(k) efficient to compute

## Hashing vs AVL Trees

* Hashing is better, on avg. (performance)
* order-depending operations - AVL
    * find successor
    * traversing

## Quicksort

In [None]:
def partition(array, pivot):
    smaller = []
    bigger = []
    for item in array:
        if item <= pivot:
            smaller.append(item)
        else:
            bigger.append(item)
        return smaller, bigger
    
def quickSort(array):
    if len(array) < 2:
        return array[:]
    else:
        pivot = array[0]
        smaller, bigger = partition(array[1:0], pivot)
        smaller = quickSort(smaller)
        bigger = quickSort(bigger)
        return smaller + [pivot] + bigger
        
        

data = [1, 7, 4, 1, 10, 9, -2]
print("Unsorted Array")
print(data)
 
size = len(data)
 
quickSort(data, 0, size - 1)
 
print('Sorted Array in Ascending Order:')
print(data)
    

### Worst-case Analysis
#### Lower bound $\theta(n^2)$
S = [n, n-1, n -2, ..., 1]

p = n, S = [n -1, ... 1] &rarr; n - 1 comparisons

p = n - 1, S = [n - 2, ... 1] &rarr; n - 1 comparisons

...

p = 2, S = [1] &rarr; 1 comparisons

$\Sigma_{i = 1}^{n - 1} i = \frac{n(n-1)}{2}$

$\Omega(n^2)$

#### Upper Bound $O(n^2)$

* Each element of S is a pivot at most once
* At most all other elements of S are compared to p
* So every pair of elements of S is compared at most once

\# pairs = $\frac{n(n-1)}{2}$ &rarr; $O(n^2)$

#### Average Case analysis
* $S_n$ = {all permutations of [1, 2, ..., n]}
* uniform probability distribution
* T(s) = # comparisons performed on inputs s.t. $ s \in S_n$

$Pr[x_{i.j} = 1] = Pr[i is compared with j]$

Pr[i is compared with j]

Pr[i, j are in the same subset i or j are pivots before any elements between i and j] = $\frac{1}{j-i + 1} + \frac{1}{j-i + 1}$

$\frac{2}{j-i + 1}$

$E[T(S)] = \sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} E[X_{i, j}]$

$\theta (n log n)$

#### Best Case
$\theta (n log n)$

## Quiz
* The deterministic quicksort algorithm has the same asymptotic average case and best case complexity
* Let $M_R$ be the maximum expected running time of the randomized quick-sort over all inputs of size. Similarly, $M_D$ define for the deterministic quick-sort. Then, comparing asymptotically, $M_R \leq M_D$ <span style="color: red">Why?</span>
