In [3]:
import math
import logging
FORMAT = '[%(name)s:%(levelname)s]  %(message)s'
logging.basicConfig(level=logging.DEBUG, format=FORMAT)
logger = logging.getLogger('dbg')

def dprint(s):
    logger.debug(s)

def iprint(s):
    logger.info(s)

logger.setLevel(logging.INFO)

## Comparison Sorting

Comparison algorithms sort arrays by comparing elements with no extra information. Examples are all the sorting algo's we have seen so far
* Insertion Sort
* Heapsort
* Quicksort

For all comparison sort algo's the following constraints hold

| Lower Bound for Worst Case | Lower Bound for Average Case |
| -- | -- |
| $\Theta( n \log n)$ | $\Theta( n \log n)$ |

Hence $O(n \log n)$ sorts such as **Heapsort** are *asymptotically optimal* in this class.

### Generalization of C' Sorts

A *comparison sort* takes an array $[a_0, ..., a_{n-1}]$, and only gains information by comparing elements: "Test is $a_i < a_j$?".

It outputs a permutation that orders the items based on a rule-set, for simplicity assuming all elements are distinct (no repeats). There are therefore $n$! possible permutations, and a correct comparison sorting algorithm must be able to produce **ALL** of these orderings!

A comparison sort corresponds to a full binary decision tree with $\geq n$! leaves, where each leaf represents a permutation of the $n$ elements. The *worst case* number of comparisons is the length of the longest simple path from root to leaf.

A binary decision tree with height $h$ has at most $2^h$ leaves, hence to fit with the above $n! \leq 2^h$. Striling's approximation for $n!$:

$n! \approx \left( \frac{n}{e} \right)^n \sqrt{2 \pi n } \left( 1 + \Theta(1/n) \right) \geq  \left( \frac{n}{e} \right)^n$

$h \geq \log (n!) \geq \log \left( \frac{n}{e} \right)^n = n \log n - n \log e = \Omega(n \log n)$

Hence no comparison sort can be faster than $\Omega(n \log n)$ on a *worst case* input.

What is the *best* possible average case? - A completely balanced tree. Completely balanced mean no leafs differ by more than 1 in height, and each leaf therefore has a depth of $\lfloor \log(n!) \rfloor$ or $\lceil \log(n!) \rceil$.

Hence no comparison sort can be faster than $\Omega(n \log n)$ on a *average case* input.


## Counting Sort [Stable]

Assume keys are integers within a fixed range - uses *key values* rather than comparisons.

For $n$ items with keys that are known to be in the range $0$ to $k-1$
- Running Time for any input $\Theta(n + k)$
- Storage $\Theta(n + k)$ + input $\Theta(n)$

Very fast whejn $k = O(n)$ -> run time $\Theta(n)$

* Create a $k$ length counting array `counts`, and an $n$ length output array `output`
* Loop through the array and tally each key by incrementing it index in `counts`
* Loop through `counts` and convert to a running total of elements so far
* Loop through the array backwards, and set `output[counts[key]-1]` to `key` filling in the output array
* This works by using `counts[key]-1` to find the correct spot to place the key


In [28]:
def counting_sort(A, k = None):
    if k is None:
        k = max(A) + 1

    n = len(A)
    counts = [0 for x in range(k)]
    output = [0 for x in range(n)]

    for key in A:
        counts[key] += 1
    print(f"Counts: {counts}")
    for i in range(1, k):
        counts[i] += counts[i - 1]
    print(f"Running: {counts}")
    for key in reversed(A):
        output[counts[key]-1] = key
        counts[key] = counts[key] - 1
    return output

A = [12,6,4,1,5,2,7,8]
print(counting_sort(A))


Counts: [0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1]
Running: [0, 1, 2, 2, 3, 4, 5, 6, 7, 7, 7, 7, 8]
[1, 2, 4, 5, 6, 7, 8, 12]


## Radix Sort

Algorithm that orders by keys rather than comparisons - applies to data that can be sorted lexicographically (integers, words etc). By standard uses counting sort for a stable inner sort.

There are 2 variants: `LSD` and `MSD`, for e.g. $n$ items with $d$ digit keys:

| LSD AnyInput RunTime | LSD Storage |
| -- | -- |
| $\Theta(d(n + k))$ | $\Theta(n+k) + \Theta(dn)$|

* Assumes all keys have the same length
* Used in card-sorting machines
* Requires a stable sort

For MSD, the worst case runtime matches LSD

| MSD Average RunTime | MSD Storage |
| -- | -- |
| $\Theta(n \log_k n)$ | $\Theta(n+pk) + \Theta(dn)$|

* Allows for keys with different lengths
* Sorts by recursion and can exit early if < 2 keys share a digit
* Runtime depends on input distribution
* Can create inefficient subarray sorts

In [29]:
def gd(num, n): # 0 -> LSD
    return num // 10**n % 10

def stable_sort(A, k, dp):
    # use counting sort as a stable sort
    n = len(A)
    counts = [0 for x in range(k)]
    output = [0 for x in range(n)]

    for key in A:
        counts[gd(key, dp)] += 1

    for i in range(1, k):
        counts[i] += counts[i - 1]

    for key in reversed(A):
        output[counts[gd(key, dp)]-1] = key
        counts[gd(key, dp)] = counts[gd(key, dp)] - 1
    return output

def LSD_radix_sort(A, d, k):
    for digit_pos in range(d):
        A = stable_sort(A, k, digit_pos)
    return A

A = [314, 712, 632, 201, 111]
A = LSD_radix_sort(A, d=3, k=10)
print(A)


[111, 201, 314, 632, 712]


### Digit Size Selection

Runtime is $\Theta(b(n + 2')/r)$ - Minimise:

**TBD**

### Vs Quicksort

Quicksort is typically $\Theta( n \log n)$, where LSD Radix is $\Theta(dn)$ when $k = O(n)$.

Quicksort is more cache-friendly and in-place where LSD Radix is not. Best depends on machine, storage and distribution of data.

## Bucket Sort

When the data distribution is *known*, achieves **average case** $\Theta(n)$.

Three stages:
1. **Scatter** - distribute keys to buckets
2. **Sort** - sort the keys within buckets
3. **Gather** - gather the sorted keys in order

Assuming uniformly distributed keys, $n$ keys and $b$ buckets, where $b \approx n$ :

Average Case $\rightarrow \Theta(n + \frac{n^2}{b} + b) \rightarrow \Theta(n)$

Worst Case $\rightarrow \Theta(n^2 + b)$

Storage $\rightarrow \Theta(n)$



In [34]:
def insertion_sort(S: list):
    for j in range(1, len(S)):
        ## start with item index 1 and go through each index
        key = S[j]
        i = j-1
        while i >= 0 and S[i] > key:
            ## loop backwards through sorter region until end or until list item is greater than key
            ## shift items to make space for insert 
            S[i+1] = S[i]
            i = i - 1
        S[i+1] = key

def bucket_sort(A):
    num_buckets = len(A)
    buckets = [[] for _ in range(num_buckets)]

    for key in A: # Scatter
        buckets[int(num_buckets * key)].append(key)
    for bucket in buckets:
        insertion_sort(bucket)
    return [x for bucket in buckets for x in bucket] # gather

A = [0.9, 0.3, 0.1, 0.5, 0.45]
A = bucket_sort(A)
print(A)

[0.1, 0.3, 0.45, 0.5, 0.9]


### RTA Complexity

**Scatter** and **Gather** operations involve basic for loops and are both $\Theta(n)$.

Remaining costs arises from `insertion_sort` being called on $n$ buckets.

Let $m_i$ denote the number of keys in bucket $i$ where $\sum_i m_i = n$

$m_i$ has a binomial distribution `bin($n$, $p$)` where P(bucket $i$), $p = 1/n$

Cost of $n$ insertion sorts is $\sum_i O(m_i^2)$

Taking the expectation over the key distribution $\mathbb{E} \left[ \sum_i O(m_i^2) \right] = \sum_i \mathbb{E}[O(m_i^2)] $

$\mathbb{E}[m_i^2] = Var[m_i] + \mathbb{E}^2[m_i] = 2 - 1/n$ 

Expected cost of $n$ insertion sorts: $\sum_i O(2 - 1/n) = O(n)$

Hence Average case bucket sort is $\Theta(n)$, Worst case (1 bucket) is $\Theta(n^2)$