# Quicksort

**Quicksort** is a recursive method:

- Shuffles the array
- Creates a partition so there is a `j` where entry `a[j]` is in place, there's no larger entry to the left of `j`, and no smaller entry to the right of `j`
- Recursively sort the left part and right part

Phase I (repeat until `i` and `j` cross:
- Choose the first element as `a[lo]`
- Scan `i` from left to right as long as `a[i] < a[lo]`
- Scan `j` from right to left so long as `a[j] > a[lo]`
- Exchange `a[i]` with `a[j]`

Phase II is to take each half around the partition and run phase I on it (using the first item as the new partition).

Typically partition in place - you can use an extra array to make it easier (and stable), but it's usually no worth the cost. One of the advantages of it over mergesort is that it's in place.

Two tricky parts are keeping the indices `i` and `j` from crossing (terminating the loop) and staying in bounds (`j == lo` test is redundant but `i == hi` isn't). The shuffling part is necessary for performance guarantees. Also, when there are duplicate keys, it's better to stop on keys equal to the partitioning item's key.

Quicksort is even faster than mergesort.

In the best case, quicksort will divide the array in half with each partition, and is then similar to mergesort with $N \lg N$. The worst case is if the array is already sorted, so each partition just peals off the first item, which is ~$\frac{1}{2} N^2$. But with random shuffling, that's highly unlikely to happen.

The average case (by number of compares) is more interesting.

The proposition is the average number of compares $C_N$ to quicksort an array of $N$ distinct keys is ~$2N \ln N$ (and the number of exchanges is ~$\frac{1}{3} N \ln N$).

**The proof**: $C_N$ satisfies the recurrence $C_0 = C_1 = 0$ and for $N \geq 2$. The following shows $C_N$ equal to the partitioning plus each left + right over the partitioning probability:

$$
C_N = (N + 1) + \bigg{(} \frac{C_0 + C_{N-1}}{N} \bigg{)} + \bigg{(} \frac{C_1 + C_{N-2}}{N} \bigg{)} + \cdots + \bigg{(} \frac{C_{N-1} + C_0}{N} \bigg{)} \\
\text{Multiply both sides by } N \\
NC_N = N(N + 1) + 2(C_0 + C_1 + \ldots + C_{N-1}) \\
\text{Subtract this from the same equation for } N-1: \\
NC_N - (N - 1)C_{N-1} = 2N + 2C_{N-1} \\
\text{Repeatedly apply the above equation, get approximate sum by integral:} \\
C_N \text{~} 2(N + 1) \ln N \approx 1.39 N \lg N
$$

**Worst case**, the number of compares is quadratic ($N^2$), but that's extremely unlikely with a random shuffle (it's a probabilistic guarantee against the worst case, and important for performance). Many text book implementations go quadratic if the array is already sorted/reverse sorted, or there are many duplicates. There are about 39% more compares than mergesort, but it's faster in practice because of less data movement.

**Summary of properties**:

- **In place**: partitioning is done with constant extra space
- **Depth of recursion**: can guarantee logarithmic depth by recurring on smaller subarrays before larger subarrays, so get logarithmic extra space with high probability
- **Not stable**: long-range swaps can pull items out of relative order

Practical improvements:

- Quicksort has too much overhead for small subarrays, can use insertion sort for arrays with ~10 items, or delay using it until one pass at the end
- Estimate the partitioning item instead of using the first item. The best choice for the pivot item is the median, can find a decent estimate by taking 3 random items and taking the median of that (improves performance by ~10%)

In [None]:
# Quicksort implementa