# Quicksort

Quicksort is one of the most important algorithms of the 20th century and it is widely used for system sorts and many other applications. This method was invented in 1961 by Tony Hore, who won the Turing Award in 1980 for this and other work. It is a recursive method, and its basic idea is that it does the recursion after it does the work. Thus, it first randomly shuffle the array (that is an important step) and then partition the array in order to divide it so that for some value `j`, there is no larger entry to its left, and no smaller entry to its right. Once we have the array partitioned in that way, we place `j` in the middle. Once we have it arranged in that way, then we recursively sort the two parts. Sort the left part, and then sort the right part. After those two things arrays are sorted, the whole content is sorted. 

The image below illustrates how the Quicksort partitioning works. The idea is to arbitrarily choose the first element to be the partitioning element. Since we shuffled the array, we pick a random element from the array. We then maintain an `i` pointer that moves from left to right, and a `j` pointer that moves from right to left. 

<img src="https://cdn.rawgit.com/rogergranada/MOOCs/master/Coursera/Princeton/Algorithms-Part-1/Week%203/images/quick_partioning.svg" width="70%" align="center"/>

Thus, we can perform the following steps.

Repeat until `i` and `j` pointers cross.
- Scan `i` from left to right so long as (`vec[i] < vec[lo]`)
- Scan `j` from right to left so long as (`vec[j] > vec[lo]`)
- Exchange `vec[i]` with `vec[j]`

When pointers cross:
- Exchange `vec[lo]` with `vec[j]`

Below we present the code for partitioning.

In [11]:
def partition(vec, lo, hi):
    i, j = lo+1, hi-1
    while True:
        while vec[i] < vec[lo]:
            i += 1
            if i == hi: break
        while vec[j] > vec[lo]:
            j -= 1
            if j == lo: break
        if i > j: break
        vec[j], vec[i] = vec[i], vec[j]
    vec[lo], vec[j] = vec[j], vec[lo]
    return j

vec = [4, 1, 6, 2, 5, 7, 3]
print('Initial array:   {}'.format(vec))
print('Central element: {}'.format(vec[0]))
partition(vec, 0, len(vec))
print('Partition array: {}'.format(vec))

Initial array:   [4, 1, 6, 2, 5, 7, 3]
Central element: 4
Partition array: [2, 1, 3, 4, 5, 7, 6]


Below we have the complete implementation of the Quick Sort algorithm:

In [56]:
# Load Integer class
%run ./integer_class.py
%run ./shuffle_class.py

class QuickSort(object):
    def __init__(self):
        pass

    def sort(self, vec, lo=float('-inf'), hi=float('inf')):
        if lo == float('-inf'):
            Shuffle().shuffle(vec)
            print('Shuffled array: {}'.format(vec)) 
            lo, hi = 0, len(vec)-1
            self.sort(vec, lo, hi)
        else:
            if hi <= lo: return 
            j = self.partition(vec, lo, hi)
            self.sort(vec, lo, j-1)
            self.sort(vec, j+1, hi)
        return vec

    def less(self, v, w):
        return v.compare_to(w) < 0
    
    def is_sorted(self, vec):
        val = float('-inf')
        for i in range(len(vec)):
            if vec[i].v < val:
                return False
            val = vec[i].v
        return True
    
    def exchange(self, vec, i, j):
        vec[i], vec[j] = vec[j], vec[i]

    def partition(self, vec, lo, hi):
        i, j = lo+1, hi
        while True:
            while self.less(vec[i], vec[lo]):
                i += 1
                if i >= hi: break
            while self.less(vec[lo], vec[j]):
                j -= 1
                if j == lo: break
            if i >= j: break
            self.exchange(vec, j, i)
        self.exchange(vec, j, lo)
        return j
    
            
vec = [Integer(1), Integer(2), Integer(3), Integer(4), Integer(5), 
       Integer(6), Integer(7), Integer(8), Integer(9), Integer(10)]
print('Initial array:  {}'.format(vec))
quick_alg = QuickSort()
sorted_vec = quick_alg.sort(vec)
print('Sorted array:   {}'.format(sorted_vec))

Initial array:  [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Shuffled array: [8, 6, 2, 1, 10, 4, 5, 9, 7, 3]
Sorted array:   [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


A visualization of the Quicksort can be seen by its trace illustred below, where the first row below the header contains the input and the second row contains the shuffled input. When `lo` and `hi` contain the same value (*e.g.*, row 7 (`lo=1`, `hi=1`) and row 8 (`lo=4`, `hi=4`)), there are no partitions to the subarray. The last row contains the result of the sort algorithm.

|  lo   |   j   |   hi  |   0   |   1   |   2   |   3   |   4   |   5   |   6   |   7   |   8   |   9   |  10   |  11   |  12   |  13   |  14   |  15   |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  $-$  |  $-$  |  $-$  | **Q** | **U** | **I** | **C** | **K** | **S** | **O** | **R** | **T** | **E** | **X** | **A** | **M** | **P** | **L** | **E** |
|  $-$  |  $-$  |  $-$  | **K** | **R** | **A** | **T** | **E** | **L** | **E** | **P** | **U** | **I** | **M** | **Q** | **C** | **X** | **O** | **S** |
|   0   |  *5*  |  15   | **E** | **C** | **A** | **I** | **E** |  *K*  | **L** | **P** | **U** | **T** | **M** | **Q** | **R** | **X** | **O** | **S** |
|   0   |  *3*  |   4   | **E** | **C** | **A** |  *E*  | **I** |   K   |   L   |   P   |   U   |   T   |   M   |   Q   |   R   |   X   |   O   |   S   |
|   0   |  *2*  |   2   | **A** | **C** |  *E*  |   E   |   I   |   K   |   L   |   P   |   U   |   T   |   M   |   Q   |   R   |   X   |   O   |   S   |
|   0   |  *0*  |   1   |  *A*  | **C** |   E   |   E   |   I   |   K   |   L   |   P   |   U   |   T   |   M   |   Q   |   R   |   X   |   O   |   S   |
|   1   |  $-$  |   1   |   A   |  *C*  |   E   |   E   |   I   |   K   |   L   |   P   |   U   |   T   |   M   |   Q   |   R   |   X   |   O   |   S   |
|   4   |  $-$  |   4   |   A   |   C   |   E   |   E   |  *I*  |   K   |   L   |   P   |   U   |   T   |   M   |   Q   |   R   |   X   |   O   |   S   |
|   6   |  *6*  |  15   |   A   |   C   |   E   |   E   |   I   |   K   |  *L*  | **P** | **U** | **T** | **M** | **Q** | **R** | **X** | **O** | **S** |
|   7   |  *9*  |  15   |   A   |   C   |   E   |   E   |   I   |   K   |   L   | **M** | **O** |  *P*  | **T** | **Q** | **R** | **X** | **U** | **S** |
|   7   |  *7*  |   8   |   A   |   C   |   E   |   E   |   I   |   K   |   L   |  *M*  | **O** |   P   |   T   |   Q   |   R   |   X   |   U   |   S   |
|   8   |  $-$  |   8   |   A   |   C   |   E   |   E   |   I   |   K   |   L   |   M   |  *O*  |   P   |   T   |   Q   |   R   |   X   |   U   |   S   |
|  10   | *13*  |  15   |   A   |   C   |   E   |   E   |   I   |   K   |   L   |   M   |   O   |   P   | **S** | **Q** | **R** |  *T*  | **U** | **X** |
|  10   | *12*  |  12   |   A   |   C   |   E   |   E   |   I   |   K   |   L   |   M   |   O   |   P   | **R** | **Q** |  *S*  |   T   |   U   |   X   |
|  10   | *11*  |  11   |   A   |   C   |   E   |   E   |   I   |   K   |   L   |   M   |   O   |   P   | **Q** |  *R*  |   S   |   T   |   U   |   X   |
|  10   |  $-$  |  10   |   A   |   C   |   E   |   E   |   I   |   K   |   L   |   M   |   O   |   P   |  *Q*  |   R   |   S   |   T   |   U   |   X   |
|  14   | *14*  |  15   |   A   |   C   |   E   |   E   |   I   |   K   |   L   |   M   |   O   |   P   |   Q   |   R   |   S   |   T   |  *U*  | **X** |
|  15   |  $-$  |  15   |   A   |   C   |   E   |   E   |   I   |   K   |   L   |   M   |   O   |   P   |   Q   |   R   |   S   |   T   |   U   |  *X*  |
| $-$   |  $-$  | $-$   | **A** | **C** | **E** | **E** | **I** |  *K*  | **L** | **M** | **O** | **P** | **Q** | **R** | **S** | **T** | **U** | **X** |

An example of the algorithm working can be seen below:

<img src="https://i.gifer.com/ORSX.gif" width="60%">

Some implementation details about Quicksort:

- **Partitioning in-place**: Using an extra array makes partitioning easier (and stable), but is not worth the cost.
- **Terminating the loop**: Testing whether the pointers cross is a bit trickier than it might seem.
- **Staying in bounds**: The (`j == lo`) test is redundant, but the (`i == hi`) test is not.
- **Preserving randomness**: Shuffling is needed for performance guarantee.
- **Equal keys**: When duplicates are present, it is (counter-intuitively) better to stop on keys equal to the partitioning item's key.

### Mathematical Analysis

In the best case, Quicksort performs $\sim N \lg N$ number of compares. In the worst case - when the array is already sorted, the number of compares is $\frac{1}{2} N^2$. Below we show the trace Quicksort in the best case, where the first row below the header contains the input and the second row contains the shuffled input. When `lo` and `hi` contain the same value, there are no partitions to the subarray. The last row contains the result of the sort algorithm.

|  lo   |   j   |   hi  |   0   |   1   |   2   |   3   |   4   |   5   |   6   |   7   |   8   |   9   |  10   |  11   |  12   |  13   |  14   |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  $-$  |  $-$  |  $-$  | **H** | **A** | **C** | **B** | **F** | **E** | **G** | **D** | **L** | **I** | **K** | **J** | **N** | **M** | **O** |
|  $-$  |  $-$  |  $-$  | **H** | **A** | **C** | **B** | **F** | **E** | **G** | **D** | **L** | **I** | **K** | **J** | **N** | **M** | **O** |
|   0   |  *7*  |  14   | **D** | **A** | **C** | **B** | **F** | **E** | **G** |  *H*  | **L** | **I** | **K** | **J** | **N** | **M** | **O** |
|   0   |  *3*  |   6   | **B** | **A** | **C** |  *D*  | **F** | **E** | **G** |   H   |   L   |   I   |   K   |   J   |   N   |   M   |   O   |
|   0   |  *1*  |   2   | **A** |  *B*  | **C** |   D   |   F   |   E   |   G   |   H   |   L   |   I   |   K   |   J   |   N   |   M   |   O   |
|   0   |  $-$  |   0   |  *A*  |   B   |   C   |   D   |   F   |   E   |   G   |   H   |   L   |   I   |   K   |   J   |   N   |   M   |   O   |
|   2   |  $-$  |   2   |   A   |   B   |  *C*  |   D   |   F   |   E   |   G   |   H   |   L   |   I   |   K   |   J   |   N   |   M   |   O   |
|   4   |  *5*  |   6   |   A   |   B   |   C   |   D   | **E** |  *F*  | **G** |   H   |   L   |   I   |   K   |   J   |   N   |   M   |   O   |
|   4   |  $-$  |   4   |   A   |   B   |   C   |   D   |  *E*  |   F   |   G   |   H   |   L   |   I   |   K   |   J   |   N   |   M   |   O   |
|   6   |  $-$  |   6   |   A   |   B   |   C   |   D   |   E   |   F   |  *G*  |   H   |   L   |   I   |   K   |   J   |   N   |   M   |   O   |
|   8   | *11*  |  14   |   A   |   B   |   C   |   D   |   E   |   F   |   G   |   H   | **J** | **I** | **K** |  *L*  | **N** | **M** | **O** |
|   8   |  *9*  |  10   |   A   |   B   |   C   |   D   |   E   |   F   |   G   |   H   | **I** |  *J*  | **K** |   L   |   N   |   M   |   O   |
|   8   |  $-$  |   8   |   A   |   B   |   C   |   D   |   E   |   F   |   G   |   H   |  *I*  |   J   |   K   |   L   |   N   |   M   |   O   |
|  10   |  $-$  |  10   |   A   |   B   |   C   |   D   |   E   |   F   |   G   |   H   |   I   |   J   |  *K*  |   L   |   N   |   M   |   O   |
|  12   | *13*  |  14   |   A   |   B   |   C   |   D   |   E   |   F   |   G   |   H   |   I   |   J   |   K   |   L   | **M** |  *N*  | **O** |
|  12   |  $-$  |  12   |   A   |   B   |   C   |   D   |   E   |   F   |   G   |   H   |   I   |   J   |   K   |   L   |  *M*  |   N   |   O   |
|  14   |  $-$  |  14   |   A   |   B   |   C   |   D   |   E   |   F   |   G   |   H   |   I   |   J   |   K   |   L   |   M   |   N   |  *O*  |
|  $-$  |  $-$  | $-$   | **A** | **B** | **C** | **D** | **E** | **F** | **G** | **H** | **I** | **J** | **K** | **L** | **M** | **N** | **O** |

For the worst case, we can see its trace in the table below, where the initial (or after shuffled) the array is already sorted.

|  lo   |   j   |   hi  |   0   |   1   |   2   |   3   |   4   |   5   |   6   |   7   |   8   |   9   |  10   |  11   |  12   |  13   |  14   |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  $-$  |  $-$  |  $-$  | **A** | **B** | **C** | **D** | **E** | **F** | **G** | **H** | **I** | **J** | **K** | **L** | **M** | **N** | **O** |
|  $-$  |  $-$  |  $-$  | **A** | **B** | **C** | **D** | **E** | **F** | **G** | **H** | **I** | **J** | **K** | **L** | **M** | **N** | **O** |
|   0   |  *0*  |  14   |  *A*  | **B** | **C** | **D** | **E** | **F** | **G** | **H** | **I** | **J** | **K** | **L** | **M** | **N** | **O** |
|   1   |  *1*  |  14   |   A   |  *B*  | **C** | **D** | **E** | **F** | **G** | **H** | **I** | **J** | **K** | **L** | **M** | **N** | **O** |
|   2   |  *2*  |  14   |   A   |   B   |  *C*  | **D** | **E** | **F** | **G** | **H** | **I** | **J** | **K** | **L** | **M** | **N** | **O** |
|   3   |  *3*  |  14   |   A   |   B   |   C   |  *D*  | **E** | **F** | **G** | **H** | **I** | **J** | **K** | **L** | **M** | **N** | **O** |
|   4   |  *4*  |  14   |   A   |   B   |   C   |   D   |  *E*  | **F** | **G** | **H** | **I** | **J** | **K** | **L** | **M** | **N** | **O** |
|   5   |  *5*  |  14   |   A   |   B   |   C   |   D   |   E   |  *F*  | **G** | **H** | **I** | **J** | **K** | **L** | **M** | **N** | **O** |
|   6   |  *6*  |  14   |   A   |   B   |   C   |   D   |   E   |   F   |  *G*  | **H** | **I** | **J** | **K** | **L** | **M** | **N** | **O** |
|   7   |  *7*  |  14   |   A   |   B   |   C   |   D   |   E   |   F   |   G   |  *H*  | **I** | **J** | **K** | **L** | **M** | **N** | **O** |
|   8   |  *8*  |  14   |   A   |   B   |   C   |   D   |   E   |   F   |   G   |   H   |  *I*  | **J** | **K** | **L** | **M** | **N** | **O** |
|   9   |  *9*  |  14   |   A   |   B   |   C   |   D   |   E   |   F   |   G   |   H   |   I   |  *J*  | **K** | **L** | **M** | **N** | **O** |
|  10   | *10*  |  14   |   A   |   B   |   C   |   D   |   E   |   F   |   G   |   H   |   I   |   J   |  *K*  | **L** | **M** | **N** | **O** |
|  11   | *11*  |  14   |   A   |   B   |   C   |   D   |   E   |   F   |   G   |   H   |   I   |   J   |   K   |  *L*  | **M** | **N** | **O** |
|  12   | *12*  |  14   |   A   |   B   |   C   |   D   |   E   |   F   |   G   |   H   |   I   |   J   |   K   |   L   |  *M*  | **N** | **O** |
|  13   | *13*  |  14   |   A   |   B   |   C   |   D   |   E   |   F   |   G   |   H   |   I   |   J   |   K   |   L   |   M   |  *N*  | **O** |
|  14   | *14*  |  14   |   A   |   B   |   C   |   D   |   E   |   F   |   G   |   H   |   I   |   J   |   K   |   L   |   M   |   N   |  *O*  |
|  $-$  |  $-$  | $-$   | **A** | **B** | **C** | **D** | **E** | **F** | **G** | **H** | **I** | **J** | **K** | **L** | **M** | **N** | **O** |

The average case, usually in case after shuffling the array, the average number of compares $C_N$ to quicksort an array of $N$ distinct keys is $\sim 2N \ln N$, and the number of exchanges is $\sim \frac{1}{3} N \ln N$. To prove that, consider that $C_N$ satisfies the recurrence $C_0 = C_1 = 0$ and for $N \ge 2$:

$$ 
C_N = (N + 1) + \left (\frac{C_0 + C_{N-1}}{N} \right ) + \left (\frac{C_1 + C_{N-2}}{N} \right ) + \ldots + \left (\frac{C_{N-1} + C_0}{N} \right )
$$ 

Multiplying both sides by $N$ and collecting terms, we have:

$$
NC_N = N(N+1) + 2(C_0 + C_1 + \ldots + C_{N-1})
$$

Subtracting from the same equation for $N-1$:

$$
NC_N - (N-1)C_{N-1} = 2N + 2C_{N-1}
$$

Rearranging terms and dividing by $N(N+1)$, we have:

$$
\frac{C_N}{N+1} = \frac{C_{N-1}}{N} + \frac{2}{N+1}
$$

Repeatedly applying the above equation:

$
\frac{C_N}{N+1} = \frac{C_{N-1}}{N} + \frac{2}{N+1} \\
\frac{C_N}{N+1} = \frac{C_{N-2}}{N-1} + \frac{2}{N} + \frac{2}{N+1} \\
\frac{C_N}{N+1} = \frac{C_{N-3}}{N-2} + \frac{2}{N-1} + \frac{2}{N} + \frac{2}{N+1} \\
\frac{C_N}{N+1} = \frac{2}{3} + \frac{2}{4} + \frac{2}{5} + \ldots + \frac{2}{N+1}
$

Approximating the sum by an integral, we have:

$$
C_N = 2(N+1) \left ( \frac{1}{3} + \frac{1}{4} + \frac{1}{5} + \ldots + \frac{1}{N+1} \right ) \\
\sim 2(N+1) \int_{3}^{N+1} \frac{1}{x}dx
$$

Finally, the desired result is:

$$
C_N \sim 2(N+1) \ln N \approx 1.39 N \lg N
$$ 

In the average case, Quicksort has 39% more compares than Mergesort. On the other hand, it is faster than Mergesort in practice because of less data movement.

### Stability of the Algorithm

It is important to say that Quicksort is not stable, since there are long exchanges between values. The table below illustrates an example, where $C_1$ and $C_2$ change positions:

|   i   |   j   |   0   |   1   |   2   |   3   |
| :---: | :---: | :---: | :---: | :---: | :---: |
|  $-$  |  $-$  | $B_1$ | $C_1$ | $C_2$ | $A_1$ |
|   1   |   3   | $B_1$ | $C_1$ | $C_2$ | $A_1$ |
|   1   |   3   | $B_1$ | $A_1$ | $C_2$ | $C_1$ |
|   0   |   1   | $A_1$ | $B_1$ | $C_2$ | $C_1$ 

### Practical Improvements

- **Using Insertion sort for small subarrays**. As in Mergesort, it would be better to use Insertion sort when considering small arrays due to too much overhead for tiny subarrays. The cutoff to Insertion sort should be about 10 items. 

```python
def sort(vec, lo=None, hi=None):
    if hi <= lo + CUTOFF - 1):
        vec = Insertion.sort(vec, lo, hi)
        return vec

    if lo == float('-inf'):
        Shuffle().shuffle(vec)
        print('Shuffled array: {}'.format(vec)) 
        lo, hi = 0, len(vec)-1
        self.sort(vec, lo, hi)
    else:
        if hi <= lo: return 
        j = self.partition(vec, lo, hi)
        self.sort(vec, lo, j-1)
        self.sort(vec, j+1, hi)
```

- **Using the median of sample**. The best choice of pivot item would be equal to the median. Thus, you can estimate the true median by taking median of a sample (median of 3 random  items). It would lead to $\sim frac{12}{7} N \lg N$ compares (slightly fewer) or $\sim \frac{12}{35} N \lg N$ exchanges (slightly more).

```python
def sort(vec, lo=None, hi=None):
    m = median_of_3(vec, lo, lo+(hi-lo)/2., hi)
    swap(vec, lo, m)
    
    if lo == float('-inf'):
        Shuffle().shuffle(vec)
        print('Shuffled array: {}'.format(vec)) 
        lo, hi = 0, len(vec)-1
        self.sort(vec, lo, hi)
    else:
        if hi <= lo: return 
        j = self.partition(vec, lo, hi)
        self.sort(vec, lo, j-1)
        self.sort(vec, j+1, hi)
```





# Selection



# Questions

1. What is the expected running time of randomized quicksort to sort an array of `n` distinct keys when the array is already sorted?<br>

&#9744; linear<br>
&#9745; linearithmic<br>
&#9744; quadratic<br>
&#9744; exponential

&#9744; 