# QuickSort

QuickSort chooses a pivot, sorts elements to be to the left and right of said pivot, and then recusively does the same on the segments to the left and right of the pivot. Different partiotions and piviot choices exist, but for this we will be using the median of three pivot selection and partitioning to speed up the algorithm a little bit. In the best case, QuickSort is $O(nlogn)$, and in the worst case it is $O(n^2)$. It is not a stable algorithm, and it has a memory complexity of $O(logn)$.

## Generating a random sequence

In [100]:
import numpy as np

In [101]:
seed = 123
np.random.seed(seed)
n = 10
high = 100
low = 0
randseq = np.random.randint(low, high, n).tolist()
print(randseq)

[66, 92, 98, 17, 83, 57, 86, 97, 96, 47]


# The algorithm, step-by-step

## Median of 3

First, we need to come up with a sub-algorithm to choose and sort the median of 3 sequence. For even sequences, we take the element in front of the middle to be the middle element

In [102]:
def median_of_3(seq):
    # Getting length of sequence
    n = len(seq)
    # Raising an error if a sequence less than 3 is passed
    if n<3:
        raise ValueError("Input list for median of 3 must minimally be length 3")
    # We take the first, last and middle element
    first = seq[0]
    last = seq[n-1]
    # Case of even number of elements
    if n%2==0:
        # Taking the element before the middle
        mid = (n//2)-1
        middle = seq[mid]
    # Case of odd number of elements
    else:
        mid = n//2
        middle = seq[mid]

    ## Sorting into 1-m-n
    # Finding out the relative sizes of the elements
    smallest = min(first, last, middle)
    largest = max(first, last, middle)
    # The remaining number will be the median of 3
    all3 = [first, last, middle]
    small_big = [smallest, largest]
    for i in small_big:
        # If the element i matches either the smallest or biggest, remove it from the list of all3 since we want to get the remaining median
        if i in all3:
            all3.remove(i)
    
    # First element is the smallest
    seq[0] = smallest
    # last element is the largest
    seq[n-1] = largest
    # Middle element is neither the largest or smallest
    seq[mid] = all3[0]
    
    # Returning the sequence with sorted 1-m-n, as well as the middle index for pivot usage
    return (seq, mid)

In [103]:
copy = randseq.copy()
medo3 = median_of_3(copy)
medo3

([47, 92, 98, 17, 66, 57, 86, 97, 96, 83], 4)

The algorithm also works for sorting lists of 3 elements

In [104]:
median_of_3([3,1,2])

([1, 2, 3], 1)

## Case for array length 2

Next we create a simply function to handle the case of 2 elements

In [105]:
def sort_2(seq):
    if len(seq)!=2:
        raise ValueError("Input list must be of size 2")
    # If bigger element is before the smaller one, swap them
    if seq[0]>seq[1]:
        temp = seq[0]
        seq[0] = seq[1]
        seq[1] = temp

    return seq

In [106]:
sort_2([4,2])

[2, 4]

## Sub-algorithm to perform pivot swaps and swaps

We now develop the algorithm that performs the swapping after the median-of-3 piviot is selected. 
1. The pivot is swapped to the second last position, and the counters i and j go from left and right respectively.
2. i stops at the first element larger than the pivot, j stops at the first element smaller or equal to the pivot.
3. They then swap the elements where they stopped at, and this continues until they cross each other.
4. The pivot is then swapped back to where i is stopped at after crossing j

In [107]:
medo3

([47, 92, 98, 17, 66, 57, 86, 97, 96, 83], 4)

In [108]:
def swaps(medo3seq):
    # Extracting the array and middle index
    seq, mid = medo3seq[0], medo3seq[1]

    # Getting the length of the index
    n = len(seq)

    # swapping the pivot to the second last position
    pivot = seq[mid]
    seq[mid] = seq[n-2]
    seq[n-2] = pivot

    # Intialising the i and j pointers
    i = 0
    j = n-3

    # Keep going until i and j cross
    while i<=j:
        # Stop i only when an element strictly larger than the pivot is found
        while seq[i]<=pivot:
            i+=1
            print("i is",i)
            # i cannot be larger than n-2
            if i==n-2:
                break
        # Stop j only when an element smaller or equal to the pivot is found
        while seq[j]>pivot:
            j-=1
            print("j is", j)
            # j cannot be lower than 0
            if j==0:
                break
        # If i has already crossed j, break out of the loop
        if i>j:
            break
        # Swap the elements where the i and j pointers are
        temp = seq[i]
        seq[i] = seq[j]
        seq[j] = temp
        print(seq[i], seq[j])

    # Swap the pivot back with the element at index i
    swapper = seq[i]
    seq[i] = seq[n-2]
    seq[n-2] = swapper
    return seq,i

## Looking at how the above algorithm handles different sequences

In [109]:
swaps(medo3)

i is 1
j is 6
j is 5
57 92
i is 2
j is 4
j is 3
17 98
i is 3
j is 2


([47, 57, 17, 66, 96, 92, 86, 97, 98, 83], 3)

In [110]:
swaps(median_of_3([5,5,1,1,1,5,5,5,51,1]))

i is 1
j is 6
j is 5
j is 4
j is 3
1 5
i is 2
i is 3
j is 2


([1, 1, 1, 1, 51, 5, 5, 5, 5, 5], 3)

In [111]:
np.random.seed(333)
n = 10
high = 100
low = 0
randseq2 = np.random.randint(low, high, n).tolist()
print(randseq2)
swaps(median_of_3(randseq2))

[12, 77, 35, 51, 46, 60, 83, 29, 71, 23]
i is 1
j is 6
j is 5
j is 4
j is 3
j is 2
j is 1
j is 0


([12, 23, 35, 51, 71, 60, 83, 29, 77, 46], 1)

## Putting together the algorithm through recursion

First, we redfine the swaps algorithm, but this time without the prints for usage in the overall algorithm

In [112]:
def swaps(medo3seq):
    # Extracting the array and middle index
    seq, mid = medo3seq[0], medo3seq[1]

    # Getting the length of the index
    n = len(seq)

    # swapping the pivot to the second last position
    pivot = seq[mid]
    seq[mid] = seq[n-2]
    seq[n-2] = pivot

    # Intialising the i and j pointers
    i = 0
    j = n-3

    # Keep going until i and j cross
    while i<=j:
        # Stop i only when an element strictly larger than the pivot is found
        while seq[i]<=pivot:
            i+=1
            # i cannot be larger than n-2
            if i==n-2:
                break
        # Stop j only when an element smaller or equal to the pivot is found
        while seq[j]>pivot:
            j-=1
            # j cannot be lower than 0
            if j==0:
                break
        # If i has already crossed j, break out of the loop
        if i>j:
            break
        # Swap the elements where the i and j pointers are
        temp = seq[i]
        seq[i] = seq[j]
        seq[j] = temp

    # Swap the pivot back with the element at index i
    swapper = seq[i]
    seq[i] = seq[n-2]
    seq[n-2] = swapper
    return seq,i

Then, we build the general algorithm

In [113]:
def QuickSort(seq):
    # Getting the length of the sequence
    n = len(seq)
    # Base case 1: 1 or no elements. Simply return the sequence
    if n<=1:
        #print(seq, "Base case 1 reached")
        return seq
    # Base case 2: 2 elements. Return the result of the sort 2 algorithm
    elif n==2:
        #print(seq, "Base case 2 reached")
        return sort_2(seq)
    # Base case 3: 3 elements. Return the result of median of 3
    elif n==3:
        #print(seq, "Base case 3 reached")
        return median_of_3(seq)[0]
    # Else, perform the swaps algorithm and call the function again on the smaller problem
    else:
        med_of_3 = median_of_3(seq)
        swapped = swaps(med_of_3)
        # Resulting sequence after the swaps have been made
        swapped_seq = swapped[0]
        # Ending index of the pivot
        pivot = swapped[1]

        # Only the pivot is now considered sorted. Split the remaining sequence into L+pivot+R
        return QuickSort(seq[:pivot]) + [seq[pivot]] + QuickSort(seq[pivot+1:])

Regenerating the random sequence

In [114]:
np.random.seed(seed)
n = 10
high = 100
low = 0
randseq = np.random.randint(low, high, n).tolist()
print(randseq)

[66, 92, 98, 17, 83, 57, 86, 97, 96, 47]


In [115]:
print(QuickSort(randseq))

[17, 47, 57, 66, 83, 86, 92, 96, 97, 98]


In [116]:
n = 100
high = 100
low = 0
randseq = np.random.randint(low, high, n).tolist()
print(randseq)

[73, 32, 46, 96, 25, 83, 78, 36, 96, 80, 68, 49, 55, 67, 2, 84, 39, 66, 84, 47, 61, 48, 7, 99, 92, 52, 97, 85, 94, 27, 34, 97, 76, 40, 3, 69, 64, 75, 34, 58, 10, 22, 77, 18, 15, 27, 30, 52, 70, 26, 80, 6, 14, 75, 54, 71, 1, 43, 58, 55, 25, 50, 84, 56, 49, 12, 18, 81, 1, 51, 44, 48, 56, 91, 49, 86, 3, 67, 11, 21, 89, 98, 3, 11, 3, 94, 6, 9, 87, 14, 83, 70, 12, 54, 27, 38, 17, 61, 74, 99]


In [117]:
print(QuickSort(randseq.copy()))

[1, 1, 2, 3, 3, 3, 3, 6, 6, 7, 9, 10, 11, 11, 12, 12, 14, 14, 15, 17, 18, 18, 21, 22, 25, 25, 26, 27, 27, 27, 30, 32, 34, 34, 36, 38, 39, 40, 43, 44, 46, 47, 48, 48, 49, 49, 49, 50, 51, 52, 52, 54, 54, 55, 55, 56, 56, 58, 58, 61, 61, 64, 66, 67, 67, 68, 69, 70, 70, 71, 73, 74, 75, 75, 76, 77, 78, 80, 80, 81, 83, 83, 84, 84, 84, 85, 86, 87, 89, 91, 92, 94, 94, 96, 96, 97, 97, 98, 99, 99]


In [118]:
print(sorted(randseq)==QuickSort(randseq.copy()))

True


In [119]:
randseq.count(76)

1

## Comparing the algorithm multiple times with a sorted iteration

In [120]:
n = 1_000_000
for i in range(n):
    n = 10
    high = 100
    low = 0
    randseq = np.random.randint(low, high, n).tolist()
    myseq = randseq.copy()
    if QuickSort(myseq)!=sorted(randseq):
        print("Sorting did not match")

Since the above did not output any false matches, the QuickSort seems to be working as intended

# Timing the algorithm

In [121]:
def QuickSortTester(n, high, low=0):
    randseq = np.random.randint(low, high+1, n).tolist()
    return QuickSort(randseq)

In [122]:
print(QuickSortTester(100,100))

[1, 2, 6, 6, 7, 8, 10, 11, 11, 14, 14, 15, 15, 16, 16, 16, 17, 18, 19, 19, 20, 21, 21, 22, 23, 25, 25, 26, 26, 26, 27, 30, 31, 31, 33, 35, 35, 35, 35, 36, 37, 37, 37, 39, 40, 41, 41, 45, 46, 47, 48, 50, 50, 50, 52, 52, 52, 57, 57, 58, 58, 58, 59, 60, 62, 64, 65, 65, 67, 67, 67, 67, 68, 68, 70, 72, 72, 72, 74, 75, 77, 79, 81, 82, 83, 84, 86, 86, 87, 87, 90, 90, 91, 91, 94, 98, 98, 98, 99, 100]


## Standard Cases

In [123]:
%timeit QuickSortTester(10,10)

6.24 μs ± 91.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [124]:
%timeit QuickSortTester(100,100)

62.6 μs ± 3.7 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [125]:
%timeit QuickSortTester(10_000,10_000)

8.48 ms ± 50.8 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [126]:
# Since the algorithm's time complexity is good enough to handle very large n values, we try a special case here
%timeit QuickSortTester(1_000_000,1_000_000)

1.29 s ± 23.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


# Case of all elements being sorted

In [127]:
# Sequence of sorted elements
def sorted_case(n):
    sorted = [i for i in range(n)]
    return QuickSort(sorted)

In [128]:
%timeit sorted_case(10_000)

5.98 ms ± 71.7 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


## Case of all elements being sorted in reverse order

In [129]:
def rev_case(n):
    # Sequence of elements sorted in reverse
    rev_sorted = [i for i in range(n-1, -1, -1)]
    return QuickSort(rev_sorted)

In [130]:
%timeit rev_case(10_000)

8.03 ms ± 82.9 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


## Case with multiple duplicates

In [131]:
def duplicates(n):
    # Sequence of elements with many duplicates
    dup = [n]+[int(n/2) for i in range(n-2)] + [0]
    return QuickSort(dup)

In [132]:
%timeit duplicates(5_930)

286 ms ± 1.41 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [133]:
def duplicates2(n):
    # Sequence of elements with many duplicates
    dup = [0]+[int(n/2) for i in range(n-2)] + [n]
    return QuickSort(dup)

In [134]:
%timeit duplicates2(5_930)

286 ms ± 2.17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


As the name suggests, QuickSort is extremely fast in most cases, perfoming similarly to MergeSort but being faster by a little bit. However, in the case of having multiple duplicates, my algorithm reaches the maximum recusion depth, and so I could only try it with 5,930 elements. Nonetheless, it already performs worse than MergeSort in this regard, and even InsertionSort is faster with `duplicates2`. 

This is likely due to the pivot moving to the edge of the sequence after each sort iteration, meaning that the partitioning is not really dividing the problem up, but more of just breaking off the small end chunk of the sequence. So it would seem that QuickSort performs badly with many duplicates