# Sorting Algoritms
## Exercise - Insertion Sort, Merge Sort, Quick Sort

We implement three classic sorting algorithms: insertion sort, merge sort, and a non-recursive version of quick sort.

### Insertion Sort

In [1]:
def insertionSort(aList):

    for i in range(1,len(aList)):
        currentValue = aList[i]
        pos = i

        while pos > 0 and aList[pos-1] > currentValue:
            aList[pos] = aList[pos-1]
            pos = pos-1

        aList[pos] = currentValue

### Merge Sort
This is the version seen at lectures without the practical improvements.

In [2]:
def merge(aList, auxList, lo, mid, hi):
    for k in range(lo, hi+1):
        auxList[k] = aList[k]
    i = lo
    j = mid+1

    for k in range(lo, hi+1):
        if i > mid:
            aList[k] = auxList[j]
            j = j+1
        elif j > hi:
            aList[k] = auxList[i]
            i = i+1
        elif auxList[j] < auxList[i]:
            aList[k] = auxList[j]
            j = j+1
        else:
            aList[k] = auxList[i]
            i = i+1

def sort(aList, auxList, lo, hi): 
    if hi <= lo:
        return

    mid = lo + ((hi - lo) // 2)

    sort(aList, auxList, lo, mid)
    sort(aList, auxList, mid+1, hi)
    merge(aList, auxList, lo, mid, hi)
    
def mergeSort(aList):
    auxList = aList.copy()
    sort(aList, auxList, 0, len(aList)-1)


### Quick Sort
This is a non-recursive version of the algorithm seen at lectures. We make use of an auxiliary stack to keep track of the subarrays to be partitioned next (i.e., we temporarily store their low/high indeces on the stack).

In [3]:
def partition(aList, lo, hi): 
    i = lo - 1 
    x = aList[hi] 

    for j in range(lo, hi): 
        if aList[j] <= x: 
            i = i + 1
            aList[i], aList[j] = aList[j], aList[i] 

    aList[i + 1], aList[hi] = aList[hi], aList[i + 1] 
    return i+1 

def quickSortIterative(aList, lo, hi): 

    # create and initialize an auxiliary stack 
    size = hi - lo + 1
    stack = [0] * size 
    top = -1

    # push initial values of the indices lo and hi onto stack 
    top = top + 1
    stack[top] = lo 
    top = top + 1
    stack[top] = hi 

    # pop from stack while not empty 
    while top >= 0: 

        hi = stack[top] 
        top = top - 1
        lo = stack[top] 
        top = top - 1

        p = partition(aList, lo, hi) 

        # if there are elements on the left side of the pivot, 
        # push indices of left sub-array onto stack 
        if p-1 > lo: 
            top = top + 1
            stack[top] = lo 
            top = top + 1
            stack[top] = p - 1

        # if there are elements on the right side of the pivot, 
        # push indices of right sub-array onto stack 
        if p + 1 < hi: 
            top = top + 1
            stack[top] = p + 1
            top = top + 1
            stack[top] = hi 

In [4]:
import random

# driver code to test insertion sort 
testList = [random.randint(0, 100) for _ in range(20)] 
print ("Unsorted array:", testList) 
insertionSort(testList)
print ("Sorted array (insertion sort):", testList) 

# driver code to test merge sort 
testList = [random.randint(0, 100) for _ in range(20)] 
print ("\nUnsorted array:", testList) 
mergeSort(testList)
print ("Sorted array (merge sort):", testList) 

# driver code to test iterative quick sort 
testList = [random.randint(0, 100) for _ in range(20)] 
print ("\nUnsorted array:", testList) 
quickSortIterative(testList, 0, len(testList) -1)
print ("Sorted array (quick sort):", testList) 

Unsorted array: [43, 88, 67, 81, 39, 12, 94, 97, 42, 77, 31, 41, 64, 68, 69, 13, 14, 37, 83, 47]
Sorted array (insertion sort): [12, 13, 14, 31, 37, 39, 41, 42, 43, 47, 64, 67, 68, 69, 77, 81, 83, 88, 94, 97]

Unsorted array: [54, 55, 68, 95, 22, 82, 59, 9, 44, 38, 71, 33, 75, 75, 8, 1, 43, 77, 40, 66]
Sorted array (merge sort): [1, 8, 9, 22, 33, 38, 40, 43, 44, 54, 55, 59, 66, 68, 71, 75, 75, 77, 82, 95]

Unsorted array: [39, 86, 49, 28, 80, 6, 52, 14, 5, 33, 91, 8, 83, 31, 62, 61, 15, 72, 87, 59]
Sorted array (quick sort): [5, 6, 8, 14, 15, 28, 31, 33, 39, 49, 52, 59, 61, 62, 72, 80, 83, 86, 87, 91]


## Exercise – Bubble Sort & Bucket Sort
We implement two sorting algorithms not covered at lectures: Bubble Sort (https://en.wikipedia.org/wiki/Bubble_sort) and Bucket Sort (https://en.wikipedia.org/wiki/Bucket_sort).

### Bubble Sort
Bubble Sort is often considered the most inefficient sorting algorithm, performing a lot of "wasted" exchange operations before knowing the final position of an item. However, because Bubble Sort scans the entire unsorted portion of the list each time, it can terminate early if during a pass there are no exchanges.

In [5]:
def bubbleSort(aList):
    
    hasExchanged = True
    passes = len(aList)-1
    
    while passes > 0 and hasExchanged:
        hasExchanged = False
        for i in range(passes):
            if aList[i] > aList[i+1]:
                hasExchanged = True
                aList[i], aList[i+1] = aList[i+1], aList[i]
        passes = passes - 1

In [6]:
import random

# driver code to test insertion sort 
testList = [random.randint(0, 100) for _ in range(20)] 


# driver code to test bubble sort 
testList = [75,11,26,30,3,76,7,6,83,51,4,3,27,12,49]
print ("Unsorted array:", testList) 
bubbleSort(testList)
print ("Sorted array (bubble sort):", testList) 

Unsorted array: [75, 11, 26, 30, 3, 76, 7, 6, 83, 51, 4, 3, 27, 12, 49]
Sorted array (bubble sort): [3, 3, 4, 6, 7, 11, 12, 26, 27, 30, 49, 51, 75, 76, 83]


### Bucket Sort
Bucket Sort is an example of a distribution sort algorithm, where the data to be sorted is distributed into multiple intermediate structures, which are then sorted before being gathered back and placed on the output. Note: distribution sort algorithms can be implemented as distributed algorithms, taking advantage of different processors, allowing external sorting of data that is too large to fit into a single computer's memory.

In [7]:
def bucketSort(aList): 
    bucketList = [] 
    numBuckets = 5 
    maxValue = max(aList) + 1
    
    for i in range(numBuckets): 
        bucketList.append([]) 
          
    for j in range(len(aList)): 
        bucketIndex = numBuckets * aList[j] // maxValue
        bucketList[bucketIndex].append(aList[j]) 
    
    # distribute the problem  
    for i in range(numBuckets): 
        insertionSort(bucketList[i]) 
    
    indices = [0] * numBuckets
    
    # concatenate the result 
    k = 0
    for i in range(numBuckets): 
        for j in range(len(bucketList[i])): 
            aList[k] = bucketList[i][j] 
            k += 1

In [8]:
# driver code to test bucket sort 
import random

#testList = [75,11,26,30,3,76,7,6,83,51,4,3,27,12,49]
testList = [random.randint(0, 1000) for i in range(30)] 


print ("Unsorted array:", testList) 
bucketSort(testList)
print ("Sorted array (bucket sort):", testList) 

Unsorted array: [761, 358, 429, 932, 730, 227, 412, 822, 880, 958, 250, 89, 565, 393, 353, 453, 707, 955, 611, 156, 590, 119, 992, 381, 186, 314, 778, 196, 293, 663]
Sorted array (bucket sort): [89, 119, 156, 186, 196, 227, 250, 293, 314, 353, 358, 381, 393, 412, 429, 453, 565, 590, 611, 663, 707, 730, 761, 778, 822, 880, 932, 955, 958, 992]


## Exercise – Performance comparison of sorting algorithms
We now compare the performance of the 5 sorting algorithms implemented above as a function of the input size. To measure how long it takes to run each algortihm, we use the Python <tt>timeit</tt> package. To generate the input, we use the Python <tt>random</tt> package.

In [9]:
import timeit
import random

# input size 
size = [1000, 2000, 3000, 4000, 5000, 10000, 15000]

# 2D array where we store run-time performance of the 5 algorithms we are comparing, as we vary the input size
times = [[0 for j in range(len(size))] for i in range(5)]

# average case - random numbers between min_value and max_value to sort
min_value = 0
max_value = 50000

print("Average case scenario (random numbers to sort)")
print("Input size:", size,"\n")


# eval of insertion sort
for i in range(len(size)):
    unsortedListOfInts = [random.randint(min_value, max_value) for i in range(size[i])] 
    starttime = timeit.default_timer()
    insertionSort(unsortedListOfInts)
    endtime = timeit.default_timer()
    times[0][i] = round(endtime-starttime,3)

print("Insertion Sort:", times[0])
      
# eval of merge sort
for i in range(len(size)):
    unsortedListOfInts = [random.randint(min_value, max_value) for i in range(size[i])] 
    starttime = timeit.default_timer()
    mergeSort(unsortedListOfInts)
    endtime = timeit.default_timer()
    times[1][i] = round(endtime-starttime,3)

print("Merge Sort:", times[1])

# eval of quick sort
for i in range(len(size)):
    unsortedListOfInts = [random.randint(min_value, max_value) for i in range(size[i])] 
    starttime = timeit.default_timer()
    quickSortIterative(unsortedListOfInts, 0, size[i]-1)
    endtime = timeit.default_timer()
    times[2][i] = round(endtime-starttime,3)

print("Quick Sort:", times[2])

# eval of Bubble Sort
for i in range(len(size)):
    unsortedListOfInts = [random.randint(min_value, max_value) for i in range(size[i])] 
    starttime = timeit.default_timer()
    bubbleSort(unsortedListOfInts)
    endtime = timeit.default_timer()
    times[3][i] = round(endtime-starttime,3)

print("Bubble Sort:", times[3])

# eval of Bucket Sort
for i in range(len(size)):
    unsortedListOfInts = [random.randint(min_value, max_value) for i in range(size[i])] 
    starttime = timeit.default_timer()
    bucketSort(unsortedListOfInts)
    endtime = timeit.default_timer()
    times[4][i] = round(endtime-starttime,3)

print("Bucket Sort:", times[4])



Average case scenario (random numbers to sort)
Input size: [1000, 2000, 3000, 4000, 5000, 10000, 15000] 

Insertion Sort: [0.016, 0.068, 0.147, 0.258, 0.407, 1.591, 3.568]
Merge Sort: [0.001, 0.003, 0.004, 0.006, 0.007, 0.015, 0.024]
Quick Sort: [0.001, 0.001, 0.002, 0.003, 0.004, 0.009, 0.014]
Bubble Sort: [0.027, 0.116, 0.263, 0.536, 0.765, 3.073, 7.095]
Bucket Sort: [0.003, 0.011, 0.026, 0.046, 0.074, 0.328, 0.68]


We now compare the same 5 algortihms, but on an edge case of the input being already sorted. Note how this represents the best-case scenario for some algorithms (what?), the worst-case scenario for other algorithms (what?), and exactly the same as the average case scenario for others still (what?).

In [10]:
import timeit

size = [1000, 2000, 3000, 4000, 5000, 10000, 15000]
times = [[0 for j in range(len(size))] for i in range(5)]

print("Edge case scenario (numbers already sorted)")
print("Input size:", size, "\n")

# eval of insertion sort
for i in range(len(size)):
    unsortedListOfInts = [i for i in range(size[i])] 
    starttime = timeit.default_timer()
    insertionSort(unsortedListOfInts)
    endtime = timeit.default_timer()
    times[0][i] = round(endtime-starttime,3)

print("Insertion Sort:", times[0])
      
# eval of merge sort
for i in range(len(size)):
    unsortedListOfInts = [i for i in range(size[i])] 
    starttime = timeit.default_timer()
    mergeSort(unsortedListOfInts)
    endtime = timeit.default_timer()
    times[1][i] = round(endtime-starttime,3)

print("Merge Sort:", times[1])

# eval of quick sort
for i in range(len(size)):
    unsortedListOfInts = [i for i in range(size[i])] 
    starttime = timeit.default_timer()
    quickSortIterative(unsortedListOfInts, 0, size[i]-1)
    endtime = timeit.default_timer()
    times[2][i] = round(endtime-starttime,3)

print("Quick Sort:", times[2])

# eval of Bubble Sort
for i in range(len(size)):
    unsortedListOfInts = [i for i in range(size[i])] 
    starttime = timeit.default_timer()
    bubbleSort(unsortedListOfInts)
    endtime = timeit.default_timer()
    times[3][i] = round(endtime-starttime,3)

print("Bubble Sort:", times[3])

# eval of Bucket Sort
for i in range(len(size)):
    unsortedListOfInts = [i for i in range(size[i])] 
    starttime = timeit.default_timer()
    bucketSort(unsortedListOfInts)
    endtime = timeit.default_timer()
    times[4][i] = round(endtime-starttime,3)

print("Bucket Sort:", times[4])

Edge case scenario (numbers already sorted)
Input size: [1000, 2000, 3000, 4000, 5000, 10000, 15000] 

Insertion Sort: [0.0, 0.0, 0.0, 0.0, 0.0, 0.001, 0.001]
Merge Sort: [0.001, 0.002, 0.003, 0.005, 0.006, 0.013, 0.02]
Quick Sort: [0.025, 0.1, 0.229, 0.425, 0.662, 2.702, 5.973]
Bubble Sort: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.001]
Bucket Sort: [0.0, 0.0, 0.0, 0.001, 0.001, 0.001, 0.002]


Last scenario we test: input in reverse order.

In [11]:
import timeit

size = [1000, 2000, 3000, 4000, 5000, 10000, 15000]
times = [[0 for j in range(len(size))] for i in range(5)]

print("Edge case scenario (numbers in reverse order)")
print("Input size:", size, "\n")

# eval of insertion sort
for i in range(len(size)):
    unsortedListOfInts = [i for i in range(size[i],0,-1)] 
    starttime = timeit.default_timer()
    insertionSort(unsortedListOfInts)
    endtime = timeit.default_timer()
    times[0][i] = round(endtime-starttime,3)

print("Insertion Sort:", times[0])
      
# eval of merge sort
for i in range(len(size)):
    unsortedListOfInts = [i for i in range(size[i],0,-1)] 
    starttime = timeit.default_timer()
    mergeSort(unsortedListOfInts)
    endtime = timeit.default_timer()
    times[1][i] = round(endtime-starttime,3)

print("Merge Sort:", times[1])

# eval of quick sort
for i in range(len(size)):
    unsortedListOfInts = [i for i in range(size[i],0,-1)] 
    starttime = timeit.default_timer()
    quickSortIterative(unsortedListOfInts, 0, size[i]-1)
    endtime = timeit.default_timer()
    times[2][i] = round(endtime-starttime,3)

print("Quick Sort:", times[2])

# eval of Bubble Sort
for i in range(len(size)):
    unsortedListOfInts = [i for i in range(size[i],0,-1)] 
    starttime = timeit.default_timer()
    bubbleSort(unsortedListOfInts)
    endtime = timeit.default_timer()
    times[3][i] = round(endtime-starttime,3)

print("Bubble Sort:", times[3])

# eval of Bucket Sort
for i in range(len(size)):
    unsortedListOfInts = [i for i in range(size[i],0,-1)] 
    starttime = timeit.default_timer()
    bucketSort(unsortedListOfInts)
    endtime = timeit.default_timer()
    times[4][i] = round(endtime-starttime,3)

print("Bucket Sort:", times[4])

Edge case scenario (numbers in reverse order)
Input size: [1000, 2000, 3000, 4000, 5000, 10000, 15000] 

Insertion Sort: [0.028, 0.118, 0.272, 0.492, 0.766, 3.078, 6.94]
Merge Sort: [0.001, 0.002, 0.004, 0.005, 0.006, 0.013, 0.022]
Quick Sort: [0.019, 0.079, 0.17, 0.299, 0.497, 1.886, 4.162]
Bubble Sort: [0.033, 0.137, 0.354, 0.585, 0.967, 3.828, 8.683]
Bucket Sort: [0.005, 0.02, 0.049, 0.09, 0.142, 0.589, 1.33]
