# Project: Benchmarking Sorting Algorithms

## Introduction

Sorting is simply arranging data in ascending or descending order and arranging data in a sequence which makes searching easier.Humans recognise the importance of searching quickly with plenty of real world problems requiring the ability to search quickly e.g. a search for telephone numbers in telephone directory would be much slower if the data was kept unordered and unsorted, but fortunately the concept of sorting came into existence, making it easier for everyone to arrange data in an order. [1]

Space complexity is the amount of memory used by the algorithm (including the input values to the algorithm) to execute and produce the result. While executing an algorithm uses memory space for three reasons: 1. Instruction Space - the amount of memory used to save the compiled version of instructions; 2. Environmental Stack - Sometimes an algorithm(function) may be called inside another algorithm(function), in that situation the current variables are pushed onto the system stack, where they wait for further execution and then the call to the inside algorithm(function) is made. 3. Data Space - is the amount of space used by the variables and constants.

Time complexity of an algorithm signifies the total time required by the program to run until completion and is most commonly expressed using the big O notation. Time complexity is estimated by counting the number of elementary steps performed by any algorithm to finish execution, and since the algorithm's performance may vary with different types of input data for an algorithm we usually use the worst-case Time complexity of an algorithm because that is the maximum time taken for any input size.

The most common metric for calculating time complexity is Big O notation. This removes all constant factors so that the running time can be estimated in relation to N, as N approaches infinity. Big O(expression) is the set of functions that grow slower than or at the same rate as expression. It indicates the maximum time required by an algorithm for all input values and represents the worst case of an algorithm's time complexity.

The Omega(expression) is the set of functions that grow faster than or at the same rate as expression. It indicates the minimum time required by an algorithm for all input values. It represents the best case of an algorithm's time complexity.

Finally Theta(expression) consist of all the functions that lie in both O(expression) and Omega(expression). It indicates the average bound of an algorithm. It represents the average case of an algorithm's time complexity.

The different sorting algorithms all highlight how algorithm design has such a strong effect on program complexity, speed, and efficiency. The two main criteria used in the performance analysis of an alogrithim are time taken to sort the given data, and memory used to in order to sort. [1]

Sorting can be classified in two types on the basis of extra space used for sorting: In place sorting does not use any extra space for sorting purpose e.g. quick sort ,merge sort ,bubble sort; Out place sorting needs extra space for sorting purposes e.g. Merge sort. [2] 

A stable sorting algorithm maintains the relative order of the items with equal sort keys e.g. Insertion sort. An unstable sorting algorithm does not e.g. Selection sort. In other words, when a collection is sorted with a stable sorting algorithm, items with the same sort keys preserve their order after the collection is sorted. [3}

The QuickSort algorithm can be used to sort an array, and utilisises the comparator function which can take two arguments and contains logic to decide their relative order in a sorted output. The idea is to provide flexibility so that Quicksort can be used for any input type and to obtain any desired order (increasing, decreasing or any other).

A comparison-based sort algorithm uses comparisons between keys (part of a group of data by which it is sorted) to arrange items in a predetermined order and cannot perform better than O(n log n) in the average or worst case e.g. Quicksort.
With a non-comparison-based sort algorithm, the keys are not compared rather it sorts the keys by sorting individual digits which may share the same position e.g. bucket sort. [4] 




## Sorting Algorithms

### 1. A simple comparison-based sort: Selection Sort

![title](Selection.png)

The selection sort improves on the bubble sort by making only one exchange for every pass through the list. In order to do this, a selection sort looks for the largest value as it makes a pass and, after completing the pass, places it in the proper location. [5].<br/>

Then it will find the second largest element and swap it with the element in the correct position, and it will keep on doing this until the entire array is sorted. The selection sort typically executes faster than a bubble sort because of the reduction in the number of exchanges.<br/>

The selection sort algorithm below uses two nested *for* loops to complete itself, one in the function 'selectionsort'; inside the first loop we make a call to another function 'positionOfmax' which has the second *for* loop. For a given input size *n*  the Space complexity for a selection sort algorithm is 0(1) meaning the amount of memory used by the algorithm does not depend on the size of the input, it should use the same amout of memeory for all inputs. The Time complexity for a selection sort algorithm is O(n<sup>2</sup>) or Quadratic time where time execution is proportional to the square of the input size so Selection sort is good for sorting arrays of small size but can become inefficient as imput size increases.


In [12]:
# reference: http://interactivepython.org/runestone/static/pythonds/SortSearch/TheSelectionSort.html

def selectionSort(alist):
   for fillslot in range(len(alist)-1,0,-1):
       positionOfMax=0
       for location in range(1,fillslot+1):
           if alist[location]>alist[positionOfMax]:
               positionOfMax = location

       temp = alist[fillslot]
       alist[fillslot] = alist[positionOfMax]
       alist[positionOfMax] = temp
    
# unsorted array given 
alist = [125,23,99,7,77,31,444,55,200]
selectionSort(alist)
print("The Sorted Array is")
print(alist)

The Sorted Array is
[7, 23, 31, 55, 77, 99, 125, 200, 444]


### 2. An efficient comparison-based: Merge Sort

![title](Merge.png)

The Merge Sort follows the rule of Divide and Conquer to sort a given set of data, it divides the input array into two halves and then merges the two sorted halves using the *merge() function*.

Merging is the process of taking two smaller sorted lists and combining them together into a single and sorted new list. In sorting n objects, merge sort has an average and worst-case performance of O(n log n). It's best, worst and average cases are very similar, making it a good choice for predictable running behaviour.[6] 

We know that we can divide a list in half *log n* times where *n* is the length of the list. The second process is the merge where ach item in the list will eventually be processed and placed on the sorted list. So the merge operation which results in a list of size *n* requires *n* operations. The result of this analysis is that *log n* splits, each of which costs n for a total of *n log n* operations. The *mergeSort* function requires extra space to hold the two halves as they are extracted with the slicing operations. This additional space can be a critical factor if the list is large and can make this sort problematic when working on large data sets.[7]



In [10]:
# ref. http://interactivepython.org/runestone/static/pythonds/SortSearch/TheMergeSort.html

def mergeSort(alist):
    print("Splitting ",alist)
    if len(alist)>1:
        mid = len(alist)//2
        lefthalf = alist[:mid]
        righthalf = alist[mid:]

        mergeSort(lefthalf)
        mergeSort(righthalf)

        i=0
        j=0
        k=0
        while i < len(lefthalf) and j < len(righthalf):
            if lefthalf[i] < righthalf[j]:
                alist[k]=lefthalf[i]
                i=i+1
            else:
                alist[k]=righthalf[j]
                j=j+1
            k=k+1

        while i < len(lefthalf):
            alist[k]=lefthalf[i]
            i=i+1
            k=k+1

        while j < len(righthalf):
            alist[k]=righthalf[j]
            j=j+1
            k=k+1
    print("Merging ",alist)
    
# unsorted array given 
alist = [54,26,93,17,77,31,44,55,20]
mergeSort(alist)
print("The Final Sorted Array is")
print(alist)


Splitting  [54, 26, 93, 17, 77, 31, 44, 55, 20]
Splitting  [54, 26, 93, 17]
Splitting  [54, 26]
Splitting  [54]
Merging  [54]
Splitting  [26]
Merging  [26]
Merging  [26, 54]
Splitting  [93, 17]
Splitting  [93]
Merging  [93]
Splitting  [17]
Merging  [17]
Merging  [17, 93]
Merging  [17, 26, 54, 93]
Splitting  [77, 31, 44, 55, 20]
Splitting  [77, 31]
Splitting  [77]
Merging  [77]
Splitting  [31]
Merging  [31]
Merging  [31, 77]
Splitting  [44, 55, 20]
Splitting  [44]
Merging  [44]
Splitting  [55, 20]
Splitting  [55]
Merging  [55]
Splitting  [20]
Merging  [20]
Merging  [20, 55]
Merging  [20, 44, 55]
Merging  [20, 31, 44, 55, 77]
Merging  [17, 20, 26, 31, 44, 54, 55, 77, 93]
The Final Sorted Array is
[17, 20, 26, 31, 44, 54, 55, 77, 93]


### 3. A non-comparison sort: Counting sort

![title](counting.png)


Counting Sort is a stable non-comparison sorting technique based on keys between a specific range. It works by iterating through the input array, counting the number of times each item occurs, and using those counts to compute an items index in the final srted array. 

The algorithm assumes an input of size n where each item has a non-negative integer key with a reange of k. It is only suitable for direct se where the varuiation in keys is not significantly greater than the number of items. Counting sort is linear and assumes an integer k so it runs in O(n + k) in all cases. Counting sort is used to sort objects according to the keys that are small numbers. It counts the number of keys whose key values are same. 

The Worst, Average and Best case performances are O(n + k) with Space complexity of O(n + k). This sorting technique is effective when the difference between different keys are not so big, otherwise it can increase the space complexity. [8]

[13]

In [100]:
# Code ref.  https://www.geeksforgeeks.org/counting-sort/

# The main function that sort the given string arr[] in  
# alphabetical order 
def countSort(arr): 
  
    # The output character array that will have sorted arr 
    output = [0 for i in range(256)] 
  
    # Create a count array to store count of inidividul 
    # characters and initialize count array as 0 
    count = [0 for i in range(256)] 
  
    # For storing the resulting answer since the  
    # string is immutable 
    ans = ["" for _ in arr] 
  
    # Store count of each character 
    for i in arr: 
        count[ord(i)] += 1
  
    # Change count[i] so that count[i] now contains actual 
    # position of this character in output array 
    for i in range(256): 
        count[i] += count[i-1] 
  
    # Build the output character array 
    for i in range(len(arr)): 
        output[count[ord(arr[i])]-1] = arr[i] 
        count[ord(arr[i])] -= 1
  
    # Copy the output array to arr, so that arr now 
    # contains sorted characters 
    for i in range(len(arr)): 
        ans[i] = output[i] 
    return ans  

# Driver program to test above function 
arr = "geeksforgeeks"
ans = countSort(arr) 
print ("The sorted character array is:")
print (ans)

The sorted character array is:
['e', 'e', 'e', 'e', 'f', 'g', 'g', 'k', 'k', 'o', 'r', 's', 's']


### 4.Quick Sort

![title](Quicksort.png)

The quick sort uses divide and conquer to gain the same advantages as the merge sort, while not using additional storage. As a trade-off, however, it is possible that the list may not be divided in half. When this happens, we will see that performance is diminished.[9].

Quicksort is commonly used sorting algorithm due to its efficiency. It works by first performing a Pivot selection: Picking an element called a “pivot” from the array; then Partioning (or reorder)of the array elements with values < the pivot coming before it; and all elements with values ≥ than the pivot come after it. After this partioining the pivot is in its final position.[10]

Quick sort partitions an array and then calls itself recursively twice to sort the two resulting subarrays. This algorithm is quite efficient for large-sized data sets as its average and worst case complexity are of Ο(n2), where n is the number of items.

The time taken by QuickSort depends upon the input array and partition strategy. The worst case occurs when the partition process always picks the greatest or smallest element as the pivot, where the last element is always picked as pivot, the worst case would occur when the array is already sorted in increasing or decreasing order which would be O(n<sup>2</sup>). The best case occurs when the partition process always picks the middle element as pivot which would be *O(nLogn)*.  To get an idea of average case we can consider the case when partition puts *O(n/9)* elements in one set and *O(9n/10)* elements in other set with recurrence of *O(nLogn)*. 
Eventhough the worst case time complexity of QuickSort is more than many other sorting algorithms like Merge Sort, QuickSort is faster in practice because its inner loop can be efficiently implemented with most real-world data. QuickSort can be implemented in different ways by changing the choice of pivot, so that the worst case rarely occurs for a given type of data. However, merge sort is generally considered better when data is huge and stored in external storage. [11]


In [4]:
# ref: https://www.geeksforgeeks.org/quick-sort/

# This function takes last element as pivot, places 
# the pivot element at its correct position in sorted 
# array, and places all smaller (smaller than pivot) 
# to left of pivot and all greater elements to right 
# of pivot 
def partition(arr,low,high): 
    i = ( low-1 )         # index of smaller element 
    pivot = arr[high]     # pivot 
  
    for j in range(low , high): 
  
        # If current element is smaller than or 
        # equal to pivot 
        if   arr[j] <= pivot: 
          
            # increment index of smaller element 
            i = i+1 
            arr[i],arr[j] = arr[j],arr[i] 
  
    arr[i+1],arr[high] = arr[high],arr[i+1] 
    return ( i+1 ) 
  
# The main function that implements QuickSort 
# arr[] --> Array to be sorted, 
# low  --> Starting index, 
# high  --> Ending index 
  
# Function to do Quick sort 
def quickSort(arr,low,high): 
    if low < high: 
  
        # pi is partitioning index, arr[p] is now 
        # at right place 
        pi = partition(arr,low,high) 
  
        # Separately sort elements before 
        # partition and after partition 
        quickSort(arr, low, pi-1) 
        quickSort(arr, pi+1, high) 

# unsorted array given
arr = [10, 7, 8, 9, 1, 5, 19, 15, 33, 4] 
n = len(arr) 
quickSort(arr,0,n-1) 
print ("The Sorted array is:") 
for i in range(n): 
    print ("%d" %arr[i]), 

The Sorted array is:
1
4
5
7
8
9
10
15
19
33


### 5. Bubble sort

![title](Bubble.png)

Bubble Sort is the simplest sorting algorithm that works by repeatedly swapping the adjacent elements if they are in wrong order, starting at the first two elements and making multiple passes through an array or list until no more swaps are required.

This algorithm is not suitable for large data sets as its average and worst case complexity are of O(n<sup>2</sup>) and not efficient since it must exchange items before the final location is known. These “wasted” exchange operations are very costly. But because the bubble sort makes passes through the entire unsorted portion of the list, it has the capability to do something most sorting algorithms cannot do. If during a pass there are no exchanges, then we know that the list must be sorted. A bubble sort can be modified to stop early if it finds that the list has become sorted. This means that for lists that require just a few passes, a bubble sort may have an advantage in that it will recognize the sorted list and stop. [12]


In [5]:
# ref. https://www.geeksforgeeks.org/bubble-sort/

def bubbleSort(arr): 
    n = len(arr) 
  
    # Traverse through all array elements 
    for i in range(n): 
  
        # Last i elements are already in place 
        for j in range(0, n-i-1): 
  
            # traverse the array from 0 to n-i-1 
            # Swap if the element found is greater 
            # than the next element 
            if arr[j] > arr[j+1] : 
                arr[j], arr[j+1] = arr[j+1], arr[j] 

# unsorted array given
arr = [19, 44, 64, 34, 25, 12, 22, 11, 90, 65, 3] 
bubbleSort(arr) 
print ("The Sorted array is:") 
for i in range(len(arr)): 
    print ("%d" %arr[i]),  

The Sorted array is:
3
11
12
19
22
25
34
44
64
65
90


## Implementation and Benchmarking

The purpose of benchmarking is to establish how fast my code executes and identefy any bottlenecks are and to weak any parts of the code that are slowing it down. 

To benchmark my five sorting algorithms I will use arrays of randomly generated integers with different input sizes n; i.e. n=10,n=100,n=500,…,n=10,000 etc as per project suggestion. The running time (in milliseconds) for each algorithm will be measured 10 times, and the average of the 10 runs for each algorithm and each input size n will be output to the console when the program finishes executing. 

Just be aware that 𝑂(𝑛2) algorithms such as Bubble Sort may take a long time to run when using large values of n! 


In [39]:
# import Python Time module
# get current time
# returns the current system time in ticks since 12:00am, January 1, 1970(The UNIX epoch).

import time
import datetime
print(datetime.datetime.now())
ticks = time.time()
print ("Number of ticks since 12:00am, January 1, 1970:", ticks)


2019-05-03 10:20:45.557059
Number of ticks since 12:00am, January 1, 1970: 1556875245.5580626


#### First I generate the arrays with random numbers using randint from the python's random library:

In [42]:
# ref. Project.pdf

# importing Pythons random module
from random import *

# the function random_array() takes as input a value n and returns an array of n randomly generated integers 
# with values between 0 and 99
def random_array(n):
    # create an array variable
    array = []
    # if n = 5, 0,1,2,3,4
    for i in range(0, n, 1):
        # add to the array random integers between 0 and 100
        array.append(randint(0,100))
    return array


# assign the random array to alist
alist1 = random_array(100)
alist2 = random_array(250)
alist3 = random_array(500)
alist4 = random_array(750)
alist5 = random_array(1000)
alist6 = random_array(1250)
alist7 = random_array(2500)
alist8 = random_array(3750)
alist9 = random_array(5000)
alist10 = random_array(6250)
alist11 = random_array(7500)
alist12 = random_array(8750)
alist13 = random_array(10000)

# to show an example of one of the randonly generated arrays on the screen, is this case with 250 elements
print(alist2)



[3, 62, 59, 66, 20, 100, 77, 72, 0, 30, 28, 54, 37, 35, 67, 77, 5, 52, 45, 72, 31, 96, 35, 34, 21, 71, 72, 70, 14, 42, 52, 78, 100, 64, 27, 75, 11, 97, 53, 66, 58, 46, 29, 79, 16, 88, 29, 19, 88, 78, 57, 63, 26, 89, 100, 76, 47, 85, 83, 72, 13, 5, 2, 13, 12, 84, 66, 4, 82, 25, 57, 30, 15, 90, 72, 38, 65, 41, 66, 43, 53, 11, 90, 14, 66, 94, 27, 35, 29, 35, 39, 71, 35, 39, 94, 37, 39, 42, 16, 35, 32, 31, 2, 32, 0, 77, 75, 88, 46, 21, 24, 39, 52, 75, 99, 44, 57, 65, 53, 21, 95, 5, 40, 75, 67, 4, 73, 50, 14, 17, 97, 88, 57, 41, 15, 58, 98, 35, 70, 51, 39, 66, 56, 73, 79, 65, 26, 11, 12, 51, 19, 44, 5, 50, 97, 86, 42, 60, 93, 38, 29, 9, 5, 18, 85, 64, 6, 29, 27, 66, 77, 3, 89, 28, 35, 81, 72, 6, 27, 38, 24, 56, 66, 14, 97, 95, 75, 82, 28, 47, 15, 33, 24, 2, 97, 26, 97, 75, 19, 32, 46, 27, 42, 22, 73, 8, 35, 43, 61, 91, 52, 45, 20, 98, 67, 7, 15, 36, 53, 87, 0, 63, 60, 100, 66, 3, 28, 74, 61, 37, 60, 30, 35, 79, 50, 42, 64, 69, 11, 62, 92, 66, 33, 85, 99, 65, 50, 87, 83, 17]


### Now I benchmark each of my sorting algorithms by running them 10 times and get the average of the 10 runs using Python's Time module:




#### 1. Selection Sort

In [50]:
# ref. Project.pdf
# importing the random numbers
from random import *

# code source as above
def selectionSort(alist):
   for fillslot in range(len(alist)-1,0,-1):
       positionOfMax=0
       for location in range(1,fillslot+1):
           if alist[location]>alist[positionOfMax]:
               positionOfMax = location

       temp = alist[fillslot]
       alist[fillslot] = alist[positionOfMax]
       alist[positionOfMax] = temp

# import time module
import time

# benchmark Selection Sort function
global selection_avglist
selection_avglist = []
num_runs = 10
results = []

def benchmark_selection():
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## selectionsort
        selectionSort(alist1)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    selection_avglist.append(average)


    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## selectionsort
        selectionSort(alist2)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    selection_avglist.append(average)


    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## selectionsort
        selectionSort(alist3)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    selection_avglist.append(average)

    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## selectionsort
        selectionSort(alist4)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    selection_avglist.append(average)

    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## selectionsort
        selectionSort(alist5)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    selection_avglist.append(average)
    
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## selectionsort
        selectionSort(alist6)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    selection_avglist.append(average)
    
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## selectionsort
        selectionSort(alist7)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    selection_avglist.append(average)
    
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## selectionsort
        selectionSort(alist8)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    selection_avglist.append(average)
    
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## selectionsort
        selectionSort(alist9)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    selection_avglist.append(average)
    
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## selectionsort
        selectionSort(alist10)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    selection_avglist.append(average)
    
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## selectionsort
        selectionSort(alist11)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    selection_avglist.append(average)
    
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## selectionsort
        selectionSort(alist12)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    selection_avglist.append(average)
    
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## selectionsort
        selectionSort(alist13)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    selection_avglist.append(average)

   
    return selection_avglist

benchmark_selection()
print(selection_avglist)

[0.0009013175964355469, 0.005908846855163574, 0.020702481269836426, 0.05762684345245361, 0.12395658493041992, 0.22079236507415773, 0.6114988088607788, 1.4840110778808593, 3.067515754699707, 5.652431392669678, 9.188903570175171, 14.455015230178834, 21.27658154964447]


#### 2. Merge sort

In [56]:
# ref. Project.pdf
# importing the random numbers
from random import *

# code source as above
def mergeSort(alist):
    #print("Splitting ",alist)
    if len(alist)>1:
        mid = len(alist)//2
        lefthalf = alist[:mid]
        righthalf = alist[mid:]

        mergeSort(lefthalf)
        mergeSort(righthalf)

        i=0
        j=0
        k=0
        while i < len(lefthalf) and j < len(righthalf):
            if lefthalf[i] < righthalf[j]:
                alist[k]=lefthalf[i]
                i=i+1
            else:
                alist[k]=righthalf[j]
                j=j+1
            k=k+1

        while i < len(lefthalf):
            alist[k]=lefthalf[i]
            i=i+1
            k=k+1

        while j < len(righthalf):
            alist[k]=righthalf[j]
            j=j+1
            k=k+1

# import time module
import time

# benchmark Merge Sort function
global merge_avglist
merge_avglist = []
num_runs = 10
results = []

def benchmark_merge():
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## mergesort
        mergeSort(alist1)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    merge_avglist.append(average)


    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## mergesort
        mergeSort(alist2)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    merge_avglist.append(average)


    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## mergesort
        mergeSort(alist3)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    merge_avglist.append(average)

    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## mergesort
        mergeSort(alist4)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    merge_avglist.append(average)

    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## mergesort
        mergeSort(alist5)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    merge_avglist.append(average)
    
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## mergesort
        mergeSort(alist6)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    merge_avglist.append(average)
    
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## mergesort
        mergeSort(alist7)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    merge_avglist.append(average)
    
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## mergesort
        mergeSort(alist8)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    merge_avglist.append(average)
    
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## mergesort
        mergeSort(alist9)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    merge_avglist.append(average)
    
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## mergesort
        mergeSort(alist10)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    merge_avglist.append(average)
    
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## mergesort
        mergeSort(alist11)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    merge_avglist.append(average)
    
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## mergesort
        mergeSort(alist12)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    merge_avglist.append(average)
    
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## mergesort
        mergeSort(alist13)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    merge_avglist.append(average)

   
    return merge_avglist

benchmark_merge()
print(merge_avglist)

[0.0005516290664672851, 0.002153944969177246, 0.005559611320495606, 0.011171269416809081, 0.01788167953491211, 0.027199482917785643, 0.046754074096679685, 0.07538654804229736, 0.11759648323059083, 0.17061128616333007, 0.23168983459472656, 0.3061146974563599, 0.3938493251800537]


#### 3. Counting sort

#### 4. Quick sort:

In [62]:
# importing the random numbers
from random import *

# http://interactivepython.org/runestone/static/pythonds/SortSearch/TheQuickSort.html
def quickSort(alist):
   quickSortHelper(alist,0,len(alist)-1)

def quickSortHelper(alist,first,last):
   if first<last:

       splitpoint = partition(alist,first,last)

       quickSortHelper(alist,first,splitpoint-1)
       quickSortHelper(alist,splitpoint+1,last)


def partition(alist,first,last):
   pivotvalue = alist[first]

   leftmark = first+1
   rightmark = last

   done = False
   while not done:

       while leftmark <= rightmark and alist[leftmark] <= pivotvalue:
           leftmark = leftmark + 1

       while alist[rightmark] >= pivotvalue and rightmark >= leftmark:
           rightmark = rightmark -1

       if rightmark < leftmark:
           done = True
       else:
           temp = alist[leftmark]
           alist[leftmark] = alist[rightmark]
           alist[rightmark] = temp

   temp = alist[first]
   alist[first] = alist[rightmark]
   alist[rightmark] = temp


   return rightmark

# import time module
import time

num_runs = 10
results = []
quicksort_avglist = []

def benchmark_quick():
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## quicksort
        quickSort(alist)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
        
    b = sum(results)
    average = (b/num_runs)
    quicksort_avglist.append(average)

    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## quicksort
        quickSort(alist1)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
        
    b = sum(results)
    average = (b/num_runs)
    quicksort_avglist.append(average)

    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## quicksort
        quickSort(alist2)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
        
    b = sum(results)
    average = (b/num_runs)
    quicksort_avglist.append(average)

    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## quicksort
        quickSort(alist3)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
        
    b = sum(results)
    average = (b/num_runs)
    quicksort_avglist.append(average)

    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## quicksort
        quickSort(alist4)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
        
    b = sum(results)
    average = (b/num_runs)
    quicksort_avglist.append(average)

    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## quicksort
        quickSort(alist5)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
        
    b = sum(results)
    average = (b/num_runs)
    quicksort_avglist.append(average)

    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## quicksort
        quickSort(alist6)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
        
    b = sum(results)
    average = (b/num_runs)
    quicksort_avglist.append(average)

    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## quicksort
        quickSort(alist7)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
        
    b = sum(results)
    average = (b/num_runs)
    quicksort_avglist.append(average)

    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## quicksort
        quickSort(alist8)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
        
    b = sum(results)
    average = (b/num_runs)
    quicksort_avglist.append(average)

    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## quicksort
        quickSort(alist9)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
        
    b = sum(results)
    average = (b/num_runs)
    quicksort_avglist.append(average)

    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## quicksort
        quickSort(alist10)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
        
    b = sum(results)
    average = (b/num_runs)
    quicksort_avglist.append(average)

    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## quicksort
        quickSort(alist11)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
        
    b = sum(results)
    average = (b/num_runs)
    quicksort_avglist.append(average)

    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## quicksort
        quickSort(alist12)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
        
    b = sum(results)
    average = (b/num_runs)
    quicksort_avglist.append(average)

    print(quicksort_avglist)

benchmark_quick()



[0.0007011175155639648, 0.0016026258468627929, 0.004206657409667969, 0.01062014102935791, 0.019650077819824217, 0.0316868782043457, 0.047838592529296876, 0.08152742385864258, 0.14024753570556642, 0.21969728469848632, 0.3245381355285645, 0.46246905326843263, 0.6331435918807984]


#### 5. Bubble Sort

In [69]:
# ref. Project.pdf
# importing the random numbers
from random import *

# code source as above
def bubbleSort(alist):
    for passnum in range(len(alist)-1,0,-1):
        for i in range(passnum):
            if alist[i]>alist[i+1]:
                temp = alist[i]
                alist[i] = alist[i+1]
                alist[i+1] = temp

# import time module
import time

# benchmark bubble function
global bubble_avglist
bubble_avglist = []
num_runs = 10
results = []


def benchmark_bubble():
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## bubblesort
        bubbleSort(alist1)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    bubble_avglist.append(average)


    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## bubblesort
        bubbleSort(alist2)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    bubble_avglist.append(average)


    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## bubblesort
        bubbleSort(alist3)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    bubble_avglist.append(average)

    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## bubblesort
        bubbleSort(alist4)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    bubble_avglist.append(average)

    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## bubblesort
        bubbleSort(alist5)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    bubble_avglist.append(average)
    
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## bubblesort
        bubbleSort(alist6)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    bubble_avglist.append(average)
    
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## bubblesort
        bubbleSort(alist7)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    bubble_avglist.append(average)
    
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## bubblesort
        bubbleSort(alist8)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    bubble_avglist.append(average)
    
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## bubblesort
        bubbleSort(alist9)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    bubble_avglist.append(average)
    
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## bubblesort
        bubbleSort(alist10)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    bubble_avglist.append(average)
    
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## bubblesort
        bubbleSort(alist11)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    bubble_avglist.append(average)
    
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## bubblesort
        bubbleSort(alist12)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    bubble_avglist.append(average)
    
    for r in range(num_runs):
        # start timer
        start_time = time.time()
        ######## bubblesort
        bubbleSort(alist13)
        end_time = time.time()
        time_elapsed= end_time - start_time
        results.append(time_elapsed)
    b = sum(results)
    average = (b/num_runs)
    bubble_avglist.append(average)

   
    return bubble_avglist

benchmark_bubble()
print(bubble_avglist)

[0.0009013175964355469, 0.006108999252319336, 0.02562565803527832, 0.06907339096069336, 0.15039641857147218, 0.2734180212020874, 0.7826057195663452, 1.9269664764404297, 4.025764203071594, 7.446999144554138, 12.540170359611512, 19.09768795967102, 27.328396725654603]


### Using the average runtime data generated from the benchmarking functions I use the pandas library to create a table to display the data:

In [74]:
# ref. https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm
    
import pandas as pd
import numpy as np

df = pd.DataFrame(columns = ['Size','Selection sort', 'Merge sort', 'Quick sort', 'Bubble sort'])

df['Size'] = [100, 250, 500, 750, 1000, 1250, 2500, 3570, 5000, 6250, 7500, 8750, 10000]

df['Selection sort'] = selection_avglist
df['Merge sort'] = merge_avglist
df['Quick sort'] = quicksort_avglist
df['Bubble sort'] = bubble_avglist

df


Unnamed: 0,Size,Selection sort,Merge sort,Quick sort,Bubble sort
0,100,0.000901,0.000552,0.000701,0.000901
1,250,0.005909,0.002154,0.001603,0.006109
2,500,0.020702,0.00556,0.004207,0.025626
3,750,0.057627,0.011171,0.01062,0.069073
4,1000,0.123957,0.017882,0.01965,0.150396
5,1250,0.220792,0.027199,0.031687,0.273418
6,2500,0.611499,0.046754,0.047839,0.782606
7,3570,1.484011,0.075387,0.081527,1.926966
8,5000,3.067516,0.117596,0.140248,4.025764
9,6250,5.652431,0.170611,0.219697,7.446999


## Conclusion

Criteria for choosing a sorting algorithm:



Criteria 	Sorting algorithm
Small number of items to be sorted 	Insertion Sort
Items are mostly sorted already 	Insertion Sort
Concerned about worst-case scenarios 	Heap Sort
Interested in a good average-case behaviour 	Quicksort
Items are drawn from a uniform dense universe 	Bucket Sort
Desire to write as little code as possible 	Insertion Sort
Stable sorting required 	Merge Sort



As we have seen, there are many different sorting algorithms, each of which has it own specific strengths and weaknesses.

Comparison-based sorts are the most widely applicable; but are limited to n log n running time in the best case

Non-Comparison sorts can achieve linear n running time in the best case, but are less flexible

Hybrid sorting algorithms allow us to leverage the strengths of two or more algorithms (e.g. Timsort = Merge sort + insertion sort)

There is no single algorithm which is best for all input instances; therefore it is important to use what you know about the expected input when choosing an algorithm.



## References:
1. https://www.studytonight.com/data-structures/introduction-to-sorting
2. https://www.quora.com/What-are-in-place-and-out-of-place-sorting-algorithms
3. https://medium.freecodecamp.org/stability-in-sorting-algorithms-a-treatment-of-equality-fa3140a5a539
4. https://en.wikipedia.org/wiki/Sorting_algorithm#Stability
5. http://interactivepython.org/runestone/static/pythonds/SortSearch/TheSelectionSort.html
6. Source Lecture Notes: P.Mannion (2019) Computational Thinking with Algorithms - Week 10: Sorting Algorithms Part 3, GMIT.
7. http://interactivepython.org/runestone/static/pythonds/SortSearch/TheMergeSort.html
8. https://www.tutorialspoint.com/Counting-Sort
9. http://interactivepython.org/runestone/static/pythonds/SortSearch/TheQuickSort.html
10. Source Lecture Notes: P.Mannion (2019) Computational Thinking with Algorithms - Week 10: Sorting Algorithms Part 3, GMIT.
11. https://www.geeksforgeeks.org/quick-sort/
12. http://interactivepython.org/runestone/static/pythonds/SortSearch/TheBubbleSort.html
13. https://www.geeksforgeeks.org/counting-sort/


