# Sorting Algorithms

# 1. Bubble Sort (basic comparison-based sort)

Source : https://www.geeksforgeeks.org/bubble-sort/

Bubble Sort is the simplest sorting algorithm that works by repeatedly swapping the adjacent elements if they are in wrong order. 

**Working**

  1. It starts at the beginning of the dataset and compares the first two elements, and if the first element is greater than the second, then it will swap them.
  2. It will continue to repeat this process until no more swaps are required.

**Performance**

Bubble sort has a worst and average case time complexity of O(n*n), n being the number of elements to be sorted. The worst case cocurs when the dataset is sorted in reverse order.
When the dataset is already sorted (best-case), the time complexity is only O(n). It is not practical and is very inefficient and hence is rarely used in real world scenarios.

# <font color = "dark cyan">Bubble Sort
Source : https://www.w3resource.com/php-exercises/searching-and-sorting-algorithm/searching-and-sorting-algorithm-exercise-6.php

![title](bubble-sort.png)

In [8]:
#code sourced from https://www.javatpoint.com/bubble-sort-in-python
# Creating a bubble sort function  
def bubbleSort(list1):  
    # Outer loop to traverse the entire list  
    for i in range(0,len(list1)-1):  
        for j in range(len(list1)-1):  
            if(list1[j]>list1[j+1]):  
                temp = list1[j]  
                list1[j] = list1[j+1]  
                list1[j+1] = temp  
  
list1 = [5, 1, 4, 2, 8]  
print("The unsorted list is: ", list1)  
# Calling the bubble sort function
bubbleSort(list1)
print("The sorted list is  : ", list1)  

The unsorted list is:  [5, 1, 4, 2, 8]
The sorted list is  :  [1, 2, 4, 5, 8]


# 2. Merge Sort (efficient comparison-based sort)

Source : https://www.geeksforgeeks.org/merge-sort/

Merge sort is a recursive divide and conquer algorithm. It recursively divides the input array into two halves, calls itself for the two halves, and then merges the two sorted halves. 

**Working**
  1. It starts by breaking down the array into subarrays until each of these subarrays contains only one element.
  2. Repeatedly merges the subarrays to create new sorted subarrays until finally there is only one subarray left, i.e. the sorted array.

  **Performance**


Time-complexity of Merge Sort in best, worst and average case is O(nlogn), n being the number of elements to be sorted in the dataset. As its time complexity is similar in all 3 cases, that makes it a good choice for predictable running behaviour.

# <font color = "dark cyan">Merge Sort
Source : https://www.w3resource.com/python-exercises/data-structures-and-algorithms/python-search-and-sorting-exercise-8.php

![title](merge_sort.png)

In [17]:
# code sourced from http://interactivepython.org/runestone/static/pythonds/SortSearch/TheMergeSort.html
def mergeSort(alist):
    if len(alist)>1:
        mid = len(alist)//2
        lefthalf = alist[:mid]
        righthalf = alist[mid:]

        mergeSort(lefthalf)
        mergeSort(righthalf)

        i=0
        j=0
        k=0
        while i < len(lefthalf) and j < len(righthalf):
            if lefthalf[i] < righthalf[j]:
                alist[k]=lefthalf[i]
                i=i+1
            else:
                alist[k]=righthalf[j]
                j=j+1
            k=k+1

        while i < len(lefthalf):
            alist[k]=lefthalf[i]
            i=i+1
            k=k+1

        while j < len(righthalf):
            alist[k]=righthalf[j]
            j=j+1
            k=k+1

alist = [54,26,93,17,77,31,44,55,20]
mergeSort(alist)
print("Sorted list ", alist)

Sorted list  [17, 20, 26, 31, 44, 54, 55, 77, 93]


# 3. Quick Sort

Source : https://www.geeksforgeeks.org/quick-sort/

It is a recursive divide and conquer algorithm which is used very commonly for sorting because of its efficiency. 

**Working**
  1. Pivot selection : Different ways to pick an element from the array called "pivot".
  2. Partitioning : Given an array and an element x of array as pivot, put x at its correct position in sorted array and put all smaller elements (smaller than x) before x, and put all greater elements (greater than x) after x. 
  3. Recursion : Recursively apply both the above steps to both the subarrays.

**Performance**
Time complexity of Quick sort in best case scenario is O(nlogn), in worst case is O(n*n) and average case is O(nlogn). 

# <font color = "dark cyan">Quick Sort
![title](quicksort.png)

In [10]:
#Code sourced from https://www.programiz.com/dsa/quick-sort
#function to find partition
def partition(array, low, high):
  # choose the rightmost element as pivot
  pivot = array[high]
  # pointer for greater element
  i = low - 1

  # traverse through all elements and compare each element with pivot
  for j in range(low, high):
    if array[j] <= pivot:
      # if element smaller than pivot is found swap it with the greater element pointed by i
      i = i + 1
      # swapping element at i with element at j
      (array[i], array[j]) = (array[j], array[i])

  # swap the pivot element with the greater element specified by i
  (array[i + 1], array[high]) = (array[high], array[i + 1])
  # return the position from where partition is done
  return i + 1

# function to perform quicksort
def quickSort(array, low, high):
  if low < high:
    # find pivot element such that element smaller than pivot are on the left and element greater than pivot are on the right
    pi = partition(array, low, high)
    # recursive call on the left of pivot
    quickSort(array, low, pi - 1)
    # recursive call on the right of pivot
    quickSort(array, pi + 1, high)

data = [8, 7, 2, 1, 0, 9, 6]
print("Unsorted Array: ", data)
size = len(data)
quickSort(data, 0, size - 1)
print('Sorted Array: ', data)


Unsorted Array:  [8, 7, 2, 1, 0, 9, 6]
Sorted Array:  [0, 1, 2, 6, 7, 8, 9]


# 4. Counting Sort

It sorts the elements of an array by counting the number of occurences of each unique element in the array. The count is stored in an auxiliary array and the sorting is done by mapping the count as an index of the auxiliary array.
Time complexity is O(n+k) in all three (best, worst and average) scenarios.

# <font color = "dark cyan">Counting Sort
![title](Countingsort.webp)

In [32]:
#code sourced from https://www.programiz.com/dsa/counting-sort
import sys
def countSort(array):
    size = len(array)
    max = -sys.maxsize - 1
    for i in range(0, size):
      if array[i] > max :
        max = array[i]
    maxval = max + 1
    count = [0]*maxval
    for a in array:
        count[a] += 1             # count occurences
    i = 0
    for a in range(maxval):            # emit
        for c in range(count[a]): # - emit 'count[a]' copies of 'a'
            array[i] = a
            i += 1
    

data = [4, 2, 2, 8, 3, 3, 1,19]
countSort(data)
print("Sorted Array in Ascending Order: ", data)

Sorted Array in Ascending Order:  [1, 2, 2, 3, 3, 4, 8, 19]


# 5. Insertion Sort

Source : https://www.programiz.com/dsa/insertion-sort

Insertion sort is a simple sorting algorithm that works similar to the way you sort playing cards in your hands. The array is virtually split into a sorted and an unsorted part. Values from the unsorted part are picked and placed at the correct position in the sorted part.

**Working**
  1. The first element in the array is assumed to be sorted. Take the second element and store it separately in key variable.
  2. Now compare these two elements and swap them accordingly. Now take the third element. Compare it with elements on the left of it. Place it just behind the element smaller than it, else place it in the front.
  3. Repeat this process to place every element at its correct position.

**Performance**

Its time complexity is O(n) for the best case whereas it is O(n*n) for worst and average case scenario. 

# <font color = "dark cyan">Insertion Sort

Source : https://media.geeksforgeeks.org/wp-content/uploads/insertionsort.png

![title](insertionsort.png)

In [12]:
#code sourced from https://www.programiz.com/dsa/insertion-sort
def insertionSort(array):
    for step in range(1, len(array)):
        key = array[step]
        j = step - 1
        # Compare key with each element on the left of it until an element smaller than it is found
        # For descending order, change key<array[j] to key>array[j].        
        while j >= 0 and key < array[j]:
            array[j + 1] = array[j]
            j = j - 1
        # Place key at after the element just smaller than it.
        array[j + 1] = key


data = [4, 3, 2, 10, 12, 1, 5, 6]
insertionSort(data)
print('Sorted Array in Ascending Order: ', data)

Sorted Array in Ascending Order:  [1, 2, 3, 4, 5, 6, 10, 12]


# <font color = "dark green">Implementation

So far, we have defined 5 sorting algorithms, namely:
1. Bubble Sort
2. Merge Sort
3. Quick Sort
4. Counting Sort
5. Insertion Sort

Now we will create a array of random numbers using randint function in the python library. 
(https://docs.python.org/2/library/random.html)

In [13]:
#code sourced from the aforementioned documentation
# Creating an array using randint
from random import *
# creating a random array, function takes in n numbers
def random_array(n):
    # create an array variable
    array = []
    # if n = 5, 0,1,2,3,4
    for i in range(0, n, 1):
        # add to the array random integers between 0 and 100
        array.append(randint(0,100))
    return array

# assign the random array to alist
alist = random_array(100)
alist1 = random_array(250)
alist2 = random_array(500)
alist3 = random_array(750)
alist4 = random_array(1000)
alist5 = random_array(1500)
alist6 = random_array(2000)
alist7 = random_array(3000)
alist8 = random_array(5000)
alist9 = random_array(6000)
alist10 = random_array(7500)
alist11 = random_array(8500)
alist12 = random_array(10000)

# <font color = "dark green"> Benchmarking Multiple Statistical Runs

1. Benchmark Bubble Sort

In [6]:
import time
global bubbleSort_avgList
bubbleSort_avgList = [] #empty initially
runs = 10 #number of runs
results = []
#first run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## bubblesort
    bubbleSort(alist)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
bubbleSort_avgList.append(average)

#second run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## bubblesort
    bubbleSort(alist1)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
bubbleSort_avgList.append(average)

#third run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## bubblesort
    bubbleSort(alist2)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
bubbleSort_avgList.append(average)

#fourth run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## bubblesort
    bubbleSort(alist3)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
bubbleSort_avgList.append(average)

#fifth run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## bubblesort
    bubbleSort(alist4)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
bubbleSort_avgList.append(average)

#sixth run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## bubblesort
    bubbleSort(alist5)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
bubbleSort_avgList.append(average)

#seventh run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## bubblesort
    bubbleSort(alist6)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
bubbleSort_avgList.append(average)

#eight run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## bubblesort
    bubbleSort(alist7)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
bubbleSort_avgList.append(average)

#ninth run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## bubblesort
    bubbleSort(alist8)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
bubbleSort_avgList.append(average)

#tenth runn
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## bubblesort
    bubbleSort(alist9)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
bubbleSort_avgList.append(average)

#eleventh run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## bubblesort
    bubbleSort(alist10)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
bubbleSort_avgList.append(average)

#twelfth run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## bubblesort
    bubbleSort(alist11)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
bubbleSort_avgList.append(average)

#thirteenth run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## bubblesort
    bubbleSort(alist12)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
bubbleSort_avgList.append(average)

[0.001, 0.008, 0.036, 0.104, 0.224, 0.499, 0.99, 2.113, 5.207, 9.667, 16.673, 25.978, 25.978]


2. Benchmark Merge Sort

In [18]:
import time
global mergeSort_avgList
mergeSort_avgList = [] #empty initially
runs = 10 #number of runs
results = []
#first run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## mergesort
    mergeSort(alist)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
mergeSort_avgList.append(average)

#second run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## mergesort
    mergeSort(alist1)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
mergeSort_avgList.append(average)

#third run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## mergesort
    mergeSort(alist2)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
mergeSort_avgList.append(average)

#fourth run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## mergesort
    mergeSort(alist3)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
mergeSort_avgList.append(average)

#fifth run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## mergesort
    mergeSort(alist4)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
mergeSort_avgList.append(average)

#sixth run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## mergesort
    mergeSort(alist5)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
mergeSort_avgList.append(average)

#seventh run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## mergesort
    mergeSort(alist6)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
mergeSort_avgList.append(average)

#eight run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## mergesort
    mergeSort(alist7)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
mergeSort_avgList.append(average)

#ninth run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## mergesort
    mergeSort(alist8)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
mergeSort_avgList.append(average)

#tenth runn
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## mergesort
    mergeSort(alist9)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
mergeSort_avgList.append(average)

#eleventh run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## mergesort
    mergeSort(alist10)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
mergeSort_avgList.append(average)

#twelfth run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## mergesort
    mergeSort(alist11)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
mergeSort_avgList.append(average)

#thirteenth run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## mergesort
    mergeSort(alist12)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
mergeSort_avgList.append(average)    
print(mergeSort_avgList)


[0.0, 0.001, 0.003, 0.006, 0.01, 0.017, 0.026, 0.04, 0.066, 0.097, 0.137, 0.184, 0.237]


3. Benchmark Quick Sort

RecursionError: ignored

4. Benchmark Counting Sort

In [34]:
from collections import Counter
import time
global countSort_avgList
countSort_avgList = [] #empty initially
runs = 10 #number of runs
results = []
#first run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## countsort
    countSort(alist)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
countSort_avgList.append(average)

#second run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## countsort
    countSort(alist1)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
countSort_avgList.append(average)

#third run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## countsort
    countSort(alist2)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
countSort_avgList.append(average)

#fourth run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## countsort
    countSort(alist3)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
countSort_avgList.append(average)

#fifth run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## countsort
    countSort(alist4)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
countSort_avgList.append(average)

#sixth run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## countsort
    countSort(alist5)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
countSort_avgList.append(average)

#seventh run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## countsort
    countSort(alist6)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
countSort_avgList.append(average)

#eight run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## countsort
    countSort(alist7)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
countSort_avgList.append(average)

#ninth run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## countsort
    countSort(alist8)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
countSort_avgList.append(average)

#tenth runn
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## countsort
    countSort(alist9)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
countSort_avgList.append(average)

#eleventh run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## countsort
    countSort(alist10)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
countSort_avgList.append(average)

#twelfth run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## countsort
    countSort(alist11)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
countSort_avgList.append(average)

#thirteenth run
for r in range(runs):
    # start timer
    start_time = time.time()
    ######## countsort
    countSort(alist12)
    end_time = time.time()
    time_elapsed= end_time - start_time
    results.append(time_elapsed)
b = sum(results)
average = (b/runs)
# round to 3 decimals
average = round(average, 3)
countSort_avgList.append(average)
print(countSort_avgList)

[0.0, 0.0, 0.0, 0.001, 0.001, 0.002, 0.003, 0.005, 0.007, 0.01, 0.015, 0.019, 0.024]


5. Benchmark Insertion Sort