# Sorting - A practical exercise

# Objectives

The aim of this notebook is to help you understand different sorting algorithms and their performance. There is a lot of code in the notebook that you do not need to worry about. The only things you should worry about are the 3 sorting algorithms you will have to implement. These are:
- Bubble sort
- Insertion sort
- Merge sort

Do not worry if you cannot complete these in the lesson. They are purely for you to get a bit more experience with different sorting algorithms. If you are really interested, these notebooks are accessible from home as well.

# Helper functions

Here we define a collection of functions that will be useful for the rest of the exercise. You do not need to worry about these functions.

You'll need to run this cell to get started.

In [None]:
# so our plots get drawn in the notebook
%matplotlib inline
from matplotlib import pyplot as plt
from random import randint
from time import clock
from random import randrange
import numpy as np

# a timer - runs the provided function and reports the
# run time in ms
def time_f(f):
    before = clock()
    f()
    after = clock()
    return after - before

def reject_outliers(data):
    m = 1.5
    u = np.mean(data)
    s = np.std(data)
    filtered = [e for e in data if (u - m * s < e < u + m * s)]
    return filtered

def time_sorts(lower, upper, steps, sorts, sort_labels):
    # Create a list with lists of times for each sorting algorithm
    times = [[] for _ in range(len(sorts) + 1)]
    
    # Loop to a list of size n
    for i in range(lower, upper, steps):
        # Apply each sort 
        for sort_index, sort in enumerate(sorts):
            unavg_time = []
            # Average the time over 100 sorts of the list
            for j in range(100):
                rand_list_temp = [randrange(0, 2000) for _ in range(i)]
                runtime = time_f(lambda: sort(rand_list_temp))
                unavg_time.append(runtime)
            
            # Calculate and save the average runtime (without outliers)
            unavg_time = reject_outliers(unavg_time)
            avg_runtime = sum(unavg_time) / len(unavg_time)
            times[sort_index].append(avg_runtime)
    return times

def benchmark_sorts(sorts, lower, upper, steps):
    # Get list of sort names
    sort_labels = [sort.__name__ for sort in sorts]
    # Calculate sort times
    times = time_sorts(lower, upper, steps, sorts, sort_labels)
    
    # Plot each sorting algorithm with its name
    for index, sort_label in enumerate(sort_labels):
        plt.plot(range(lower, upper, steps), times[index], label=sort_label)
        
    # Add axis labels and legend
    plt.xlabel('n')
    plt.ylabel('time (/s)')
    plt.legend(sort_labels)
    
    plt.show()

def test_sort(sort_func):
    
    n = 100
    x = [randint(0, n) for _ in range(n)]

    sort_name = sort_func.__name__
    if is_sorted(sort_func(x)):
        print(sort_name + " works!")
    else:
        print(sort_name + " has failed.")

def is_sorted(l):
    return all([l[i] <= l[i+1] for i in range(len(l)-1)])


## Task 1: Bubble Sort

In this task you are asked to implement `bubble_sort`. You have been provided with test cases to see if your implementation works. Do not change the function defintion. Make sure you return the sorted array even if you choose to sort in place.

In [None]:
def bubble_sort(a):                    
    return a

Run this test to confirm your implementation is correct. Do not edit the test cases.

In [None]:
test_sort(bubble_sort)

## Task 2: Insertion Sort

In this task you are asked to implement `insertion_sort`. You have been provided with test cases to see if your implementation works. Do not change the function defintion. Make sure you return the sorted array even if you choose to sort in place.

In [None]:
def insertion_sort(a):
    return a

Run this test to confirm your implementation is correct. Do not edit the test cases.

In [None]:
test_sort(insertion_sort)


## Task 3: Merge Sort

In this task you are asked to implement `merge_sort`. You have been provided with test cases to see if your implementation works. Do not change the function defintion. Make sure you return the sorted array even if you choose to sort in place.

In [None]:
def merge_sort(list):
    return merged_list

Use this test to confirm your implementation is correct.

In [None]:
test_sort(merge_sort)

# Analysing the running time performance of bubble sort, insertion_sort and merge_sort

Here we will compare the running times of insertion sort, bubble sort, and merge sort. You do not need to make any modifications to this code. Just run the code below.

In [None]:
benchmark_sorts([bubble_sort, merge_sort, insertion_sort], 1, 10, 1)
benchmark_sorts([bubble_sort, merge_sort, insertion_sort], 1, 120, 1)
benchmark_sorts([bubble_sort, merge_sort, insertion_sort], 1, 1000, 100)

You may notice that insertion sort performs better than merge sort for small lists. This is despite merge sort having a theoretically faster run time! If you want to know why, come speak to me!

# Never write your own sort!

Although we've just seen how you can, it's important to note that you should never write your own sort! This is because the inbuilt sort often makes use of extremely advanced features that make it faster than something we could write.

In [None]:
def inbuilt_sort(a):
    return a.sort()

Run the benchmark below to see the performance difference!

In [None]:
benchmark_sorts([merge_sort, inbuilt_sort], 1, 1000, 100)