# COMP20230 Tutorial/Lab: Sorting and Selection

## Sorting
In the lectures we looked as sorting of numbers into ascending order using a variety of algorithms (bubble, insertion, selection, merge, quick). We saw that the data can have a significant impact on the performance of the algorithm and that as a result of the data, the expected and worst-case big-$\mathcal{O}$ were important.


## Selection
Aside from sorting there are other times we may want to consider a total order relation for a list such as rank statistics like the minimum and maximum or median elements. More generally this can be viewed as a selection problem, to select any arbitrary $k^{th}$ element. Sorting can help with this: sort the list and then indexing into at $k=0$  for min, $k=n-1$ for max, $k=[n/2]$ for median.




## Sorting Algorithm Implementations

In [10]:
def bubble_sort(arr):
    arr_size = len(arr)
    for i in range(arr_size):
        for j in range(i, arr_size):
            if arr[i] > arr[j]:
                arr[i], arr[j] = arr[j], arr[i]
    return arr


def insertion_sort(arr):
    arr_size = len(arr)
    for i in range(1, arr_size):
        key = arr[i]
        j = i-1
        while j >= 0 and key < arr[j]:
            arr[j+1] = arr[j]
            j -= 1
        arr[j+1] = key
    return arr


def selection_sort(arr):
    arr_size = len(arr)
    for i in range(arr_size):
        min_index = i
        for j in range(i, arr_size):
            if arr[j] < arr[min_index]:
                min_index = j
        arr[i], arr[min_index] = arr[min_index], arr[i]
    return arr

In [11]:
# Python implementation of Quicksort.

def partition(arr, low, high):
    """Return a pivot element for Quicksort."""
    key = arr[high]
    pivot = low
    for i in range(low, high):
        if arr[i] <= key:
            arr[i], arr[pivot] = arr[pivot], arr[i]
            pivot += 1

    arr[pivot], arr[high] = arr[high], arr[pivot]
    return pivot


def quicksort(arr, low=0, high=None):
    if high is None:
        high = len(arr)-1
    
    if high < low:
        return
    else:
        pivot = partition(arr, low, high)
        quicksort(arr, low, pivot-1)
        quicksort(arr, pivot+1, high)


def quick_sort_simple(data):
    if len(data) > 1:
        less = []
        equal = []
        greater = []
        pivot = data[0]

        for x in data:
            if x < pivot:
                less.append(x)
            elif x == pivot:
                equal.append(x)
            else:
                greater.append(x)

        return quick_sort_simple(less) + equal + quick_sort_simple(greater)

    else:
        return data

In [12]:
# Merge sort implemented in Python

def merge(left, right):
    result = []
    left_idx, right_idx = 0, 0
    while left_idx < len(left) and right_idx < len(right):
        if left[left_idx] <= right[right_idx]:
            result.append(left[left_idx])
            left_idx += 1
        else:
            result.append(right[right_idx])
            right_idx += 1
 
    if left:
        result.extend(left[left_idx:])
    if right:
        result.extend(right[right_idx:])
    return result


def mergesort(arr):
    if len(arr) <= 1:
        return arr
 
    mid = len(arr) // 2
    left = arr[:mid]
    right = arr[mid:]
 
    left = mergesort(left)
    right = mergesort(right)
    return list(merge(left, right))

## Unit Tests

For unit tests, we will create unsorted lists and then use assertEqual to compare the output of each sorting algorithm to the python ``sorted`` function

In [21]:
import unittest

class SortingTest(unittest.TestCase):
    def __init__(self, *args, **kwargs):
        super(SortingTest, self).__init__(*args, **kwargs)
        self.test_arr = [[1], [1, 2], [2, 1], [5, 2, 7, 1, 8], [1, 2, 5, 7, 8],
                [10, 272, 100, -98, 876, 877754, 98124, 0, 1000000, -100]]
        self.test_text_arr = [['cat','dog','bird','aardvark'],['dublin','berlin','Cork']]

    def test_bubble_sort(self):
        """Test bubble sort."""
        for arr in self.test_arr:
            self.assertEqual(sorted(arr), bubble_sort(arr))

    def test_selection_sort(self):
        for arr in self.test_arr:
            self.assertEqual(selection_sort(arr), sorted(arr))

    def test_insertion_sort(self):
        for arr in self.test_arr:
            self.assertEqual(insertion_sort(arr), sorted(arr))

    def test_quick_sort_inplace(self):
        for arr in self.test_arr:
            quicksort(arr)
            self.assertEqual(arr, sorted(arr))
            
    def test_merge_sort(self):
        for arr in self.test_arr:
            self.assertEqual(mergesort(arr), sorted(arr))

    def test_merge_text_sort(self):
        for arr in self.test_text_arr:
            print(mergesort(arr))
            self.assertEqual(mergesort(arr), sorted(arr))

                              
                              
            
if __name__ == '__main__':
    unittest.main(argv=['ignored', '-v'], exit=False)

test_bubble_sort (__main__.SortingTest)
Test bubble sort. ... ok
test_insertion_sort (__main__.SortingTest) ... ok
test_merge_sort (__main__.SortingTest) ... ok
test_merge_text_sort (__main__.SortingTest) ... ok
test_quick_sort_inplace (__main__.SortingTest) ... ok
test_selection_sort (__main__.SortingTest) ... 

['aardvark', 'bird', 'cat', 'dog']
['Cork', 'berlin', 'dublin']


ok

----------------------------------------------------------------------
Ran 6 tests in 0.005s

OK


## Questions

1) Add the improvements to bubblesort that we saw in the lecture notes to limit comparisons of previously sorted elements.

2) Bob has a set $A$ of $n$ nuts and a set $B$ of $n$ bolts, such that each nut in $A$ has a unique matching bolt in $B$. Unfortunately, the nuts in $A$ all look the same, and the bolts in $B$ all look the same as well. The only kind of a comparison that Bob can make is to take a nut-bolt pair $(a, b)$, such that $a$ is in $A$ and $b$ is in $B$, and test it to see if the threads of a are larger, smaller, or a perfect match with the threads of $b$. Describe and analyze an efficient algorithm for Bob to match up all of his nuts and bolts.

3) Why did the animals get sorted but the cities did not? How do we fix this?

4) Read the Python documentation for how the python ``sorted`` function handles text sorting. (https://docs.python.org/3/library/functions.html#sorted)

5) Adapt one of the functions above to handle sorting strings and reversing the order of sort in a similar manner to the python ``sorted`` function.

6) Suppose we are given an $n$-element sequence $S$ such that each element in $S$ represents a different vote for president, where each vote is given as an integer representing a particular candidate, yet the integers may be arbitrarily large (even if the number of candidates is not). Design an $\mathcal{O}(n log n)$- time algorithm to see who wins the election $S$ represents, assuming the candidate with the most votes wins.

## Solutions 

1) Use pseudo code from lecture notes

2) This problem can be solved using a divide-and-conquer approach. First, we choose a random bolt and partition the remaining nuts around it. Then we take the nut that matches the chosen bolt and partition the remaining bolts around it. We can continue doing this until all the nuts and bolts are matched up. In essence, we are doing the randomized quick- sort algorithm. Thus, we have an average running time of $\mathcal{O}(n log n)$.

3)  UTF8 or ASCII have a numeric order. Can perform comparisons based on their order. Capitalisation matters so pre-process all to one or the other if we don’t want it to matter: ``str.upper()``/``str.lower()``

4) Note the ``key=str.lower`` parameter.

6) First sort the sequence S by the candidate’s ID. Then walk through the sorted sequence, storing the current max count and the count of the current candidate ID as you go. When you move on to a new ID, check it against the current max and replace the max if necessary.


