### Learning Objectives
* To be able to sort a unsorted data efficiently.
* To distinguish between the different kinds of sorting algorithm.

### Instructions
Read and study the following sections, run their code examples and solve their challenges. This worksheet has the following challenges:
* [CHALLENGE 01](#ch01)
* [CHALLENGE 02](#ch02)
* [CHALLENGE 03](#ch03)
* [CHALLENGE 04](#ch04)

Run your coding challenges and fix any errors they might have before downloading and submitting your completed worksheet for grading. When done, open the menu **File >> Download as >> HTML (.html)** to download your worksheet in HTML format. **Submit the downloaded *.html* file via Canvas**.

# Sorting
In this worksheet, we study the different ways of sorting datasets; that is arranging data elements in an ascending or descending order. We will cover six sorting algorithms all of which are comparison-based. These algorithms are **bubble sort, selection sort, insertion sort, merge sort, quick sort**, and **heap sort**. The first three algorithms (bubble, selection, and insertion) are often described as elementary, the last three algorithms (merge, quick, and heap) are considered fast.

We will start by generating a vector of random integers to sort. Keep in mind that these algorithms are applicable to more that integers.

First we declare some needed variables and a function for printing datasets.

In [1]:
#include <iostream>
#include <iomanip>
#include <string>
#include <random>
#include <vector>
#include <cstdlib>

const int MAX_SIZE = 13;
const int MIN_NUM = 101;
const int MAX_NUM = 299;


In [2]:
template<typename T>
void printData(std::vector<T>& data){
    for(int i = 0; i < data.size(); i++){
        if(i != 0) std::cout << ", ";
        std::cout << data[i];
    }
    std::cout << std::endl;
}

Now we generate the vector and print its values.

In [3]:
std::default_random_engine en;
std::uniform_int_distribution<> dist{MIN_NUM, MAX_NUM};

std::vector<int> original_data;
for(int i = 0; i < MAX_SIZE; i++){
    original_data.push_back(dist(en));
}

std::cout << "Original: "; printData(original_data);

Original: 243, 170, 225, 135, 181, 189, 295, 227, 135, 211, 271, 177, 269


We are now ready for our first elementary algorithm: bubble sort.

## Bubble sort
Bubble sort works by swapping elements repeatedly until the whole dataset is sorted. With a dataset such as

```
243, 170, 225, 135
```

we have two loops: an outer loop with index `i` and inner loop with index `j`. In the beginning both `i` and `j` are set to `0`. In the first outer loop iteration, bubble sort works by checking if 170 (j + 1) < 243 (j) and if so swapping them. This leads to:

```
170, 243, 225, 135
```
and `j` advances to `1`. It then checks if 225(j + 1) < 243 (j) and if so swaps them. This leads to: 

```
170, 225, 243, 135
```

Again `j` advances to `2` and it checks if 135 (j + 1) < 243 (j) and if so swaps them, which leads to:

```
170, 225, 135, 243
```

This is the end of the first outer loop iteration, and, as you can see, the largest element `243` is in its correct place. 

After this `i` advances to `1` and `j` changes back to `0`. At the end of the second outer loop iteration, the second largest element will be in its correct place, and so on.

Here is how this algorithm is implemented.

In [4]:
template<typename T>
void bubbleSort(std::vector<T>& data) {
    for(int i = 0; i < MAX_SIZE - 1; i++){
        for(int j = 0; j < MAX_SIZE - i - 1; j++){
            if(data[j + 1] < data[j]){
                std::swap(data[j + 1], data[j]);
            }
        }
    }
}

Let's test this algorithm by first copying the original vector into a new one named `data` and then sorting it using `bubbleSort`.

In [5]:
std::vector<int> data = original_data; // copy original data to data

std::cout << "Before: "; printData(data);
bubbleSort(data);
std::cout << " After: "; printData(data);

Before: 243, 170, 225, 135, 181, 189, 295, 227, 135, 211, 271, 177, 269
 After: 135, 135, 170, 177, 181, 189, 211, 225, 227, 243, 269, 271, 295


Having these two outer loops means that in the worst case, the bubble sort takes $O(n^2)$ time to run.

## Selection sort
Selection sort works by finding the smallest element and swapping it with the first element in the dataset. Then it finds the second smallest element and swaps it with the second element in the dataset. The algorithm goes on like this until all the data elements are sorted. Here is how this algorithm is implemented.

In [6]:
template<typename T>
void selectionSort(std::vector<T>& data) {
    for(int i = 0; i < MAX_SIZE - 1; i++){
        // Find the minimum element  
        int min_index = i;
        for(int j = i + 1; j < MAX_SIZE; j++){
            if(data[j] < data[min_index]){
                min_index = j;
            }
        }
        
        std::swap(data[min_index], data[i]);
    }
}

Let's test this algorithm by resetting the `data` vector to the original one before sorting it using `selectionSort`.

In [7]:
data = original_data; // reset data

std::cout << "Before: "; printData(data);
selectionSort(data);
std::cout << " After: "; printData(data);

Before: 243, 170, 225, 135, 181, 189, 295, 227, 135, 211, 271, 177, 269
 After: 135, 135, 170, 177, 181, 189, 211, 225, 227, 243, 269, 271, 295


### <a id="ch01">CHALLENGE 01</a>
**Q1**. What is the running time in Big-O notation of the above `selectionSort` function?

**Q2**. Complete the function bellow so as to make the vector sorted in descending order.

In [8]:
template<typename T>
void reversedSelectionSort(std::vector<T>& data) {
    //TODO
}

## Insertion sort
Insertion sort works in the way we typically sort cards in our hands. This algorithm uses two nested loops. The outer loop is controlled by the index `i` which starts at `1`. The inner loop use the index `j` which starts at the same value as `i`. The `i` index increments by one with every outer loop iteration, and the `j` index decrements by one with every inner loop iteration. 

The insertion sort works by dividing the dataset into two parts: sorted and unsorted. 
- The sorted part starts with the element at index `0` until but not equal to the element at index `i`. That is the interval $[0 - i)$. 
- The unsorted part contains the elements of the interval $[i - N)$.

With every outer loop iteration, the sorted part grows by one more element while the unsorted part shrinks by one element as well.

Here is how this algorithm is implemented.

In [9]:
template<typename T>
void insertionSort(std::vector<T>& data) {
    for(int i = 1; i < MAX_SIZE; i++){
        for(int j = i; j > 0 && data[j] < data[j - 1]; j--){
            std::swap(data[j], data[j-1]);
        }
    }
}

Let's test this algorithm by resetting the `data` vector to the original one before sorting it using `insertionSort`.

In [10]:
data = original_data; // reset data

std::cout << "Before: "; printData(data);
insertionSort(data);
std::cout << " After: "; printData(data);

Before: 243, 170, 225, 135, 181, 189, 295, 227, 135, 211, 271, 177, 269
 After: 135, 135, 170, 177, 181, 189, 211, 225, 227, 243, 269, 271, 295


### <a id="ch02">CHALLENGE 02</a>
**Q1**. What is the running time in Big-O notation of the above `insertionSort` function?

**Q2**. A sorting algorithm is said to be **stable** if two elements with equal values appear in the same order in the sorted vector as they appear in the unsorted vector. Knowing that the bubble sort algorithm about is **stable** and that the selection sort is **not stable**, is the insertion sort algorithm above stable?

# Fast sorting algorithms
## Merge sort
The merge sort is a **divide-and-conquer** algorithm that works by recursively dividing the dataset in half sorting each half before merging the sorted halves. For example the dataset:

```
243, 170, 225, 135, 181, 189, 295, 227
```
is divided into `243, 170, 225, 135` and`181, 189, 295, 227`. Each half is sorted recursively using merge sort before they are merged to produce the final sorted dataset. Here is how division in half works for the above example:

```
            243, 170, 225, 135, 181, 189, 295, 227                  --> Tree root (unsorted)
                              |
             +----------------+----------------+
             |                                 |
     243, 170, 225, 135                181, 189, 295, 227
             |                                 |
      +--------------+                +--------+-------+
      |              |                |                |
  243, 170        225, 135         181, 189         295, 227
     |               |                |                |
 +---+----+      +---+----+       +---+----+       +---+----+
 |        |      |        |       |        |       |        |
243      170    225      135     181      189     295      227      --> Tree leaves
```
which looks like a tree. As we can see, the subsets keep being divided in half until there is only one element per set.

The actual sorting happens at the merging step, which starts at the leaves of the tree and move up to the root. Here is how the example above is merged.

```
243      170    225      135     181      189     295      227     --> Tree leaves
 |        |      |        |       |        |       |        |
 +---+----+      +---+----+       +---+----+       +---+----+
     |               |                |                |
  170, 243        135, 225         181, 189         227, 295  
     |               |                |                |  
     +-------+-------+                +--------+-------+
             |                                 |
     135, 170, 225, 243                181, 189, 227, 295
             |                                 |     
             +----------------+----------------+
                              |
            135, 170, 181, 189, 225, 227, 243, 295                  --> Tree root (sorted)
```

Here is how this algorithm is implemented.

In [11]:
template<typename T>
void merge(std::vector<T>& data, std::vector<T>& aux, int lo, int mid, int hi){
  int i = lo;
  int j = mid + 1;

  for(int k = lo; k <= hi; k++){
    if(i > mid) aux[k] = data[j++];
    else if(j > hi) aux[k] = data[i++];
    else if(data[i] < data[j]){
      aux[k] = data[i++];
    }else{
      aux[k] = data[j++];
    }
  }

  // copying aux back to data
  for(int i = lo; i <= hi; i++){
    data[i] = aux[i];
  }
}

template<typename T>
void mergeSort(std::vector<T>& data, std::vector<T>& aux, int lo, int hi){
  if (hi <= lo) return;            // The base case

  int mid = lo + (hi - lo)/2;      // Divide in half
  mergeSort(data, aux, lo, mid);   // Sort first half
  mergeSort(data, aux, mid+1, hi); // Sort second half
    
  merge(data, aux, lo, mid, hi);
}

template<typename T>
void mergeSort(std::vector<T>& data) {
  std::vector<T> aux(data.size());
  mergeSort(data, aux, 0, data.size()-1);
}

Let's test this algorithm by resetting the `data` vector to the original one before sorting it using `mergeSort`.

In [12]:
data = original_data; // reset data

std::cout << "Before: "; printData(data);
mergeSort(data);
std::cout << " After: "; printData(data);

Before: 243, 170, 225, 135, 181, 189, 295, 227, 135, 211, 271, 177, 269
 After: 135, 135, 170, 177, 181, 189, 211, 225, 227, 243, 269, 271, 295


Analyzing the running time of merge sort is interesting. It depends on the height (number of division levels) in the tree. In the example above, we have a vector of $8$ elements and the divide-and-conquer tree has $3$ levels. That means for a vector of size $n$, the height of the tree will be $log(n)$. And because each level of the tree involves merging $n$ elements, the overall running time of the merge sort is $O(n log(n))$. 

## Quick sort
As the name implies, quick sort is one of the fastest sort algorithms in practice. Like merge sort, quick sort is a divide and Conquer algorithm. It picks an element as a pivot and partitions the dataset into two parts:
- one part containing all the elements less than the pivot and
- another containing the rest of the elements; that is those that are greater than or equal to the pivot.

Every partition is then recursively sorted using quick sort. 

Given a pivot element $p$, partitioning puts $p$ at its correct sorted position, puts all the elements less than it before it, and puts all the elements greater than or equal to it after it.

Partitioning is a key step in quick sort; it is where the actual sorting happens. There are many strategies to picking a pivot to partition the dataset around:
* First element is always the pivot
* Last element is always the pivot (implemented below)
* Randomly pick an element to be the pivot. This can improve performance; we will implement it in class.

Let's see an example in action. Given the dataset:

```
243, 170, 225, 135, 181, 189, 295, 227
```
Here is how quick sort recursively partitions it. The pivot elements used for partitioning and sub-partitioning are placed in parentheses.

```
                 243, 170, 225, 135, 181, 189, 295, (227)
                                        |
                +-----------------------+-------------+
                |                       |             |
        170, 225, 135, 181, (189)       |            295, (243)
                |                       |               |
         +---------------+-------+      |      +--------+-------+
         |               |       |      |      |                |
     170, 135, (181)     |       |      |      |                |
             |           |       |      |      |                |
    +------+-------+     |       |      |      |                |
    |              |     |       |      |      |                |
   170, (135)      |     |       |      |      |                |
      |            |     |       |      |      |                |
 +----+----+       |     |       |      |      |                |
 |         |       |     |       |      |      |                |
135       170     181   189     225    227    243              295

```

Here is how this algorithm is implemented.

In [13]:
template<typename T>
int partition(std::vector<T>& data, int lo, int hi){
    // Defines i which is used to calculate where p's position is.
    // In the begining i is right before the first element
    int i = lo - 1;
    for(int j = lo; j < hi; j++){
        if(data[j] < data[hi]){
            std::swap(data[++i], data[j]);
        }
    }
    // Please pivot in its correct position.
    std::swap(data[i + 1], data[hi]);
    
    return i + 1;
}

template<typename T>
void quickSort(std::vector<T>& data, int lo, int hi){
  if(lo < hi){
      int m = partition(data, lo, hi);
      quickSort(data, lo, m - 1);
      quickSort(data, m + 1, hi);
  }
}

template<typename T>
void quickSort(std::vector<T>& data){
    quickSort(data, 0, data.size() - 1);
}

Let's test this algorithm by resetting the `data` vector to the original one before sorting it using `quickSort`.

In [14]:
data = original_data; // reset data

std::cout << "Before: "; printData(data);
quickSort(data);
std::cout << " After: "; printData(data);

Before: 243, 170, 225, 135, 181, 189, 295, 227, 135, 211, 271, 177, 269
 After: 135, 135, 170, 177, 181, 189, 211, 225, 227, 243, 269, 271, 295


As you can see from the example above, the performance of quick sort depends on the height of the partitioning tree. If we are lucky in our selection of the pivot elements, the datasets and subsets split in halves resulting in a balanced tree with a $log(n)$ height. In this case, the overall running time of quick sort will be $O(n log(n))$.

In the worst case, we might get extremely unlucky with our pivot elements and datasets are partitioned into two uneven parts: one with a single element and another with the rest of the elements. When this happens, the overall running time of quick sort become $n^2$. This is why randomly selecting a pivot element makes this worst case scenario highly unlikely.

### <a id="ch03">CHALLENGE 03</a>
Going back to the definition of a **stable sort**, is the quick sort implementation above stable?

## Heap sort
Heap sort involves first thinking or (re-imagining) the dataset as a complete binary tree. For example, the dataset:

```
243, 170, 225, 135, 181, 189, 295, 227
```

can be thought of as the following **complete binary tree**.

```
        243
        / \
       /   \
      /     \
     /       \
    /         \
  170         225
   /\          /\
  /  \        /  \
 /    \      /    \
135  181    189  295
 |
 |
227
```

According to <a href="https://en.wikipedia.org/wiki/Binary_tree#Types_of_binary_trees">Wikipedia</a>, a **complete binary tree** is a binary tree (each parent has at most two children) in which every level, except possibly the last, is completely filled, and all nodes are as far left as possible.

That means, the first element of the dataset becomes the root element of the tree and the only element at the top level. The second level will have the next two elements, the third level the next 4 elements, the fourth level the next 8 elements, and so on. Given an element at index $p$, the left child of this element will be at index $2 \times p + 1$ while the right child of this element will be at index $2 \times p + 2$, assuming all indices are zero-based.

But being a complete binary tree is not enough for heap sort. The dataset must also be a **binary heap**, which is defined as a complete binary tree in which every parent element is smaller(or greater) than its two children elements. As you can see the tree above is not a binary heap. Why?

There are two kinds of binary heaps. A **maximum binary heap** is one where every parent element is greater than its children elements. A **minimum binary heap**, on the other hand, is one where every parent element is smaller than its children elements.

Heap sort takes a dataset and converts it into a maximum (or minimum) binary heap. A binary heap is not necessarily a sorted dataset; it is just a start. In a maximum binary heap, the root element is the largest element.

The following is heap sort implementation using a maximum binary heap. It works as follows;
- It builds a binary heap
- It swaps the root element of the heap (the largest element) with the last element of the dataset and shrinks the heap size by one.
- It fixes the heap by calling the `heapify` function recursively at index `0`.
- It repeats the previous two steps until the heap size shrinks to one.

Here are the steps of building a binary heap for the above complete binary tree.

```
        243                                     243
        / \                                     / \
       /   \                                   /   \
      /     \                                 /     \
     /       \                               /       \
    /         \                             /         \
  170         225           =>            170         225          =>
   /\          /\                          /\          /\
  /  \        /  \                        /  \        /  \
 /    \      /    \                      /    \      /    \
135  181    189  295                    227  181    189  295
 |                                       |
 |                                       |
227                                     135

        243                                     243
        / \                                     / \
       /   \                                   /   \
      /     \                                 /     \
     /       \                               /       \
    /         \                             /         \
  170         295           =>            227         295          =>
   /\          /\                          /\          /\
  /  \        /  \                        /  \        /  \
 /    \      /    \                      /    \      /    \
135  181    189  225                    170  181    189  225
 |                                       |
 |                                       |
227                                     135

        295
        / \
       /   \
      /     \
     /       \
    /         \
  227         243
   /\          /\
  /  \        /  \
 /    \      /    \
170  181    189  225
 |
 |
135
```

Here is how this algorithm is implemented.

In [15]:
template<typename T>
void heapify(std::vector<T>& data, int p, int hsz){
    int left = 2 * p + 1;
    int right = left + 1;
    int largest = p;
    if(left < hsz && data[p] < data[left]) {
        largest = left;
    }

    if(right < hsz && data[largest] < data[right]) {
        largest = right;
    }

    if(largest != p){
        std::swap(data[p], data[largest]);
        heapify(data, largest, hsz);
    }
}

template<typename T>
void buildHeap(std::vector<T>& data){
    for(int i = data.size() / 2 - 1; i >= 0; i--){
        heapify(data, i, data.size());
    }
}

template<typename T>
void heapSort(std::vector<T>& data){
    buildHeap(data);
    for(int i = data.size()  - 1; i > 0; i--){
        std::swap(data[0], data[i]);
        heapify(data, 0, i);
    }
}

Let's test this algorithm by resetting the `data` vector to the original one before sorting it using `heapSort`.

In [16]:
data = original_data; // reset data

std::cout << "Before: "; printData(data);
heapSort(data);
std::cout << " After: "; printData(data);

Before: 243, 170, 225, 135, 181, 189, 295, 227, 135, 211, 271, 177, 269
 After: 135, 135, 170, 177, 181, 189, 211, 225, 227, 243, 269, 271, 295


Like merge sort, the running time of heap sort is $O(n log(n))$. This is because the binary heap has a height of $O(log(n))$.

### <a id="ch04">CHALLENGE 04</a>
Given the data set
```
35, 81, 89, 95, 27, 15, 21
```
**Q1**. Build a complete binary tree for this dataset in the cell below.

**Q2**. Build a maximum binary heap for the complete binary tree of **Q1**.

## Summary
All the sorting algorithms above are comparison-based; the use comparison operators such as < to establish order of any two values. The slowest algorithm runs in $O(n^2)$ and the fastest runs in $O(n log(n))$. As a matter of fact, it can be mathematically proved that:

<blockquote>
    A comparison-based sort algorithm can do no better than $O(n log(n))$. 
</blockquote>