# Sorting

> I have some items
>
> With a defined ordering
>
> So I will sort them.

## To sort, or not to sort?

Why do we want sorted data?

- Improved searching
- Better human experience

## Bubble Sort 🫧

### Algorithm

Given a list of items:

- scan through the items from left to right
- if an item is larger than the item to its right, swap them (putting the larger item on the right)
- if no swaps are made in a given pass through the items, the items are sorted


### `bubble.h`

### What is the complexity of Bubble Sort?

#### On Average
- Each iteration performs $O(n)$ comparisons
- In each iteration, the largest "unsorted" item in the list is moved into place
  - so after $n$ iterations, all the items will be in place
  - so $O(n)$ iterations
- $O(n) \cdot O(n) = O(n^2)$

#### Reverse Sorted
- Same

#### Pre-sorted
- No swaps are performed, so after a single pass through the data it's done!
- $O(n)$ performance on pre-sorted data 😎
  - but as soon as a single small value ends up at the back, we get $O(n^2)$ again. 😛

## Selection Sort 👇🏻

### Algorithm

Given a list of items:
- Partition the list into "sorted" and "unsorted"
  - the "sorted" part starts empty
- Until the "unsorted" part is empty:
  - Find the **smallest** item in the "unsorted" list
  - Swap it with the **first** item in the "unsorted" list
  - The "sorted" list now includes the swapped item

Use selection-sort to sort: 

5 2 7 3 4

We'll use $p$ to describe the first index of the unsorted list.

$p = 0$

```
5 2 7 3 5
|--------
```

Smallest is 2 => swap with 5:

```
2 5 7 3 5
|--------
```

$p = 1$

```
2 5 7 3 5
  |------
```

Smallest is 3 => swap with 5

```
2 3 7 5 5
  |------
```

$p = 2$

```
2 3 7 5 5
    |----
```

Smallest is 5 => swap with 7

```
2 3 5 7 5
    |----
```

$p = 3$

```
2 3 5 7 5
      |--
```

Smallest is 5 => swap with 7

```
2 3 5 5 7
      |--
```

$p = 4$

```
2 3 5 5 7
        |
```
Smallest is 7 => swap with 7

```
2 3 5 5 7
        |
```

$p = 5$ means we've finished!

```
2 3 5 5 7
          ^
```

(Yes, we could stop once $p = length(list)-1$)

### 👷🏼‍♀️ Activity

#### Selection Sort Algorithm

Given a list of items:
- Partition the list into "sorted" and "unsorted"
  - the "sorted" part starts empty
- Until the "unsorted" part is empty:
  - Find the **smallest** item in the "unsorted" list
  - Swap it with the **first** item in the "unsorted" list
  - The "sorted" list now includes the swapped item

Show each step of selection sort on:

```
5 2 7 3 4
```

```
5 2 7 3 4
```

```
5 2 7 3 4
|--------

2 5 7 3 4
  |------
  
2 3 7 5 4
    |----
```
```
2 3 4 5 7
      |--
      
2 3 4 5 7
        |
```

### Complexity of Selection Sort

```
for each i from 0 to size-1
  min = i
  for each j from i+1 to size
    if list[j] < list[min]
      min = j
  swap list[i] and list[min]

```

What is the time complexity (big-O) of selection sort?

Does the running time depend on what kind of input you give it?

- Random input?
- Sorted input?
- Reverse-sorted input?

#### Sorted data

```
1 2 3 4
```

- Find min (search 1 2 3 4) => 1
  - swap with self
- Find min (search 2 3 4) => 2
  - swap with self
- Find min (search 3 4) => 3
  - swap with self
- Find min (search 4) => 4
  - swap with self

Selection sort does $O(n^2)$ for sorted data.

#### Reversed data

4 3 2 1

- Find min (search 4 3 2 1) => 1
  - swap with 4 => 1 3 2 4
- Find min (search 3 2 4) => 2
  - swap with 3 => 1 2 3 4
- Find min (search 3 4) => 3
  - swap with self
- Find min (search 4) => 4
  - swap with self

Selection sort does $O(n^2)$ on reversed data.

What is the **space** complexity (big-O) of selection sort?

Makes no copy of the data; only stores space for swapping items: $O(1)$

This assumes we aren't counting the space to store the original data. If we were to count that, the space complexity would be $O(n)$

State and justify your assumptions!

### Selection sort

- Runtime: $O(n^2)$
  - Best case, worst case, average case
  - Doesn't matter if the input is sorted, reversed, random
- Space: $O(1)$
  - Doesn't use any more space than the original container


### `selection.h`

## Insertion sort

### Algorithm

- Partition the collection into sorted and unsorted parts
  - The sorted part starts with the **first** item
- While the unsorted part is not empty
  - Take the first item in the unsorted part and **insert** it into it's place in the sorted list
  - Increment the partition

Use insertion sort to sort:

```
5 2 7 3 4
```

$p = 1$

```
5 2 7 3 4
| ^
```

Item $p$ is 2. Insert after the start:

```
2 5 7 3 4
--|
```

$p = 2$

```
2 5 7 3 4
--| ^
```

Item $p$ is 7. Insert after the 5:

```
2 5 7 3 4
----|
```

$p = 3$

```
2 5 7 3 4
----| ^
```

Item $p$ is 3. Insert after the 2:

```
2 3 5 7 4
------|
```

$p = 4$

```
2 3 5 7 4
------| ^
```
Item $p$ is 4. Insert after the 3:

```
2 3 4 5 7
--------|
```

<div style='font-size: 40pt'>😴 🥱 😶‍🌫️ 🙃 😃</div>

### 👷🏽‍♀️ Activity

#### Algorithm

- Partition the collection into sorted and unsorted parts
  - The sorted part starts with the **first** item
- While the unsorted part is not empty
  - Take the first item in the unsorted part and **insert** it into it's place in the sorted list
  - Increment the partition
  

Use insertion sort to sort:

```
8 5 2 7 3
```

```
8 5 2 7 3
| ^
 
5 8 2 7 3
--| ^
  
2 5 8 7 3
----| ^
```
```
2 5 7 8 3
------| ^
        
2 3 5 8 7
--------|
```

### Insert Sort Pseudo-code
```
for each i from 1 to size
  j = i
  item = list[j]

  while j > 0 and list[j-1] > item
    list[j] = list[j-1];
    j--;
  end while
  list[j] = item
end for

```

- For each position, starting at 1 (i.e. the partition index)
- Grab the item at position i and save it
- Shift all values greater than that item to the right
- Put the item in where it goes

- What is the time and space complexity of insertion sort?
- Does the performance matter on the kind of input?
  - Random?
  - Sorted?
  - Reversed?

#### Sorted Input

```
1 2 3 4
```

- Grab 2, how many do I need to shift?
  - Nothing
- Grab 3, how many do I need to shift?
  - Nothing
- etc.

Insertion sort handles sorted data very nicely: $O(n)$! 😎

#### Reversed Input

4 3 2 1

- Grab 3, shift 4 => 3 4 2 1
- Grab 2, shift 3,4 => 2 3 4 1
- Grab 1, shift 2,3,4 => 1 2 3 4

Insertion sort has a full $O(n^2)$ behavior for reversed data.

Insertion sort is still $O(n^2)$ for random data, but the constant factors will be smaller than for reversed data.

### Insertion sort

- Time
  - Generally $O(n^2)$
  - Already-sorted $O(n)$ 👍🏻
  - Worst-case $O(n^2)$

- Space complexity is $O(1)$

## Key ideas

- Bubble, Selection, and Insertion sorts are all $O(n^2)$ for time and $O(1)$ for space
  - Bubble sort is slowest, but is simple and has $O(n)$ on pre-sorted data
  - Selection sort is always $O(n^2)$
  - Insertion sort is $O(n)$ for pre-sorted data and has a faster $O(n^2)$ than the other two
    - Fewer checks and swaps in each iteration

## Appendix

### Can you sort bread? 🍞

In [None]:
#include <string>
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

In [9]:
vector<string> bread = {"🥯","🍞","🥖","🥨","🥐","🫓","🥪","🥙","🍩"};

In [10]:
bread

{ "🥯", "🍞", "🥖", "🥨", "🥐", "🫓", "🥪", "🥙", "🍩" }

In [11]:
sort(bread.begin(), bread.end())

In [12]:
bread

{ "🍞", "🍩", "🥐", "🥖", "🥙", "🥨", "🥪", "🥯", "🫓" }

Yes. Yes you can.

<div class='big centered'> 🍞 </div>