## General idea

### Sorting 2 elements

In [72]:
a = 5
b = 2

if a > b:
    a, b = b, a

a, b

(2, 5)

### Sorting 3 elements

In [73]:
a = 5
b = 2
c = 1

if a > b:
    a, b = b, a

if b > c:
    b, c = c, b

if a > b:
    a, b = b, a

a, b, c

(1, 2, 5)

## "Slow" sorting algorithms

### Bubble Sort

Time complexity: O(n²)

Stable

In [74]:
arr = [9, 4, 3, 2, 6, 7, 1, 8, 5]

for i in range(len(arr) - 1):
    for j in range(len(arr) - 1 - i):
        if arr[j] > arr[j + 1]:
            arr[j], arr[j + 1] = arr[j + 1], arr[j]

arr

[1, 2, 3, 4, 5, 6, 7, 8, 9]

### Selection Sort

Time complexity: O(n²)

Not stable

In [75]:
arr = [9, 4, 3, 2, 6, 7, 1, 8, 5]

for i in range(len(arr) - 1):
    min_index = i

    for j in range(i + 1, len(arr)):
        if arr[j] < arr[min_index]:
            min_index = j

    arr[min_index], arr[i] = arr[i], arr[min_index]

arr

[1, 2, 3, 4, 5, 6, 7, 8, 9]

### Insertion Sort

Time complexity: O(n²)

Stable

In [76]:
arr = [9, 4, 3, 2, 6, 7, 1, 8, 5]

for i in range(1, len(arr)):
    key = arr[i]

    j = i - 1
    while j >= 0 and key < arr[j]:
        arr[j + 1] = arr[j]
        j -= 1
    arr[j + 1] = key

arr

[1, 2, 3, 4, 5, 6, 7, 8, 9]

## "Fast" sorting algorithms

### Merge sort

#### Merge algorithm

Time complexity: O(n)

Merge sorted arrays

In [77]:
left = [2, 5, 6]
right = [1, 3, 7]
arr = []

left_index = 0
right_index = 0

while left_index < len(left) and right_index < len(right):
    if left[left_index] <= right[right_index]:
        arr.append(left[left_index])
        left_index += 1
    else:
        arr.append(right[right_index])
        right_index += 1

while left_index < len(left):
    arr.append(left[left_index])
    left_index += 1

while right_index < len(right):
    arr.append(right[right_index])
    right_index += 1

arr

[1, 2, 3, 5, 6, 7]

#### Merge sort

Time complexity: O(n * log n)

Stable

In [78]:
def merge(arr, left, right):
    left_index = 0
    right_index = 0
    index = 0

    while left_index < len(left) and right_index < len(right):
        if left[left_index] <= right[right_index]:
            arr[index] = left[left_index]
            index += 1
            left_index += 1
        else:
            arr[index] = right[right_index]
            index += 1
            right_index += 1

    while left_index < len(left):
        arr[index] = left[left_index]
        index += 1
        left_index += 1

    while right_index < len(right):
        arr[index] = right[right_index]
        index += 1
        right_index += 1

def merge_sort(arr):
    if len(arr) > 1:
        middle = len(arr) // 2
        left = arr[:middle]
        right = arr[middle:]
        merge_sort(left)
        merge_sort(right)
        merge(arr, left, right)

arr = [9, 4, 3, 2, 6, 7, 1, 8, 5]
merge_sort(arr)
arr

[1, 2, 3, 4, 5, 6, 7, 8, 9]

### Quick sort

Time complexity (average): O(n * log n)

Time complexity (worst case): O(n²)

Not stable

See: https://youtu.be/aXXWXz5rF64, https://youtu.be/es2T6KY45cA

In [79]:
arr = [9, 4, 3, 2, 6, 7, 1, 8, 5]

def partition(arr, low, high):
    pivot = arr[high]

    i = low

    for j in range(low, high):
        if arr[j] <= pivot:
            arr[i], arr[j] = arr[j], arr[i]
            i += 1

    arr[i], arr[high] = arr[high], arr[i]

    return i

def quick_sort(arr, low, high):
    if low < high:
        pi = partition(arr, low, high)
        quick_sort(arr, low, pi - 1)
        quick_sort(arr, pi + 1, high)

quick_sort(arr, 0, len(arr) - 1)
arr

[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [80]:
arr = [9, 4, 3, 2, 6, 7, 1, 8, 5]

print("       Unsorted order:", arr)
print("Expected sorted order:", sorted(arr))
pi = partition(arr, 0, len(arr) - 1)
print("      Partition index:", pi)
print("Order after partition:", arr)
print()
print(f"Notice that the element at index pi = {pi} (arr[pi] = arr[{pi}] = {arr[pi]}) is in the correct place after partitioning")

       Unsorted order: [9, 4, 3, 2, 6, 7, 1, 8, 5]
Expected sorted order: [1, 2, 3, 4, 5, 6, 7, 8, 9]
      Partition index: 4
Order after partition: [4, 3, 2, 1, 5, 7, 9, 8, 6]

Notice that the element at index pi = 4 (arr[pi] = arr[4] = 5) is in the correct place after partitioning


## Special sorting algorithms

### Counting sort

Time complexity: O(N + K)

Space complexity: O(N + K)

N = the number of items

K = the number of possible values

Only works if the items have a known, finite set of possible values (of size K)

In [81]:
arr = [9, 4, 3, 2, 6, 7, 1, 8, 5]

N = len(arr)
K = max(arr) + 1

output = [0] * N
count = [0] * K

# (1)
for i in range(N):
    count[arr[i]] += 1

# (2)
for i in range(1, K):
    count[i] += count[i - 1]

# (3)
for i in reversed(range(N)):
    output[count[arr[i]] - 1] = arr[i]
    count[arr[i]] -= 1

for i in range(N):
    arr[i] = output[i]

print(arr)

[1, 2, 3, 4, 5, 6, 7, 8, 9]


Given: `[3, 2, 1, 2, 1, 6, 2]` with possible values `{0, 1, 2, 3, 4, 5, 6}`

The result should be: `[1, 1, 2, 2, 2, 3, 6]`

Notice that the elements are simply repeated. We can produce the sorted array if we *count* how many of each possible value we should have

(1) Create an array of counts, where `counts[v]` is how many times the value `v` appears

For the example, this is `counts = [0, 2, 3, 1, 0, 0, 1]`

`counts[0] = 0` (0 appears 0 times), `counts[1] = 2` (1 appears 2 times), `counts[2] = 3` (2 appears 3 times), etc.

(2) At which index should a given value `v` be placed? Each occurence of a smaller value will "push" it forward.

For the example: The value `0` isn't in the array. The `1`s will start at index 0 and stop at index 0 + 2 = 2 (not inclusive). Then, for the `2`s, they will be "pushed" forward by the `1`s, which stopped at index 2. There are 3 instances of the value `2`, so they will end at index 0 + 2 + 3 = 5 (not inclusive). Then, for the `3`s, they will be "pushed" forward by the `2`s, which stopped at index 5. There is 1 instance of the value `3`, so the `3`s will end at index 0 + 2 + 3 + 1 = 6, etc.

So, every item in `counts` should become the sum of all previous items. This is called a "prefix sum".

For the example, the new value is `counts = [0, 2, 5, 6, 6, 6, 7]`

`counts[1] = 2` (meaning, the `1`s stop at index 2, not including index 2), etc.

(3) Now, `counts` can tell us where to place a given item. If we see a `2`, we know we can place it at index `counts[2] - 1` (because `counts[2]` is not inclusive). Then, if we see another `2`, we should place it at index `counts[2] - 2`, etc. In other words, we can decrement `counts[v]` when inserting an element `v` to keep the `counts` array (which now stores target indexes, not counts) accurate.

Notice that this procedure will insert the elements in reversed order: the first `2` that is inserted will end up at the last index for `2`s, the next `2` will be inserted to the left of it, etc. By iterating the original array in reverse order, we place the rightmost instance of `v` last, etc., which makes the algorithm stable.

For the example:

0: We have `arr = [3, 2, 1, 2, 1, 6, 2]`, `counts = [0, 2, 5, 6, 6, 6, 7]`, `result = [0, 0, 0, 0, 0, 0, 0]`

1: `i = 6`, `arr[i] = 2`. So: `counts = [0, 2, 4, 6, 6, 6, 7]`, `result = [0, 0, 0, 0, 2, 0, 0]`

2: `i = 5`, `arr[i] = 6`. So: `counts = [0, 2, 4, 6, 6, 6, 6]`, `result = [0, 0, 0, 0, 2, 0, 6]`

3: `i = 4`, `arr[i] = 1`. So: `counts = [0, 1, 4, 6, 6, 6, 6]`, `result = [0, 1, 0, 0, 2, 0, 6]`

4: `i = 3`, `arr[i] = 2`. So: `counts = [0, 1, 3, 6, 6, 6, 6]`, `result = [0, 1, 0, 2, 2, 0, 6]`

5: `i = 2`, `arr[i] = 1`. So: `counts = [0, 0, 3, 6, 6, 6, 6]`, `result = [1, 1, 0, 2, 2, 0, 6]`

6: `i = 1`, `arr[i] = 2`. So: `counts = [0, 0, 2, 6, 6, 6, 6]`, `result = [1, 1, 2, 2, 2, 0, 6]`

7: `i = 0`, `arr[i] = 3`. So: `counts = [0, 0, 2, 5, 6, 6, 6]`, `result = [1, 1, 2, 2, 2, 3, 6]`

### Counting sort on objects

In [82]:
class Student:
    def __init__(self, name, grade):
        self.name = name
        self.grade = grade

    def __repr__(self):
        return f"Student({self.name!r}, {self.grade!r})"

arr = [
    Student("Rafael", 2),
    Student("Etta", 6),
    Student("Zane", 3),
    Student("Eric", 4),
    Student("Whitley", 5),
    Student("Rowen", 6),
    Student("Zoe", 4),
]

# Sorting by grade

N = len(arr)
K = max(student.grade for student in arr) + 1

output = [0] * N
count = [0] * K

# (1)
for i in range(N):
    key = arr[i].grade

    count[key] += 1

# (2)
for i in range(1, K):
    count[i] += count[i - 1]

# (3)
for i in reversed(range(N)):
    key = arr[i].grade

    output[count[key] - 1] = arr[i]
    count[key] -= 1

for i in range(N):
    arr[i] = output[i]

from pprint import pprint
pprint(arr)

[Student('Rafael', 2),
 Student('Zane', 3),
 Student('Eric', 4),
 Student('Zoe', 4),
 Student('Whitley', 5),
 Student('Etta', 6),
 Student('Rowen', 6)]


## Sorting stability

### Unstable sort

In [83]:
class Student:
    def __init__(self, name, grade):
        self.name = name
        self.grade = grade

    def __repr__(self):
        return f"Student({self.name!r}, {self.grade!r})"

arr = [
    Student("Rafael", 2),
    Student("Etta", 6),
    Student("Zane", 3),
    Student("Eric", 4),
    Student("Whitley", 5),
    Student("Rowen", 6),
    Student("Zoe", 4),
]

# Selection sort
for i in range(len(arr) - 1):
    min_index = i

    for j in range(i + 1, len(arr)):
        if arr[j].grade < arr[min_index].grade:
            min_index = j

    arr[min_index], arr[i] = arr[i], arr[min_index]

from pprint import pprint
pprint(arr)

print()
print("Notice that Etta and Rowen are swapped")

[Student('Rafael', 2),
 Student('Zane', 3),
 Student('Eric', 4),
 Student('Zoe', 4),
 Student('Whitley', 5),
 Student('Rowen', 6),
 Student('Etta', 6)]

Notice that Etta and Rowen are swapped


### Sorting by multiple columns with a stable sort

In [84]:
class Student:
    def __init__(self, first_name, last_name):
        self.first_name = first_name
        self.last_name = last_name

    def __repr__(self):
        return f"Student({self.first_name!r}, {self.last_name!r})"

arr = [
    Student("Rafael", "Green"),
    Student("Eric", "Dickson"),
    Student("Zane", "Waller"),
    Student("Eric", "Baker"),
    Student("Etta", "Atkins"),
    Student("Whitley", "Kelley"),
    Student("Rowen", "Pitts"),
    Student("Zoe", "Lynch"),
]

import operator

# If we want to sort first by first name, then by last name (if the first name matches),
# we can sort multiple times in the reverse order (first sort by last name, then by first name)
# Why: we get the items sorted by last name. Then, we sort by first name and if any of the first
# names match, the stable sorting algorithm will preserve their original order. But, we already
# sorted by last name, so the "original order" is the sorted order by last name.
arr.sort(key=operator.attrgetter("last_name"))
arr.sort(key=operator.attrgetter("first_name"))

from pprint import pprint
pprint(arr)

[Student('Eric', 'Baker'),
 Student('Eric', 'Dickson'),
 Student('Etta', 'Atkins'),
 Student('Rafael', 'Green'),
 Student('Rowen', 'Pitts'),
 Student('Whitley', 'Kelley'),
 Student('Zane', 'Waller'),
 Student('Zoe', 'Lynch')]
