# Sorting algorithm

## What is sorting?

By definition sorting refers to arranging data in a particular formal: either ascending or descending.

## Types of sorting

Sorting algorithms can be divided in 2 categories depending on the **space being used** or **stability of algorithm**

* Sorting:
  * Space used
    * In pace
    * Out of pace
  * Stability
    * Stable
    * Unstable

### Space used

#### In place sorting

Sorting algorithms which **does not require any extra space for sorting**

Example: *Bubble sort*


#### Out place sorting

Sorting algorithms which requires an extra space for sorting

Example: *Merge sort*

### Stability

### Stable sorting

If a sorting algorithm **after sorting the contents does not change the sequence of similar content in which they apprear**, then this sorting is called stable sorting.

Example: *Insertion sort*

### Unstable sorting

If a sorting algorithm **after sorting the content changes the sequence of similar content in which they apprear**, then it is called unstable sorting.

Example: *Quick sort*

## Sorting terminology

* **Increasing order**: 
  * If successive element **is greater than** the previous one
  * Example: *1, 3, 5, 7, 9, 11*
* **Decreasing order**: 
  * If successive element **is less than** the previous one
  * Example: *11, 9, 7, 5, 3, 1*
* **Non increasing order**:
  * If successinve element **is less than or equal to** its previous element in the sequence.
  * Example: *11, 9, 7, 5, 5, 3, 1*
* **Non decreasing order**:
  * If successive element **is greater than or equal to** its previous element in the sequence.
  * Example: *1, 3, 5, 7, 7, 9, 11*

We just need to remember that when we see ***non*** keyword it means that we have duplicate value in our sequence. 


## Sorting algorithms

Bubbly sort
Selection sort
Insertion sort
Bucket sort
Merge sort
Quick sort
Heap sort

We need so many sorting algorithms because each one has its pros and cons. Based on the circumstances we will use different sorting algorithms. Before selecting any sorting algorithms we will analyze the requirements and based on the requirements we can identify wich one fits our case perfectly. Some example of requirements can be:

* Stability
* Space efficient
* TIme efficient

## Bubble sort

* Bubble sort is also referred as sinking sort.
* We repeatedly compare each pair of adjacent items and swap them if they are in the wrong order.

In [1]:
# O(n2) time | O(1) space
def bubble_sort(arr):
    for i in range(len(arr)):
        for j in range(len(arr)-1-i):
            if arr[j] > arr[j+1]:
                arr[j], arr[j+1] = arr[j+1], arr[j]
    return arr

print(bubble_sort([2, 35, 1, 8, 6, 1, 32, 0, 15, 42, 112, 13, 55, 68, 51, 31]))

[0, 1, 1, 2, 6, 8, 13, 15, 31, 32, 35, 42, 51, 55, 68, 112]


When to use/avoid
* When the input is already sorted
* Space in concern
* Easy to implement

When to avoid
* Average time complexity is poor

## Selection sort

* In case of selection sort we repeatedly find the minimum element and move it to the sorted part of array to make unsorted part sorted.

In [2]:
# O(n2) time | O(1) space
def selection_sort(arr):
    for i in range(len(arr)):
        min_index = i
        for j in range(i+1, len(arr)):
            if arr[j] < arr[min_index]:
                min_index = j
        arr[i], arr[min_index] = arr[min_index], arr[i]
    return arr

print(selection_sort([2, 35, 1, 8, 6, 1, 32, 0, 15, 42, 112, 13, 55, 68, 51, 31]))

[0, 1, 1, 2, 6, 8, 13, 15, 31, 32, 35, 42, 51, 55, 68, 112]



When to use/avoid it

* When we have insufficient memory
* Easy to implement

When to avoid it
* When time is a concern

## Insertion sort

* Divide the given array into $2$ parts
* Take **first element from unsorted array** and find **its correct position in sorted array**
* Repeat until unorted array is empty

In [3]:
# O(n2) time | O(1) space
def insertion_sort(arr):
    for i in range(1, len(arr)):
        j = i
        while j > 0 and arr[j] < arr[j-1]:
            arr[j], arr[j-1] = arr[j-1], arr[j]
            j -= 1
    return arr

print(insertion_sort([2, 35, 1, 8, 6, 1, 32, 0, 15, 42, 112, 13, 55, 68, 51, 31]))

[0, 1, 1, 2, 6, 8, 13, 15, 31, 32, 35, 42, 51, 55, 68, 112]


In [4]:
def insertion_sort2(arr):
    for i in range(1, len(arr)): # Comparison btw current and next element, element at 1 is the next element
        key = arr[i]    # current element
        j = i - 1       # Previous element from before the key (current)
        while j >= 0 and arr[j] > key:
            arr[j+1] = arr[j]
            j -= 1
        arr[j+1] = key
    return arr


print(insertion_sort2([2, 35, 1, 8, 6, 1, 32, 0, 15, 42, 112, 13, 55, 68, 51, 31]))

[0, 1, 1, 2, 6, 8, 13, 15, 31, 32, 35, 42, 51, 55, 68, 112]


When to use/avoid it

* When we have insufficient memory
* Easy to implement
* When we have continuous inflow of numbers and we want to keep them sorted

When to avoid insertion sort
* When time is a concern


## Bucket sort

* **Create buckets** and distribute elements of array into buckets
* Sort buckets individually
* Merge buckets after sorting

$number\_of\_buckets = round(\sqrt{number\_of\_elements})$

$appropriate\_bucket = ceil(\frac{value \times mumber\_of\_buckets}{maxValue})$

In [5]:
import math
def bucket_sort(arr):
    num_buckets = round(math.sqrt(len(arr)))
    max_value = max(arr)
    arr_buckets = [[] for _ in range(num_buckets)]
    
    for i in arr:
        index_bucket = math.ceil(i*num_buckets/max_value)
        arr_buckets[index_bucket-1].append(i)

    for j in range(num_buckets):
        arr_buckets[j] = insertion_sort(arr_buckets[j])
    
    k = 0
    for i in range(num_buckets):
        for j in range(len(arr_buckets[i])):
            arr[k] = arr_buckets[i][j]
            k += 1
    return arr

print(bucket_sort([0, 2, 35, 1, 8, 6, 1, 32, 0, 15, 42, 112, 13, 55, 68, 51, 31]))

[1, 1, 2, 6, 8, 13, 15, 31, 32, 35, 42, 51, 55, 68, 0, 0, 112]


## Merge sort

* Merge sort is a divide and conquer algorithm
* Divide the input array in two halves and we keep halving recursively until they become too small that cannot be broken further
* Merge halves by sorting them

Performs better than other previously seen algorithms but required O(n) space complexity

In [6]:
# O(n) time
def merge(arr, l, m, r):
    n1 = m - l + 1      # number of element in 1st subarray
    n2 = r - m          # number of element in 2nd subarray
    # 2 temporary arrays
    L = [0] * (n1)
    R = [0] * (n2)

    # Copy data to temp arrays L[] and R[]
    for i in range(0, n1):
        L[i] = arr[l + i]
    for j in range(0, n2):
        R[j] = arr[m + 1 + j]

    # Merge the temp arrays back into arr[l..r]
    i = 0     # Initial index of first subarray
    j = 0     # Initial index of second subarray
    k = l     # Initial index of merged subarray
    while i < n1 and j < n2:
        if L[i] <= R[j]:
            arr[k] = L[i]
            i += 1
        else:
            arr[k] = R[j]
            j += 1
        k += 1
    while i < n1:
        arr[k] = L[i]
        i += 1
        k += 1
    while j < n2:
        arr[k] = R[j]
        j += 1
        k += 1

# O(n log n) time | O(n) space
def merge_sort(arr, l, r):
    if(l < r):
        m = (l + (r - 1)) // 2
        merge_sort(arr, l, m)       # T(n/2)
        merge_sort(arr, m + 1, r)   # T(n/2) => O(nlog(n)) time complexity
        merge(arr, l, m, r)
    return arr

print(merge_sort([0, 2, 35, 1, 8, 6, 1, 32, 0, 15, 42, 112, 13, 55, 68, 51, 31], 0, 16))

[0, 0, 1, 1, 2, 6, 8, 13, 15, 31, 32, 35, 42, 51, 55, 68, 112]


When to use/avoid it
* When you need stable sort
* When average expected time is $\Omicron(nlogn)$

When to avoid it
* When space is a concern

### Quick sort

Similar to merge sort it's also a divide and conquer algorithm. *(take bigger problem and divide it into smaller problems, then solve them and then merge the results to form the final solution)*

* Quick sort is a divide and conquer algorithm
* Find pivot number and make sure smaller numbers located at the left of pivot and bigger numbers are located at the right of the pivot.
* **Unlike merge sort extra space is not required**

In [7]:
def partition(arr, low, high): # low 1st index high last index
    pivot = arr[high]
    i = low - 1
    for j in range(low, high):
        if arr[j] <= pivot:
            i += 1
            arr[i], arr[j] = arr[j], arr[i]
    arr[i+1], arr[high] = arr[high], arr[i+1]
    return i+1

# O(n) time | O(n) space complexity because of the recursive calls
def quick_sort(arr, low, high):
    if low < high:
        pi = partition(arr, low, high) # partition index coming from the partition function
        quick_sort(arr, low, pi-1)     # T(n/2)
        quick_sort(arr, pi+1, high)    # T(n/2) => O(nlog(n)) time complexity
    return arr


print(quick_sort([0, 2, 35, 1, 8, 6, 1, 32, 0, 15, 42, 112, 13, 55, 68, 51, 31], 0, 16))

[0, 0, 1, 1, 2, 6, 8, 13, 15, 31, 32, 35, 42, 51, 55, 68, 112]


When to use/avoid it
* When average expected time is $\Omicron(nlogn)$

When to avoid it
* When space is a concern
* When you need stable sort

### Heap sort

It uses binary heap to sort the array.
* Step $1$ : Insert data to binary heap tree
* Step $2$ : Extract data from binary heap tree
* It is **best suited with array**, it **does not work with linked list.**

Binary Heap is a binart tree with special properties.

* The value of any given node must be less or equal of its children (*min heap*)
* The value of any given node must be greater or equal of its children (*max heap*)

a sample of (min) binary heap 

                         5
                        / \
                      10   20
                     / \   / \
                    30 40 50 60
                   / \
                 70   80           

In [8]:
def heapify(arr, n, i):
    largest = i # Initialize smallest as the first index coming from the parameter 
    
    l = 2 * i + 1
    r = 2 * i + 2
    if l < n and arr[largest] < arr[l]:
        largest = l
    if r < n and arr[largest] < arr[r]:
        largest = r
    if largest != i:
        arr[i], arr[largest] = arr[largest], arr[i]
        heapify(arr, n, largest)

# O(nlogn) time | O(1) space
def heap_sort(arr):
    n = len(arr)
    for i in range(n, -1, -1):
        heapify(arr, n, i)
    for i in range(n-1, 0, -1):
        arr[i], arr[0] = arr[0], arr[i]
        heapify(arr, i, 0)
    return arr

print(heap_sort([0, 2, 35, 1, 8, 6, 1, 32, 0, 15, 42, 112, 13, 55, 68, 51, 31]))

[0, 0, 1, 1, 2, 6, 8, 13, 15, 31, 32, 35, 42, 51, 55, 68, 112]


### Sorting algorithms

|      Name      |  Time complexity  | Space complexity | Stable |
| :------------: | :---------------: | :--------------: | :----: |
|  Bubble sort   |  $\Omicron(n^2)$  |  $\Omicron(1)$   | $Yes$  |
| Selection sort |  $\Omicron(n^2)$  |  $\Omicron(1)$   |  $No$  |
| Insertion sort |  $\Omicron(n^2)$  |  $\Omicron(1)$   | $Yes$  |
|  Bucket sort   | $\Omicron(nlogn)$ |  $\Omicron(n)$   | $Yes$  |
|   Merge sort   | $\Omicron(nlogn)$ |  $\Omicron(n)$   | $Yes$  |
|   Quick sort   | $\Omicron(nlogn)$ |  $\Omicron(n)$   |  $No$  |
|   Heap sort    | $\Omicron(nlogn)$ |  $\Omicron(1)$   |  $No$  |
