# Insertion Sort vs Bucket Sort: A few test cases to show where each one outperforms the other
This is a short Jupyter Notebook aimed at showing where these two sorting algorithms can excel with respect to one-another. Insertion Sort has quadratic time complexity, while Bucket Sort is linear while a certain condition is true: the elements in the array it is sorting need to be evenly distributed. They operate on the same data types (integers, floats, decimals etc.), and Bucket Sort can be used in every case Insertion Sort is usable, unlike Radix Sort for example.

Below are definitions in Python for the sorting algorithms and test cases to give an idea of performance relative to these case conditions.

In [5]:
import itertools
import math

class SortingUtilities:

    # standard Insertion Sort implementation
    @staticmethod
    def insertion_sort(array):
        current_index = 1
        while current_index < len(array):
            other_index = current_index
            while other_index > 0 and array[other_index - 1] > array[other_index]:
                # swap
                tmp = array[other_index]
                array[other_index] = array[other_index - 1]
                array[other_index - 1] = tmp

                other_index = other_index - 1
            current_index = current_index + 1

        return array
    
    # one possible implementation of Bucket Sort
    # can be made more efficient with respect to space usage and complexity in another lower-level language such as C, C++ or Java
    @staticmethod
    def bucket_sort(array, bucket_count):
        buckets = []
        for k in range(0, bucket_count):
            buckets.append([])

        array_length = len(array)
        for i in range(0, array_length):
            position = math.floor(bucket_count * array[i] / array_length)
            buckets[position].append(array[i])
        
        new_array = []
        for j in range(0, bucket_count):
            SortingUtilities.insertion_sort(buckets[j])
            new_array = new_array + buckets[j]

        for i in range(0, array_length):
            array[i] = new_array[i]
        
        return new_array


In [6]:
# quick test to check if changes in code work
array1 = [5, 1, 4, 2, 7, 1, 6, 8]
array2 = [4, 7, 1, 8, 9, 5, 9, 1, 5, 3, 3]
sorted1 = SortingUtilities.insertion_sort(array1)
sorted2 = SortingUtilities.bucket_sort(array2, 5)

print(array1 == [1, 1, 2, 4, 5, 6, 7, 8])
print(array2 == [1, 1, 3, 3, 4, 5, 5, 7, 8, 9, 9])
print(sorted1 == [1, 1, 2, 4, 5, 6, 7, 8])
print(sorted2 == [1, 1, 3, 3, 4, 5, 5, 7, 8, 9, 9])

True
True
True
True


## Quick Estimation of Time Complexities for each Sorting Algorithm
### Insertion Sort
The outer `while` loop always does `n - 1` runs.
```python
    while current_index < len(array):
        ...
        current_index = current_index + 1
```

The inner `while` loop does on average less than and at most `other_index`, which is by definition between zero and `current_index < n`.
```python
    while other_index > 0 and array[other_index - 1] > array[other_index]:
        ...
        other_index = other_index - 1
```
With these in mind we can estimate the Time Complexity of Insertion Sort `Big O of (n-1) * current_index ~ n^2`, so `O(n^2)`.

Here is a table of complexity by case from Wikipedia (confronted with my implementations):
| Case            | Complexity                         |
|-----------------|------------------------------------|
| Worst-case Time | `O(n^2)` comparisons and swaps     |
| Best-case Time  | `O(n^2)` comparisons, `O(1)` swaps |
| Average Time    | `O(n^2)` comparisons and swaps     |
| Spatial         | `O(1)` auxiliary                   |

Auxiliary Spatial Complexity is constant because we only use memory for swaps (and indices obviously). There is however no guarantee that Total Spatial Complexity is constant, and is probably linear considering it uses garbage collection and every datum is an object.

### Bucket Sort
There are four `for`` loops. Let's inspect each one and then have a look at the Spatial Complexity.

The first `for` loop does `k` runs, `k` being the number of buckets and a parameter passed during execution by the user.
```python
    for k in range(0, bucket_count):
        ...
```

The second loop evidently does `n` runs.

The third does `k` runs, but inside itself it calls Insertion Sort: `k * (n/k)^2 = n^2/k` runs.
```python
    for j in range(0, bucket_count):
        SortingUtilities.insertion_sort(buckets[j])
        ...
```
The last loop only populates the old array with the sorted values and is optional if a faster implementation is desired. It does `n` runs.

So we can expect the Time Complexity to be `O(k + n + n^2/k)`. `k` can be any integerb between one and `n`. When `k = 1` it is blatanly an Insertion Sort. When `k ~ n`, we have `O(n)`. So using a large number of buckets can be advantageous in terms of time.

Lastly, Spatial Complexity is `O(n + k)`: there is a list of `k` buckets and the buckets themselves contain `n` elements in total.

TODO:
* experiment section
* find way to time from example report (notebook does it too)
* data generation
* code diagram
