# Sorting


## Bubble Sort
* makes multiple passes through list
* it compares adjacent items and compares the ones out of order
* called bubble sort because the largest numbers "bubble" to the end of the list with each pass

### Analysis of Bubble Sort
* Time complexity: $O(n^2)$
    * it needs to make N-1 passes to make sure all items are sorted
    * the number of comparisons made in the first pass is N-1, but the number of comparisons decrements by 1 until we have just 1 comparison at the last pass
    * So, the time complexity is an arithmetic sequence. The below calculations show that it is $O(n^2)$
$$
T(n) = \sum_{i=1}^{n-1}i = \frac{(n-1)(n-1+1)}{2} = \frac{1}{2}n^2 - \frac{1}{2}n \\
O(n^2)
$$

* Bubble Sort is relatively inefficient, especially because it makes multiple exchanges for knowing what the final location of the item will be

In [2]:
def bubbleSort(alist: list[int]):
    for i in range(len(alist), 0, -1):
        for j in range(1,i):
            if alist[j] < alist[j-1]:
                temp = alist[j]
                alist[j] = alist[j-1]
                alist[j-1] = temp
    return alist

print(bubbleSort([2,1,45,3]))


[1, 2, 3, 45]


# Selection Sort
* makes (n-1) passes over an list of size N
* on each pass i, it searches the sublist alist[:-i] and finds the largest number and puts it at alist[-i]

## Analysis
* Time Complexity: $O(n^2)$
    * The number of searches it does as it iterates over each item in the list can be written as an arithmetic sequence
$$
T(n) = \sum_{i=1}^{n-1}i = \frac{(n-1)(n-1+1)}{2} = \frac{1}{2}n^2 - \frac{1}{2}n \\
O(n^2)
$$

In [10]:
def selectionSort(alist: list[int]):
    for i in range(len(alist)-1,0,-1):
        greatest = i
        for j in range(i+1): 
            if alist[j] > alist[greatest]:
                greatest = j
        temp = alist[i]
        alist[i] = alist[greatest]
        alist[greatest] = temp
    return alist

print(selectionSort([5,2,1,4,3]))

[1, 2, 3, 4, 5]


# Insertion Sort
* makes N-1 passes
* On each pass i, it only looks at sublist alist[:i+1]
* In the sublist, it finds the position pos that alist[i] should be in the sublist
* after finding the pos, it shifts every to the right of that position one up, then inserts alist[i] at pos

## Analysis
* Time Complexity: $O(n^2)$
    * The number of searches it does as it iterates over each item in the list can be written as an arithmetic sequence
$$
T(n) = \sum_{i=1}^{n-1}i = \frac{(n-1)(n-1+1)}{2} = \frac{1}{2}n^2 - \frac{1}{2}n \\
O(n^2)
$$

In [49]:
def insertionSort(alist: list[int]):
    for end in range(1,len(alist)):
        for pos in range(end):
            if alist[end] < alist[pos]:
                alist = alist[:pos] + [alist[end]] + alist[pos:end] + alist[end+1:]
                break
    return alist

print(insertionSort([54,26,93,17,77,31,44,55,20]))

[17, 20, 26, 31, 44, 54, 55, 77, 93]


## Shell Sort
1. for each iteration, it creates partitions of the list starting with gap=$\frac{n}{2}$. The partitions are a list of positions that are a gap apart.
    * e.g [4,5,1,6,2,7,3] -> i=3: [4,6,3],[5,2],[1,7]
1. it then sorts only the items within the same partition, using insertion sort
    * e.g [4,6,3],[5,2],[1,7] -> [3,4,6],[2,5],[1,7]
1. When sorting within the partition, we are actually sorting on the original list by switching the positions of items in the sublist. So the list would actually end up like below
    e.g [3,4,6],[2,5],[1,7] -> [3,2,1,4,5,7,6]
1. On the next pass, i=$\frac{n}{4}$. We keep halving i until i=1. At that point, we just do a simple insertion sort for one pass
    * [3,2,1,4,5,7,6] -> i = 1
    * [3,2,1,4,5,6,7]
    * [1,2,3,4,5,6,7]


# Analysis
* Time Complexity: 
    * Best Case: $O(nlogn)$ ; applies when list is mostly sorted already
        * Recursion depth is about $log_2(n)$ because in the shellSort() while loop, we are halving "gap" which halves the number of iterations to reach our end condition
        * In shellSorter, the least amount of iterations we could do is a factor of n
            * if we skip the while loop, we still have to iterate over the for loop
            * the for loop has a factor of n iterations because if gap=1 and start=0, you're iterating over each item in the list
            * any other values for gap and start is still some factor of n
                * gap > 1 just means you're dividing the number of iterations you have to do, start>0 is subtracting the number of iterations
    * Average Case: ~$O(n^{3/2})$ (Or more loosely somewhere between $O(n)$ and $O(n^2)$)
        * The $O(n^{3/2})$ is derived from a distribution of computations, not necessary to derive mathematically
    * Worst Case: O(n^2) ;if at the end, we still need to sort all the elements as the shell sorts make little to no progress e.g in a reverse sorted list
        * in shellSorter if we have to go through all the items in the sublists in the while loop, that combined with the for loop becomes an arithmetic sequence which is $O(n^2)$
* Space Complexity: O(1)
  * no new arrays created because all changes are done directly to array

In [1]:
## Implementation Shell Sort
def shellSorter(alist: list[int], start: int, gap: int):
    for pos in range(start+gap,len(alist), gap):
        max_item_index = pos
        while max_item_index >= gap and alist[max_item_index] < alist[max_item_index-gap]:
            temp = alist[max_item_index]
            alist[max_item_index] = alist[max_item_index-gap]
            alist[max_item_index-gap] = temp
            max_item_index = max_item_index-gap
    return alist

def shellSort(alist: list[int]):
    gap = len(alist)//2
    while gap > 0:
        for start in range(gap):
            shellSorter(alist, start, gap)
        gap = gap//2
    return alist

print(shellSort([54,26,93,17,77,31,44,55,20]))

        

    

    

[17, 20, 26, 31, 44, 54, 55, 77, 93]


## Merge Sort
* uses recursion to sort list 
* base case: len(alist)=N=1, return alist
* recursive case: len(alist)=N>1, split the list in half before feeding it back into recursive call
* recursive call:  a,b = mergeSort(alist[:N/2]), mergeSort(alist[N/2:]) 
    * after doing the recursion call, you still need to sort a and b
    * you can do that with a modified select sort: simply iterate over b until you find a element greater than a[0]. Then insert a[0] into a before that position (merging them)
    * we merge a into b because b is typically the bigger sublist
    * keep doing the above until you exhausted all elements in b 

### Analysis
* Time Complexity: $O(nlog(n))$
    * Each time you're doing a recursive call, you are halving the list 
    * meaning the height of the recursion stack ends up being i=$log_2(n)$
    * and at each layer of the stack, you sort N objects total across all the sublists
    * therefore, the total number of sorting you do is $O(nlog(n))$    
* Space Complexity: O(nlogn)
    * we are creating a new list length n everytime we start a new stack in a recursion stack that is log(n) deep 

In [35]:
# Implementation of merge sort (not space efficient, but simple to remember)

def mergeSort(alist):
    if len(alist) <= 1:
        return alist
    midpoint = len(alist)//2
    left,right = mergeSort(alist[:midpoint]), mergeSort(alist[midpoint:])
    newlist = []
    while left and right:
        if left[0] <= right[0]:
            newlist.append(left.pop(0))
        else:
            newlist.append(right.pop(0))
    return newlist + left + right


print(mergeSort([54,26,93,17,77,31,44,55,20]))


[54, 26, 93, 17, 77, 31, 44, 55, 20] 4
None


In [42]:
# implementation of space efficient mergesort
def mergeSort(alist):
    if len(alist) <= 1:
        return alist
    midpoint = len(alist)//2
    left,right = alist[:midpoint], alist[midpoint:]

    mergeSort(alist[:midpoint])
    mergeSort(right)
    
    rightindex = midpoint
    for pos in range(len(alist)):
        if (rightindex >= len(right)) or (leftindex < len(left) and left[leftindex] <= right[rightindex]):
            alist[pos] = left[leftindex]
            leftindex += 1
        else:
            alist[pos] = right[rightindex]
            rightindex += 1
    return alist


print(mergeSort([54,26,93,17,77,31,44,55,20]))

UnboundLocalError: local variable 'leftindex' referenced before assignment

## Quick Sort
* also a divide and conquer technique similar to merge sort
* doesn't use additional storage because instead of splitting lists, we'll be using pivot points
* The pivot point will start at pivot=0 for list `alist` of length n
* we have a leftmark that starts at i_left=i+1, and rightmark that starts at i_right=n-1
* i_left increases until alist[i_left] > alist[pivot], and i_right decreases until  alist[i_right] < alist[pivot] 
    * when both are true, switch alist[i_left] and alist[i_right]
* keep going until i_right < i_left. At that point i_right is the split point.
* if alist[pivot] > alist[i_right], switch them (when i_right=split_point)
* two sublists are created: alist[:i_right], alist(i_right:)
    * alist[:i_right] contains all values less than alist[pivot]
    * alist[i_left:] contains all values greater than alist[pivot]
* then you recursively call quickSort on the two sublists
    * base case: start <= end, return alist without sorting
        * applies when list has 1 or less items
        * also applies when a sublist we're quicksorting has 1 or less items (end recursive call)
    * recursive case: when the sublists after 1 round of quicksort has length greater than 1
    * recursive call:
        * quickSort(alist, start, rightmark-1)
        * quickSort(alist, leftmark, end)

### Analysis
* Time Complexity: variable
    * Best case: O(nlogn)
        * if the partition is always in the middle, the list gets roughly divided in half on each pass
        * so the recursion stack is $log_2(n)$ deep
        * and on each layer, you make n comparisons
        * so O(n*log(n))
    * Worst case: $O(n^2)$
        * the partition gets split towards the end or beginning of the list i.e in sorted or reverse sorted arrays
        * so the recursion stach is more-or-less n deep
        * and on each layer you make n comparisons
        * so O(n^2)
* Space Complexity:
    * Average Case: O(log(n))
        * each time we call quickSort recursively, we are dividing the list at the pivot point
        * assuming that on average the list and sublists will be divided roughly in half, that means the recursion depth is O(logn)
    * Worst Case: O(n)
        * the recursion depth is n for the worst case partitioning scenario: the pivot point repeatedly gets assigned to the beginning or end of list
        



In [13]:
def quickSort(alist: list[int], start = 0, end = None):
    if end is None:
        end = len(alist)-1
    if start >= end:
        return alist

    pivot = start
    leftmark = pivot+1
    rightmark = end

    while rightmark > leftmark:
        if alist[rightmark] < alist[pivot] and alist[leftmark] > alist[pivot]:
            temp = alist[rightmark]
            alist[rightmark] = alist[leftmark]
            alist[leftmark] = temp
        if alist[rightmark] >= alist[pivot]:
            rightmark -= 1
        if alist[leftmark] <= alist[pivot]:
            leftmark += 1

    if alist[rightmark] < alist[pivot]:
        temp = alist[rightmark]
        alist[rightmark] = alist[pivot]
        alist[pivot] = temp
    
    quickSort(alist, start, rightmark-1)
    quickSort(alist, leftmark, end)
    return alist

print(quickSort([54,26,93,17,77,31,44,55,20]))


0 8
0 4
0 2
1 2
3 4
5 8
6 8
[17, 20, 26, 31, 44, 54, 55, 77, 93]
