# _12. Searching and Sorting_

Notebook follows along with the [twelfth video](https://www.youtube.com/watch?v=6LOwPhPDwVc&list=PLUl4u3cNGP63WbdFxL8giv4yhgdMGaZNA&index=39&t=0s) in MIT's 6.0001 Introduction to Computer Science and Programming in Python, Fall 2016.

### _Searching Algorithms_

- linear search
    - brute force search (aka British Museum algorithm)
    - list does not have to be sorted
- bisection search
    - list **MUST be sorted** to give correct answer
    - saw two different implementations of the algorithm

### _Linear Search on Unsorted List: Recap_

In [0]:
def linear_search(L, e):
    found = False
    for i in range(len(L)):
        if e == L[i]: # speeding up a little by returning True here
            found = True # but doesn't impact worst case
    return found

- must look through all elements to decide it's not there
- _O(len(L))_ for the loop * _O(1)_ to test if `e == L[i]`
- overall complexity is **O(n) - where n is `len(L)`**

### _Linear Search on Sorted List: Recap_

In [0]:
def search(L, e):
    for i in range(len(L)):
        if L[i] == e:
            return True
        if L[i] > e:
            return False
    return False

### _Bisection Search Implementation: Recap_

In [0]:
def bisect_search2(L, e):
    def bisect_search_helper(L, e, low, high):
        if high == low:
            return L[low] == e
        mid = (low + high)//2
        if L[mid] == e:
            return True
        elif L[mid] > e:
            if low == mid: # nothing left to search
                return False
            else:
                return bisect_search_helper(L, e, low, mid - 1)
        else:
            return bisect_search_helper(L, e, mid + 1, high)
    if len(L) == 0:
        return False
    else:
        return bisect_search_helper(L, e, 0, len(L) - 1)

### _Searching a Sorted List -- n is len(L)_

- using linear search, search for an element is _O(n)_
- using binary search, can search for an element in _O(log n)_
    - assuming the list is sorted
- when does it make sense to sort first then search?
    - SORT + _O(log n)_ < _O(n)_ --> SORT < _O(n)_ - _O(log n)_
    - when sorting is less than _O(n)_

### _Amortized Cost_

- why bother sorting first?
- in some cases, may sort a list once then do many searches
- **AMORTIZE cost** of the sort over many searches
- SORT + K*_O(log n)_ < K*_O(n)_
    - for large K, sort time becomes irrelevant, if cost of sorting is small enough

### _Monkey Sort_

- aka bogosort, stupid sort, slowsort, permutation sort, shotgun sort

### _Complexity of bogo sort_

In [0]:
def bogo_sort(L):
    while not is_sorted(L):
        random.shuffle(L)

- best case: _O(n)_ where n is `len(L)` to check if sorted
- worst case: _O(?)_ it is unbounded if really unlucky

### _Bubble Sort_

- compare consecutive pairs of elements
- swap elements in pair such that smaller is first
- when reach end of list, start over again
- stop when no more swaps have been made
- largest unsorted element always at end after pass, so at most n passes

### _Complexity of Bubble Sort_

In [0]:
def bubble_sort(L):
    swap = False
    while not swap:
        print('bubble sort: ' + str(L))
        swap = True
        for j in range(1, len(L)):
            if L[j-1] > L[j]:
                swap = False
                temp = L[j]
                L[j] = L[j-1]
                L[j-1] = temp

In [12]:
test_list = [1, 3, 5, 7, 2, 6, 25, 18, 13]

print(bubble_sort(test_list))
print(test_list)

bubble sort: [1, 3, 5, 7, 2, 6, 25, 18, 13]
bubble sort: [1, 3, 5, 2, 6, 7, 18, 13, 25]
bubble sort: [1, 3, 2, 5, 6, 7, 13, 18, 25]
bubble sort: [1, 2, 3, 5, 6, 7, 13, 18, 25]
None
[1, 2, 3, 5, 6, 7, 13, 18, 25]


- inner for loop is for doing the comparisons
- outer while loop is for doing multiple passes until no more swaps
- _O(n**2)_ where n is `len(L)` to do len(L)-1 comparisons and len(L)-1 passes

### _Selection Sort_

- first step
    - extract minimum element
    - swap it with element at index 0
- subsequent step
    - in remaining sublist, extract minimum element
    - swap it with the element at index 1
- keep the left portion of the list sorted
    - at `i`'th step, first i elements in list are sorted
    - all other elements are bigger than first i elements

### _Analyzing Selection Sort_

- loop invariant
    - given prefix of list `L[0:i]` and suffix `L[i+1:len(L)]` then prefix is sorted and no element in prefix is larger than smallest element in suffix
        1. base case: prefix empty, suffix whole list - invariant true
        2. induction step: move minimum element from suffix to end of prefix. Since invariant true before move, prefix sorted after append
        3. when exit, prefix is entire list, suffix empty, so sorted

### _Complexity of Selection Sort_

In [0]:
def selection_sort(L):
    suffixSt = 0
    while suffixSt != len(L):
        for i in range(suffixSt, len(L)):
            if L[i] < L[suffixSt]:
                L[suffixSt], L[i] = L[i], L[suffixSt]
        suffixSt += 1

- outer loop executes `len(L)` times
- inner loop executes `len(L) - i` times
- complexity of selection sort is _O(n**2) where n is len(L)_

### _Merge Sort_

- divide and conquer
- split list in half until have sublists of only 1 element
- merge such that sublists will be sorted after merge

### _Merging Sublists Step_

In [0]:
def merge(left, right):
    result = []
    i, j = 0, 0
    while i < len(left) and j < len(right): # left and right sublists are ordered
        if left[i] < right[j]: # move indices for sublists depending on which sublist holds next smallest element
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    while (i < len(left)): # when right sublist is empty
        result.append(left[i])
        i += 1
    while (j < len(right)): # when left sublist is empty
        result.append(right[j])
        j += 1
    return result

### _Complexity of Merging Sublists Step_

- go through two lists, only one pass
- compare only smallest elements in each sublist
- _O(len(left) + len(right))_ copied elements
- _O(len(longer list))_ comparisons
- linear in length of the lists

### _Merge Sort Algorithm --  Recursive_

In [0]:
def merge_sort(L):
    print(f'merge sort: {str(L)}')
    if len(L) < 2: # base case
        return L[:]
    else:
        middle = len(L)//2
        left = merge_sort(L[:middle]) # divide
        right = merge_sort(L[middle:])
        return merge(left, right) # conquer with the merge step

- divide list successively into halves
- depth-first such that conquer smallest pieces down one branch first before moving to larger pieces

In [20]:
testList = [1,3,5,7,2,6,25,18,13]
print(merge_sort(testList))

merge sort: [1, 3, 5, 7, 2, 6, 25, 18, 13]
merge sort: [1, 3, 5, 7]
merge sort: [1, 3]
merge sort: [1]
merge sort: [3]
merge sort: [5, 7]
merge sort: [5]
merge sort: [7]
merge sort: [2, 6, 25, 18, 13]
merge sort: [2, 6]
merge sort: [2]
merge sort: [6]
merge sort: [25, 18, 13]
merge sort: [25]
merge sort: [18, 13]
merge sort: [18]
merge sort: [13]
[1, 2, 3, 5, 6, 7, 13, 18, 25]


### _Complexity of Merge Sort_

- at first recursion level
    - n/2 elements in each list
    - _O(n)_ + _O(n)_ = _O(n)_ where n is len(L)
- at second recursion level
    - n/4 elements in each list
    - two merges --> _O(n)_ where n is len(L)
- each recursion level is _O(n)_ where n is len(L)
- dividing list in half with each recursive call
    - _O(log(n))_ where n is len(L)
- overall complexity is _O(n log(n))_ where n is len(L)

### _The Three A's of Computational Thinking_

- abstraction
    - choosing the right abstractions
    - operating in multiple layers of abstraction simultaneously
    - defining the relationships between abstraction layers
- automation
    - think in terms of mechanizing our abstractions
    - mechanization is possible - because we have precise and exacting notations and models; and because there is some "machine" that can interpret our notations
-algorithms
    - language for describing automated processes
    - also allows abstraction of details
    