<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc" style="margin-top: 1em;"><ul class="toc-item"><li><span><a href="#Algorithm-complexity" data-toc-modified-id="Algorithm-complexity-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Algorithm complexity</a></span><ul class="toc-item"><li><span><a href="#Search-algorithm" data-toc-modified-id="Search-algorithm-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Search algorithm</a></span><ul class="toc-item"><li><span><a href="#Linear-search" data-toc-modified-id="Linear-search-1.1.1"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Linear search</a></span></li><li><span><a href="#Binary-search" data-toc-modified-id="Binary-search-1.1.2"><span class="toc-item-num">1.1.2&nbsp;&nbsp;</span>Binary search</a></span></li></ul></li></ul></li><li><span><a href="#Sorting-algorithms" data-toc-modified-id="Sorting-algorithms-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Sorting algorithms</a></span><ul class="toc-item"><li><ul class="toc-item"><li><span><a href="#Bubble-sort" data-toc-modified-id="Bubble-sort-2.0.1"><span class="toc-item-num">2.0.1&nbsp;&nbsp;</span>Bubble sort</a></span></li><li><span><a href="#Selection-sort" data-toc-modified-id="Selection-sort-2.0.2"><span class="toc-item-num">2.0.2&nbsp;&nbsp;</span>Selection sort</a></span></li><li><span><a href="#Insertion-sort" data-toc-modified-id="Insertion-sort-2.0.3"><span class="toc-item-num">2.0.3&nbsp;&nbsp;</span>Insertion sort</a></span></li><li><span><a href="#Merge-Sort" data-toc-modified-id="Merge-Sort-2.0.4"><span class="toc-item-num">2.0.4&nbsp;&nbsp;</span>Merge Sort</a></span></li><li><span><a href="#Quick-sort" data-toc-modified-id="Quick-sort-2.0.5"><span class="toc-item-num">2.0.5&nbsp;&nbsp;</span>Quick sort</a></span></li></ul></li></ul></li></ul></div>

Discussion and implementation of common abstact datastructures and algorithms 

Content is heavily derived from: http://interactivepython.org/runestone/static/pythonds/index.html

In [35]:
import pandas as pd
import numpy as np
import time
import matplotlib.pyplot as plt
%matplotlib inline

# Algorithm complexity

`Big O notation` is used in Computer Science to describe the performance or complexity of an algorithm. Big O specifically describes the worst-case scenario, and can be used to describe the execution time required or the space used.

To know more: http://interactivepython.org/runestone/static/pythonds/AlgorithmAnalysis/BigONotation.html

![selection](../imgs/bigo.png)

<img src="http://interactivepython.org/runestone/static/pythonds/_images/newplot.png">

Let's understand this through 2 different implementation of search algorithm

## Search algorithm

### Linear search

In [93]:
def linear_search(l, target):
    for e in l:
        if e == target:
            return True
    return False

In [94]:
l = np.arange(1000)

In [95]:
%%timeit
linear_search(l,999)

265 µs ± 5.07 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


Time scales linearly with n. So Big-O is $O(n)$

### Binary search

Iterative algo

In [96]:
def binarySearchIterative(a, t):
    upper = len(a) - 1
    lower = 0
    while lower <= upper:
        middle = (lower + upper) // 2
        if t == a[middle]:
            return True
        else:
            if t < a[middle]:
                upper = middle - 1
            else:
                lower = middle + 1
    return False

In [97]:
l = np.arange(1000)

In [98]:
%%timeit
binarySearchIterative(l,999)

9.05 µs ± 419 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


Time scales linearly with n. So Big-O is $O(log(n))$

We can see that binary search is almost 30x faster

We can do binary search in a recursive way too

In [100]:
def binarySearchRecursive(a, t):
    upper = len(a) - 1
    lower = 0
    if upper >= 0:
        middle = (lower + upper) // 2
        if t == a[middle]: return True
        if t < a[middle]: return binarySearchRecursive(a[:middle], t)
        else: return binarySearchRecursive(a[middle + 1:], t)
    return False

In [101]:
%%timeit
binarySearchRecursive(l,999)

15.6 µs ± 517 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


# Sorting algorithms

Source: http://interactivepython.org/runestone/static/pythonds/SortSearch/toctree.html

### Bubble sort

![bubble](../imgs/bubblepass.png)

$$Complexity: O(n^2)$$

A bubble sort is often considered the most inefficient sorting method since it must exchange items before the final location is known. These “wasted” exchange operations are very costly. However, because the bubble sort makes passes through the entire unsorted portion of the list, it has the capability to do something most sorting algorithms cannot. In particular, if during a pass there are no exchanges, then we know that the list must be sorted. A bubble sort can be modified to stop early if it finds that the list has become sorted. This means that for lists that require just a few passes, a bubble sort may have an advantage in that it will recognize the sorted list and stop.

In [589]:
l = [1,2,3,4,32,5,5,66,33,221,34,23,12]

In [590]:
def bubblesort(nums):
    n = len(nums)
    exchange_cnt = 1
    while exchange_cnt > 0:
        exchange_cnt = 0
        for i in range(1, n):
            if nums[i] < nums[i - 1]:
                exchange_cnt += 1
                nums[i - 1], nums[i] = nums[i], nums[i - 1]
        print(nums)
    return nums

In [591]:
bubblesort(l)

[1, 2, 3, 4, 5, 5, 32, 33, 66, 34, 23, 12, 221]
[1, 2, 3, 4, 5, 5, 32, 33, 34, 23, 12, 66, 221]
[1, 2, 3, 4, 5, 5, 32, 33, 23, 12, 34, 66, 221]
[1, 2, 3, 4, 5, 5, 32, 23, 12, 33, 34, 66, 221]
[1, 2, 3, 4, 5, 5, 23, 12, 32, 33, 34, 66, 221]
[1, 2, 3, 4, 5, 5, 12, 23, 32, 33, 34, 66, 221]
[1, 2, 3, 4, 5, 5, 12, 23, 32, 33, 34, 66, 221]


[1, 2, 3, 4, 5, 5, 12, 23, 32, 33, 34, 66, 221]

### Selection sort

![selection](../imgs/selectionsort.png)

$$Complexity: O(n^2)$$

The selection sort improves on the bubble sort by making only one exchange for every pass through the list. In order to do this, a selection sort looks for the largest value as it makes a pass and, after completing the pass, places it in the proper location. As with a bubble sort, after the first pass, the largest item is in the correct place. After the second pass, the next largest is in place. This process continues and requires n−1 passes to sort n items, since the final item must be in place after the (n−1) st pass.

In [46]:
l = [1,2,3,4,32,5,5,66,33,221,34,23,12]

In [39]:
def selectionSort(l):
    n = len(l)
    end = n - 1
    for j in range(n):
        max_ = l[-1 - j]
        max_idx = -1 - j
        for i in range(end):
            if l[i] > max_:
                max_ = l[i]
                max_idx = i
            else:
                continue
        l[-1 - j], l[max_idx] = l[max_idx], l[-1 - j]
        end = end - 1
        print(l)
    return l

In [40]:
selectionSort(l)

[1, 2, 3, 4, 32, 5, 5, 66, 33, 12, 34, 23, 221]
[1, 2, 3, 4, 32, 5, 5, 23, 33, 12, 34, 66, 221]
[1, 2, 3, 4, 32, 5, 5, 23, 33, 12, 34, 66, 221]
[1, 2, 3, 4, 32, 5, 5, 23, 12, 33, 34, 66, 221]
[1, 2, 3, 4, 12, 5, 5, 23, 32, 33, 34, 66, 221]
[1, 2, 3, 4, 12, 5, 5, 23, 32, 33, 34, 66, 221]
[1, 2, 3, 4, 5, 5, 12, 23, 32, 33, 34, 66, 221]
[1, 2, 3, 4, 5, 5, 12, 23, 32, 33, 34, 66, 221]
[1, 2, 3, 4, 5, 5, 12, 23, 32, 33, 34, 66, 221]
[1, 2, 3, 4, 5, 5, 12, 23, 32, 33, 34, 66, 221]
[1, 2, 3, 4, 5, 5, 12, 23, 32, 33, 34, 66, 221]
[1, 2, 3, 4, 5, 5, 12, 23, 32, 33, 34, 66, 221]
[1, 2, 3, 4, 5, 5, 12, 23, 32, 33, 34, 66, 221]


[1, 2, 3, 4, 5, 5, 12, 23, 32, 33, 34, 66, 221]

The benefit of selection over bubble sort is it does one exchange per pass whereas bubble sort can do multiple exchanges.

### Insertion sort

![insertion](../imgs/insertionsort.png)

$$Complexity: O(n^2)$$

In [605]:
l = [1,2,3,4,32,5,5,66,33,221,34,23,12]

In [606]:
def insertionSort(l):
    for i in range(1, len(l)):
        cval = l[i]
        pos = i
        while pos > 0 and l[pos - 1] > cval:
            l[pos],l[pos-1] = l[pos - 1],l[pos]
            pos = pos - 1
        print(l)
    return l

In [607]:
insertionSort(l)

[1, 2, 3, 4, 32, 5, 5, 66, 33, 221, 34, 23, 12]
[1, 2, 3, 4, 32, 5, 5, 66, 33, 221, 34, 23, 12]
[1, 2, 3, 4, 32, 5, 5, 66, 33, 221, 34, 23, 12]
[1, 2, 3, 4, 32, 5, 5, 66, 33, 221, 34, 23, 12]
[1, 2, 3, 4, 5, 32, 5, 66, 33, 221, 34, 23, 12]
[1, 2, 3, 4, 5, 5, 32, 66, 33, 221, 34, 23, 12]
[1, 2, 3, 4, 5, 5, 32, 66, 33, 221, 34, 23, 12]
[1, 2, 3, 4, 5, 5, 32, 33, 66, 221, 34, 23, 12]
[1, 2, 3, 4, 5, 5, 32, 33, 66, 221, 34, 23, 12]
[1, 2, 3, 4, 5, 5, 32, 33, 34, 66, 221, 23, 12]
[1, 2, 3, 4, 5, 5, 23, 32, 33, 34, 66, 221, 12]
[1, 2, 3, 4, 5, 5, 12, 23, 32, 33, 34, 66, 221]


[1, 2, 3, 4, 5, 5, 12, 23, 32, 33, 34, 66, 221]

### Merge Sort

![merge](../imgs/mergesort.png)

![merge1](../imgs/mergesortB.png)

$$Complexity: O(nlog(n))$$

In [1]:
l = [1,2,3,4,32,5,5,66,33,221,34,23,12]

In [2]:
def mergeSort(alist):
    print("Splitting ", alist)
    if len(alist) > 1:
        mid = len(alist) // 2
        lefthalf = alist[:mid]
        righthalf = alist[mid:]

        mergeSort(lefthalf)
        mergeSort(righthalf)

        i = 0
        j = 0
        k = 0
        while i < len(lefthalf) and j < len(righthalf):
            if lefthalf[i] < righthalf[j]:
                alist[k] = lefthalf[i]
                i = i + 1
            else:
                alist[k] = righthalf[j]
                j = j + 1
            k = k + 1

        while i < len(lefthalf):
            alist[k] = lefthalf[i]
            i = i + 1
            k = k + 1

        while j < len(righthalf):
            alist[k] = righthalf[j]
            j = j + 1
            k = k + 1
    print("Merging ", alist)

In [3]:
mergeSort(l)

Splitting  [1, 2, 3, 4, 32, 5, 5, 66, 33, 221, 34, 23, 12]
Splitting  [1, 2, 3, 4, 32, 5]
Splitting  [1, 2, 3]
Splitting  [1]
Merging  [1]
Splitting  [2, 3]
Splitting  [2]
Merging  [2]
Splitting  [3]
Merging  [3]
Merging  [2, 3]
Merging  [1, 2, 3]
Splitting  [4, 32, 5]
Splitting  [4]
Merging  [4]
Splitting  [32, 5]
Splitting  [32]
Merging  [32]
Splitting  [5]
Merging  [5]
Merging  [5, 32]
Merging  [4, 5, 32]
Merging  [1, 2, 3, 4, 5, 32]
Splitting  [5, 66, 33, 221, 34, 23, 12]
Splitting  [5, 66, 33]
Splitting  [5]
Merging  [5]
Splitting  [66, 33]
Splitting  [66]
Merging  [66]
Splitting  [33]
Merging  [33]
Merging  [33, 66]
Merging  [5, 33, 66]
Splitting  [221, 34, 23, 12]
Splitting  [221, 34]
Splitting  [221]
Merging  [221]
Splitting  [34]
Merging  [34]
Merging  [34, 221]
Splitting  [23, 12]
Splitting  [23]
Merging  [23]
Splitting  [12]
Merging  [12]
Merging  [12, 23]
Merging  [12, 23, 34, 221]
Merging  [5, 12, 23, 33, 34, 66, 221]
Merging  [1, 2, 3, 4, 5, 5, 12, 23, 32, 33, 34, 66, 221

### Quick sort

![quick](../imgs/quicksort.png)

$$Complexity: O(nlog(n))$$ $$Worst case : O(n^2)$$

In [57]:
def quickSort(alist):
    quickSortHelper(alist, 0, len(alist) - 1)


def quickSortHelper(alist, first, last):
    if first < last:

        splitpoint = partition(alist, first, last)

        quickSortHelper(alist, first, splitpoint - 1)
        quickSortHelper(alist, splitpoint + 1, last)


def partition(alist, first, last):
    pivotvalue = alist[first]

    leftmark = first + 1
    rightmark = last

    done = False
    while not done:

        while leftmark <= rightmark and alist[leftmark] <= pivotvalue:
            leftmark = leftmark + 1

        while alist[rightmark] >= pivotvalue and rightmark >= leftmark:
            rightmark = rightmark - 1

        if rightmark < leftmark:
            done = True
        else:
            temp = alist[leftmark]
            alist[leftmark] = alist[rightmark]
            alist[rightmark] = temp

    temp = alist[first]
    alist[first] = alist[rightmark]
    alist[rightmark] = temp

    return rightmark


alist = [54, 26, 93, 17, 77, 31, 44, 55, 20]
quickSort(alist)
print(alist)

[17, 20, 26, 31, 44, 54, 55, 77, 93]


# References and useful links:

* Visualization of these concepst : https://visualgo.net/en
