In [2]:
import math
import logging
FORMAT = '[%(name)s:%(levelname)s]  %(message)s'
logging.basicConfig(level=logging.DEBUG, format=FORMAT)
logger = logging.getLogger('dbg')

def dprint(s):
    logger.debug(s)

def iprint(s):
    logger.info(s)

logger.setLevel(logging.INFO)

## Model of computation

Random-Access machine
1. Instructions executed sequentially
2. No concurrency
3. Constant time instructions
4. No memory hierarchy

## Data-structures

**Dynamic Sets**

Sets that can change over time, may have structure such as ordering
For a given set _S_
- Search(_S_, _k_) - return a pointer _x_ to an element such that _x_.key = _k_ (lookup _k_ in _S_)
- Insert(_S_, _x_) - adds the set with element pointed to by x 
- Delete(_S_, _x_) - removes _x_ 
- minimum(_S_) - query on a totally ordered set to find the smallest key
- maximum(_S_) - query on a totally ordered set to find the largest key
- Successor(_S_, _x_) - returns a pointer to the next largest element in a TO set
- Predecessor(_S_, _x_) - returns a pointer to the next smallest element in a TO set

**Stack** - Dynamic set following Last-In First-Out LIFO. Operated on by PUSH and PULL.

**Queue** - Dynamic set implements first-in first-out FIFO. Operated by ENQUEUE and DEQUEUE.

**Linked List**

Objects are arranged in a distinct order determined with a pointer at each member. Doubly-linked lists have two pointers for a predecessor and successor. Circular linked lists have no terminating NULL and the last elements links to the first. Access time is constant in python.

**Tree**

Structure with pointers to parents, children and siblings. Binary tree's only have left and right children (no siblings). Rooted tree's have parent, left child, right sibling pointers which scales well for minimal memory similar to a LL.

<img src="https://www.researchgate.net/publication/234801069/figure/fig4/AS:670016219385862@1536755723210/An-example-left-child-right-sibling-LCRS-tree.png" alt="drawing" width="350"/>

**Dictionary** - Dynamic set that supports insert, delete and search.

## Proof of Correctness

A loop invariant can be used to prove algorithmic correctness, and is essentially bounded mathematical induction:
1. **initialization** - invariant is true at iteration 0.
2. **Maintenance** - if true before iteration, it remains true after.
3. **Termination** - Unlike induction there is a fixed termination point where the correctness of the algorithm can be inferred

### e.g. proof for insertion sort

1. **Final Invariant** - Elements $S[:j]$ are sorted before the $j$-th iteration
2. **Initialisation** - True for $j=1$ as a single item list is sorted
3. **Maintenance**- True for $j=k$, show must be true for $j=k+1$. Informally the while loop works by shifting items up the sorted list until the correct spot is found, hence $S[:k+1]$ is sorted.
4. **Termination** - At for loop end we have $j=n$ such that $S[:n]$ must be sorted


![image.png](media/isortpsuedo.png)

In [3]:
def insertion_sort(S: list):
    iprint(S)
    for j in range(1, len(S)):
        ## start with item index 1 and go through each index
        key = S[j]
        i = j-1
        dprint(f"Key: {key}")
        dprint(f"Sorted: {i} so far")
        while i >= 0 and S[i] > key:
            ## loop backwards through sorter region until end or until list item is greater than key
            ## shift items to make space for insert 
            S[i+1] = S[i]
            i = i - 1
        dprint(f"Insert at : {i+1}")
        S[i+1] = key
        iprint(S)

L = [4,3,1,4,2]
insertion_sort(L)
        

[dbg:INFO]  [4, 3, 1, 4, 2]
[dbg:INFO]  [3, 4, 1, 4, 2]
[dbg:INFO]  [1, 3, 4, 4, 2]
[dbg:INFO]  [1, 3, 4, 4, 2]
[dbg:INFO]  [1, 2, 3, 4, 4]



### Run-Time Analysis of Insertion Sort

![image.png](media/isortrta.png)

$t_j$ - number of times the while loop test executes for value $j$ where $j$ is the outermost loop variable.

For an input list of size $n$ the running time $T(n)$ can be generalized by summing the costs and iterations, noting that the exact time is clearly dependent on how sorted the list already is:

$ T(n) = c_1 n + c_2 (n-1) + c_4 (n -1) + c_5 \sum_{j=2}^{n} t_j + c_6 \sum_{j=2}^{n} (t_j - 1) + c_7 \sum_{j=2}^{n} (t_j -1) + c_8 (n -1) $

**Best Case is when the loop is already sorted** - in this case  $\sum_{j=2}^{n} t_j = (n-1)$

$ T(n) = (c_1 + c_3 + c_4 + c_5)n - (c_2 + c_4 + c_5 + c_8) $ :: **$\theta(n)$**

**Worst case is when the list is reverse sorted** - in this case  $\sum_{j=2}^{n} t_j = \sum_{j=2}^{n} j =  0.5n^2 + 0.5n - 1$ i.e. each $j$ requires $j$ loops back to loc 0.

$ T(n) = \frac{c_5 + c_6 + c_7}{2}n^2 + (c_1 + c_3 + c_4 + c_8 + \frac{c_5 - c_6 - c_7}{2})n - (c_2 + c_4 + c_5 + c_8) $ :: **$\theta(n^2)$**



In [4]:
# logger.setLevel(logging.DEBUG)
L = [1,2,3,4,5]
insertion_sort(L)
L = [5,4,3,2,1]
insertion_sort(L)
# logger.setLevel(logging.INFO)

[dbg:INFO]  [1, 2, 3, 4, 5]
[dbg:INFO]  [1, 2, 3, 4, 5]
[dbg:INFO]  [1, 2, 3, 4, 5]
[dbg:INFO]  [1, 2, 3, 4, 5]
[dbg:INFO]  [1, 2, 3, 4, 5]
[dbg:INFO]  [5, 4, 3, 2, 1]
[dbg:INFO]  [4, 5, 3, 2, 1]
[dbg:INFO]  [3, 4, 5, 2, 1]
[dbg:INFO]  [2, 3, 4, 5, 1]
[dbg:INFO]  [1, 2, 3, 4, 5]


**Worst Case**
- Longest time for any input of size n, a firm bound
- common is some applications e.g. database search for a non-existent item

**Average Case**
- Av time per input (often assuming all inputs of the given size are uniformly distributed)
- Usually as bad as the worst case

**Best Case** - shortest time, rarely used

**Amortized Cost**
- Average cost per operation for the worst case time, where an operation could be one loop of insertion sort e.g.
- Amortization is the process of spreading a large fixed cost across a set of costs (i.e. monthly membership)
- $T(n)/n$ is the average cost of an operation or loop in the worst case, which is taken as the amortized cost

Amortized cost analysis is often required in cases when a sequence of operations are performed on some data structure. A direct application of worst case analysis per operation may lead to overly pesimistic evaluation of the performance.

Amortized analysis averages the running times of operations in a sequence over that sequence instead. For a given operation of an algorithm (e.g. <b>multipop</b>), certain situations (e.g., when stack is empty) may imply a that the cost can be much smaller than expected from an individual, per operation analysis. For example, in case of stack, since each element can be pushed onto stack and poped out of the stack at most once, the actual total worst case run time cost of a sequence of $n$ operations is $O(n)$, leading to $O(1)$ amortized cost per operation. Hence while stack with multipop can have some seemingly inefficient operations such as <b>multipop</b>, considering its performance over a sequence operations, reveals that such data structure is indeed highly efficient.
    
Note that amortized cost analysis is not required in cases when performing operations in a sequence does not change their cost compared to isolated execution of individual operations (e.g. a stack with only <b>push</b> and <b>pop</b> operations).   
    

**Expected Running Time**
- Expected average time per same of a randomized algorithm (which makes random choices) that allows a probabilistic analysis such as quicksort

## Asymptotic Notation & Order of Growth

![image.png](media/asymnot.png)

- Low order terms are insignificant for large n
- Coefficients are less significant than rate of growth
- Lower order worst-case run time is considered more efficient
- For small n, a higher order of growth _may_ be faster

**Big $O$ - Upper Bound**

A function will grow _no faster_ than a certain rate based on its highest-order term, if a function is $O(n^2)$ is is therefore also $O(n^3)$.

>For a function $g(n)$ we denote by $O(g(n))$ _the set of functions_:
>
>$ O(g(n)) = \{ f(n):\;$ there exists positive constants $c_0$ & $n_0\;$ s.t. $\;0 \leq f(n) \leq cg(n) \;\; \forall \; n \geq n_0\}$
>
>A function $f(n)$ belongs to the set $O(g(n))$ if there exists a positive constant $c$ such that $f(n) \leq cg(n)$ for sufficiently large $n$
>
> e.g. $4n^2 + 100n + 500 = O(n^2)$ despite the relative size of coefficients

The formal definition of $O$-notation shows that $n^3 - 100n^2 \neq O(n^2)$ even though the coefficient of $n^2$ is a large negative number. If $n^3 - 100n^2 = O(n^2)$ there would be $c$ and $n_0$ s.t. $n^3 - 100n^2 \leq cn^2 \; \; \forall \; \; n \geq n_0$. Divide by $n^2$, giving $n - 100 \leq c$. This inequality clearly does not hold for any value of $n > c + 100$.

**Big $\Omega$ - Lower Bound**

A function will grow _at least_ as fast as a certain rate based on its highest-order term, if a function is $\Omega(n^2)$ is is therefore also $\Omega(n)$.

>For a function $g(n)$ we denote by $\Omega(g(n))$ _the set of functions_:
>
>$ \Omega(g(n)) = \{ f(n):\;$ there exists positive constants $c_0$ & $n_0\;$ s.t. $\;0 \leq cg(n) \leq f(n) \;\; \forall \; n \geq n_0\}$
>
>A function $f(n)$ belongs to the set $\Omega(g(n))$ if there exists a positive constant $c$ such that $cg(n) \leq f(n)$ for sufficiently large $n$

$4n^2 + 100n + 500 = \Omega(n^2)$, divide both sides by $n^2$ giving $ 4 + 100/n + 500/n^2 \geq c$ which hold for $n _0 \geq 1$ and $c = 4$
 
**Big $\Theta$ - Tight Bound**

A function will grow _precisely_ at a certain rate based on its highest-order term, if a function is $O(f(n))$ and $\Omega(f(n))$ then it can be said to be $\Theta(f(n))$

>For a function $g(n)$ we denote by $\Theta(g(n))$ _the set of functions_:
>
>$ \Theta(g(n)) = \{ f(n):\;$ there exists positive constants $c_1$,$c_2$ & $n_0\;$ s.t. $\;0 \leq c_1g(n) \leq f(n) \leq c_2g(n) \;\; \forall \; n \geq n_0\}$


### Insertion Sort

It is deduced that $O(n^2)$ the is running time for any case of `insertion_sort()`. The run-
ning time is dominated by the inner loop. Each of the $n-1$ iterations of the outer loop causes the inner loop to iterate at most $i-1$ times, and because $i$ is at most $n$, the total number of iterations of the inner loop is at most $(n-1)(n-1) < n^2$. Since each iteration of the inner loop takes constant time, the total time spent in the inner loop is at most $cn^2$, or $O(n^2)$.

The worst-case running time of `insertion_sort()` is $\Omega(n^2)$ (Lower Bound). Saying that the worst-case running time of an algorithm is $\Omega(n^2)$ implies for every input size $n$ above a certain threshold,
there is at least one input of size $n$ for which the algorithm takes at least $cn^2$ time. It does not necessarily mean that the algorithm takes at least $cn^2$ time for all inputs.


## Merge Sort RTA

<img src="media/msort.png" alt="drawing" width="350"/>
<img src="media/mfunc.png" alt="drawing" width="450"/>



In [5]:
logger.setLevel(logging.INFO)
## sort S in-place

def mergesort_merge(S, p, q, r):
    dprint(f"Merging {p}:{q} & {q+1}:{r}")
    ## bit of slicing trickery
    ## extract the two sorted sublists were trying to merge
    L = [S[p]] if p == q else S[p:q+1] 
    R = [S[q+1]] if q+1 == r else S[q+1:r+1]
    dprint(f"Combine {L} {R}")
    # trick to simplify the merge
    # if we append both with infinity, if one list ends 
    # the other will be appended without any logic change 
    L.append(float("inf"))
    R.append(float("inf"))
    i = 0 # L index
    j = 0 # R index
    # k is the index in the top level list we're replacing
    for k in range(p, r+1):
        dprint(f"Comparing {L[i]} <= {R[j]}")
        if L[i] <= R[j]:
            S[k] = L[i]
            i += 1
        else:
            S[k] = R[j]
            j += 1
    dprint(f"Sorted Region: {S[p:r+1]}")
    


def mergesort(S, p = None, r = None):
    if p == None:
        iprint(S)
        p = 0
    if r == None:
        r = len(S)-1
    if p < r:
        dprint(f"Recurse to {p}:{r}")
        # S is the list, p-r is the sub-array index to sort
        q = math.floor((p+r)/2) # midpoint
        dprint(f"Midpoint {q}")
        # sort these two sub-arrays
        mergesort(S, p, q)
        mergesort(S, q + 1, r)
        mergesort_merge(S, p, q, r)
    else:
        # a single item is already sorted - do nothing
        dprint(f"Single @ {p}:{r} Return {S[p]}")
        pass
    iprint(S)


L = [4,3,1,4,2]
dprint(L)
mergesort(L)

[dbg:INFO]  [4, 3, 1, 4, 2]
[dbg:INFO]  [4, 3, 1, 4, 2]
[dbg:INFO]  [4, 3, 1, 4, 2]
[dbg:INFO]  [3, 4, 1, 4, 2]
[dbg:INFO]  [3, 4, 1, 4, 2]
[dbg:INFO]  [1, 3, 4, 4, 2]
[dbg:INFO]  [1, 3, 4, 4, 2]
[dbg:INFO]  [1, 3, 4, 4, 2]
[dbg:INFO]  [1, 3, 4, 2, 4]
[dbg:INFO]  [1, 2, 3, 4, 4]


In [10]:
import time

L = [4,3,1,4,2,8,4,6,32,5,3,2,5,8,9,4,2,5,899,3,25,78,3224,67,4,64,2,356,2,5]*500
logger.setLevel(logging.ERROR)

st = time.process_time()
mergesort(L)
et = time.process_time()
res = et - st
print(f'Merge: {res:.3g} s')

L = [4,3,1,4,2,8,4,6,32,5,3,2,5,8,9,4,2,5,899,3,25,78,3224,67,4,64,2,356,2,5]*500
logger.setLevel(logging.ERROR)

st = time.process_time()
insertion_sort(L)
et = time.process_time()
res = et - st
print(f'Insertion: {res:.3g} s')


Merge: 0.312 s
Insertion: 6.76 s


As a divide and conquer algorithm, Mergesort runtime analysis can be done using a _recurrence relationship_. 

A recurrence is an equation that describes a function in terms of its value on other, typically smaller, arguments. 
Recurrences go hand in hand with the divide-and-conquer method because they give us a natural way to characterize
the running times of recursive algorithms mathematically.

- **Divide** the problem into one or more subproblems that are smaller instances of the
same problem.
- **Conquer** the subproblems by solving them recursively.
- **Combine** the subproblem solutions to form a solution to the original problem.

_Aside:_
A recurrence $T(n)$ is algorithmic if, for every sufficiently large threshold constant $n_0 > 0$, the following two properties hold:
1. For all $n < n_0$ , we have $T(n) = \Theta(1)$
2. For all $n > n_0$ , every path of recursion terminates in a defined base case within a finite number of recursive invocations.

**For _Merge-Sort_**

**Unifying Constants:**

$T(n) \big\vert_{n=1} = c\;$, $\;\;T(n) \big\vert_{n>1} = 2T(\lfloor n/2 \rfloor) + cn + c$

**Asymptotic Notation:**

$T(n) \big\vert_{n=1} = \Theta(1)\;$, $\;\;T(n) \big\vert_{n>1} = 2T(\lfloor n/2 \rfloor) + \Theta(n)$


### Solving with the substitution method

1. Make a good guess for the form of the solution (e.g. $T(n) = O(n)$)
2. Use mathematical induction to find constants and show solution is valid

To apply the inductive hypothesis, you substitute the guessed solution for the function on smaller values, hence the name "substitution method".

_Let:_ $T(n) = 2T(\lfloor n/2 \rfloor) + n$  with  $T(1) = 1$

_Guess:_ $T(n) = O(n \log_2(n))$

_Proof:_ Assume bounds hold for any $m$ where $m<n$

Adopt inductive hypothesis $T(n) \leq cn \log_2 n \;\; \forall \;\; n \geq n_0$ implied by $T(n) = O(n \log_2n)$

Therefore it must be the case that $T(\lfloor n/2 \rfloor) = O\left(\lfloor n/2 \rfloor \log_2(\lfloor n/2 \rfloor) \right)$ yielding $T(\lfloor n/2 \rfloor) \leq c\lfloor n/2 \rfloor \log_2 \lfloor n/2 \rfloor$

Next this guess/inductive hypothesis is substituted into the recurrence equation:

$T(n) = 2T(\lfloor n/2 \rfloor) + n \\\\
\quad \quad \; \leq 2c\lfloor n/2 \rfloor \log_2 \lfloor n/2 \rfloor + n \\\\
\quad \quad \; \leq cn \log_2(n) - cn \log_2(2) + n \\\\
\quad \quad \; \leq cn \log_2(n) - cn + n \\\\
\quad \quad \; \leq cn \log_2(n) \\\\$

Where the last step holds if we constrain the constants $n_0$ and $c$ to be sufficiently large that for $n > 2n_0$, the quantity $cn$ dominates the anonymous function, and have a valid base case.

The **induction must be mathatically concrete**, and should theoretically validate the base cases aswell.

Examples:

1. $f(n)= 3f(\frac{n}{3})+\Theta(n)$ for $n>1$ and $f(1) = \Theta(1)$.


Guess that $f(n)= O(c n\log_3 n) \leq cn \log_3 n$

$f(n)= 3 f(\frac{n}{3})+ \Theta(n) \\\\
\quad \quad \; \leq 3 c_2 \frac{n}{3} \log_3\frac{n}{3}+c n \\\\
\quad \quad \; = c n (\log_3 n -1)+c n \\\\
\quad \quad \; = C.$

2. $f(n)= 2f(\frac{n}{2})+n\log_2 n +3n$ for $n>1$ and $f(1)=1$.

Guess that $f(n) = O(n (\log_2 n)^2)$

$f(n) = 2f(\frac{n}{2})+n\log_2 n +3n \\\\
\quad \quad \; \leq 2c \frac{n}{2} (\log_2 \frac{n}{2})^2  + n\log_2 n +3n \\\\
\quad \quad \; = cn (\log_2 n - \log_2 2)(\log_2 n - \log_2 2) +  n\log_2 n +3n \\\\
\quad \quad \; = cn (\log_2 n - 1)(\log_2 n - 1) +  n\log_2 n +3n \\\\
\quad \quad \; = cn ((\log_2 n)^2 - 2\log_2 n  + 1) +  n\log_2 n +3n \\\\
\quad \quad \; = cn (\log_2 n)^2 - 2cn \log_2 n + cn + n\log_2 n + 3n \\\\
\quad \quad \; = cn (\log_2 n)^2 + (1-2c)n \log_2 n + (3+c)n \\\\
\quad \quad \; = cn (\log_2 n)^2 - (2c - (3+c)/\log n - 1)n \log_2 n \\\\
\quad \quad \; \leq  cn (\log_2 n)^2 \text{ for } 2c > (3+c)/\log n - 1$

$2c > (3+c)/\log n$ => $c(2 \log n - 1) > 3$ => $c > 3 / (2 \log n - 1)$
 
as $n > n_0$ and larger $n$ lowers the bounds on $c$: choose $n_0$ to be 4: $(2 * 2 - 1) = 3$ and $c_0 > 1$


### Using the reoccurrence tree method to generate a strong guess

balanced trees are simple, for example for the recurrence relationship:

$T(n) = 2T(\frac{n}{2}) + \Theta(n) \leq 2T(\frac{n}{2}) + c_2(n)$

$T(1) = c_1$


Each node has a value $cn$ and produces two child nodes. These each have value $c\frac{n}{2}$. The total cost is found by the summation of all the nodes. Note the piecewise definition results in an extra layer.

![alt](media/mtree3.png)
![alt](media/mtreen.png)



