## Proof of Correctness

Loop invariants help us understand why an algorithm is correct. When you’re
using a loop invariant, you need to show three things:

* **Initialization:** It is true prior to the first iteration of the loop.
* **Maintenance:** If it is true before an iteration of the loop, it remains true before the next iteration.
* **Termination:** The loop terminates, and when it terminates, the invariant usually along with the reason that the loop terminated gives us a useful property that helps show that the algorithm is correct.

### e.g. proof for insertion sort

1. **Final Invariant** - Elements $S[:j]$ are sorted before the $j$-th iteration
2. **Initialisation** - True for $j=1$ as a single item list is sorted
3. **Maintenance**- True for $j=k$, show must be true for $j=k+1$. Informally the while loop works by shifting items up the sorted list until the correct spot is found, hence $S[:k+1]$ is sorted.
4. **Termination** - At for loop end we have $j=n$ such that $S[:n]$ must be sorted


![image.png](../media/isortpsuedo.png)

### Proof of Dijkstra

Prove the following loop invariant over the minimum element of the min priority queue.:

* **Invariant** - at the start of the while loop, node $u$, with the minimum distance $u.d$ from start node, $s$, found but yet to be explored, has already its shortest path computed.
    
**Proof** - Contradiction. 
* Assume $u$ is the first vertex that has the minimum distance from $s$ and is a member of the min-priority queue but $u.d > \delta(s,u)$. 
* Note $u\neq s$, since $\delta(s,s)=0$. 
* Since $u.d>\delta(s,u)$ there exists a path $s\leadsto x \leadsto y \leadsto u$, such that $x$ has already been explored (removed from the queue) and $y$ has not yet been explored. 
* If $y = u$ then $u.d=\delta(s,u)$ as vertex $u$ should have been relaxed after the vertex $x$ had been explored. 
* If $y\neq u$ then $y.d < u.d$ since we only consider non-negative edges and $u.d > \delta(s,u)$. Which is a contradiction as then we should have selected $y$ from the queue!
 
### Proof of Bellman-Ford

Consider the **path relaxation property**. 

This property states: 
* if $p=\langle v_0,v_1,\cdots,v_k \rangle$ is a shortest path from $s=v_0$ to $v_k$, and we relax all the edges of $p$ in the order $(v_0,v_1),(v_1,v_2),\cdots,(v_{k-1},v_k)$
* then $v.k=\delta(s,v_k)$. 
* This property holds regardless any other relaxations steps that occur. 
* The outer loop runs $\lvert V -1\rvert$ times and any shortest path is not longer than $\lvert V -1 \rvert$ edges
* After running the Bellman-Ford algorithm $u.d=\delta(s,u)$ for all $v\in V$.
 

## Asymptotic Notation & Order of Growth

>For a function $g(n)$ we denote by $O(g(n))$ _the set of functions_:
>
>$ O(g(n)) = \{ f(n):\;$ there exists positive constants $c_0$ & $n_0\;$ s.t. $\;0 \leq f(n) \leq cg(n) \;\; \forall \; n \geq n_0\}$

>For a function $g(n)$ we denote by $\Omega(g(n))$ _the set of functions_:
>
>$ \Omega(g(n)) = \{ f(n):\;$ there exists positive constants $c_0$ & $n_0\;$ s.t. $\;0 \leq cg(n) \leq f(n) \;\; \forall \; n \geq n_0\}$

>For a function $g(n)$ we denote by $\Theta(g(n))$ _the set of functions_:
>
>$ \Theta(g(n)) = \{ f(n):\;$ there exists positive constants $c_1$,$c_2$ & $n_0\;$ s.t. $\;0 \leq c_1g(n) \leq f(n) \leq c_2g(n) \;\; \forall \; n \geq n_0\}$

## Recursion Relations for Div & Conc.

### Cheatsheet:

| Recursion Relation | Guess | Comment | 
| -- | -- | -- |
| $2T(\lfloor n/2 \rfloor) + n$ | $O(n \log_2(n))$ | |
| $3T(\frac{n}{3})+\Theta(n)$ |  $O(n\log_3 n)$ | |
| $3T(\frac{n}{2})+n\log_2 n +3n$ | $O(n (\log_2 n)^2)$ | |
| $3T(\lfloor\frac{n}{2}\rfloor+1)+n$ | $O(n \log_2(n))$ | $c n\log n - d$ ,  $\frac{n}{2}+1\leq \frac{2n}{3}$ |
| $T(\lfloor n/2 \rfloor) + T(\lceil n/2 \rceil) + 1$ | $O(n)$ | $ T(n) \leq cn -d$

*For _Merge-Sort_**

**Unifying Constants:**

$T(n) \big\vert_{n=1} = c\;$, $\;\;T(n) \big\vert_{n>1} = 2T(\lfloor n/2 \rfloor) + cn + c$

**Asymptotic Notation:**

$T(n) \big\vert_{n=1} = \Theta(1)\;$, $\;\;T(n) \big\vert_{n>1} = 2T(\lfloor n/2 \rfloor) + \Theta(n)$


### Solving with the substitution method

1. Make a good guess for the form of the solution (e.g. $T(n) = O(n)$)
2. Use mathematical induction to find constants and show solution is valid

To apply the inductive hypothesis, you substitute the guessed solution for the function on smaller values, hence the name "substitution method".

_Let:_ $T(n) = 2T(\lfloor n/2 \rfloor) + n$  with  $T(1) = 1$

_Guess:_ $T(n) = O(n \log_2(n))$

_Proof:_ Assume bounds hold for any $m$ where $m<n$

Adopt inductive hypothesis $T(n) \leq cn \log_2 n \;\; \forall \;\; n \geq n_0$ implied by $T(n) = O(n \log_2n)$

Therefore it must be the case that $T(\lfloor n/2 \rfloor) = O\left(\lfloor n/2 \rfloor \log_2(\lfloor n/2 \rfloor) \right)$ yielding $T(\lfloor n/2 \rfloor) \leq c\lfloor n/2 \rfloor \log_2 \lfloor n/2 \rfloor$

Next this guess/inductive hypothesis is substituted into the recurrence equation:

$T(n) = 2T(\lfloor n/2 \rfloor) + n \\\\
\quad \quad \; \leq 2c\lfloor n/2 \rfloor \log_2 \lfloor n/2 \rfloor + n \\\\
\quad \quad \; \leq cn \log_2(n) - cn \log_2(2) + n \\\\
\quad \quad \; \leq cn \log_2(n) - cn + n \\\\
\quad \quad \; \leq cn \log_2(n) \\\\$

Where the last step holds if we constrain the constants $n_0$ and $c$ to be sufficiently large that for $n > 2n_0$, the quantity $cn$ dominates the anonymous function, and have a valid base case.

The **induction must be mathatically concrete**, and should theoretically validate the base cases aswell.

Examples:

1. $f(n)= 3f(\frac{n}{3})+\Theta(n)$ for $n>1$ and $f(1) = \Theta(1)$.


Guess that $f(n)= O(c n\log_3 n) \leq cn \log_3 n$

$f(n)= 3 f(\frac{n}{3})+ \Theta(n) \\\\
\quad \quad \; \leq 3 c_2 \frac{n}{3} \log_3\frac{n}{3}+c n \\\\
\quad \quad \; = c n (\log_3 n -1)+c n \\\\
\quad \quad \; = c n\log_3 n $.

2. $f(n)= 2f(\frac{n}{2})+n\log_2 n +3n$ for $n>1$ and $f(1)=1$.

Guess that $f(n) = O(n (\log_2 n)^2)$

$f(n) = 2f(\frac{n}{2})+n\log_2 n +3n \\\\
\quad \quad \; \leq 2c \frac{n}{2} (\log_2 \frac{n}{2})^2  + n\log_2 n +3n \\\\
\quad \quad \; = cn (\log_2 n - \log_2 2)(\log_2 n - \log_2 2) +  n\log_2 n +3n \\\\
\quad \quad \; = cn (\log_2 n - 1)(\log_2 n - 1) +  n\log_2 n +3n \\\\
\quad \quad \; = cn ((\log_2 n)^2 - 2\log_2 n  + 1) +  n\log_2 n +3n \\\\
\quad \quad \; = cn (\log_2 n)^2 - 2cn \log_2 n + cn + n\log_2 n + 3n \\\\
\quad \quad \; = cn (\log_2 n)^2 + (1-2c)n \log_2 n + (3+c)n \\\\
\quad \quad \; \leq  cn (\log_2 n)^2 \text{ for } c \geq 0.5, n \text{  large}$


A recurrence $T(n)$ is algorithmic if, for every sufficiently large threshold constant $n_0 > 0$, the following two properties hold:
1. For all $n < n_0$ , we have $T(n) = \Theta(1)$.
2. For all $n \geq n_0$ , every path of recursion terminates in a defined base case within a finite number of recursive invocations.

Whenever a recurrence is stated without an explicit base case, we assume that the recurrence is algorithmic.

### Part 2 has a bad termination - add a tighter bound

2. $f(n)= 2f(\frac{n}{2})+n\log_2 n +3n$ for $n>1$ and $f(1)=1$. 

Guess that $f(n) = O(n (\log_2 n)^2)$, $f(n) = c_0 n (\log_2 n)^2) - dn$

$f(n) = 2f(\frac{n}{2})+n\log_2 n +3n \\\\
\quad \quad \; \leq 2 c_0 \frac{n}{2} (\log_2 \frac{n}{2})^2 -2dn  + n\log_2 n +3n \\\\
\quad \quad \; = c_0n (\log_2 n - \log_2 2)(\log_2 n - \log_2 2) +  n\log_2 n +3n -2dn\\\\
\quad \quad \; = c_0n (\log_2 n - 1)(\log_2 n - 1) +  n\log_2 n +3n -2dn\\\\
\quad \quad \; = c_0n ((\log_2 n)^2 - 2\log_2 n  + 1) +  n\log_2 n +3n -2dn\\\\
\quad \quad \; = c_0n (\log_2 n)^2 - 2c_0n \log_2 n + c_0n + n\log_2 n + 3n -2dn\\\\
\quad \quad \; = c_0n (\log_2 n)^2 + (1-2c_0)n \log_2 n + (3+c_0-2d)n \\\\
\quad \quad \; \leq  c_0n (\log_2 n)^2 \text{ for } c_0 \geq 0.5, \; d \geq (c_0 + 3)/2$

Choose $c_0 = 1$, $d = 3$

$f(n)\; \leq  n (\log_2 n)^2  - n \log_2 n + -2n \leq n (\log_2 n)^2 = O(n (\log_2 n)^2)$

This is mathematically concrete assuming our conditional variables fit the base case, or the recurrence is algorithmic


### An example with a constant subtraction needed to cancel 

$f(n)=\begin{cases}  1 & n \leq 2, \\2f(\lfloor\frac{n}{2}\rfloor+1)+n & n>2 \end{cases}$ 

Proove asymptotically bounded by $O(n\log n)$.

Make the Assumption: $f(n) = c n\log n - d$ for $n > 2$, $d$ is a lower order term we are subtracting.

Choose $n_1$ such that $n\geq n_1$ implies $\frac{n}{2}+1\leq \frac{2n}{3}$, or $n \geq 6$

$f(n)=2f(\lfloor\frac{n}{2}\rfloor+1)+n \\\\
\quad \quad \; \leq 2(c(\frac{n}{2}+1)\log (\frac{n}{2} +1)-d)+n \\\\
\quad \quad \; \leq cn \log(\frac{n}{2}+1)+ 2c \log(\frac{n}{2}+1)-2d+n \\\\
\quad \quad \; \leq cn\log (\frac{2n}{3})+2 c \log(\frac{2n}{3})-2d+n \text{ for } n\geq n_1 = 6 \\\\
\quad \quad \; \leq cn\log (n) + cn\log (\frac{2}{3}) +  2 c \log(n) + 2 c \log(\frac{2}{3})-2d+n$

If $c\log (\frac{2}{3}) = -2 \text{ and } d=-4$

$\quad \quad \; \leq cn\log (n) -2n +  2 c \log(n) -4 +4 -d + n \\\\
\quad \quad \; \leq cn\log (n) - n +  2 c \log(n) -d  \\\\
\quad \quad \; \leq cn\log (n) - d + ( 2 c \log(n) - n )$

If $2 c \log(n) - n  \leq 0$ i.e. removing it increases the RHS

$f(n) \leq cn\log (n) - d$ and QED

If $2 c \log(n) - n  = 0$ and  $n \geq 6$, choose no to be larger than both


### Using the reoccurrence tree method to generate a strong guess

balanced trees are simple, for example for the recurrence relationship:

$T(n) = 2T(\frac{n}{2}) + \Theta(n) \leq 2T(\frac{n}{2}) + c_2(n)$

$T(1) = c_1$


![alt](../media/mtreen.png)


Each node has a value $c_2n$ and produces two child nodes. These each have value $c_1\frac{n}{2}$. The total cost is found by the summation of all the nodes. Note the piecewise definition results in an extra layer.

### Example

$T(n) = 8 T(n/2) + \Theta(1) \leq 8 T(n/2) + c_2$

$T(1) = c_1$


Level 1 Tree:
```
   c_2
    |
  --------------------------------------------------
  |      |      |      |      |      |      |      |
T(n/2) T(n/2) T(n/2) T(n/2) T(n/2) T(n/2) T(n/2) T(n/2)
```

Level n Tree:
```
   c_2 n
    |
  ---------------------------------------------------------
  |       |       |       |       |       |       |       |
c_2.n/2 c_2.n/2 c_2.n/2 c_2.n/2 c_2.n/2 c_2.n/2 c_2.n/2 c_2.n/2
  |       |       |       |       |       |       |       |
  8xn/4   8xn/4   8xn/4   8xn/4   8xn/4   8xn/4   8xn/4   8xn/4
```



* Row 1: $c_2 n$
* Row 2: $8\cdot c_2 n/2 = 4 c_2 n$
* Row 3: $8\cdot8\cdot c_2 n/4 = 16 c_2 n$
* Row k: $4^{n-1} c_2 n$ = $2^{2(n-1)} c_2 n$

And there are $log_2(n)$ rows, hence total cost is:

$\sum_{k=1}^{log_2(n)} 4^{k-1} c_2 n$ 

Let $q = \log_2 n$ hence $n = 2^q$

$\sum_{k=1}^{q} 4^{k-1} c_2 2^q$ = $c_2 2^q \sum_{k=1}^{q} 4^{k-1} $ 

$ S = a(1-r^q)/(1-r)$, where $a = 1$, $r = 4$, $S = (1-4^q)/1-4 = (4^q -1)/3$

Hence:

$C_T =  c_2 2^q \cdot (2^{2^q} -1)/3$

$C_T =  c_2 n \cdot (2^{n} -1)/3$ ... $= O(n^3)$



### Again! ... doesn't work simply with infinity either

$T(n) = c_2 n + 4 c_2 n + 4^2 c_2 n + ... + 4^{\log_2(n)} c_2 n$

$ = \sum_{i = 0}^{log_2(n)} 4^i c_2 n \leq \sum_{i = 0}^{\infty} 4^i c_2 n$