In [1]:
# setup
from IPython.display import display,HTML
display(HTML('<style>.prompt{width: 0px; min-width: 0px; visibility: collapse}</style>'))
display(HTML(open('../rise.css').read()))

# imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(style="whitegrid", font_scale=1.5, rc={'figure.figsize':(12, 6)})


## Work-Span model

- We define this model to analyze the costs of recursive functions

Definitions **work** and **span**:

- **work**: total number of primitive operations performed by an algorithm

- **span**: longest sequence of dependent operations in computation
    - also called: *critical path length* or *computational depth*

**intuition**: 

**work**: total operations required by a computation  

**span**: minimum possible time that the computation requires, measure of how *parallelized* and algorithm is

For a given recursive function $e$, we will analyze the work $W(e)$ and span $S(e)$.

## Composition

<img src="./figures/composition.png" width="50%"/>


-   $(e_1, e_2)$: Sequential Composition
    - Work and span are both sum of both expression
    - Work == Span

-   $(e_1 || e_2)$: Parallel Composition
    - Work is the sum of both expresions
    - Span is the **max of the two**
    
    


## Rules of composition


|$\mathbf{e}$ (Recursive function)|     $\mathbf{W(e)}$ (Work)     |     $\mathbf{S(e)}$ (Span)     |
| --------------------- | ------------------------------ | ------------------------------ |
|       $v$             |              $1$               |             $1$                |
|$\mathtt{lambda}\:p\:.\:e$ (define new function)|              $1$            |             $1$                |
|     $(e_1, e_2)$  (sequential composition)    |     $1 + W(e_1) + W(e_2)$      |     $1 + S(e_1) + S(e_2)$      |
|    $(e_1 \|\| e_2)$  (parallel composition)   |     $1 + W(e_1) + W(e_2)$      |   $1 + \max(S(e_1), S(e_2))$   |
| $(e_1 e_2)$ (function application)      |  $W(e_1) + W(e_2) + W([\hbox{Eval}(e_2)/x]e_1) + 1$ | $S(e_1) + S(e_2) + S([\hbox{Eval}(e_2)/x]e_1) + 1$ |
|   `let val` $x=e_1$ `in` $e_2$ `end`  |  $1 + W(e_1) + W([\hbox{Eval}(e_1)/x]e_2)$       |     $1 + S(e_1) +     S([\hbox{Eval}(e_1)/x]e_2)$  |
| $\{e(x)\mid x\in A\}$  |   $1+\sum_{x\in A} W(e(x))$    |   $1+\max_{x\in A} S(f(x))$    |




## Sequential composition: $e_1, e_2$

$W(e_1, e_2) = 1 + W(e_1) + W(e_2)$  
$S(e_1, e_2) = 1 + S(e_1) + S(e_2)$  


## Parallel composition: $e_1 || e_2$

$W(e_1 || e_2) = 1 + W(e_1) + W(e_2)$  
$S(e_1 || e_2) = 1 + \max(S(e_1), S(e_2))$  

We are making a couple of assumptions about $e_1$ and $e_2$.

In a *pure* functional language, we can run two functions in parallel if there is no explicit sequencing.

- no side effects
- data persistence


## function application: $e_1 e_2$

Apply $e_1$ to $e_2$. We understand what this means intuitively. We will write down what it means formally, but this is not "need to know" information. It was incorrect in previous versions of the notes. See Parallel and Sequential Algorithms page 89.

e.g, if $e_1=\mathtt{lambda} (\: x \: . ~ x * x), e_2=6/3$, then

$\mathtt{lambda} (\: x \: . ~ x * x)(6/3)=4$.


$W(e_1 e_2) = W(e_1) + W(e_2) + W([\hbox{Eval}(e_2)/x](\mathtt{lambda} ( \: x \: . ~ e_1))) + 1$.



- $\hbox{Eval}(e)$ evaluates expression $e$. e.g. $\hbox{Eval}(6/3)=2$.


- $[v/x]e$: replace all free (unbound) occurrences of $x$ in $e$ with value $v$ 
  - e.g., $[10/x](x^2+10) \rightarrow 110$



- $W([\hbox{Eval}(e_2)/x]e_1)$ is the cost of substituting $\hbox{Eval}(e_2)$ into all instances of $x$ in $e_1$.



We won't get into the weeds of calculating $[\hbox{Eval}(e_2)/x](\mathtt{lambda} ( \: x \: . ~ e_1))$ in this course.

We can simplify the costs of function application, under the assumption that the cost of substitution takes constant time.

$W(e_1 e_2) = W(e_1) + W(e_2)$

$S(e_1 e_2) = S(e_1) + S(e_2)$
<br>

In [2]:
e1 = lambda x: x**2 + 10
e2 = lambda y: 5*y

e1(
    e2(2)  # evaluate e2 to a value v
  )        # substitute v for x in e1
           # return result of e1

110


$e_1 e_2$


E.g., if $e_1$ is the function $f$, we  
1. evaluate $e_2$ to a value $v$.
2. Use lambda abstraction to form $f(x)$ from $f$.
2. substitute $v$ for $x$ in $f(x)$.
3. Evaluate $f(v)$.


The expression $[\hbox{Eval}(e_2)/x]e_1$ is defined by replacing all free occurences of $x$ in $e_1$ with $\hbox{Eval}(e_2)$.

Example: the increment function.

Let $f = \mathtt{lambda} \: x \: . \: x + 1$.

<br>

$W(f(1)) = W(f) + W(1) \in O(1)$.


$S(f(1)) = S(f)+S(1) \in O(1)$

**serial composition:**

$W(f(1), f(2)) = $

$W(f(1)) + W(f(2)) + 1 = $

$O(1) + O(1) + 1 \in O(1)$.

<br>

$S(f(1), f(2)) = $

$S(f(1)) + S(f(2)) + 1 = $

$O(1) + O(1) + 1 \in O(1)$.

**parallel composition:**

$W(f(1) || f(2)) = $

$W(f(1)) + W(f(2)) + 1 =$

$O(1) + O(1) + 1 \in O(1)$.

<br>

$S(f(1) ||  f(2)) = $

$\max(S(f(1)), S(f(2))) + 1 =$

$\max(O(1),O(1)) + 1 \in O(1)$.

## Parallelism, revisited

How many processors can we use efficiently?


**average parallelism** is defined as the ratio of work to span:

$$
\overline{P} = \frac{W}{S}
$$


<br><br>
To increase parallelism, we can



- decrease span
- increase work (but that's not really desireable, since we want the overall cost to be low)

<br>



**work efficiency**: a parallel algorithm is *work efficient* if it performs asymptotically the same work as the best known sequential algorithm for the problem.

So, we want a *work efficient* parallel algorithm with low span.


## Recurrences

Recurrences are a way to capture the behavior of recursive algorithms.

Key ingredients: 

- Base case ($n = b_c$): constant time 
- Inductive case ($n > b_c$): recurse on smaller instance and use output to compute solution

> $b_c$ is the size of the base case problems


We saw several recursive equations for runtime of the form $T(n)= aT(\frac{n}{b})+f(n)$. Let's derive the runtime when $a=b=2$ and $f(n)=n$.

## Drawing the Recursion Tree

<img src = "./figures/mergesort_tree_1.jpeg" width = "100%" class="center">

## Solving Recurrences with the Tree Method 

### Recipe: 
1. Expand tree for two levels.
2. Determine the cost of each level $i$ ($i$ starts at $0$).
3. Determine the number of levels
4. Cost = $\sum_{i=0}^{\hbox{num levels}}$ cost for level $i$
  - This last step usually involves using properties of series
  
<br>

## Solving the Summation

$$W(n) = \sum_{i=0}^{\log n} (c_1n + 2^i c_2)$$

$$= \sum_{i=0}^{\log n}c_1 n + \sum_{i=0}^{\log n} 2^i c_2$$

$$= c_1n \sum_{i=0}^{\log n} 1 + c_2 \sum_{i=0}^{\log n} 2^i$$


To solve this, we'll make use of bounds for **geometric series**. 

For $\alpha > 1$: $\:\:\: \sum_{i=0}^n \alpha^i  < \frac{\alpha}{\alpha - 1}\cdot\alpha^n$

e.g,

$\sum_{i=0}^{\log n} 2^i < \frac{2}{1} * 2^{\log n} = 2n$

<br>

For $\alpha < 1$: $\:\:\: \sum_{i=0}^\infty \alpha^i  < \frac{1}{1-\alpha}$

e.g,

$\sum_{i=0}^{\log n} \frac{1}{2^i} < 2$


<br> plugging in...

$$= \sum_{i=0}^{\log n} (c_1 n + 2^i c_2)$$

$$= \sum_{i=0}^{\log n}c_1 n + \sum_{i=0}^{\log n} 2^i c_2$$

$$= c_1n \sum_{i=0}^{\log n} 1 + c_2 \sum_{i=0}^{\log n} 2^i$$


## What about the span?

The recurrence for the span of Mergesort is:

$$
\begin{align}
S(n) = \begin{cases}
  c_3, & \text{if $n=1$} \\
  S(n/2) + c_4 \log n, & \text{otherwise} 
  \end{cases}
\end{align}
$$

Why?


The branching factor is $1$.

$S(n) = \pmb{S(n/2)} + c_4 \log n$

<img src = "figures/span_in_work_tree.jpeg" width = "100%" class="center">

$S(n) = S(n/2) + \pmb{c_4 \log n}$

For span recurrences, this term represents the span of each subproblem.

A sequential merge algorithm requires $O(n)$ work and span.

However, there exists a [parallel merge algorithm](https://www.mcs.anl.gov/~itf/dbpp/text/node127.html) with:

- $W(n) \in O(n)$
- $S(n) \in O(\log n)$

### Solving the Span recurrence

$$
\begin{align}
S(n) = \begin{cases}
  c_3, & \text{if $n=1$} \\
  S(n/2) + c_4 \log n, & \text{otherwise} 
  \end{cases}
\end{align}
$$

<img src="figures/span_tree.jpeg" width="500">


$ \begin{align}
S(n) & = & \sum_{i=0}^{\log n} \log\frac{n}{2^i}\\
& = & \sum_{i=0}^{\log n} (\log n - i)\\
& = & \sum_{i=0}^{\log n} (\log n) - \sum_{i=1}^{\log n} i\\
& < & \log^2 n  - \frac{1}{2}\log n * (\log n+1) \:\: (\hbox{using}\:\:\sum_{i=0}^n = \frac{n(n+1)}{2})\\
& < & \log^2n - \frac{1}{2}\log^2 n - \frac{1}{2}  \log n\\
& \in & O(\log^2 n)\\
\end{align}$

### Divide and Conquer

A divide-and-conquer algorithm, at each step, divides the problem into subproblems, solves each, then combines the results to arrive at the final solution. These algorithms can be easily implemented using recursion.

Recurrences can easily be written and interpretted from the perspective of divide and conquer algorithms.


Merge Sort is an example of a divide-and-conquer algorithm that we will see on Friday (Week4, day 2). Its work obeys the following recursive equation. 

$$
W(n) = \begin{cases}
  c_b, & \text{if $n=1$} \\
  aW(\frac{n}{b}) + f(n), & \text{otherwise} 
  \end{cases}
$$

- $a$ is the branching factor.
- $\frac{n}{b}$ is sub-problem size at the next level.
- $f(n)$ is the amount of work within each recursive call.

## General Recurrences

<img src="figures/general-recursion-tree.jpeg">

### More Practice

Another recurrence:
    
$$
W(n) = \begin{cases}
  c_b, & \text{if $n=1$} \\
  2W(n/2) + n^2, & \text{otherwise} 
  \end{cases}
$$

What is the asymptotic runtime?

$$W(n) = 2W(n/2) + n^2$$

<img width="110%" src="figures/quadratic-recurrence.jpeg"/>



$W(n) = \sum_i^{\log n} (\frac{n^2}{2^i})$

$= n^2 \sum_{i=0}^{\log n} \frac{1}{2^i}$

$= c_1 n^2 \sum_{i=0}^{\log n} (\frac{1}{2})^i$



To solve this, we can again use **geometric series**.


For $\alpha < 1$: $\:\:\: \sum_{i=0}^\infty \alpha^i  < \frac{1}{1-\alpha}$

e.g., $\sum_{i=0}^{\log n} \frac{1}{2^i} < 2$

To solve this, we can again use **geometric series**.


For $\alpha < 1$: $\:\:\: \sum_{i=0}^\infty \alpha^i  < \frac{1}{1-\alpha}$

e.g., $\sum_{i=0}^{\log n} \frac{1}{2^i} < 2$

$< 2 n^2$

$\in O(n^2)$

So what if branching factor is not 2?

<img width="110%" src="figures/quadratic-recurrence_2.jpeg"/>


$$W(n) = \sum_{i=0}^{\log n} n^2$$

<br>

still $\log n$ levels. Why?

Because at every level we are dividing the problem size in half, so we still need $\log_2 n$ levels.


If we were dividing the problem size in thirds, we would need $\log_3 n$ levels

$$W(n) = \sum_{i=0}^{\log n} n^2$$

$$W(n) = n^2 \sum_{i=0}^{\log n} 1$$

$$W(n) = n^2 \log n$$

$$W(n) \in \Theta(n^2 \log n)$$

<h3>The Brick Method and the Master Theorem</h3>

When we solve recurrences using the Tree method, we typically find $3$ cases.

1. (Root dominated) 
    - The sum of costs of the levels decays geometrically
    - The total cost is the cost of the root, or top level.
2. (Balanced) 
    - Each level has roughly the same cost.
    - The total cost is the cost of each level times the number of levels.
    - The number of levels is typically $\log(n)$, so we typically gain a factor of log.
3. (Leaf dominated) 
    - The sum of the costs of the levels grows geometrically.
    - The total cost is the cost of the leaves.

We can provide a cookie-cutter theorem that allows us to treat many recurrences uniformly. We call this the Master Theorem.


Master Theorem:
Suppose that our recurrence is of the form $T(n)=aT(\frac{n}{b})+f(n)$, where $f(n)=n^\alpha (\log(n))^k$.

Then:
1. (Root Dominated) If $\log_b(a) < \alpha$, the total cost is $T(n)\in \Theta(f)$.
2. (Balanced) If $\log_b(a) = \alpha$, then the total cost is $f(n)\log(n)$. We incur an extra factor of log.
3. (Leaf Dominated) If $\log_b(a)>\alpha$, then the total cost is $n^{\log_b(a)}$.

Not every recurrence can be solved with the Master Theorem.
