In [3]:
# setup
from IPython.core.display import display,HTML
display(HTML('<style>.prompt{width: 0px; min-width: 0px; visibility: collapse}</style>'))
display(HTML(open('rise.css').read()))

# imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(style="whitegrid", font_scale=1.5, rc={'figure.figsize':(12, 6)})


# CMPS 2200
# Introduction to Algorithms

## Cost models 


## Language Based Models

- Define a language to specify algorithms
- Assign a cost to each expression
- Cost of algorithm is sum of costs for each expression


**Work-Span Model**
 > For a given expression $e$ [a series of statements], we will analyze the work $W(e)$ and span $S(e)$ 


## SPARC

Our textbook uses a pseudo code language called **SPARC**
- based on [Standard ML](https://en.wikipedia.org/wiki/Standard_ML) [ML: Meta language]
- functional language

When possible, we will also show Python versions of key algorithms.




## Example SPARC program


<br><br>
<p> <span>\[\begin{array}{l}  
\texttt{let}\\   
~~~~x = 2 + 3\\  
~~~~f (w) = (w * 4, w - 2)\\  
~~~~(y,z) = f(x-1)\\  
\texttt{in}\\   
~~~~x + y + z\\  
\texttt{end}   
\end{array}\]</span></p>
<br><br>

<br><br>
**binding**: associate entities (data or code) with identifiers.

<br>

**let expression:**

**let**  
$\:\: b^+$  
**in**  
$\:\:e$  
**end**

Expression $e$ is applied using the bindings defined inside **let**.

<br><br>
**expression** *e*: describes a computation  
- **evaluating** an expression produces its value

<br><br>
$x = 2 + 3 = 5$  
$f(4) \rightarrow (16, 2)$  
$x + y + z= 5 + 16 + 2 = 23$







### What does this do?

<p><span class="math display">\[\begin{array}{l}  
\texttt{let}\\  
~~~~f(i) = \texttt{if} ~(i < 2) ~\texttt{then}~ i ~\texttt{else}~ i  *   
f(i - 1) \\  
\texttt{in} \\   
~~~~f(5) \\  
\texttt{end}   
\end{array}\]</span> </p>



In [1]:
def factorial(i):
    if i<2:
        return i
    else:
        return i*factorial(i-1)
    
factorial(5)

120

## Composition [SPARC]

<img src="figures/composition.png" width="50%" />


-   $(e_1, e_2)$: Sequential Composition

    -   Add work and span

-   $(e_1 || e_2)$: Parallel Composition

    -   Add work but **take the maximum span**
    
    

### parallel composition: $(e_1 || e_2)$

- $W(e_1 || e_2) = 1 + W(e_1) + W(e_2)$  
- $S(e_1 || e_2) = 1 + \max(S(e_1), S(e_2))$  

Let's look at the specification and recurrence for Summing List: 

<p><span class="math display">\[\begin{array}{l}  
\mathit{sumList}~a =  
\\   
~~~~\texttt{if}~|a| \leq 1~\texttt{then}  
\\   
~~~~~~~~a  
\\  
~~~~\texttt{else}  
\\   
~~~~~~~~\texttt{let}  
\\  
~~~~~~~~~~~~(l,r) = \mathit{splitMid}~a  
\\   
~~~~~~~~~~~~(l',r') = (\mathit{sumList}~l \mid\mid{} \mathit{sumList}~r)  
\\  
~~~~~~~~\texttt{in}  
\\   
~~~~~~~~~~~~l'+r'  
\\  
~~~~~~~~\texttt{end}  
\end{array}\]</span></p>





## Parallelism

how many processors can we use efficiently?


**average parallelism**: 

$$
\overline{P} = \frac{T_1}{T_\infty} = \frac{W}{S}
$$


<br><br>
To increase parallelism, we can either:
- decrease span
- increasing work (but that's not really desireable, since we want the overall cost to be low)

<br>

**work efficiency**: a parallel algorithm is *work efficient* if it performs asymptotically the same work as the best known sequential algorithm for the problem.

So, we want a *work efficient* parallel algorithm with low span.


## Scheduling

Key issue of parallel algorithms is **scheduling**: which processor will run which task when?
- typically have more tasks than processors.

Recall our parallel sum method:

![dag-sum](figures/dag-sum.png)  
[source](https://homes.cs.washington.edu/~djg/teachingMaterials/spac/sophomoricParallelismAndConcurrency.pdf)



We must decide when to run each part of the sum. There are dependencies that constrain the order.

## Scheduler
- For each task generated by a parallel algorithm, assign it to an available processor
- Goal: minimize execution time.





## Greedy Scheduler

Whenever there is a processor available and a task ready to execute, assign the task to the processor and start it immediately. 

Why might this not be optimal?

![greedy](figures/greedy.png)

Greedy schedulers have an important property that is summarized by the **greedy scheduling principle**.

Assuming $P$ processors, then the time $T_P$ to perform computation with work $W$ and span $S$ is bounded by:

$$T_P < \frac{T_1}{P} + T_\infty = \frac{W}{P} + S$$


Because we know:  
- $T_P \ge \frac{W}{P}$, since that would be the optimal division of work to processors
- $T_P \ge S = T_\infty$, by the definition of span

we can conclude that the best we can hope for is:

$$T_P \ge \mathrm{max}(\frac{W}{P},S)$$

Therefore, the time using a greedy scheduler is bounded by:

$$ \mathrm{max}(\frac{W}{P},S) \le T_P < \frac{W}{P} + S$$

<br>
How good is greedy? How close is $(\frac{W}{P} + S)$ to $\mathrm{max}(\frac{W}{P},S)$?


actually pretty close.

> $\frac{W}{P} + S \le 2 * \mathrm{max}(\frac{W}{P},S)$

(why? consider what the worst possible span is...)

<br>

Greedy scheduler gets better, the more parallelism is possible in the algorithm.

Recall average parallelism: $\overline{P} = \frac{W}{S}$

We can rewrite:

$$T_P < \frac{W}{P} + S= \frac{W}{P} + \frac{W}{\overline{P}}=\frac{W}{P}(1+\frac{P}{\overline{P}})$$


So, the greater $\overline{P}$ is than $P$, the closer to optimal we get.


E.g., recall our parallel sum method, which has

$W=O(n)$  
$S=O(\lg n)$

<br>
$\overline{P} = \frac{W}{S} = \frac{O(n)}{O(\lg n)}$

<br>

$ \mathrm{max}(\frac{W}{P},S) \le T_P < \frac{W}{P} + S$

<br>

so, if we have 2 processors:

$\mathrm{max}(\frac{O(n)}{2},O(\lg n)) \le T_2 < \frac{O(n)}{2} + O(\lg n)$

$\frac{O(n)}{2} \le T_2 < \frac{O(n)}{2} + O(\lg n)$

<br><br><br>

if we have $\lg n$ processors:


$\mathrm{max}(\frac{O(n)}{\lg n},O(\lg n)) \le T_{\lg n} < \frac{O(n)}{\lg n} + O(\lg n)$

$\frac{O(n)}{\lg n} \le T_{\lg n} < \frac{O(n)}{\lg n} + O(\lg n)$




The advantage of the Work-Span model:

- We can design parallel algorithms without worrying about scheduling details.

- We are ignoring some overhead in the creation of the schedule itself.
  - This is acceptable since we are focused on asymptotics (just as in RAM model)