In [1]:
# setup
from IPython.core.display import display,HTML
display(HTML('<style>.prompt{width: 0px; min-width: 0px; visibility: collapse}</style>'))
display(HTML(open('rise.css').read()))

# imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(style="whitegrid", font_scale=1.5, rc={'figure.figsize':(12, 6)})


  from IPython.core.display import display,HTML


# CMPS 2200
# Introduction to Algorithm


## Parallelism and Analysis


## Parallelism (aka parallel computing)

> ability to run multiple computations at the same time

- faster
- lower energy usage
- better hardware now available


### Pure Computation


<span style="color:orange">**Question**</span>: Does this function have side effects?


```python

def sum_list(mylist):
    result = 0
    for v in mylist:
        result += v
    return result

```



<br>

<span style="color:red">**Question**</span>: What is the running time? Note that the length of $mylist$ is $n$.


#### Parallel Algorithm

```python 
def parallel_sum_list(mylist):
    result1, result2 = in_parallel(
        sum_list, mylist[:len(mylist)//2],
        sum_list, mylist[len(mylist)//2:]
    )
    # combine results
    return result1 + result2

```

<span style="color:red">**Question**</span>: What is the running time when we have 2 processors? Note that the length of $mylist$ is $n$.




The **speedup** of a parellel algorithm $P$ over a sequential algorithm $S$ is:
$$
\text{speedup}(P,S) = \frac{T(S)}{T(P)} 
$$

<br>
<br>

<span style="color:blue">**Question**</span>: How shall we analyze paralell algorithms? 


For a project, worker **A** can complete the work alone in 2 days, earning $200 per day. 

Now, suppose worker **B** joins, and **A** and **B** have the same capability. 

- How many days will it take for them to finish the project together? 

- What will be the total cost of paying both workers? 

- What about if we have more workers?

## Analyzing parallel algorithms

> **work**: total number of primitive operations performed by an algorithm


- For sequential machine, just total sequential time. 
- On parallel machine, work is divided among $P$ processors



> **perfect speedup**: dividing $W$ work across $P$ processors yields total time $\frac{W}{P}$

> **span**: longest sequence of dependencies in computation
- time to run with an infinite number of processors
- measure of how "parallelized" an algorithm is
- also called: **critical path length** or **computational depth**


## More Intuition

>**work**: $T_1$ = time using one processor  
>**span**: $T_\infty$ = time using $\infty$ processors


>**work**: total energy consumed by a computation  
>**span**: minimum possible time that the computation requires


> **parallelism** = $\frac{T_1}{T_\infty}$  
> maximum possible speedup with unlimited processors

Summing can easily be parallelised by splitting the input list into two (or $k$) pieces.

> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]


**What is work and span of `parallel_sum_list` algorithm using $n$ threads?**


In [4]:
def in_parallel_n(tasks):
    """
    generalize in_parallel for n threads.
    
    Params:
      tasks: list of (function, argument) tuples to run in parallel
      
    Returns:
      list of results
    """
    with ThreadPool(len(tasks)) as pool:
        results = []
        for func, arg in tasks:
            results.append(pool.apply_async(func, [arg]))
        return [r.get() for r in results]


def parallel_n_sum_list(mylist):
 
    result = 0
    for v in results:
        result += v
    return result

parallel_n_sum_list(range(10))

45

- work: $O(n)$
- span: $O(n)$


**oops** that didn't work...

<br>

can we do better?

**Idea: Recursive Algorithm**
- let threads create threads recursively [**Example**:] https://www.programiz.com/python-programming/recursion

- parallelize multiple threads




## Divide-and-Conquer

![dag-sum](figures/dag-sum.png)  
[source](https://homes.cs.washington.edu/~djg/teachingMaterials/spac/sophomoricParallelismAndConcurrency.pdf)


In [None]:
# recursive, serial
def sum_list_recursive(mylist):
    print('summing %s' % mylist)
    if len(mylist) == 1:
        return mylist[0]
    return (
        sum_list_recursive(mylist[:len(mylist)//2]) +
        sum_list_recursive(mylist[len(mylist)//2:])
    )

sum_list_recursive(range(10))

In [None]:
# recursive, parallel
def sum_list_recursive_parallel(mylist):
    print('summing %s' % mylist)
    if len(mylist) == 1:
        return mylist[0]

    # each thread spawns more threads
    result1, result2 = in_parallel(
        sum_list_recursive_parallel, mylist[:len(mylist)//2],
        sum_list_recursive_parallel, mylist[len(mylist)//2:]
    )
    print('>>>merging %s and %s' % (result1, result2))
    return result1 + result2

sum_list_recursive_parallel(range(10))

## Computation Graph

![dag](figures/dag.png)  
[source](https://homes.cs.washington.edu/~djg/teachingMaterials/spac/sophomoricParallelismAndConcurrency.pdf)

- Directed-acyclic graph (DAG) where
  - Each node is a unit of computation $(O(1))$
  - An edge is a **computational dependency**
    - Edge from node $A$ and $B$ means $A$ must complete before $B$ begins
    
> **work**: total number of primitive operations performed by an algorithm  
> **span**: longest sequence of dependencies in computation



#### So, what is work and span for `sum_list_recursive_parallel`?

**work**: number of nodes

**span**: length of longest path

What is the height of a balanced binary tree with $n$ nodes?


> $\mathcal{O}(\log_2 n)$


- Number of leaf nodes in a perfect binary tree is $2^h$.
- To add an array of length $n$, the computation graph has $3n-2$ nodes.

<br>

so,

**work**: $3n-2 \in \mathcal{O}(n)$  
**span**: $2 \log_2 n \in \mathcal{O}(\log_2 n)$

<br>

**parallelism**: $\mathcal{O}(\frac{n}{\log_2 n})$ = **exponential speedup**


