In [2]:
# setup
from IPython.core.display import display,HTML
display(HTML('<style>.prompt{width: 0px; min-width: 0px; visibility: collapse}</style>'))
display(HTML(open('rise.css').read()))

# imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(style="whitegrid", font_scale=1.5, rc={'figure.figsize':(12, 8)})


# CMPS 2200
# Introduction to Algorithms

## Asymptotic Analysis

- Finish the `Intro Survey`
- Read Diderot textbook
- Create GitHub/Repls Accounts
- Thursday lab is in person, either morning session or afternoon session, from the third week.
- Annoucements for Quiz/Assignments/Lab will be sent along with publish. 

## Algorithm

>  an explicit, precise, unambiguous, mechanically-executable sequence of elementary instructions, usually intended to accomplish a specific purpose.<br>

#### Note that the same task can be implemented in different algorithms 
- Analyze Algorithms: methods to compute tight bounds on running time
- Design Algorithms: various approaches to designing efficient algorithms(lists, sequences, trees, graphs,...)

#### This course's focus on **efficiency** in parallel computing.
  - runs quickly  
  - requires little memory


### Analysis of Linear Search

- Assign a time cost $c_i$ to each line $i$.
- Figure out how often each line is run $n_i$
- total cost is the cost per line multiplied by the number of times it is run


$
\hbox{Cost(linear-search, mylist, key)} = \sum_i c_i * n_i
$

In [1]:

def linear_search(mylist, key):        #   cost         number of times run
    for i,v in enumerate(mylist):      #   c1               ?
        if v == key:                   #   c2               ?
            return i                   #   c3               ?
    return -1                          #   c4               ?

### Best/Average/Worst case

To deal with the effects of the input values on performance, we can consider three types of analysis, but only focus on worst case [Liebig's law of the minimum].


### Worst-case analysis of linear search

Assume $n \leftarrow$ `len(mylist)`


$ \hbox{Cost(linear-search, } n) = c_1n + c_2n + c_4$


### Big Idea: Asymptotic Analysis

- Ignore machine-dependent constants
- Focus on **growth** of running time
  - What happens in the limit as $n \rightarrow \infty$ https://en.wikipedia.org/wiki/Limit_(mathematics)

$ c_1n + c_2n + c_4 \approx c_5n + c_6n + c_4 \in O(n)$

### Definition: Asymptotic dominance

Function $f(n)$ **asymptotically dominates** function $g(n)$ if **there exist** constants $c$ and $n_0$ such that

$ g(n) \le c \cdot f(n)$ **for all** $n \ge n_0$

<br>

e.g., $f(n)= c_1n^2$ asymptotically dominates $g(n)=c_2n +c_3$


## Asymptotic Notation

$
\begin{align}
\hline
\mathcal{O} (f(n)) & = \{ g(n) \mid f(n) \hbox{ asymptotically dominates } g(n)\}\\
\Omega (f(n)) & =  \{ g(n) \mid  g(n) \hbox{ asymptotically dominates } f(n)\}\\
\Theta (f(n)) & =  \mathcal{O} (f(n)) \cap \Omega (f(n))\\
\hline
o(f(n)) &  = \mathcal{O}(f(n)) \setminus \Theta(f(n))\\
\omega (f(n)) &=\Omega(f(n)) \setminus \Theta(f(n))\\
\hline
\end{align}
$

Note that $\cap$ denotes set intersection and $\setminus$ means set difference.

In other words, $\mathcal{O}()$ is the upper bound, and $\Omega()$ is lower bound.

## Example
![dag-sum](figures/asym_exmp.png) 

- We often abuse notation such as $2n = \mathcal{O}(n^2)$ or $2n \mathrm{~is~} \mathcal{O}(n^2)$


## Analogy 

|$\mathcal{O}~~~$ | $\Omega~~~$ | $\Theta~~~$ | $\mathcal{o}$ | $\omega$ |
|--------------|----------|----------|---------------|----------|
| $\leq~~~$       | $\geq~~~$   | $=~~~$      | $\lt~~~ $         | $\gt~~~$    |    

<br><br>
## Limit Method 
 - L’Hospital’s Rule https://en.wikipedia.org/wiki/L%27H%C3%B4pital%27s_rule
![dag-sum](figures/an.png) 

# CMPS 2200
# Introduction to Algorithms

## Parallelism

## What is parallelism? (aka parallel computing)

> ability to run multiple computations at the same time

## Why study parallel algorithms?

- faster
- lower energy usage
  + performing a computation twice as fast sequentially requires roughly eight times as much energy
  + energy consumption is a cubic function of clock frequency
- better hardware now available
  + multicore processors are the norm
  + GPUs (graphics processor units)
  
E.g., more than **one million** core machines now possible:  
SpiNNaker (Spiking Neural Network Architecture), University of Manchester  
<img src="https://upload.wikimedia.org/wikipedia/commons/9/97/Spinn_1m_pano.jpg" alt="SpiNNaker"/>



## Example: Summing a list

Summing can easily be parallelised by splitting the input list into two (or $k$) pieces.


In [2]:
def sum_list(mylist):
    result = 0
    for v in mylist:
        result += v
    return result

sum_list(range(10))

45

becomes

In [3]:
from multiprocessing.pool import ThreadPool

def parallel_sum_list(mylist):
    result1, result2 = in_parallel(
        sum_list, mylist[:len(mylist)//2],
        sum_list, mylist[len(mylist)//2:]
    )
    # combine results
    return result1 + result2

def in_parallel(f1, arg1, f2, arg2):
    with ThreadPool(2) as pool:
        result1 = pool.apply_async(f1, [arg1])  # launch f1
        result2 = pool.apply_async(f2, [arg2])  # launch f2
        return (result1.get(), result2.get())   # wait for both to finish

parallel_sum_list(list(range(10)))

45

- How much faster should parallel version be?

`parallel_sum_list` is twice as fast `sum_list` 

<br><br>
...almost. This ignores the **overhead** to setup parallel code and communicate/combine results.

$O(\frac{n}{2}) + O(1)$  
$O(1)$  to combine results


<br>
some current state-of-the-art results:

| application                           | sequential | parallel (32 core) | speedup |
|---------------------------------------|------------|--------------------|---------|
| sort $10^7$ strings                   | 2.9        | .095               | 30x     | 
| remove duplicated from $10^7$ strings | .66        | .038               | 17x     | 
| min. spanning tree for $10^7$ edges   | 1.6        | .14                | 11x     | 
| breadth first search for $10^7$ edges | .82        | .046               | 18x     |

The **speedup** of a parellel algorithm $P$ over a sequential algorithms $S$ is:
$$
speedup(P,S) = \frac{T(S)}{T(P)}
$$

## Parallel software

So why isn't all software parallel?

**dependency**

> The fundamental challenge of parallel algorithms is that computations must be **independent** to be performed in parallel.  Parallel computations should not depend on each other.

**What should this code output?**

run this a few times and see if output changes...

In [4]:
total = 0

def count(size):
    global total
    for _ in range(size):
        total += 1
    
def race_condition_example():
    global total
    in_parallel(count, 1000000,
                count, 1000000)
    print(total)
    
race_condition_example()

1525722


#### Counting in parallel is hard!

- motivates functional programming (next class)

This course will focus on:
- understanding when things can run in parallel and when they cannot
- algorithm, not hardware specifics (though see CMPS 4760: Distributed Systems)
- runtime analysis