In [1]:
# setup
from IPython.core.display import display,HTML
display(HTML('<style>.prompt{width: 0px; min-width: 0px; visibility: collapse}</style>'))
display(HTML(open('rise.css').read()))

# imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(style="whitegrid", font_scale=1.5, rc={'figure.figsize':(12, 6)})


  from IPython.core.display import display,HTML


# CMPS 2200
# Introduction to Algorithms

## Parallelism - Functional Programming


## What is parallelism? (aka parallel computing)

> ability to run multiple computations at the same time


## Why study parallel algorithms?

- faster
- lower energy usage
  + performing a computation **twice** as fast sequentially requires roughly **eight** times as much energy
  + energy consumption is a cubic function of clock frequency
- better hardware now available
  + multicore processors are the norm
  + GPUs (graphics processor units)
  
E.g., more than **one million** core machines now possible:  
SpiNNaker (Spiking Neural Network Architecture), University of Manchester  
<img src="https://upload.wikimedia.org/wikipedia/commons/9/97/Spinn_1m_pano.jpg" alt="SpiNNaker"/>



## Example: Summing a list

Summing can easily be parallelised by splitting the input list into two (or $k$) pieces.

- [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

In [3]:
def sum_list(mylist):
    result = 0
    for v in mylist:
        result += v
    return result

sum_list(range(10))

45

becomes

In [4]:
from multiprocessing.pool import ThreadPool

def in_parallel(f1, arg1, f2, arg2):
    with ThreadPool(2) as pool:
        result1 = pool.apply_async(f1, [arg1])  # launch f1
        result2 = pool.apply_async(f2, [arg2])  # launch f2
        return (result1.get(), result2.get())   # wait for both to finish

def parallel_sum_list(mylist):
    result1, result2 = in_parallel(
        sum_list, mylist[:len(mylist)//2],
        sum_list, mylist[len(mylist)//2:]
    )
    # combine results
    return result1 + result2

parallel_sum_list(list(range(10)))

45

- How much faster should parallel version be?

`parallel_sum_list` is twice as fast `sum_list`

<br><br>
...almost. This ignores the **overhead** to setup parallel code and communicate/combine results.

$O(\frac{n}{2}) + O(1)$  
$O(1)$  to combine results





The **speedup** of a parellel algorithm $P$ over a sequential algorithm $S$ is:
$$
\text{speedup}(P,S) = \frac{T(S)}{T(P)} 
$$

<br>
<br>
<br>
<br>

some current state-of-the-art results:


| application                           | sequential | parallel (<span style="color:red">**32 core**</span>) | speedup |
|---------------------------------------|------------|--------------------|---------|
| sort $10^7$ strings                   | 2.9        | .095               | 30x     |
| remove duplicated from $10^7$ strings | .66        | .038               | 17x     |
| min. spanning tree for $10^7$ edges   | 1.6        | .14                | 11x     |
| breadth first search for $10^7$ edges | .82        | .046               | 18x     |

<br>

<span style="color:red">**Question**</span>: Why are the speedups different and lower than 32?


<span style="color:blue">**Question**</span>: Why aren't all softwares parallel? Any examples in real life?



### Dependency

> The fundamental challenge of parallel algorithms is that computations must be **independent** to be performed in parallel.  Parallel computations should not depend on each other.

**What should this code output?**

 - run this a few times and see if output changes...

In [6]:

def count(size):
    global total
    for _ in range(size):
        total += 1
    
def race_condition_example():
    global total
    in_parallel(count, 100000,
                count, 100000)
    print(total)
    
race_condition_example()

140632


#### Counting in parallel is hard!

- motivates functional programming 

This course will focus on:
- understanding when things can run in parallel and when they cannot
- algorithm, not hardware specifics (though see CMPS 4760: Distributed Systems)



## Functional languages

In functional languages, functions act like mathematical functions.

Two key properties:

1. function maps an input to an output $f : X \mapsto Y$
    - no **side effects**
    
    
2. function can be treated as values
    - function A can be passed to function B

## Pure function

A function is **pure** if it maps an input to an output with no **side effects.**

A computation is **pure** if all of its functions are pure.


In [1]:
def double(value):
    return 2 * value

double(10)

20

We can view the `double` function as a mathematical function, defined by the mapping:

$$ \{(0, 0), (1, 2), (2, 4), \ldots \}$$

versus...

In [3]:
def append_sum(mylist):
    return mylist.append(sum(mylist))

mylist = [1,2,3]
append_sum(mylist)
mylist

[1, 2, 3, 6]

This has the side effect of changing (or *mutating*) `mylist`.

though compare with...

In [4]:
def append_sum(mylist):
    return list(mylist).append(sum(mylist))

mylist = [1,2,3]
append_sum(mylist)
mylist

[1, 2, 3]

Almost all "real" computations have some side effects. Consider:

In [5]:
def do_sum(mylist):
    total = 0
    for v in mylist:
        total += v
    return total

`do_sum` has the side effect of modifying `total`. But, this effect is not visible outside of `do_sum`, due to variable scoping.

> **benign effect:** a side-effect that is not observable from outside of the function.

A function with benign effects is still considered pure.

## Why is pure computation good for parallel programming?

  Recall our **race condition** example:

In [6]:
from multiprocessing.pool import ThreadPool

def in_parallel(f1, arg1, f2, arg2):
    with ThreadPool(2) as pool:
        result1 = pool.apply_async(f1, [arg1])  # launch f1
        result2 = pool.apply_async(f2, [arg2])  # launch f2
        return (result1.get(), result2.get())   # wait for both to finish
    
total = 0

def count(size):
    global total
    for _ in range(size):
        total += 1
    
def race_condition_example():
    global total
    in_parallel(count, 100000,
                count, 100000)
    print(total)
    
race_condition_example()

140632


The `count` function has a side-effect of changing the global variable `total`.


More generally, if we want to parallelize two functions $f(a)$ and $g(b)$, we want the same result **no matter which order they are run in.**

> Because of the lack of side-effects, pure functions satisfy this condition.

## Data Persistence

In pure computation no data can ever be overwritten, only new data can be created.   

Data is therefore always **persistent**
  â€”if you keep a reference to a data structure, it will always be there and in the same state as it started.


<span style="color:red">**Note:**</span> This is the same to **immutable types** in Python. <a href="https://www.geeksforgeeks.org/python/mutable-vs-immutable-objects-in-python/">[Link]</a>

## Functional languages

In functional languages, functions act like mathematical functions.

Two key properties:

1. function maps an input to an output $f : X \mapsto Y$
    - no **side effects**
    
    
2. function can be treated as values
    - function A can be passed to function B

## Functions as values

Many languages allow functions to be passed to other functions.

Functions as "first-class values."

In [2]:
def double(value):
    return 2 * value

def double_and_sum(double_fn, vals):
    total = 0
    for v in vals:
        total += double_fn(v)
    return total

# pass the function double to the function double_and_sum
double_and_sum(double, [1,2,3]) 
# 1*2 + 2*2 + 3*3

12

`double_and_sum` is called a **higher-order function**, since it takes another function as input.

Why is this useful?

In [7]:
def map_function(function, values):
    for v in values:
        yield function(v)

list(map_function(double, [1,2,3]))

[2, 4, 6]

In [6]:
def square(value):
    return value * value

list(map_function(square, [1,2,3]))

[1, 4, 9]

In [7]:
list(map_function(double, map_function(square, [1,2,3])))

[2, 8, 18]

- If we know that `function` is pure, then we can trivially parallelize `map_function` for many inputs.



- By using higher-order functions, we can define a few primitive, high-order functions that will make it easier to reason about and analyze run-time of parallel computations.

## Lambda Calculus 

Consists of expressions $e$ in one of three forms:

1. a **variable**, e.g., $x$
2. a **lambda abstraction**, e.g., $(\lambda \: x \: . \: e)$, where $e$ is a function body.
3. an **application**, written $(e_1, e_2)$ for expressions $e_1$, $e_2$.

```python
factorial = lambda i: i if i < 2 else i*factorial(i-1)
factorial(8)
```

<a href='https://en.wikipedia.org/wiki/Lambda_calculus'>Source</a>

In [1]:
def factorial(i):
    if i<2: 
        return i
    else:
        return i*factorial(i-1)
    

factorial(8)

40320

In [1]:
# lambda functions exist in Python.
# these are anonymous functions (no names)
# Here, e_2 is a variable.
square = lambda x: x*x
square(10)

100

We can also chain functions together. E.g., $e_2$ can be another function.

In [2]:
def compose(g, f):
    """
    Returns a **function** that composes f and g
    """
    return lambda x: g(f(x))  # different from just: g(f(x))

def meter2cm(d):
    return d * 100

def cm2inch(d):
    return d / 2.54


# how many inches in a meter?
meter2inch = compose(cm2inch, meter2cm)
meter2inch(1)

39.37007874015748