# Parallelism (this section is optional)

More efficient algorithms are more powerful algorithms in that they can solve larger problems. Many of our world's most important problems require tremendous computational power, more than any individual CPU could provide in any reasonable timeframe. 

A parallel algorithm is one in which independent parts of the problem can be solved concurrently and their solutions combined into a final solution. Parallel algorithms spread their work over multiple CPU cores, allowing computational resources to be used more effectively and larger problems to be solved.


## Key Concepts

### Independence

The idea of **independence** is crucial for parallel algorithms. For an algorithm to be parallelizable, it must contain parts which can be solved concurrently. If two parts of an algorithm depend on each other, then they must be executed one after another, limiting parallelizability. 



### Work

The **work** of an algorithm is our standard runtime. An algorithm's work is the total number of operations required as a function of the problem size.



### Span

The **span** of an algorithm is a measure of how much dependency is built into an algorithm. An algorithm's span is the length of the longest chain of dependent operations.

Since dependency limits parallelizability, the *larger* the span, the *less* parallelizable an algorithm is.



### Parallelism

The parallelism of an algorithm is defined as:

$\text{parallelism} = \frac{\text{work}}{\text{span}}$

The smaller the span is with relation to the total work required, the more parallelizable an algorithm is.



## Recursively Searching through a list

In the last set of you notes, you considered a recursive implementation of linear search, now let's consider a second recusive solution to the searching problem.

To be specific, the problem being solved is:



### The Search Problem

Given an unsorted list and a key, return `True` if the key is in the list and `False` otherwise.

#### Strategy

In linear search, we look through the list element by element to see if any of them match the key.

Instead of scanning element by element, we can use a divide and conquer approach. 

The basic idea is:

> If the key is in the left half **or** if the key is in the right half, then it is in the list.


#### Divide and Conquer Search Implementation

In [1]:
def search_DC(lst, key):
    if len(lst) == 0:
        return False
    if len(lst) == 1:
        return lst[0] == key

    mid = len(lst)//2
    return search_DC(lst[:mid], key) or search_DC(lst[mid:], key)

lst = [4, 5, 2, 6, 1, 0, 3, 7]
search_DC(lst, 3)

True

#### Parallel Implementation?

The details of how to implement parallel programs are beyond the scope of this material, but to show that it is possible, a parallel implemenation is provided below.

In [2]:
from multiprocessing.pool import ThreadPool

def search_DC(lst, key):
    if len(lst) == 0:
        return False
    if len(lst) == 1:
        return lst[0] == key

    mid = len(lst)//2
    with ThreadPool(2) as pool:
        # launch the first recursive call
        result1 = pool.apply_async(search_DC, [lst[:mid], key])
        # launch the second recursive call
        result2 = pool.apply_async(search_DC, [lst[mid:], key])
        return result1.get() or result2.get() # wait for both to finish and return the answer

lst = [4, 5, 2, 6, 1, 0, 3, 7]
search_DC(lst, 3)

True

As you can see, parallelizing our implementation adds some complexity and clutter.

In order to better focus on the algorithms themselves, we will omit the actual parallelization in our implementations. In our future analysis however, we will assume that the algorithm has been parallelized. 

> In a future module, we will discuss threads and concurrency.

## Parallel Algorithm Analysis

To analyze `search_DC`, we need to determine its work and span.

First, let's think about its overall behavior.

At every step, it divides the list into two halves and recursively runs `search_DC` on each. Once the lists get down to be lists of size 1, it starts evaluating the base cases and combining results the results (through that **or** statement).

We can visualize it as the following computation graph:

<img src = "figures/search_recursive.jpeg" width = "100%">

### Work

The work of the algorithm is the total work done across all function calls.

Since every call performs constant work (two conditional checks, the calculation of mid, and the `or` in the return), the total work is just the total number of calls.

We can observe that the number of function calls is the number of nodes in the **divide** and **base cases** in the computation graph.

Also, they form a perfect binary tree, that is, a binary tree where the bottom level is full.

Drawn more abstractly:

<img src = "figures/binary_tree.jpeg" width = "75%">

Given its regularity, we can easily calculate the number of nodes in it.

For any perfect binary tree, the total number of nodes is:

$2*\text{num\_leaves}-1$

where the leaves are the nodes along the bottom of the tree. Verify this above.



For our search problem, how many leaves do we have?

Recalling from the computation graph, the leaves correspond to the base cases, and we will have one base case for every element in the list. Thus the number of leaves will be $n$, and the total **work** will be:

$2n-1 \in O(n)$

### Span

What about the span?

The span of an algorithm measures the dependency in that algorithm. Looking back at the computation graph, every node depends on the previous node. Thus, the span is the length of the longest path through the computation graph.



How do we calculate the length of the longest path? 

We can observe that the computation graph is basically two inverted binary trees. We can approximate the length of its longest path as being:

$2*\text{binary\_tree\_height}$

The height of a perfect binary tree with $n$ nodes is $\log_2 n$.

> Technically, our tree has $2n-1$ nodes so I am being a little loose here, but the difference between $\log_2 n$ and $\log_2 2n$ is $1$ and it won't affect our final analysis.

Our **span** is thus:

$2\log_2 n \in O(\log_2 n)$

### Parallelism

We have calculated our work and span

**Work:** $O(n)$

**Span:** $O(\log_2 n)$

Since our span is less than the work, we do have some parallelism here.

How much?

**Parallelism:** $O(\frac{n}{\log_2 n})$

This is exponential speedup! This is great!

By taking a divide and conquer approach and parallelizing the calls, we are able to achieve a great speedup.

## Summary

We have three important measures for the analysis of parallel algorithms.

The **work** of an algorithm is the total number of operations required by the algorithm.

The **span** of an algorithm is the length of the longest chain of dependent operations in it.

An algorithm's **parallelism** is the amount of speedup possible if it is parallelized. We calculate it as $\frac{work}{span}$.

The lower the span of an algorithm is, as compared to its work, the more parallelizable it it.

As we continue to discuss more algorithms, we will analyze both their work and their span so that we can get a handle on how much parallelism is possible.