<img src="http://dask.readthedocs.io/en/latest/_images/dask_horizontal.svg"
     align="right"
     width="30%"
     alt="Dask logo\">

# Parallelize your Python code

In this notebooks, we will explain how Dask enables the parallelization of basic sequential programs throught domain decomposition (data parallelism) and functional decomposition (task parallelism).

**Introduction**

Dask is a **library for parallel/distributed computing** in Python. 

* Python users can achieve **data parallelism**, by using Dask libraries such as `Dask Array` (mimic Numpy), `Dask DataFrame` (mimic Pandas), or `Dask Bag`.
* Users can also achieve **task parallelism** by using libraries such as `Dask Delayed`.

**Content**

1. Parallel programming
2. Problem decomposition strategies
3. Example

**Learning outcomes**
* Define parallel programmming
* Describe parallel programmming most common problem decomposition strategies.
* Identify which Dask data structures enable domain and functional decomposition. 

## 1. Parallel programming

Parallel programmming is the use of two or more processor/cores in combination to solve a single problem.

In order to use two or more processor/cores **we must divivde the problem**.

Parallel programming is about **divide and conquer**, that is partitining the problem in order to feed this partitions (chunks) into multiple processors in a single machine or a distributed system. Then processors can perform computations on these partitions at the same time.

| Sequential programming                                                                                        | Parallel programming                                                                                        |
|---------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|
| The problem was **NOT** devided by the programmer                                                            | The problem was divided by the programmer                                                                   |
| <img src="https://raw.githubusercontent.com/DonAurelio/dask-tutorial-2023/main/img/sequential_programming.png"> | <img src="https://raw.githubusercontent.com/DonAurelio/dask-tutorial-2023/main/img/parallel_programming.png"> |

# 2. Problem decomposition strategies

Problem decomposition (or division) is how parallel computing is achieved. Two decomposition strateties dominate when dealing with scientific and data analysis computations: domain and functional decomposition. 

**Domain decomposition or Data parallelism**

* The main idea is to **partition the domain of the problem**. In scientific computing, the problem domain is commonly represented via **vectors** and **matrices** whilst in data analystics, the problem domain is commonly represented via **data tables**.
* Implies partitioning data to processes such that *a single portion of data is assigned to a single core* [1].
* Implies the simulataneous execution of the same function (operation) across the elements of a dataset [2].

**Figure:** *[Left]* The domain is partitioned into chunks. Chucks are represenetd by colors. Every chunk is feed into a single core. *[Right]* Chunks in the figure are represented by subtables. Every subtable is feed into a single core.

| Array                                                                                                                             | Table                                                                                                                             |
|-----------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------|
|                                                         |
| <img src="https://raw.githubusercontent.com/DonAurelio/dask-tutorial-2023/main/img/domain_decomposition_array.png" width="600px"> | <img src="https://raw.githubusercontent.com/DonAurelio/dask-tutorial-2023/main/img/domain_decomposition_table.png" width="600px"> |


**Functional decomposition or Task parallelism** 

* The main idea is to **divide the code** used to solve a problem. This is commonly acheved by identifing functions, operations or secctions of the code that are independent of others.
* Implies partitioning code to processes such that *a single portion of code (e.g., a function) is assigned to a single core* [1].
* Implies the simultaneous execution of multiple and different functions across the same or different data sets [2].

**Figure:** *[Left]* The code represents the following ecuation $z = x^2 + y^2$ for $x=2$ and $y=4$. *[Right]* If the program is executed in a sequential approach, it will take 4 time steps. However, when considering functional decomposition, the same function `squared` can be applied in parallel to $2$ and $4$ since these values are independent. Then, the amount of time steps is reduced to 3.

<img src="https://raw.githubusercontent.com/DonAurelio/dask-tutorial-2023/main/img/functional_decomposition.png">

## 3. Example 1 

**Problem**. Consider the following equation for arrays of shape `(1000, 1000)`. Suppose $x$ and $y$ are both matrices of ones.

$$
  z = x^2 + y^2
$$

**Activities**
1. Sequential programming approach: Write a sequential program to find the value of $z$.
2. Parallel programming approach: Write a parallel version of the sequential program using the domain decomposition approach.
3. Parallel programming approach: Write a parallel version of the sequential program using the functional decomposition approach.

### Sequential programming approach

__1. Import required libraries, define required variables and functions__

In [None]:
import numpy as np 

__2. Write the sequential program to find the value of $z$__

In [None]:
%%time

# Size of the array
shape = (1000,1000)
# Create matrcies of ones
x = np.ones(shape=shape)
y = np.ones(shape=shape)
# Describe the equation and compute result
z = (x**2) + (y**2)
z

### Parallel programming approach: Domain decomposition

In order to achieve parallelism, using the domain decomposition approach, we will use `Dask Array`. 

__1. Import required libraries, define required variables and functions__

In [None]:
import dask.array as da

shape = (10000,10000)
chunks = (100,100)

__2. Create the first array__ using `Dask Array` and perform domain decomposition. The `chucks` param tell Dask the size of the chuck that will be considered for the decomposition. 

In [None]:
x = da.ones(shape=shape, chunks=chunks)
x

__3. Create the second array__ using `Dask Array` and perform domain decomposition. The `chucks` param tell Dask the size of the chuck that will be considered for the decomposition. 

In [None]:
y = da.ones(shape=shape, chunks=chunks)
y

__3. Write the parallel version of the program__

*Hint: you just need to write the equation $z = x^2 + y^2$ in Python langauge*

In [None]:
z = (x**2) + (y**2)
z

**4. Compute the result** of the equation

_Questions: wondering why we need to use `compute` in Dask? Ask the speaker._

In [None]:
%%time

z.compute()

#### Parallel programming approach: Functional decomposition

In order to achieve parallelism, using the functional decomposition approach, we will use `Dask Delayed`. However, **we need first to adjust our code** to be able to apply functional decomposition.

__1. Import required libraries, define required variables and functions__

In [None]:
import numpy as np
import dask

shape = (10000,10000)

def squared(a):
    return a**2

def add(a,b):
    return a + b

__2. Create arrays__

In [None]:
x = np.ones(shape=shape)
y = np.ones(shape=shape)

__3. Write the equation in terms of functions__

_Hint: you just need to write the equation $z = x^2 + y^2$ in Python langauge, but using the functions `squared` and `add` in place of `**` and `+`._

In [None]:
%%time 

a = squared(x)
b = squared(y) 
c = add(a,b)
c

__4. Parallelize the previous program__ using Dask Delayed.

In [None]:
%%time 

a = dask.delayed(squared)(x)
b = dask.delayed(squared)(y) 
c = dask.delayed(add)(a,b)

__5. Visualize the parallel computation__ to be performed.

In [None]:
c.visualize()

__6. Execute the computation__

_Questions: wondering why we need to use `compute` in Dask? Ask the speaker._

In [None]:
c.compute()

# References

1. Vitorović, A., Tomašević, M. V., & Milutinović, V. M. (2014). Manual parallelization versus state-of-the-art parallelization techniques: The spec cpu2006 as a case study. In Advances in Computers (Vol. 92, pp. 203-251). Elsevier.
2. Terrell, R. (2018). Concurrency in. NET: Modern patterns of concurrent and parallel programming. Simon and Schuster.