<a href="https://www.dask.org/" target="_blank">
<img src="http://dask.readthedocs.io/en/latest/_images/dask_horizontal.svg"
     align="right"
     width="30%"
     alt="Dask logo\">
</a>

# Lab 2: Scalability

In this mini-lab, you will solve the problem described below using a High-Performance Computing System.

**Problem**. Consider the following equation for large arrays of shape `(100,100)`. Suppose $x$ and $y$ are both matrices of ones.

$$
  z = (x^2 + y^2) * (x^2 - y^2)
$$

**Activities**

Solve the problem given above by applying domain and functional decomposition. Use a chunk size of `(10,10)` to get started.

1. Parallel programming approach: Write a parallel version of the sequential program using the **domain decomposition** approach.
   * Increase the size of the problem by considering two sizes from the following sequence `(100,100), (1000,1000), (10000,10000), ..., (1000000,100000)`. **NOTE:** `(1000000,100000)` will take several minutes.
   * You may consider adjusting the **chunk size** to reduce the resulting number of tasks to be scheduled.
   * Take notes of the amount of time consumed in every `problem size vs chunk size` configuration considered.
  
2. Parallel programming approach: Write a parallel version of the sequential program using the **functional decomposition** approach.
   * Increase the size of the problem by considering the sizes from the following sequence `(100,100), (1000,1000)`. **NOTE:** `(1000,1000)` may take several minutes.
   * You may consider adjusting the **chunk size** to reduce the resulting number of tasks to be scheduled.
   * Take notes of the amount of time consumed in every `problem size vs chunk size` configuration considered.
 

**Questions**

1. Which approach, between domain and functional decomposition, demonstrates superior scalability? Please explain your reasoning.
2. What combination of problem size and chunk size yielded the optimal execution time? Please explain your reasoning.

**Instructions for answers**

1. Download the template [Dask Tutorial: Lab 2 - Scalability](../labs/Lab2.txt)
2. Edit the template in Microsoft Word, place your answers. 
3. Create a PDF document.
2. Share your PDF document in [Padlet](https://uniandes.padlet.org/aavivas/dask-tutorial-lab-2-scalability-2025-8xt530boe5e9851x)

## 1. Cluster in a High Performance Computing System

__1. Import required libraries, define required variables and functions__

In [None]:
# ...
# ...
# ...

__2. Create a Dask cluster__

In [None]:
cluster = SLURMCluster(
    #...
    #...
    #...
    #...
    #...
    scheduler_options={"dashboard_address": ":0"}
)

cluster

__3. Create a Dask Client and connect the client to the Dask cluster__

_Activities:_ 

1. Run the cell below
2. Use the option `Launch dashboard in JupyterLab`, this will display the Dask Dashboard.

In [None]:
client = # ...
client

__4. Deploy 2 workers for your Dask cluster__

_Hint: Create 2 workers, then you can increase up to 4._

In [None]:
cluster.scale(
    jobs=# ...
)

__5. Solve the problem through Domain Decomposition__

Hint: You should perform the following steps

1. Import dask.arry
2. Create the x and y arrays
3. Write the equation
4. Compute the result.
5. Adjust problem and chunk sizes, compute the result, and take notes of problem size, chunk size, and execution time.

**NOTE:** Please do not print the result, use, for instance, `result = z.compute()`. Displaying the results of large arrays can be a less-than-ideal experience.

In [None]:
%%time

# ... Import dask.array

# ... Create the x and y arrays

# ... Write the equation 

# ... Compute the result


__5. Solve the problem through Functional Decomposition__

Hint: You should perform the following steps

1. Import dask.arry
2. Create the x and y arrays
3. Write the equation
4. Compute the result.
5. Adjust problem and chunk sizes, compute the result, and take notes of problem size, chunk size, and execution time.

**NOTE:** Please do not print the result, use, for instance,`result = z.compute()`. Displaying the results of large arrays can be a less-than-ideal experience.

In [None]:
%%time

# ... Import dask

# ... Create the x and y arrays

# ... Write the equation in terms of functions, use the dask.delayed interface

# ... Compute the result


__6. Showdown the cluster__

_Hint: This is **MANDATORY**, one you finish using a cluster you must turn it of, since it will release the computing resources your cluster was using_

In [None]:
cluster.close()

__7. Close the connection between the client and the cluster__

In [None]:
client.close()

# [Index](../0.Introduction.ipynb)