## Parallel Computing with Dask
### Questions
- How do I start working with larger datasets in parallel? 

### Objectives
- Introduce the concept of Dask, a free and open-source library for parallel computing in Python

In [1]:
import dask
from dask.distributed import Client
client = Client(processes=False)
client

0,1
Client  Scheduler: inproc://192.168.130.105/254/1  Dashboard: /user/ajijohn/proxy/8787/status,Cluster  Workers: 1  Cores: 4  Memory: 7.52 GB


In [2]:
import xarray as xa
def calc_stats(url):
    bf = xa.open_rasterio(url, chunks={'band': 1, 'x': 1024, 'y': 1024})
    mean_band= bf.mean()
    return mean_band

Test it locally 

In [3]:
url = 'http://landsat-pds.s3.amazonaws.com/c1/L8/227/065/LC08_L1TP_227065_20200608_20200626_01_T1/'
redband = url+'LC08_L1TP_227065_20200608_20200626_01_T1_B{}.TIF'.format(4)

redband

'http://landsat-pds.s3.amazonaws.com/c1/L8/227/065/LC08_L1TP_227065_20200608_20200626_01_T1/LC08_L1TP_227065_20200608_20200626_01_T1_B4.TIF'

In [4]:
mean=calc_stats(redband)

We will use client.submit to execute the computation on a distributed worker:

In [5]:
future = client.submit(calc_stats, redband)

In [6]:
future

Lets do two files

We are now ready to get mean across many files using distributed workers. We can use map operation which is non-blocking, and one can continue to work in the Python shell/notebook while the computations are running.

In [7]:
b4 = url+'LC08_L1TP_227065_20200608_20200626_01_T1_B{}.TIF'.format(4)
b5 = url+'LC08_L1TP_227065_20200608_20200626_01_T1_B{}.TIF'.format(5)
b6 = url+'LC08_L1TP_227065_20200608_20200626_01_T1_B{}.TIF'.format(6)
filenames=[b4,b5,b6]

In [8]:
futures = client.map(calc_stats, filenames)

In [9]:
len(futures)

3

In [10]:
futures[:3]

[<Future: finished, type: xarray.DataArray, key: calc_stats-46b4ccfd7102696ba15314f40b4ebc48>,
 <Future: pending, key: calc_stats-86d1ca9ecd2aa13bdbfc7ad25e938be6>,
 <Future: pending, key: calc_stats-3302ef9f32549d9cd72f0f4903c0a72a>]

In [11]:
from distributed import progress

In [12]:
#progress(futures)

In [13]:
progress(futures)

VBox()