# ODC and Dask (LocalCluster) <img align="right" src="../../resources/csiro_easi_logo.png">

- [Setup](#Setup)
- [Load data without dask](#Load-data-without-dask)
- [Exploring dask concepts with ODC](#Exploring-dask-concepts-with-ODC)
- [The impact of dask on ODC](#The-impact-of-dask-on-ODC)
- [Exploiting delayed tasks](#Exploiting-delayed-tasks)
- [Data and computational locality](#Data-and-computational-locality)
   - [With compute on the algorithm](#With-compute-on-the-algorithm)
   - [With selected measurements](#With-selected-measurements)
- [A quick check on the task graph](#A-quick-check-on-the-task-graph)

This notebook explores the use of ODC with Dask LocalCluster. The goal is to introduce fundamental concepts and the role Dask can serve with `datacube` and subsequent computation using `xarray`.

The example computation is fairly typical of an EO data processing pipeline. We'll be using a small area and time period to start with and progressively scaling this example. EO scientists may find some aspects of these examples unrealistic, but this isn't an EO science course &#9786;. 

The basic workflow is:
  1. Specify Region of Interest, Satellite product, EO satellite bands, Time range, desired CRS for the `datacube` query
  1. Load data using `datacube.load()`
  1. Mask valid data
  1. Visualisation of the ROI
  1. Compute NDVI
  1. Visualise NDVI
  
__NOTE__: Some cells in this notebook will take minutes to run so please be patient. Also, some cells can load large datasets into Jupyter's memory (based on the defaults), which can exhaust the available memory and cause the kernel to crash. If this occurs: restart the kernal, run the [Setup](#Setup) cells, and then jump to the next section. The (text) outputs from previous sections can be retained between kernel restarts. Example times for each exercise:

| Process | N times | Load |
|--|--|--|
| [Load data without dask](#Load-data-without-dask) | 7 | 3-3.5 mins |
| [Exploring dask concepts with ODC](#Exploring-dask-concepts-with-ODC) | 2 | 1 min |
| [The impact of dask on ODC](#The-impact-of-dask-on-ODC) | 22 | 1-2 mins |
| [Exploiting delayed tasks](#Exploiting-delayed-tasks) | 7 | 1 min |
| [Data and computational locality](#Data-and-computational-locality) | 7 | 30-50 secs |
| [With compute on the algorithm](#With-compute-on-the-algorithm) | 7 | 10 secs |
| [With selected measurements](#With-selected-measurements) | 7 | 10 secs |

### Setup

In [None]:
import git
import sys, os
from dateutil.parser import parse
from dateutil.relativedelta import relativedelta
from dask.distributed import Client, LocalCluster
import datacube
from datacube.utils import masking
from datacube.utils.aws import configure_s3_access

# EASI defaults
os.environ['USE_PYGEOS'] = '0'
repo = git.Repo('.', search_parent_directories=True).working_tree_dir
if repo not in sys.path: sys.path.append(repo)
from easi_tools import EasiDefaults, notebook_utils
easi = EasiDefaults()
client = None

The next cell sets out all the query parameters used in our `datacube.load()`.
For this run we keep the ROI quite small.

In [None]:
# Get the default latitude & longitude extents
study_area_lat = easi.latitude
study_area_lon = easi.longitude

# Or choose your own by uncommenting and modifying this section
###############################################################
# # Central Tasmania (near Little Pine Lagoon)
# central_lat = -42.019
# central_lon = 146.615

# # Set the buffer to load around the central coordinates
# # This is a radial distance for the bbox to actual area so bbox 2x buffer in both dimensions
# buffer = 0.05

# # Compute the bounding box for the study area
# study_area_lat = (central_lat - buffer, central_lat + buffer)
# study_area_lon = (central_lon - buffer, central_lon + buffer)
###############################################################

# Data product
product = easi.product('landsat')
# product = 'landsat8_c2l2_sr'

# Set the date range to load data over
set_time = easi.time
# set_time = ("2021-01-01", "2021-01-31")

# Selected measurement names (used in this notebook). None` will load all of them
alias = easi.aliases('landsat')
measurements = None
# measurements = [alias[x] for x in ['qa_band', 'red', 'nir']]

# Set the QA band name and mask values
qa_band = alias['qa_band']
qa_mask = easi.qa_mask('landsat')

# Set the resampling method for the bands
resampling = {qa_band: "nearest", "*": "average"}

# Set the coordinate reference system and output resolution
set_crs = easi.crs('landsat')                # If defined, else None
set_resolution = easi.resolution('landsat')  # If defined, else None
# set_crs = "epsg:3577"
# set_resolution = (-30, 30)

# Set the scene group_by method
group_by = "solar_day"

Now initialise the `datacube`.

In [None]:
dc = datacube.Datacube()

# Access AWS "requester-pays" buckets
# This is necessary for reading data from most third-party AWS S3 buckets such as for Landsat and Sentinel-2
from datacube.utils.aws import configure_s3_access
configure_s3_access(aws_unsigned=False, requester_pays=True);

## Load data without dask

Now load the data. This first `dc.load()` does not use Dask so it will take a few minutes.

We use `%%time` to keep track of how long things take to complete.

In [None]:
%%time
dataset = None # clear results from any previous runs
dataset = dc.load(
            product=product,
            x=study_area_lon,
            y=study_area_lat,
            time=set_time,
            measurements=measurements,
            resampling=resampling,
            output_crs=set_crs,
            resolution=set_resolution,
            group_by=group_by,
        )

The result of the `datacube.load()` function is an `xarray.Dataset`.

Jupyter notebooks can render a description of the xarray `dataset` variable with a _lot of useful information_ about the structure of data.

In [None]:
dataset

Open the `Data variables` (click "&#8227; Data variables") and click on the stacked cylinders for one of them. You will see the actual data array is available and shown in summary form.

> __NOTE__ that you can see real numbers in the array when you do this. This will change when we start using Dask.

This graphical summary of an xarray variable will become increasingly importantly when dask is enabled and as scale out occurs so take a moment now to just poke around the interface. Depending on your area of interest set above, you should have a relatively small area: perhaps around 300 to 400 pixels in each of the `x` and `y` dimensions and perhaps up to 10 time slices. This is a relatively small size and fine to do without using Dask.

Next up filter out pixels that are affect by clouds and other issues and compute the NDVI. Since we aren't specifying a time range this will be performed for all images.

In [None]:
%%time
# Identify pixels that don't have cloud, cloud shadow or water
from datacube.utils import masking

cloud_free_mask = masking.make_mask(dataset[qa_band], **qa_mask)

# Apply the mask
cloud_free = dataset.where(cloud_free_mask)

# Calculate the components that make up the NDVI calculation
band_diff = cloud_free[alias['nir']] - cloud_free[alias['red']]
band_sum = cloud_free[alias['nir']] + cloud_free[alias['red']]
# Calculate NDVI
ndvi = None
ndvi = band_diff / band_sum

The result `ndvi` is an `xarray.DataArray`. Let's take a look at it. Again the notebook will render an html version of the data in summary form.
Notice again the actual data values are being shown and that there are the same number of time slices as above and the x and y dimensions are identical.

In [None]:
ndvi

Raw numbers aren't nice to look at so let's draw a time slice. We'll select just one of them to draw and pick one that didn't get masked out by cloud completely. You can see that all clouds and water has been masked out so that we are just looking at the NDVI of the land area.

In [None]:
ndvi.isel(time=1).plot()

## Exploring dask concepts with ODC

Let's set our time range to a couple of weeks, or approximately two passes of Landsat 8 for this ROI. Less data will allow us to explore how dask works with the `datacube` and `xarray` libraries.

In [None]:
set_time = (set_time[0], parse(set_time[0]) + relativedelta(weeks=3))
# set_time = ("2021-01-01", "2021-01-14")
set_time

In [None]:
%%time
dataset = None # clear results from any previous runs
dataset = dc.load(
            product=product,
            x=study_area_lon,
            y=study_area_lat,
            time=set_time,
            measurements=measurements,
            resampling=resampling,
            output_crs=set_crs,
            resolution=set_resolution,
            group_by=group_by,
        )
dataset

As before you can see the actual data in the results but this time there should only be 1 or 2 observation times

### Start a dask LocalCluster

Now let's create a `LocalCluster` as we did in the earlier notebook.

In [None]:
from dask.distributed import Client, LocalCluster

cluster = LocalCluster()
client = Client(cluster)
client

You may like to open up the dashboard for the cluster, although for this notebook we won't be talking about the dashboard (that's for a later discussion).

In [None]:
notebook_utils.localcluster_dashboard(client=client, server=easi.hub)

Now that we are using a cluster, even though it is local, we need to make sure that our cluster has the right configuration to use __Requester Pays__ buckets in AWS S3. To do this, we need to re-run the `configure_s3_access()` function that we ran earlier, but we need to pass the `client` to the function as well.

In [None]:
from datacube.utils.aws import configure_s3_access
configure_s3_access(aws_unsigned=False, requester_pays=True, client=client);

`datacube.load()` will use the default `dask` cluster (the one we just created) __if the `dask_chunks` parameter is specified__.

The chunk shape and memory size is a critial parameter in tuning `dask` and we will be discussing it in great detail as scale increases. For now we're simply going to specify that the `time` dimension should individually chunked (`1` slice of time) and by not specifying any chunking for the other dimensions they will be form a single contiguous block.

If that made no sense what's so ever, that's fine because we will look at an example.

In [None]:
chunks = {"time":1}

In [None]:
%%time
dataset = None # clear results from any previous runs
dataset = dc.load(
            product=product,
            x=study_area_lon,
            y=study_area_lat,
            time=set_time,
            measurements=measurements,
            resampling=resampling,
            output_crs=set_crs,
            resolution=set_resolution,
            dask_chunks = chunks, ###### THIS IS THE ONLY LINE CHANGED. #####
            group_by=group_by,
        )
dataset

First thing you probably noticed is that whilst only one line changed the load time dropped to sub-seconds!
The second thing you probably noticed is if you look at one of the `data variables` by clicking on the database icon as before, there is no data but instead there is a  diagram which shows you the __Dask Chunks__ for each measurement. It's really fast because it didn't actually load any data!

When `datatcube` has `dask_chunks` specified it switches from creating `xarrays` to instead use `dask.arrays` in the backend and `lazy loads` them - this means that __no data is loaded until used__. If you look at one of the data variables you will see it now has `dask.array<chunksize=(....)>` rather than values and the cylinder icon will show the Array _and_ Chunk parameters along with some statistics, not actual data.

The `datacube.load()` has used the `dask.Delayed` interface which will not perform any `tasks` (Dask's name for `calculations`) until the _result_ of the `task` is actually required. We'll load the data in a moment but first let's take a look at the parameters in that pretty visualisation. Click on the cylinder for the `red` Data variables and look at the table and the figure. It should look similar to the image below.

<img src="../../resources/dask_array_example_small.png">

Looking at this image (yours may be different), you can see that:
  1. The Array is `221.92 kiB` in total size and is broken into Chunks which have size `110.96 kiB`
  2. The Array shape is `(2, 375, 303) (time, y, x)` but each chunk is `(1,375,303)` because we specified the `time` dimension should have chunks of length `1`.
  3. There are `2` chunk tasks, one for each time slice and in this instance, and one graph layer. More complex calculations will have more layers in the graph.
  4. The Array type is `uint16` and is split up into chunks which are `numpy.ndarrays`.
  
The chunking has split the array loading into two Chunks. __Dask can execute these in parallel.__

We can look at the delayed tasks and how they will be executed by visualising the task graph for one of the variables. We'll use the red band measurement.

In [None]:
dataset[alias['red']].data.visualize()

Details on the task graph can be found in the dask user guide but what's clear is you have two independent paths of execution which produce one time slice each (0,0,0) and (1,0,0). These are the two chunks that that full array has been split into.

To retrieve the actual data we need to `compute()` the result, this will cause all the delayed tasks to be executed for the variable we are computing. Let's `compute()` the red variable.

In [None]:
%%time
actual_red = dataset[alias['red']].compute()
actual_red

As you can see we now have actual data (there are real numbers, not just Dask arrays). You can do the same thing for all arrays in the dataset in one go by computing the dataset itself.

In [None]:
%%time
actual_dataset = dataset.compute()
actual_dataset

## The impact of dask on ODC

From the above we can see that specifying `dask_chunks` in `datacube.load()` splits up the `load()` operation into a set of `chunk` shaped arrays and `delayed` _tasks_. Dask can now perform those tasks in _parallel_. Dask will only _compute_ the results for those parts of the data we are using but we can force the computation of all the `delayed` _tasks_ using `compute()`.

There is a _lot_ more opportunity than described in this simple example but let's just focus on the impact of dask on ODC for this simple case.

The time period and ROI are far too small to be interesting so let's change our time range to a few months of data.

In [None]:
set_time = (set_time[0], parse(set_time[0]) + relativedelta(months=6))
# set_time = ("2021-01-01", "2021-06-30")
set_time

We skip loading this longer time range (larger data selection) without dask because it can take many minutes and may use more than the available memory in the Jupyter node.

Let's enable dask and then do the load. We're chunking by time (length one) so dask will be able to load each time slice in parallel. The data variables are also independent so will be done in parallel as well.

In [None]:
if client is None:
    cluster = LocalCluster()
    client = Client(cluster)
    configure_s3_access(aws_unsigned=False, requester_pays=True, client=client);
    display(notebook_utils.localcluster_dashboard(client=client, server=easi.hub))
else:
    client.restart()

In [None]:
%%time

chunks = {"time":1}

dataset = None # clear results from any previous runs
dataset = dc.load(
            product=product,
            x=study_area_lon,
            y=study_area_lat,
            time=set_time,
            measurements=measurements,
            resampling=resampling,
            output_crs=set_crs,
            resolution=set_resolution,
            dask_chunks = chunks, ###### THIS IS THE ONLY LINE CHANGED. #####
            group_by=group_by,
        )
dataset

Woah!! that was fast - but we didn't actually compute anything so no load has occurred and all tasks are pending.
Open up the Data Variables, click the stacked cylinders and take a look at the delayed task counts. These exist for every variable.

Let's visualise the _task graph_ for the `red` band.

In [None]:
dataset[alias['red']].data.visualize()

Well that's not as useful, is it!

You should just be able to make out that each of the _chunks_ are able to independently `load()`. `time` _chunk_ is length 1 so these are individual times. This holds true for all the bands so dask can spread these out across multiple threads.

> __Tip__: Visualising task graphs is less effective as your task graph complexity increases. You may need to use simpler examples to see what is going on.

Let's get the actual data

In [None]:
%%time
actual_dataset = dataset.compute()
actual_dataset

How fast this step is will depend on how many cores are in your Jupyter notebook's local cluster. In real world scenarios, an 8-core cluster the `datacube.load()` this may take between 1/4 or 1/6 of the time compared to without `dask` (not shown) depending on many factors. This is great!

Why not 1/8 of the time?

Dask has overheads, and `datacube.load()` itself is IO limited. There are all sorts of things that result in limits and part of the art of parallel computing is tuning your algorithm to reduce the impact of these and achieve greater performance. As we scale up this example we'll explore some of these.

> __Tip__: recent updates to Dask have greatly improved performance and we are now seeing more substantial performance gains, more in line with the increase in cores.
>
> Do not always expect 8x as many cores to produce 8x the speed up. Algorithms can be tuned to perform better (or worse) as scale increases. This is part of the art of parallel programming. Dask does it's best, and you can often do better.

## Exploiting delayed tasks

Now let's repeat the full example, with NDVI calculation and masking, in a single cell with `dask` and `compute` to load the data in. We get the total time for later comparison.

Most of the time (not shown) is in the data load and the NDVI calculation is < 1 second.

To ensure comparable timings, we will `.restart()` the Dask cluster. This makes sure that we aren't just seeing performance gains for data caching.

> __Note__ that this may show some `Restarting worker` warnings. That is ok and it is just telling you that each of the four workers in the cluster are restarting.

In [None]:
if client is None:
    cluster = LocalCluster()
    client = Client(cluster)
    configure_s3_access(aws_unsigned=False, requester_pays=True, client=client);
    display(notebook_utils.localcluster_dashboard(client=client, server=easi.hub))
else:
    client.restart()

In [None]:
%%time

chunks = {"time":1}

dataset = None # clear results from any previous runs
dataset = dc.load(
            product=product,
            x=study_area_lon,
            y=study_area_lat,
            time=set_time,
            measurements=measurements,
            resampling=resampling,
            output_crs=set_crs,
            resolution=set_resolution,
            dask_chunks = chunks, 
            group_by=group_by,
        )
actual_dataset = dataset.compute() ### Compute the dataset ###

# Identify pixels that don't have cloud, cloud shadow or water
cloud_free_mask = masking.make_mask(actual_dataset[qa_band], **qa_mask)

# Apply the mask
cloud_free = actual_dataset.where(cloud_free_mask)

# Calculate the components that make up the NDVI calculation
band_diff = cloud_free[alias['nir']] - cloud_free[alias['red']]
band_sum = cloud_free[alias['nir']] + cloud_free[alias['red']]
# Calculate NDVI and store it as a measurement in the original dataset ta da
ndvi = None
ndvi = band_diff / band_sum

Quicker but can do better...

## Data and computational locality

When `compute()` is called `dask` not only executes all the tasks but it consolidates all the distributed chunks back into a normal array on the client machine - in this case the notebook's kernel. In the previous cell we have two variables that both refer to the data we are loading:
1. _dataset_ refers to the `delayed` version of the data. The `delayed` _tasks_ and the _chunks_ that make it up will be __on the cluster__
2. _actual_dataset_ refers to the actual array in the notebook kernel memory after execution of the _tasks_. The _actual_dataset_ is a complete array in memory in the notebook kernel (__on the _client___).

So in the previous cell everything _after_ the `actual_dataset = dataset.compute()` line is computed in the Jupyter kernel and doesn't use the dask cluster at all for computation.

If we shift the location of this `compute()` call we can perform more _tasks_ in parallel on the dask cluster. 

> __Tip__: Locality is an important concept and applies to both data and computation

Now let's repeat the load and NDVI calculation but this time rather than `compute()` on the full `dataset` we'll run the compute at the cloud masking step (`cloud_free = dataset.where(cloud__free_mask).compute()`) so the masking operation can be performed in parallel. Let's see what the impact is...


In [None]:
if client is None:
    cluster = LocalCluster()
    client = Client(cluster)
    configure_s3_access(aws_unsigned=False, requester_pays=True, client=client);
    display(notebook_utils.localcluster_dashboard(client=client, server=easi.hub))
else:
    client.restart()

In [None]:
%%time

chunks = {"time":1}

dataset = None # clear results from any previous runs
dataset = dc.load(
            product=product,
            x=study_area_lon,
            y=study_area_lat,
            time=set_time,
            measurements=measurements,
            resampling=resampling,
            output_crs=set_crs,
            resolution=set_resolution,
            dask_chunks = chunks, 
            group_by=group_by,
        )

# Identify pixels that are either "valid", "water" or "snow"
cloud_free_mask = masking.make_mask(dataset[qa_band], **qa_mask)

# Apply the mask
cloud_free = dataset.where(cloud_free_mask).compute()    ### COMPUTE MOVED HERE ###

# Calculate the components that make up the NDVI calculation
band_diff = cloud_free[alias['nir']] - cloud_free[alias['red']]
band_sum = cloud_free[alias['nir']] + cloud_free[alias['red']]
# Calculate NDVI and store it as a measurement in the original dataset ta da
ndvi = None
ndvi = band_diff / band_sum
actual_ndvi = ndvi

A few seconds quicker but not that different. This isn't too surprising since the masking operation is pretty quick (it's all numpy) and the data load is the bulk of the processing.

Dask can see the entire task graph for both load and mask computation. As a result _some_ of the computation can be performed concurrently with file IO, and CPUs are busier as a result, so it will be slightly faster in practice but with IO dominating we won't see much overall improvement.

### With compute on the algorithm

Perhaps doing more of the calculation on the cluster will help. Let's also move `ndvi.compute()` so the entire calculation is done on the cluster and only the final result returned to the client.

In [None]:
if client is None:
    cluster = LocalCluster()
    client = Client(cluster)
    configure_s3_access(aws_unsigned=False, requester_pays=True, client=client);
    display(notebook_utils.localcluster_dashboard(client=client, server=easi.hub))
else:
    client.restart()

In [None]:
%%time

chunks = {"time":1}

dataset = None # clear results from any previous runs
dataset = dc.load(
            product=product,
            x=study_area_lon,
            y=study_area_lat,
            time=set_time,
            measurements=measurements,
            resampling=resampling,
            output_crs=set_crs,
            resolution=set_resolution,
            dask_chunks = chunks, 
            group_by=group_by,
        )

# Identify pixels that don't have cloud, cloud shadow or water
cloud_free_mask = masking.make_mask(dataset[qa_band], **qa_mask)

# Apply the mask
cloud_free = dataset.where(cloud_free_mask)

# Calculate the components that make up the NDVI calculation
band_diff = cloud_free[alias['nir']] - cloud_free[alias['red']]
band_sum = cloud_free[alias['nir']] + cloud_free[alias['red']]
# Calculate NDVI and store it as a measurement in the original dataset ta da
ndvi = None
ndvi = band_diff / band_sum
actual_ndvi = ndvi.compute()    ### COMPUTE MOVED HERE ###

Now we are seeing a huge difference!

You may be thinking "Hold on a sec, the NDVI calculation is pretty quick in this example with such a small dataset, why such a big difference?" - and you'd be right. There is more going on.

Remember that `dataset` is a _task graph_ with `delayed` tasks waiting to be executed __when the result is required__. In the example `dataset`, there are many data variables available but _only 3 are used_ to produce the `ndvi` (`qa_band`, `red` and `nir`). As a result _`dask` doesn't load the other variables_ and because computation time in this case is mostly IO related the execution time is a *lot* faster.

### With selected measurements

Of course we can save `dask` the trouble of figuring this out on our behalf and only `load()` the `measurements` we need in the first place. Let's check that now, we should see a similar performance figure.

In [None]:
if client is None:
    cluster = LocalCluster()
    client = Client(cluster)
    configure_s3_access(aws_unsigned=False, requester_pays=True, client=client);
    display(notebook_utils.localcluster_dashboard(client=client, server=easi.hub))
else:
    client.restart()

In [None]:
%%time

chunks = {"time":1}
measurements = [alias[x] for x in ['qa_band', 'red', 'nir']]

dataset = None # clear results from any previous runs
dataset = dc.load(
            product=product,
            x=study_area_lon,
            y=study_area_lat,
            time=set_time,
            measurements=measurements,
            resampling=resampling,
            output_crs=set_crs,
            resolution=set_resolution,
            dask_chunks = chunks, 
            group_by=group_by,
        )

# Identify pixels that don't have cloud, cloud shadow or water
cloud_free_mask = masking.make_mask(dataset[qa_band], **qa_mask)
# Apply the mask
cloud_free = dataset.where(cloud_free_mask)

# Calculate the components that make up the NDVI calculation
band_diff = cloud_free[alias['nir']] - cloud_free[alias['red']]
band_sum = cloud_free[alias['nir']] + cloud_free[alias['red']]
# Calculate NDVI and store it as a measurement in the original dataset ta da
ndvi = None
ndvi = band_diff / band_sum
actual_ndvi = ndvi.compute()

Pretty similar as expected, but again, a slight improvement because now there are less overheads and a smaller task graph.
Now it can pay to give `dask` a hand and not have the _task graph_ cluttered with tasks you are not going to use. Still it's nice to see that `dask` can save you some time by only computing what is required when you need it.

# A quick check on the task graph

For completeness we will take a look at the _task graph_ for the full calculation, all the way to the NDVI result. Given the complexity of the full graph we'll simplify it to 2 time observations like we did when the task graph was introduced previously.


In [None]:
set_time = (set_time[0], parse(set_time[0]) + relativedelta(weeks=3))
# set_time = ("2021-01-01", "2021-01-14")
set_time

In [None]:
if client is None:
    cluster = LocalCluster()
    client = Client(cluster)
    configure_s3_access(aws_unsigned=False, requester_pays=True, client=client);
    display(notebook_utils.localcluster_dashboard(client=client, server=easi.hub))
else:
    client.restart()

In [None]:
%%time
dataset = None # clear results from any previous runs
measurements = [alias[x] for x in ['qa_band', 'red', 'nir']]
dataset = dc.load(
            product=product,
            x=study_area_lon,
            y=study_area_lat,
            time=set_time,
            measurements=measurements,
            resampling=resampling,
            output_crs=set_crs,
            resolution=set_resolution,
            dask_chunks = chunks, 
            group_by=group_by,
        )

# Identify pixels that don't have cloud, cloud shadow or water
cloud_free_mask = masking.make_mask(dataset[qa_band], **qa_mask)
# Apply the mask
cloud_free = dataset.where(cloud_free_mask)

# Calculate the components that make up the NDVI calculation
band_diff = cloud_free[alias['nir']] - cloud_free[alias['red']]
band_sum = cloud_free[alias['nir']] + cloud_free[alias['red']]
# Calculate NDVI and store it as a measurement in the original dataset ta da
ndvi = None
ndvi = band_diff / band_sum

In [None]:
ndvi.data.visualize()

The computation flows from __bottom to top__ in the _task graph_. You can see there are two main paths, one for each time (since the time chunk is length 1). You can also see the three data sources are loaded independently. After that it gets a little more difficult to follow but you can see `qa_band` being used to produce the mask (_and\__, _eq_). Then combined via the `where` function with other two datasets. Then finally the NDVI calculation - a sub, add and divide (truediv).

Dask has lots of internal optimizations that it uses to help identify the dependencies and parallel components of a task graph. Sometimes it will reorder or prune operations where possible to further optimise (for example, not loading _data variables_ that aren't used in the NDVI calculation).

> __Tip__: The _task graph_ can be complex but it is a useful tool in understanding your algorithm and how it scales.

## Be a good dask user - Clean up the cluster resources

In [None]:
client.close()

cluster.close()