# Dask Example - Average Colour of Australia

[Dask](https://dask.org/) is an open-source Python library for enabling parallel computation. This is critical when working with large data sets from satellites. Fortunately, Dask has been developed to integrate nicely with Xarray, which underpins the Open Data Cube. This example demonstrates how to use Dask to compute the average colour of Australia, a calculation which involves 1.3 billion pixels. 

## Import key packages for analysis

In [None]:
%matplotlib inline

import datacube
import numpy as np
import matplotlib.pyplot as plt

dc = datacube.Datacube(app='dc-visualize')

## Set up parallel computing with Dask

In [None]:
import dask
from dask.distributed import Client

client = Client('dask-datacube-dask.labs:8786')

## Select area for analysis
In this example, we'll use coordinates from the centre of Australia and map out an area 10 square degrees around it. These values can be changed to perform the calculation for other areas around Australia. For ease of the calculation, image data is only loaded for a single day.

In [None]:
# Set the centre latitude and longitude coordinates
latitude, longitude = (-25, 137)

# Set the study area around the centre coordinates, where buffer is in degrees
buffer = 5
latitude_range = (latitude - buffer, latitude + buffer)
longitude_range = (longitude - buffer, longitude + buffer)

# Set the time frame
date_range = ('2017-01-01', '2017-01-01')

## Load and view the data

Data is loaded with the `dc.load()` function. The Landsat 8 Annual Geomedian product `ls8_nbart_geomedian_annual` has six bands associated with it, but this analysis only requires the `red`, `green`, and `blue` bands. The `dask_chunks` argument is used to tell Dask how to segement the data for parallelised computations (see the [xarray documentation](http://xarray.pydata.org/en/stable/dask.html)). 

In [None]:
%%time

ds = dc.load(
    product='ls8_nbart_geomedian_annual',
    x=longitude_range,
    y=latitude_range,
    output_crs='epsg:3577',
    resolution=(-30, 30),
    time = date_range,
    measurements=['red', 'green', 'blue'],
    dask_chunks = {'x': 4000, 'y': 4000, 'time': 1}
)

print(ds)

## Calculate the number of pixels used

Multiplying the size of the x- and y-dimensions together gives the total number of pixels that will be used in the calculation, which demonstrates the need for parallel computing.

In [None]:
n_pixels = ds.dims['x'] * ds.dims['y']
n_pixels_billions = n_pixels / 10**9

print("{:2.2f} billion pixels".format(n_pixels_billions))

## Calculate the average colour

There are a few steps involved in calculating the average colour. Firstly, we calculate the average value for each of the `red`, `green`, and `blue` bands (specifying that the values must be greater than 0 to be included in the average). Secondly, the average values must be scaled to account for how the Landsat 8 satellite captures data. In this case, each pixel in each Landsat 8 band is stored as a value from 0 to 10,000, with larger values corresponding to brighter objects. To prevent high-value pixels (such as cloud) from affecting the contrast, we divide the average values for each band by 3000. Finally, the scaled average values are converted to RGB values by clipping the scaled values from 0 to 1 and multiplying by 255. This final step is captured in the `get_rgb()` function in the next cell.

To see the resources being used when you run the operation, visit https://dask.sandbox.dea.ga.gov.au/status.

In [None]:
def get_rgb(scaled_ds):
    rgb_values = [int(np.clip(scaled_ds[band], 0, 1) * 255) for band in ['red', 'green', 'blue']]
    red, green, blue = rgb_values
    hex_value = '#{:02x}{:02x}{:02x}'.format(red, green, blue) 

    return(rgb_values, hex_value)

In [None]:
%%time
average = ds.where(ds>0).mean()

scaling_factor = 3000
scaled_average = average / scaling_factor

rgb, hex_colour = get_rgb(scaled_average)

print("Hex Code: {}".format(hex_colour))
print("Red: {}, Green: {}, Blue: {}".format(rgb[0], rgb[1], rgb[2]))

## Display the average colour

In [None]:
from matplotlib.patches import Rectangle

fig, ax = plt.subplots(figsize=(5,5))
ax.add_patch(Rectangle((0.0, 0.0), 1.0, 1.0, alpha=1, facecolor=hex_colour))
plt.show()