# Masking data

* **Products used:** 
[ls8_sr](https://explorer.digitalearth.africa/ls8_sr) 


## Background
In the past, remote sensing researchers would reject partly cloud-affected scenes in favour of cloud-free scenes. 
However, multi-temporal analysis techniques increasingly make use of every quality assured pixel within a time series of observations. 
The capacity to automatically exclude low quality pixels (e.g. clouds, shadows or other invalid data) is essential for enabling these new remote sensing methods.

Analysis-ready satellite data from Digital Earth Africa includes pixel quality information that can be used to easily "mask" data (i.e. keep only certain pixels in an image) to obtain a time series containing only clear or cloud-free pixels.

## Description
In this notebook, we show how to mask Digital Earth Africa satellite data using boolean masks. The notebook demonstrates how to:

1. Load in a time series of satellite data including the `pixel_qa` pixel quality band
2. Inspect the band's `flags_definition` attributes
3. Create a mask where pixels are cloudy, have cloud-shadow, or no-data
4. Apply binary morphological operators on the cloudy pixels to improve the mask
5. Apply the masks to the staellite data so we retain only the good quality observations, and plot the results
6. Use `load_ard` to mask poor quality pixels, while taking advantage of the morphological operators parameter

Digital Earth Africa provides wrapper functions that will automatically provide cloud-masked satellite data, more information can be found in the [Using_load_ard](./Using_load_ard.ipynb) notebook.

***

## Getting started
First we import relevant packages and connect to the datacube. 
Then we define our example area of interest and load in a time series of satellite data.

In [None]:
# !/env/bin/python3 -m pip install --upgrade pip
# !pip install pystac==1.0.0rc2 --upgrade
# !pip install odc-algo --extra-index-url="https://packages.dea.ga.gov.au" --upgrade

In [None]:
%matplotlib inline

import xarray
import numpy
import datacube
import scipy.ndimage
from pprint import pprint
from odc.algo import mask_cleanup

from datacube.utils import masking
from datacube.storage.masking import make_mask
from datacube.storage.masking import mask_invalid_data
from odc.algo import keep_good_only, erase_bad

from deafrica_tools.plotting import rgb
from deafrica_tools.datahandling import mostcommon_crs

### Connect to the datacube

In [None]:
dc = datacube.Datacube(app="Masking_data")

## Create a query and load satellite data

To demonstrate how to mask satellite data, we will load Landsat 8 surface reflectance RGB data along with a pixel quality classification band called `pixel_quality`.

In [None]:
# Region of interest
lat, lon = 35.7718, -5.8096
buffer = 0.03

# Create a reusable query
query = {
    'x': (lon-buffer, lon+buffer),
    'y': (lat+buffer, lat-buffer),
    'time': ('2016-08', '2016-10-04'),
}

# Identify the most common projection system in the input query
output_crs = mostcommon_crs(dc=dc, product='ls8_sr', query=query)

# Load data from the Landsat-8
data = dc.load(product="ls8_sr",
               measurements=["blue", "green", "red", "pixel_quality"],
               output_crs=output_crs,
               resolution=(-30, 30),
               align=(15, 15),
               **query)
print(data)

The absence of satellite observation is indicated by a "nodata" value for the band, which is listed under the **Attributes** category of the returned `xarray.DataArray`.

In [None]:
print(data.red.attrs)

We see that the `nodata` attribute reports the value `0`.

We can find the classification scheme of the `pixel_qa` band in its flags definition.

In [None]:
flags_def = data.pixel_quality.attrs["flags_definition"]
pprint(flags_def)

We see that `pixel_quality` also reports the `nodata` pixels, and along with the `cloud` and `cloud_shadow` pixels, it also picks up `snow` and `water` pixels.

## Creating a cloud and pixel quality mask

We create a mask by specifying conditions that our pixels must satisfy.
But we will only need the labels (not the values) to create a mask.

In [None]:
quality_flags = dict(
                cloud="high_confidence", # True where there is cloud
                cirrus="high_confidence",# True where there is cirrus cloud
                cloud_shadow="high_confidence",# True where there is cloud shadow
                nodata=True 
)

# set cloud_mask: True=cloud, False=non-cloud
mask, _= masking.create_mask_value(flags_def, **quality_flags)
data['cloud_mask'] = (data['pixel_quality'] & mask) != 0

Below, we'll plot the mask along with the true colour satellite images.

Does the cloud mask exactly match the clouds you see in the RGB plots? Landsat's pixel quality algorithm has known limitations that result in bright objects, such as beaches and cities, mistakenly being classified as clouds.

In [None]:
# Plot the data
rgb(data, col="time", col_wrap=8)

In [None]:
#plot the locations where there are clouds and cloud shadows
data['cloud_mask'].plot(col="time", col_wrap=8);

## Cloud mask morphological operators

We can improve on the false positives detected by Landsat's pixel quality mask by applying binary moprhological image processing techniques (e.g. [binary_closing](https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.ndimage.morphology.binary_closing.html#scipy.ndimage.morphology.binary_closing), [binary_erosion](https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.ndimage.morphology.binary_erosion.html#scipy.ndimage.morphology.binary_erosion) etc.). The Open-Data-Cube library [odc-algo](https://github.com/opendatacube/odc-tools/tree/develop/libs/algo) has a function, `odc.algo.mask_cleanup` that can perform a few of these operations.  Below, we will try to imporve the cloud mask by apply a number of the filters.

Feel free to experiment with the values in `filters`


In [None]:
# set the filters to apply. The integers refer to the number of pixels
filters = [('erosion', 5),("closing", 2),("opening", 2),("dilation", 1)]

In [None]:
# Use the mask_cleanup function to apply the filters
data['cloud_mask_filtered'] = mask_cleanup(data['cloud_mask'], mask_filters=filters)

In [None]:
#plot the results
data['cloud_mask_filtered'].plot(col="time", col_wrap=8);

### Applying the cloud-mask

We can now get the clear images we want by erasing the cloud and non-data pixels from the data

In [None]:
# erase pixels with cloud
clear = erase_bad(data.drop_vars(['cloud_mask_filtered', 'cloud_mask']),
                                      data['cloud_mask'])

#erase pixels with the cloud_filtering
clear_filtered = erase_bad(data.drop_vars(['cloud_mask_filtered', 'cloud_mask']),
                                      data['cloud_mask_filtered'])


### Plot the results of our 'clear' masking 

In [None]:
rgb(clear, col="time", col_wrap=8)

### Plot the results of our 'clear_filtered' masking 

As you can see, the morphological filtering operations have minimised the impact of the false-postives in the cloud mask over the cities and along the coast

In [None]:
rgb(clear_filtered, col="time", col_wrap=8)

## Morphological filtering with `load_ard`

---

## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Africa data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.

**Contact:** If you need assistance, please post a question on the [Open Data Cube Slack channel](http://slack.opendatacube.org/) or on the [GIS Stack Exchange](https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the `open-data-cube` tag (you can view previously asked questions [here](https://gis.stackexchange.com/questions/tagged/open-data-cube)).
If you would like to report an issue with this notebook, you can file one on [Github](https://github.com/digitalearthafrica/deafrica-sandbox-notebooks).

**Compatible datacube version:** 

In [None]:
print(datacube.__version__)

**Last Tested:**

In [None]:
from datetime import datetime
datetime.today().strftime('%Y-%m-%d')