# Time Serise Algorithm for Sentinel-2 Cloud Mask

## Overview
This product is a time series cloud and cloud shadow detection algorithm for Sentinel-2 surface reflectance data.It models time series of surface reflectance derived indices and calculates time series abnormality coefficients for pixels in the time series. It does not rely on predefined training data to generate complex models with many rule sets, which often work well for data similar to the training data while returning poor results for data contrasting to the training data. Instead, it identifies cloud and cloud shadows by detecting local abnormalities in temporal and spatial contexts from abnormality coefficients.



## Required Modules 

### DEA modules


The notebook requires functions in DEA Datacube API. To load DEA module, open a terminal and run following commands: 

`module use /g/data/v10/public/modules/modulefiles`

`module load dea`

### Other Python modules 

DEADataHandling is a module written by scientists and engineers from DEA teams. The module consists of a number of wrapper functions to handle various DEA datasets. The module can be download at https://github.com/GeoscienceAustralia/dea-notebooks. 

tsmask_func is a module for functions used in this notebook

In [30]:
## Start Main program

%load_ext autoreload
%autoreload 2

## Import modules 

import datacube
import sys
import numpy as np
#import time
import os
#import rasterio
#import xarray as xr


from multiprocessing import Pool 
#import rasterio
#from spectral import envi
import tsmask_func as tsf


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload




## Workflow of the Program  

### Specify input parameters

In [31]:
## Specify input parameters

#lat_top, lat_bottom, lon_left, lon_right =  -35.144, -35.505, 148.985, 149.284
lat_top, lat_bottom, lon_left, lon_right =  -35.244, -35.344, 149.055, 149.155
start_of_epoch, end_of_epoch = '2017-01-01', '2018-12-31'
  


### Data loading

Load Sentinel-2 NBART time series data in 20m x 20m resolution. save it to a Xarray dataset, create tsmask dataarray  


In [32]:
# Load surface reflectance data

dc = datacube.Datacube(app='load_clearsentinel')


# Call Function F1
# Load Sentinel-2 data 

s2_ds = tsf.load_s2_nbart_ts(dc, lat_top, lat_bottom, lon_left, lon_right, start_of_epoch, end_of_epoch)



Loading s2a pixel quality
    Loading 70 filtered s2a timesteps
Loading s2b pixel quality
    Loading 54 filtered s2b timesteps
Combining and sorting s2a, s2b data



### Time Series Cloud and Cloud shadow detection

Perform time series cloud/shadow detection for one pixel. 




In [33]:
%%time

# create a list of tuples as input of the cloud detection functions
# startmap method of the Pool class from Multiprocessing module requires an ierative object for function parameters

ts_tuples = tsf.create_ts_tuples(s2_ds)

results=[]

# number of process for the  pool object
number_of_workers=8
# Create a Pool object with a number of processes
p=Pool(number_of_workers)
# Start runing the cloud detection function using a pool of independent processes
results=p.starmap(tsf.perpixel_filter_direct, ts_tuples)
# Finish the parallel runs
p.close()
# Join the results and put them back in the correct order
p.join()

# Save the cloud/shadow masks to the 'tsmask' dataarray in the s2_ds dataset
irow = s2_ds['y'].size
icol = s2_ds['x'].size
for y in np.arange(irow):
    for x in np.arange(icol):
        s2_ds['tsmask'].values[:, y, x]=results[y*icol+x]

  
  
  
  
  
  
  
  
  
  
  
  
  
  


CPU times: user 3min 15s, sys: 3.33 s, total: 3min 18s
Wall time: 26min 36s


### Spatial Filter

Filter out isolated / single cloud and shadow pixels. 


In [34]:
%%time

results=[]

# number of process for the  pool object
number_of_workers=8
# Create a Pool object with a number of processes
p=Pool(number_of_workers)

#create a list of scene
paralist=[ s2_ds['tsmask'].values[i, :, :] for i in np.arange(s2_ds.time.size)]
# Start runing the spatial filter function using a pool of indepedent processes
results=p.map(tsf.spatial_filter, paralist)
# Finish the parallel runs
p.close()
# Join the results and put them back in the correct order
p.join()


# Save the cloud/shadow masks to the 'tsmask' dataarray in the s2_ds dataset
for i in np.arange(s2_ds.time.size):
    s2_ds['tsmask'].values[i, :, :] = results[i]

CPU times: user 227 ms, sys: 1.34 s, total: 1.56 s
Wall time: 1min 16s


### Spatial Buffer

The program calls Function F3 to extend cloud/shadow detection coverage by one pixel buffers 


In [35]:
%%time

results=[]

# number of process for the  pool object
number_of_workers=8
# Create a Pool object with a number of processes
p=Pool(number_of_workers)

# Create a list of scene
paralist=[ s2_ds['tsmask'].values[i, :, :] for i in np.arange(s2_ds.time.size)]
# Start runing the spatial_buffer function using a pool of indepedent processes
results=p.map(tsf.spatial_buffer, paralist)
# Finish the parallel runs
p.close()
# Join the results and put them back in the correct order
p.join()

# Save the cloud/shadow masks to the 'tsmask' dataarray in the s2_ds dataset
for i in np.arange(s2_ds.time.size):
    s2_ds['tsmask'].values[i, :, :] = results[i]


CPU times: user 122 ms, sys: 708 ms, total: 830 ms
Wall time: 41.3 s


### Output Data

The program produces a cloud mask data with the same dimension as the input time series data,a Sentinel-2 pixel is calsiified as one of four distinctive categories:
 

No observation ---> 0
Clear ---> 1
Cloud ---> 2
Cloud shadow ---> 3


The S2 NBARt time series and tsmask are saved as ENVI images and NetCDF files. 

In [39]:
# Eliminate invalid attribute values

for key, variable in s2_ds.variables.items():
    if 'spectral_definition' in variable.attrs:
        del variable.attrs['spectral_definition']

In [40]:
## Output spectral data and tsmask as NetCDF file

dirc='/g/data/u46/pjt554/tmp/canberra-buffer.nc'
s2_ds.to_netcdf(dirc)

In [41]:
# Name of the dataarrays 
bandsets = ['tsmask']
outbandnames = ['tsmask-buffer']
#The directory where the files will be saved
dirc='/g/data/u46/pjt554/tmp'
# Output spectral and tsmask dataaray as ENVI images
tsf.output_ds_to_ENVI(bandsets, outbandnames, dirc, s2_ds)

In [42]:
print("The End")

The End
