## THE Demo <a name="top"></a>

This script has two goal:
- Introduce new users to the [Open Data Cube](https://www.opendatacube.org/) through [Jupyter notebook](https://jupyter.org/), manipulating passive sensors (Landsat or Sentinel 2, depending on DC availability).
- Present the **last version** of functions developped by [Swiss Data Cube](https://www.swissdatacube.org/) (SDC) team, and also presented in demo_FUN_* notebooks in current folder.

The script is structured as follows:
- **[standard script beginning](#standbeg)**: cells generally found at the beginning of a script. To run cells from other sections you need to run all cells of this section.

- **[load DC (Option A)](#loaddcopta)**: load required DC information on memory based on configuration parameters, using default DC function. 

- **[load DC (Option B)](#loaddcoptb)**: load required DC information on memory based on configuration parameters, using SDC function. 
    
- **[explore created xarray.Dataset](#explorexr)**: explore the created xarray.Dataset variable (dataset_clean).

- **[create, plot and export mosaic figure using default DC functions](#pngdef)**

- **[plot and export mosaic figure using SDC functions](#pngsdc)**

- **[export xarray.Dataset](#exportds)**

- **[create, plot and export xarray.DataArray](#dataarray)**

- **[single time water time serie analysis](#waterts)**
    

---


### Standard script beginning <a name="standbeg"></a>

The cells in this section are generally found at the beginning of a script (and it is advised to apply this template).

To run cells from above sections you need to run all cells of this section.

- **import dependencies**: import libraries, connect to dc and silence warning if required.
- **Configuration**: all variable you might need to change. Keep in mind the bigger it will be (in term of geograhical extent, time period and number of measurements (bands)), the slower the demo will go.
- **Functions**: all functions written in-script
[<div style="text-align: right; font-size: 24px"> &#x1F51D; </div>](#top)

In [None]:
# Make sure the script is using the proper kernel
try:
    %run ../swiss_utils/assert_env.py
except:
    %run ./swiss_utils/assert_env.py

In [None]:
# Import modules

# reload module before executing code
%load_ext autoreload
%autoreload 2

# define modules locations (you might have to adapt define_mod_locs.py)
%run ../swiss_utils/define_mod_locs.py

# to plot figures
%matplotlib inline

# import full general libraries

# import general libraries and allocate them a specific name
import numpy as np # np.average
import pandas as pd # DataFrame
import matplotlib.pyplot as plt

# import specific functions from general libraries
from datetime import datetime
from IPython.display import Image, display, HTML
from matplotlib import colors

# import dedicated function of general libraries

# import ODC (default) functions
from utils.data_cube_utilities.dc_mosaic import create_hdmedians_multiple_band_mosaic
from utils.data_cube_utilities.dc_utilities import write_png_from_xr
from utils.data_cube_utilities.dc_water_classifier import wofs_classify

# import SDC functions
from swiss_utils.data_cube_utilities.sdc_utilities import ls_qa_clean, load_multi_clean, \
                                                          write_geotiff_from_xr, time_list
from swiss_utils.data_cube_utilities.sdc_advutils import oneband_fig, composite_fig

# connect to DC
import datacube
dc = datacube.Datacube()

# silence warning (not recommended during development)
import warnings
warnings.filterwarnings("ignore")

# The next cell contains the dataset configuration information:
- product
- geographical extent
- time period
- bands

You can generate it in three ways:
1. manually from scratch,
2. by manually copy/pasting the final cell content of the [config_tool](config_tool.ipynb) notebook,
3. by loading the final cell content of the [config_tool](config_tool.ipynb) notebook using the magic `# %load config_cell.txt`.

In order to have a sweet running demo it is advised to apply the following rules when generating the configuration cell:
- select a **small dataset** (geograhical extent, time period and number of measurements (bands)) for faster processing,
- use **landsat, but not Landsat 7 product** (as it contains large part of nodata since 2003),
- select an **area covering only a small parts of mountains** (as snow is generally confused with clouds and then considered as nodata),
- for the previous reason, **avoid to select winter period**.

Finally **the following measurements are required**: `red, green, blue, nir, swir2 and pixel_qa`; and the geographical extent should contain some surface with a water body as water detection tools will be used.

In [None]:
%load config_cell.txt

### Load DC (option A)<a name="loaddcopta"></a>

Load requested DC information on memory (meaning an [xarray.Dataset](http://xarray.pydata.org/en/stable/index.html) variable will be created) based on configuration parameters, using default DC function [dc_load](https://datacube-core.readthedocs.io/en/latest/dev/api/generate/datacube.Datacube.load.html).

This function simply load a single product and does not clean it by applying a mask. Then we will have to use the `ls_qa_clean` function dedicated to clean Landsat xarray.Dataset (see [demo_FUN_ls_qa_clean.ipynb](demo_FUN_ls_qa_clean.ipynb) fro more details.
[<div style="text-align: right; font-size: 24px"> &#x1F51D; </div>](#top)

In [None]:
# Load DC the default way (notice this function deals with a single product only)

dataset_in = dc.load(latitude = (min_lat, max_lat),
                     longitude = (min_lon, max_lon),
                     product = product,
                     time = [start_date, end_date],
                     measurements = measurements)

In [None]:
# visualize the DC
print(dataset_in)

In [None]:
# visualize the green band of the first time (remember indexation starts at 0)

print(dataset_in.isel(time = 0).green)

In [None]:
# to illustrate how the mask is used, let's plot an histogram of green band
dataset_in.green.plot.hist()

In [None]:
# let's generate and apply a mask on dataset_in by using ls_qa_clean function from and compare with what
# we add before cleaning

clean_mask = ls_qa_clean(dataset_in.pixel_qa)
dataset_clean = dataset_in.where(clean_mask)

print(dataset_clean)
dataset_clean.green.plot.hist()

In [None]:
# Let's plot green band for all time

dataset_clean.green.plot(x='longitude', y='latitude', col='time', col_wrap=5)

In [None]:
# You might notice several issues let's get rid of them and plot again the green band

# 1. many plot are empty meaning the full scene doesn't contains any data, let's remove them
dataset_clean = dataset_clean.dropna('time', how='all')

# 2. a few pixels have negative values (which shoulln't be the case with Landsat SR), let's remove them
dataset_clean = dataset_clean.where(dataset_clean >= 0)

# Let's replot using appropriate colors
dataset_clean.green.plot(x='longitude', y='latitude', col='time', col_wrap=5, cmap = 'Greens')

In [None]:
# Re-compute clean_mask as the number of time of dataset_clean was reduced
print(f"Original clean_mask shape: {clean_mask.shape}")

clean_mask = ~np.isnan(dataset_clean[measurements[0]].values)

print(f"Updated clean_mask shape: {clean_mask.shape}")

In [None]:
# in the case dataset_in is not anymore necessary and in order free data stored in memory
# let's remove dataset_in

del(dataset_in)

# for the demo, check the variable does not exists anymore.
print(dataset_in)
# You should get an error message as the dataset_in variable does not exists anymore

### Load DC (option B)<a name="loaddcoptb"></a>

Load requested DC information on memory (meaning an [xarray.Dataset](http://xarray.pydata.org/en/stable/index.html) variable will be created) based on configuration parameters, using [load_multi_clean](demo_FUN_load_multi_clean.ipynb) function developped within the frame of the SDC.

This function load several products (in the same xarray.Dataset), automaticaly clean it and generate a mask.

Various function are available from default libraries and function within notebooks to mask a **Landsat xarray.Dataset**, each one giving slighly different results. The function load_multi_clean was developped within SDC context with several purpose:

- ability to process Landsat as well as **Sentinel 2** xarray.Dataset
- give **priority to snow** in case of low probability of clouds in Landsat xarray.Dataset
- load at once **several products**

**load_multi_clean** generate two output:

- a clean xarray.Dataset
- a boolean mask numpy.ndarray

Documentation for a given function can be accessed simply by adding ? at the end of the function in a cell. e.g. `load_multi_clean?` or by selecting the function and pressing `Shift-Tab`.
[<div style="text-align: right; font-size: 24px"> &#x1F51D; </div>](#top)

In [None]:
# Load DC using SDC load_multi_clean function which will immediately generate a clean dataset

dataset_clean, clean_mask = load_multi_clean(dc = dc,
                                             products = product ,
                                             time = [start_date, end_date],
                                             lon = (min_lon, max_lon),
                                             lat = (min_lat, max_lat),
                                             measurements = measurements)
print(dataset_clean)
print(clean_mask)

In [None]:
# let's plot an histogram of green band again

dataset_clean.green.plot.hist()

### Explore the created xarray.Dataset variable (dataset_clean) <a name="explorexr"></a>
[<div style="text-align: right; font-size: 24px"> &#x1F51D; </div>](#top)

In [None]:
# # visualize dataset_clean
# by adding , '\n' we add an empty line between print functions
print(dataset_clean, '\n')

# visualize dimensions
print('dimensions: %s\n' % (list(dataset_clean.dims)))

# visualize time
print('time: %s\n' % (dataset_clean.time))

# get the number of time
print('time count: %s\n' % (len(dataset_clean.time)))

In [None]:
# nicely display time values using pandas library
pd.DataFrame(dataset_clean.time.values, columns = ['date'])

In [None]:
# visualize specific red band
# an xarray.Dataset variable consists in an xarray.DataArray
print(dataset_clean.red)

In [None]:
# visualize specific red band for a given time index
# remember in Python indexing starts at 0
print(dataset_clean.red.isel(time=0))

In [None]:
# Let's plot green band for all time

dataset_clean.green.plot(x='longitude', y='latitude', col='time', col_wrap=5, cmap='Greens')

In [None]:
# let's do it again but this time removing negative values

dataset_clean = dataset_clean.where(dataset_clean >= 0)
dataset_clean.green.dropna('time', how='all').plot(x='longitude', y='latitude', col='time',
                                                  col_wrap=5, cmap = "Greens")

In [None]:
# Let's plot composites in True color (red, green, blue)

dataset_clean[['red','green','blue']].isel(time = time_list(dataset_clean)).to_array().plot.imshow(x='longitude', y='latitude',col='time',col_wrap=5)

### Create, plot and export mosaic figure using default DC functions<a name="pngdef"></a>[<div style="text-align: right; font-size: 24px"> &#x1F51D; </div>](#top)

In [None]:
# Create a mosaic
# several mosaic function (and options) are available:
# - create_mosaic(dataset_in, clean_mask)
# - create_mosaic(dataset_in.sortby('time', ascending = False)
# - create_mean_mosaic(dataset_in)
# - create_median_mosaic(dataset_in)
# - create_min_ndvi_mosaic(dataset_in, clean_mask)
# - create_max_ndvi_mosaic(dataset_in, clean_mask)
# - create_hdmedians_multiple_band_mosaic(dataset_in, clean_mask, operation='median')
# - create_hdmedians_multiple_band_mosaic(dataset_in, clean_mask, operation='medoid')

# we will apply the last one at it seems to be the best balance betwee visual result an processing time
mosaic = create_hdmedians_multiple_band_mosaic(dataset_clean, clean_mask, operation='medoid')
mosaic

In [None]:
# Plot mosaic the default DC way

mosaic[['red','green','blue']].to_array().plot.imshow(x='longitude', y='latitude', robust=True)

In [None]:
# Export mosaic as composite png the default way

write_png_from_xr('demo_mosaic.png', mosaic ,['red', 'green', 'blue'])

# png can be downloaded and visualized through the Home page of the Jupyter interface
# but it can be visualized in the notebook
Image('demo_mosaic.png')

In [None]:
# You might find the image a bit lighter (or darker), then let's find dataset values distribution

kwargs = dict(bins = 50, alpha = 0.3)

mosaic.red.plot.hist(color='red', **kwargs)
mosaic.green.plot.hist(color='green', **kwargs, stacked = True)
mosaic.blue.plot.hist(color='blue', **kwargs, stacked = True)
plt.xlabel('Value')

In [None]:
# improve rendering using scale option
# and display the png

write_png_from_xr('demo_mosaic_scaled.png', mosaic ,['red', 'green', 'blue'], scale = [(0,1000),(0,1000),(0,1000)])

Image('demo_mosaic_scaled.png')

### Plot and export mosaic figure using SDC functions<a name="pngsdc"></a>

In the contect of the SDC a function [composite_fig](https://sdc.unepgrid.ch:8080/notebooks/demo_FUN_figs.ipynb) was created to plot or export 3 band composite png, plus a few bonus (title, scalebar,...).

For documentation run a cell containing:

`composite_fig?`
[<div style="text-align: right; font-size: 24px"> &#x1F51D; </div>](#top)

In [None]:
# Export previous mosaic as composite png the SDC way

composite_fig(mosaic,
              bands = ['red', 'green', 'blue'],
              title = 'Demo composite',
              scalebar_color = 'white',
              max_size = 16)

In [None]:
# for the demo let's reduce the figure size and stretch the image histogram

composite_fig(mosaic,
              bands = ['red', 'green', 'blue'],
              title = 'Demo composite',
              scalebar_color = 'white',
              max_size = 10,
              hist_str = 'contr')

In [None]:
# to export the composite as png, simply add the fig_name parameter

composite_fig(mosaic,
              bands = ['red', 'green', 'blue'],
              title = 'Demo composite',
              scalebar_color = 'white',
              max_size = 10,
              hist_str = 'contr',
              fig_name = 'demo_composite.png')

# when a png is created the composite is not displayed, but it can be downloaded and visualized
# through the Home page of the Jupyter interface or added to the notebook with the command:
Image('demo_composite.png')

### Export xarray.Dataset <a name="exportds"></a>[<div style="text-align: right; font-size: 24px"> &#x1F51D; </div>](#top)

In [None]:
# Export mosaic (xarray.Dataset) as a multi-band (containing all bands) NetCDF using default DC function

mosaic.to_netcdf('mosaic.nc')

# as NetCDF file is created the composite and can be downloaded through the Home page of the Jupyter
# interface. A direct link can also be added to the notebook with the command. But user might have to use
# Shift + Right click to save the link.
display(HTML("""<a href="mosaic.nc" target="_blank" >download NetCDF</a>"""))

In [None]:
# As the default DC export_xarray_to_geotiff function as several weakness, in the contect of the SDC a 
# write_geotiff_from_xr function was created to improve and facilitate xarray.Dataset export by:
# - fixing pixel shift bug of the default function
# - keeping original band name
# - adding a compression option

# For documentation run a cell containing: `write_geotiff_from_xr?`

# As the CRS information was lots during mosaic creation it has to be precised in the next function

write_geotiff_from_xr('mosaic.tif', mosaic, crs = dataset_clean.crs, compr = 'DEFLATE')

# add a direct link (user might have to use Shift + Right click to save the link).
display(HTML("""<a href="mosaic.tif" target="_blank" >download geotiff</a>"""))

### Create, plot and export xarray.DataArray <a name="dataarray"></a>

dataArray will be presented and manipulated using Normalized Difference Indexes
[<div style="text-align: right; font-size: 24px"> &#x1F51D; </div>](#top)

In [None]:
# Let's start computing NDVI for each time

ndvi = (dataset_clean.nir - dataset_clean.red) / (dataset_clean.nir + dataset_clean.red)
print(ndvi)

In [None]:
# then compute mean NDVI through time
ndvi_mean = ndvi.mean(dim=['time'])
# delete the variable we do not need anymore
del ndvi
# replace +-Inf by nan
ndvi_mean = ndvi_mean.where(np.isfinite(ndvi_mean))
ndvi_mean

In [None]:
# plot ndvi_mean the default DC way (as in previous sections, but by using custom NDVI colors
# and fixed extrem colors/values)

ndvi_mean.plot.imshow(x='longitude', y='latitude', vmin=-1, vmax=1,
                      cmap = colors.LinearSegmentedColormap.from_list('ndvi', ['darkblue','blue','lightblue', \
                                                                               'lightgreen','darkgreen'], N=256))


In [None]:
# equivalent plot the SDC way (oneband_fig function)

oneband_fig(ndvi_mean,
            leg = colors.LinearSegmentedColormap.from_list('ndvi', ['darkblue','blue','lightblue',
                                                                    'lightgreen','darkgreen'], N=256),
            title = 'NDVI mean with a gold scalebar',
            scalebar_color = 'gold',
            max_size = 16,
            v_min = -1,
            v_max = 1)

# this is the occasion to compare figure widht/height ratio for default DC and SDC way output,
# notice how the x and y resolution differ in the above figure.

In [None]:
# as the composite_fig function you can save the figure simply by adding the fig_name parameter

oneband_fig(ndvi_mean,
            leg = colors.LinearSegmentedColormap.from_list('ndvi', ['darkblue','blue','lightblue',
                                                                    'lightgreen','darkgreen'], N=256),
            title = 'NDVI mean without scalebar',
            fig_name = 'ndvi_mean_without_scalebar.png',
            max_size = 12,
            v_min = -1,
            v_max = 1)

# and diplay the figure (as well available through the home page of Jupyter):
Image('ndvi_mean_without_scalebar.png')

In [None]:
# Export as NetCDF

ndvi_mean.to_netcdf('ndvi_mean.nc')
display(HTML("""<a href="ndvi_mean.nc" target="_blank" >download NetCDF</a>"""))

# Export as geotiff
# xarray.DataArray need to be converted to xarray.Dataset and the CRS to be defined

write_geotiff_from_xr('ndvi_mean.tif', ndvi_mean.to_dataset(name = 'NDVI'), ['NDVI'],
                      crs = dataset_clean.crs, compr = 'DEFLATE')
display(HTML("""<a href="ndvi_mean.tif" target="_blank" >download geotiff</a>"""))

In [None]:
# compute NDWI and NDBI by combining the 2 commands (then we do not need to delete intermediate index)

ndwi_mean = ((dataset_clean.green - dataset_clean.nir) / (dataset_clean.green + dataset_clean.nir)).mean(dim=['time'])
ndwi_mean = ndwi_mean.where(np.isfinite(ndwi_mean)) # replace +-Inf by nan
ndbi_mean = ((dataset_clean.swir2 - dataset_clean.nir) / (dataset_clean.swir2 + dataset_clean.nir)).mean(dim=['time'])
ndbi_mean = ndbi_mean.where(np.isfinite(ndbi_mean)) # replace +-Inf by nan

In [None]:
# for fun let's create a false color composite using Built, Vegetation and Water indexes

# create a dataset with the 3 bands
bvw_ds = ndbi_mean.to_dataset(name = 'ndbi').merge(ndvi_mean.to_dataset(name = 'ndvi')).merge(ndwi_mean.to_dataset(name = 'ndwi'))
# delete the variable we do not need anymore
del ndbi_mean
del ndvi_mean
del ndwi_mean
# fix nan issues
bvw_ds = bvw_ds.fillna(bvw_ds.min())
bvw_ds

In [None]:
bvw_ds.ndbi.plot.hist(bins = 50, color='red', alpha = 0.3)
bvw_ds.ndvi.plot.hist(bins = 50, color='green', alpha = 0.3, stacked = True)
bvw_ds.ndwi.plot.hist(bins = 50, color='blue', alpha = 0.3, stacked = True)

In [None]:
# finally create a figure with fixed display range (-1 to +1 as we are dealing with normalized indexes)
composite_fig(bvw_ds,
              bands = ['ndbi', 'ndvi', 'ndwi'],
              title = 'Demo BVW composite (with color range fixed to -1 to 1)',
              scalebar_color = 'white',
              max_size = 14,
              v_min = -1,
              v_max = 1,
              fig_name = 'demo_BVW_composite.png')

# and diplay it
Image('demo_BVW_composite.png')

### Single time water time serie analysis <a name="waterts"></a>[<div style="text-align: right; font-size: 24px"> &#x1F51D; </div>](#top)

In [None]:
# run the "Water Observation From Space" algorithme
# replace nodata values (-9999) by nan
# compute percentage of time a pixel was detected as water

# by default this fucntion display a lot of warning
# if not already done let's switch them off
import warnings
warnings.filterwarnings("ignore")

ts_water_classification = wofs_classify(dataset_clean, clean_mask = clean_mask)
ts_water_classification = ts_water_classification.where(ts_water_classification != -9999)
water_classification_percentages = (ts_water_classification.mean(dim = ['time']) * 100).wofs.rename('water_classification_percentages')
del ts_water_classification

# display water percentage
water_classification_percentages.plot()

In [None]:
# display values distribution

water_classification_percentages.plot.hist(bins = 20)