In [1]:
%%html
<style>
    .dothis{
    font-weight: bold;
    color: #ff7f0e;
    font-size:large
    }
</style>

## Demo of the Swiss Data Cube <a name="top"></a>

This notebook introduces you to the Swis Data Cube. It has the following sections:

- **[Standard script/notebook beginning](#standbeg)**: To run cells from other sections you first need to run all cells of this section.

- **[Load a data cube](#loaddcoptb)**: loads a datacube (into an `xarray.Dataset`) for further analysis.
    
- **[Explore created data cube](#explorexr)**: explore the created `xarray.Dataset` variable (dataset_clean).

- **[Create, plot and export mosaic figure using default data cube functions](#pngdef)**

- **[Plot and export mosaic figure using Swiss Data Cube functions](#pngsdc)**

- **[Export `xarray.Dataset`](#exportds)**

- **[Create, plot and export `xarray.DataArray`](#dataarray)**

- **[Water time series analysis](#waterts)**

- **[Extracting time series at a point](#tsextract)**
    

---


### Standard script beginning <a name="standbeg"></a>

The cells in this section are generally found at the beginning of a script (and it is advised to re-use this template in all new notebooks you make).

To run cells from above sections you need to run all cells of this section.

- **import dependencies**: import libraries, connect to datacube.
- **Configuration**: all variables you might need to change. Keep in mind that the larger it will be (in terms of geograhical extent, time period and number of measurements (bands)), the slower the loading will go.
- **Functions**: all functions written in-script
[<div style="text-align: right; font-size: 24px"> &#x1F51D; </div>](#top)

In [None]:
# Make sure the script is using the correct kernel (see also the README)
try:
    %run ../swiss_utils/assert_env.py
except:
    %run ./swiss_utils/assert_env.py

In [None]:
# Import modules

# reload module before executing code
%load_ext autoreload
%autoreload 2

# define modules locations (you might have to adapt define_mod_locs.py)
%run ./swiss_utils/define_mod_locs.py

# to plot figures
%matplotlib inline

# import full general libraries

# import general libraries and allocate them a specific name
import numpy as np # np.average
import pandas as pd # DataFrame
import matplotlib.pyplot as plt

# import specific functions from general libraries
from datetime import datetime
from IPython.display import Image, display, HTML
from matplotlib import colors

# import dedicated function of general libraries

# import ODC (default) functions
from utils.data_cube_utilities.dc_mosaic import create_hdmedians_multiple_band_mosaic
from utils.data_cube_utilities.dc_utilities import write_png_from_xr
from utils.data_cube_utilities.dc_water_classifier import wofs_classify

# import SDC functions
from swiss_utils.data_cube_utilities.sdc_utilities import ls_qa_clean, load_multi_clean, \
                                                          write_geotiff_from_xr, time_list
from swiss_utils.data_cube_utilities.sdc_advutils import oneband_fig, composite_fig

# connect to DC
import datacube
dc = datacube.Datacube()

# silence warning (not recommended during development)
import warnings
warnings.filterwarnings("ignore")

### Set up the data cube
The next cell contains the data cube configuration information:

- product
- geographical extent
- time period
- bands

You can create it in three ways:
1. by loading the final cell content of the [config_tool](config_tool.ipynb) notebook using the magic `%load config_cell.txt`.
2. by manually copy/pasting the final cell content of the [config_tool](config_tool.ipynb) notebook,
3. manually by typing it out.

Apply the following rules when generating the configuration cell:
- select a **small dataset** (geograhical extent, time period and number of measurements (bands)) for faster processing,
- select an **area covering only a small parts of mountains** (as snow is generally confused with clouds and then considered as nodata),
- If selecting winter, **be careful** as the chances of confusing clouds and snow are higher.

Specifically for this demo:
- Use **landsat - but not any Landsat 7 product** (as it contains large part of nodata since 2003),
- **The following measurements are required**: `red, green, blue, nir, swir1, swir2` and `pixel_qa`
- the geographical extent should **contain some water/a lake** as water detection tools will be used.


Now:
<ul class="dothis">
    <li>Use the config_tool to create <tt>config_cell.txt</tt>.</li>
    <li>Execute the cell below to load the contents of <tt>config_cell.txt</tt>.</li>
    <li>Execute the cell below again so that Python reads/executes the variables.</li>
</ul>

In [None]:
%load config_cell.txt

### Load data cube<a name="loaddcoptb"></a>

Load requested data cube (meaning an [xarray.Dataset](http://xarray.pydata.org/en/stable/index.html) variable will be created) based on configuration parameters, using [load_multi_clean](demo_FUN_load_multi_clean.ipynb).

This function loads several products (in the same xarray.Dataset), cleans it and generates a mask.

Various masking functions are available from the Open Data Cube libraries and the SDC, each one giving slightly different results. The function `load_multi_clean`:

- can process Landsat as well as **Sentinel 2** data cubes
- with Landsat, gives **priority to snow** when there is a low probability of cloud cover
- can load **several products** at once.

**load_multi_clean** generates two outputs:

- a clean `xarray.Dataset`
- a boolean mask `numpy.ndarray`

Documentation for a given function can be accessed simply by adding ? at the end of the function in a cell. e.g. `load_multi_clean?` or by selecting the function and pressing `Shift-Tab`.
[<div style="text-align: right; font-size: 24px"> &#x1F51D; </div>](#top)

In [None]:
# Load a cube using SDC load_multi_clean function which will generate a clean dataset
# Sometimes this doesn't work the first time - if not, re-execute the %load config_cell.txt cell then try again!

dataset_clean, clean_mask = load_multi_clean(dc = dc,
                                             products = product ,
                                             time = [start_date, end_date],
                                             lon = (min_lon, max_lon),
                                             lat = (min_lat, max_lat),
                                             measurements = measurements 
                                             )

In [None]:
# Let's take a look at the contents of the datacube we've loaded
dataset_clean

In [None]:
# let's plot an histogram of green band
dataset_clean.green.plot.hist()

### Explore the created xarray.Dataset variable (dataset_clean) <a name="explorexr"></a>
[<div style="text-align: right; font-size: 24px"> &#x1F51D; </div>](#top)

In [None]:
# Look at dimensions
dataset_clean.dims

In [None]:
# get the number of time points in the cube
print('time count: %s\n' % (len(dataset_clean.time)))

In [None]:
# Look at time dimension
dataset_clean.time

In [None]:
# nicely display time values using pandas library
pd.DataFrame(dataset_clean.time, columns=['date'])

In [None]:
# visualize specific red band
# an xarray.Dataset variable consists in an xarray.DataArray
dataset_clean.red

In [None]:
# visualize specific red band for a given time index
# remember in Python indexing starts at 0
dataset_clean.red.isel(time=0)

In [None]:
# Let's plot green band for all time

dataset_clean.green.plot(x='longitude', y='latitude', col='time', col_wrap=5, cmap='Greens')

In [None]:
# Let's plot composites in True color (red, green, blue)
# robust=True guesses the minimum and maximum values for each image.
dataset_clean[['red','green','blue']].to_array().plot.imshow(col='time',col_wrap=5, robust=True)

### Create, plot and export mosaic figure<a name="pngdef"></a>[<div style="text-align: right; font-size: 24px"> &#x1F51D; </div>](#top)
Across the whole geographical area, we can combine all our different satellite images into one composite/mosaic that represents the time period of interest. Here we export to the `png` format, which is very suitable for your project reports. Note that pngs do not have georeferencing information, so they cannot be read by GIS software such as QGIS. See later in this demo for how to create GeoTIFFs (`tif`).

In [None]:
# Create a mosaic
# several mosaic function (and options) are available:
# - create_mosaic(dataset_in, clean_mask)
# - create_mosaic(dataset_in.sortby('time', ascending = False)
# - create_mean_mosaic(dataset_in)
# - create_median_mosaic(dataset_in)
# - create_min_ndvi_mosaic(dataset_in, clean_mask)
# - create_max_ndvi_mosaic(dataset_in, clean_mask)
# - create_hdmedians_multiple_band_mosaic(dataset_in, clean_mask, operation='median')
# - create_hdmedians_multiple_band_mosaic(dataset_in, clean_mask, operation='medoid')

# we will apply the last one at it seems to be the best balance betwee visual result an processing time
mosaic = create_hdmedians_multiple_band_mosaic(dataset_clean, clean_mask, operation='medoid')
mosaic

In [None]:
# Plot mosaic the default way
mosaic[['red','green','blue']].to_array().plot.imshow(x='longitude', y='latitude', robust=True)

In [None]:
# Export mosaic as composite png the default way
write_png_from_xr('demo_mosaic.png', mosaic ,['red', 'green', 'blue'])

# png can be downloaded and visualized through the Home page of the Jupyter interface
# but it can also be visualized in the notebook
Image('demo_mosaic.png')

In [None]:
# You might find the image a bit lighter (or darker), then let's find dataset values distribution
kwargs = dict(bins = 50, alpha = 0.3)

mosaic.red.plot.hist(color='red', **kwargs)
mosaic.green.plot.hist(color='green', **kwargs, stacked = True)
mosaic.blue.plot.hist(color='blue', **kwargs, stacked = True)
plt.xlabel('Value')

In [None]:
# improve rendering using scale option
# and display the png

write_png_from_xr('demo_mosaic_scaled.png', mosaic ,['red', 'green', 'blue'], scale = [(0,2000),(0,2000),(0,2000)])

Image('demo_mosaic_scaled.png')

### Plot and export mosaic figure the Swiss Data Cube way<a name="pngsdc"></a>

This adds bonus features such as a title, scale bar...

For documentation run a cell containing:

`composite_fig?`
[<div style="text-align: right; font-size: 24px"> &#x1F51D; </div>](#top)

In [None]:
# Export previous mosaic as composite png the SDC way

composite_fig(mosaic,
              bands = ['red', 'green', 'blue'],
              title = 'Demo composite',
              scalebar_color = 'white',
              max_size = 16)

In [None]:
# for the demo let's reduce the figure size and stretch the image histogram

composite_fig(mosaic,
              bands = ['red', 'green', 'blue'],
              title = 'Demo composite',
              scalebar_color = 'white',
              max_size = 10,
              hist_str = 'contr')

In [None]:
# to export the composite as png, simply add the fig_name parameter

composite_fig(mosaic,
              bands = ['red', 'green', 'blue'],
              title = 'Demo composite',
              scalebar_color = 'white',
              max_size = 10,
              hist_str = 'contr',
              fig_name = 'demo_composite.png')

# when a png is created the composite is not displayed, but it can be downloaded and visualized
# through the Home page of the Jupyter interface or added to the notebook with the command:
Image('demo_composite.png')

### Export xarray.Dataset <a name="exportds"></a>[<div style="text-align: right; font-size: 24px"> &#x1F51D; </div>](#top)

In [None]:
# Export mosaic (xarray.Dataset) as a multi-band (containing all bands) NetCDF
mosaic.to_netcdf('mosaic.nc')

In [None]:
# You can re-load this later, which is very useful to avoid having to query the DataCube every time!
import xarray as xr
mosaic_from_disk = xr.open_dataset('mosaic.nc')
mosaic_from_disk

### Export a GeoTIFF - can be added straight into software like QGIS/ArcGIS.


In [None]:
# For documentation run a cell containing: `write_geotiff_from_xr?`

# As the CRS information was lots during mosaic creation it has to be precised in the next function

write_geotiff_from_xr('mosaic.tif', mosaic, crs = dataset_clean.crs, compr = 'DEFLATE')

# add a direct link (user might have to use Shift + Right click to save the link).
display(HTML("""<a href="mosaic.tif" target="_blank" >download geotiff</a>"""))

### Computing Normalized Difference Indexes


In [None]:
# Let's start computing NDVI for each time

ndvi = (dataset_clean.nir - dataset_clean.red) / (dataset_clean.nir + dataset_clean.red)
ndvi

In [None]:
# then compute mean NDVI of the full time period
ndvi_mean = ndvi.mean(dim=['time'])

# replace +-Inf by nan
ndvi_mean = ndvi_mean.where(np.isfinite(ndvi_mean))
ndvi_mean

In [None]:
# plot ndvi_mean the default way (as in previous sections, but by using custom NDVI colors
# and fixed extreme colors/values)

ndvi_mean.plot.imshow(x='longitude', y='latitude', vmin=-1, vmax=1,
                      cmap = colors.LinearSegmentedColormap.from_list('ndvi', ['darkblue','blue','lightblue', \
                                                                               'lightgreen','darkgreen'], N=256))


In [None]:
# equivalent plot the SDC way (oneband_fig function)

oneband_fig(ndvi_mean,
            leg = colors.LinearSegmentedColormap.from_list('ndvi', ['darkblue','blue','lightblue',
                                                                    'lightgreen','darkgreen'], N=256),
            title = 'NDVI mean with a gold scalebar',
            scalebar_color = 'gold',
            max_size = 16,
            v_min = -1,
            v_max = 1)

# Compare the figure width/height ratio for default output and the Swiss Data Cube option.
# Notice how the x and y resolution differ in the above figure.

### Export a DataArray 

In [None]:
# Export as NetCDF

ndvi_mean.to_netcdf('ndvi_mean.nc')
display(HTML("""<a href="ndvi_mean.nc" target="_blank" >download NetCDF</a>"""))

In [None]:
# Export as geotiff
# xarray.DataArray need to be converted to xarray.Dataset and the CRS to be defined

write_geotiff_from_xr('ndvi_mean.tif', ndvi_mean.to_dataset(name = 'NDVI'), ['NDVI'],
                      crs = dataset_clean.crs, compr = 'DEFLATE')
display(HTML("""<a href="ndvi_mean.tif" target="_blank" >download geotiff</a>"""))

In [None]:
# compute NDWI and NDBI by combining the 2 commands (then we do not need to delete intermediate index)

ndwi_mean = ((dataset_clean.green - dataset_clean.nir) / (dataset_clean.green + dataset_clean.nir)).mean(dim=['time'])
ndwi_mean = ndwi_mean.where(np.isfinite(ndwi_mean)) # replace +-Inf by nan
ndbi_mean = ((dataset_clean.swir2 - dataset_clean.nir) / (dataset_clean.swir2 + dataset_clean.nir)).mean(dim=['time'])
ndbi_mean = ndbi_mean.where(np.isfinite(ndbi_mean)) # replace +-Inf by nan

In [None]:
# for fun let's create a false color composite using Built, Vegetation and Water indexes

# create a dataset with the 3 bands
bvw_ds = ndbi_mean.to_dataset(name = 'ndbi').merge(ndvi_mean.to_dataset(name = 'ndvi')).merge(ndwi_mean.to_dataset(name = 'ndwi'))
# delete the variable we do not need anymore
del ndbi_mean
del ndvi_mean
del ndwi_mean
# fix nan issues
bvw_ds = bvw_ds.fillna(bvw_ds.min())
bvw_ds

In [None]:
bvw_ds.ndbi.plot.hist(bins = 50, color='red', alpha = 0.3)
bvw_ds.ndvi.plot.hist(bins = 50, color='green', alpha = 0.3, stacked = True)
bvw_ds.ndwi.plot.hist(bins = 50, color='blue', alpha = 0.3, stacked = True)

In [None]:
# finally create a figure with fixed display range (-1 to +1 as we are dealing with normalized indexes)
composite_fig(bvw_ds,
              bands = ['ndbi', 'ndvi', 'ndwi'],
              title = 'Demo BVW composite (with color range fixed to -1 to 1)',
              scalebar_color = 'white',
              max_size = 14,
              v_min = -1,
              v_max = 1,
              fig_name = 'demo_BVW_composite.png')

# and diplay it
Image('demo_BVW_composite.png')

### Single time water time series analysis <a name="waterts"></a>

In [None]:
# run the "Water Observation From Space" algorithm
# replace nodata values (-9999) by nan
# compute percentage of time a pixel was detected as water

# by default this function displays several warnings, we are turning them off...
import warnings
warnings.filterwarnings("ignore")

ts_water_classification = wofs_classify(dataset_clean, clean_mask = clean_mask)
ts_water_classification = ts_water_classification.where(ts_water_classification != -9999)
water_classification_percentages = (ts_water_classification.mean(dim = ['time']) * 100).wofs.rename('water_classification_percentages')

# display water percentage
water_classification_percentages.plot()

In [None]:
# display values distribution

water_classification_percentages.plot.hist(bins = 20)

### BONUS: Extracting and plotting data through time <a name="tsextract"></a>
We will be covering time series analysis in much more detail on Friday morning!

In [None]:
# Let's show a map of the area where the current data cube covers.
from shapely.geometry import Polygon
from swiss_utils.data_cube_utilities.sdc_utilities import new_get_query_metadata
from swiss_utils.data_cube_utilities.sdc_advutils import draw_map

# We need the coordinate reference system of the product we are looking at.
mtd = new_get_query_metadata(dc, product)
crs = mtd['crs']

# Add an empty map you can draw on it
m, drawn_features = draw_map([min_lat, max_lat], [min_lon, max_lon], 'epsg:4326', draw=False)
from ipyleaflet import DrawControl
draw_c = DrawControl(marker={'shapeOptions': {'color': '#0000FF'}},
                 polyline={},
                 circle={},
                 circlemarker={},
                 polygon={}
                )
m.add_control(draw_c)
print('Within the red rectangle, zoom, pan and then, using the Marker tool on the left, place a marker where you want to extract a time series:')
m

In [None]:
coords = draw_c.last_draw['geometry']['coordinates']

In [None]:
coords

In [None]:
ndvi.sel(latitude=coords[0], longitude=coords[1], method='nearest').plot(marker='o', linestyle='none')
plt.ylabel('NDWI')

In [None]:
# Let's look at a certain time period in more detail
ndvi.sel(latitude=coords[0], longitude=coords[1], method='nearest').sel(time=slice('2021-02-01', '2021-04-25')).plot(marker='o', linestyle='none')
plt.ylabel('NDWI')

In [None]:
# We can convert our time series to a Pandas series for more examination
ndvi_at_point = ndvi.sel(latitude=coords[0], longitude=coords[1], method='nearest').to_pandas()

In [None]:
ndvi_at_point

In [None]:
# Let's resample to a monthly data series. Monthly values are calculated as the median of all values in the month.
ndvi_pt_monthly = ndvi_at_point.resample('1M').median()
ndvi_pt_monthly.plot(marker='x', linestyle='none')

In [None]:
# And now let's export to Comma Separated Format, CSV - this can be opened by other programs like Excel.
ndvi_pt_monthly.to_csv('ndvi_at_pt.csv')

In [None]:
# We can also generate a time series plot of the whole datacube area
ndvi.median(dim=('longitude','latitude')).plot()

*****

# Reprojection

*****

All the operations above we carried out using a CRS (coordinate reference system) of latitude and longitude called WGS84 (its code is *EPSG:4326*). You might have noticed that this CRS is displaying things with units of latitude and longitude. The images look compressed in the latitude dimension. Below is an example how you can reproject the data to CH1903+ / LV95 (EPSG:2056), also known as "SwissGrid".


In [None]:
# By default lat and lon use EPSG:4326 which is the CRS used to store SDC data.
# Let's reproject the xarray.Dataset into (in our case Swiss CRS) CH1903+ / LV95 (EPSG:2056).
dataset_CH = ndvi.rio.set_crs("epsg:4326").rio.reproject("epsg:2056")

# xarray.Dataset CRS metadata remains in previous CRS
# let's update metadata
dataset_CH.attrs['crs'] = 'EPSG:2056'
dataset_CH

Note how the dimensions have changed from `latitude, longitude, time` to `x, y, time`.

Plot the mosaic again. We will see that the coordinate axes have changed and now represent the familiar Swissgrid. <span class='dothis'>Compare it</span> to the lat/lon images we made earlier in this notebook.

In [None]:
# Plot mosaic again
plt.figure()
ax = plt.subplot(111, aspect='equal')
p = dataset_CH.isel(time=0).plot.imshow(robust=True)
# Make the x and y coordinates equally spaced.
plt.gca().set_aspect('equal')

In [None]:
dataset_CH

**Note how the coordinate units have changed from degrees to metres, compared to the previous plots.**

You cannot use the `write_geotiff_from_xr()` function to export datasets that are in SwissGrid, it will cause an error. Use instead the `rio.to_raster()` function which we used already earlier.

In [None]:
dataset_CH.isel(time=2).rio.to_raster("ndvi_swissgrid.tif")