# TS-1: Data preparation

*****

This notebook allows you to load and pre-process an SDC dataset, which you can then save into a NetCDF (.nc) file to be reused quickly in other Notebooks where you do your analysis.

Things you should change:

* The config_cell variables
* The output filename of the netcdf file (see the last cell).

*****


In [None]:
# Make sure the script is using the proper kernel
try:
    %run ../swiss_utils/assert_env.py
except:
    %run ./swiss_utils/assert_env.py

In [None]:
# Import modules

# reload module before executing code
%load_ext autoreload
%autoreload 2

# define modules locations (you might have to adapt define_mod_locs.py)
%run swiss_utils/define_mod_locs.py

import os
import shutil

import numpy as np

from datetime import datetime

from swiss_utils.data_cube_utilities.sdc_utilities import load_multi_clean

import datacube
dc = datacube.Datacube()

# silence warning
import warnings
warnings.filterwarnings("ignore")

The next cell contains the dataset configuration information:
- product
- geographical extent
- time period
- bands

You can generate it in three ways:
1. manually from scratch,
2. by manually copy/pasting the final cell content of the [config_tool](config_tool.ipynb) notebook,
3. by loading the final cell content of the [config_tool](config_tool.ipynb) notebook using the magic `# %load config_cell.txt`.

In [None]:
%load "config_cell.txt"

In [None]:
# Load the dataset and clean it

ds_in, clean_mask = load_multi_clean(dc = dc, products = product,
                                     time = [start_date, end_date],
                                     lon = [min_lon, max_lon], lat = [min_lat, max_lat],
                                     measurements = measurements)
del clean_mask
ds_in = ds_in.where(ds_in >= 0) # keep only positive values
ds_in = ds_in.dropna('time', how='all') # drop scenes without data

In [None]:
ds_in

In [None]:
# OPTIONAL CELL TO CALCULATE NDIs
# You can already calculate normalised difference indexes here to be saved with the measurements.
# To do this, uncomment the relevant line(s) below and/or add your own.

#ds_in['ndvi'] = (ds_in.nir - ds_in.red) / (ds_in.nir + ds_in.red)
#ds_in['ndwi'] = (ds_in.green - ds_in.nir) / (ds_in.green + ds_in.nir)

# 'NDWI': '(ds.green - ds.nir) / (ds.green + ds.nir)',
# 'NDBI': '(ds.swir2 - ds.nir) / (ds.swir2 + ds.nir)'

In [None]:
## Some necessary small changes so that we can save this dataset to a NetCDF (.nc) file.

# Remove quality info attributes
if 'pixel_qa' in measurements:
    ds_in.pixel_qa.attrs['flags_definition'] = []
elif 'slc' in measurements:
    ds_in.slc.attrs['flags_definition'] = []
    
# Remove time attributes.
ds_in.ndvi.time.attrs = {}


In [None]:
# Save the file. Change the output filename to something useful!
output_filename = 'myfile.nc'
ds_in.to_netcdf(output_filename)
