# Welcome to EASI <img align="right" src="../resources/csiro_easi_logo.png">

This notebook introduces new users to working with EASI notebooks and the Open Data Cube (ODC).

It will demonstrate the following basic functionality:
- [Notebook setup](#Notebook-setup)
- [Select an EASI environment](#Select-an-EASI-environment)
- [Connect to the OpenDataCube](#Connect-to-the-OpenDataCube)
  - [List products](#List-products)
  - [List measurements and attributes](#List-measurements-and-attributes)
  - [Choose an area of interest](#Choose-an-area-of-interest)
  - [Load data](#Load-data)
  - [Plot the data](#Plot-the-data)
  - [Masking and scaling](#Masking-and-scaling)
  - [Perform a calculation on the data](#Perform-a-calculation-on-the-data)
  - [Save the results to file](#Save-the-results-to-file)
- [Summary](#Summary)
- [Be a good cloud citizen](#Be-a-good-cloud-citizen)
- [Further reading](#Further-reading)

## Notebook setup

A notebook consists of cells that contain either text descriptions or python code for performing operations on data.

1. Start by clicking on the cell below to select it.
1. Execute a selected cell, or each cell in sequence, by clicking the &#9654; button (in the notebook toolbar above) or pressing `Shift`+`Enter`.
1. Each cell will show an asterisk icon <font color='#999'>[*]:</font> when it is running. Once this changes to a number, the cell has finished.
1. The cell below imports packages to use and sets some formatting options.

In [None]:
# Basic plots
%matplotlib inline
import matplotlib.pyplot as plt
# plt.rcParams['figure.figsize'] = [12, 8]

# Common imports and settings
import os, sys
from IPython.display import Markdown
import pandas as pd
pd.set_option("display.max_rows", None)
import xarray as xr

# Datacube
import datacube
from datacube.utils.rio import configure_s3_access
from datacube.utils import masking
from datacube.utils.cog import write_cog
# https://github.com/GeoscienceAustralia/dea-notebooks/tree/develop/Tools
from dea_tools.plotting import display_map, rgb
from dea_tools.datahandling import mostcommon_crs

# EASI tools
repo = f'{os.environ["HOME"]}/easi-notebooks'  # No easy way to get repo directory
if repo not in sys.path: sys.path.append(repo)
from easi_tools import EasiNotebooks, xarray_object_size

## Select an EASI environment

Each EASI deployment has a different set of products in its opendatacube database. We introduce a set of defaults to allow these training notebooks to be used between EASI deployments.

For this notebook we select the default **Landsat** product.

In [None]:
this = EasiNotebooks('csiro')

family = 'landsat'
product = this.product(family)
display(Markdown(f'Default {family} product for "{this.name}": [{product}]({this.explorer}/products/{product})'))

## Connect to the OpenDataCube

The `Datacube()` API provides search, load and information functions for data products *indexed* in an ODC database. More information on the Open Data Cube software:

- https://datacube-core.readthedocs.io/en/latest/
- https://github.com/opendatacube/datacube-core

In [None]:
dc = datacube.Datacube()

# Access AWS "requester-pays" buckets
# This is necessary for reading data from most thrid-party AWS S3 buckets
from datacube.utils.aws import configure_s3_access
configure_s3_access(aws_unsigned=False, requester_pays=True);

### List products
Show all available products in the ODC database and list them along with selected properties.

The **ODC Explorer** also has this information and more: view available products, data coverage, product definitions, dimensions, metadata and paths to the files.

The product definitions include details about the *measurements* (or bands) in each product and, usually, the spatial resolution and CRS (if common to all member *datasets*).

> **Exercise**: Browse the ODC Explorer link and find the information described above.

In [None]:
display(Markdown(f'#### ODC Explorer: {this.explorer}'))

products = dc.list_products()  # Pandas DataFrame
products

### List measurements and attributes

The data arrays for each product are called **measurements**. In different data science domains these might also be called the "bands", "variables" or "parameters" of a product.

List the measurements of a product. The columns are selected attributes or metadata for each measurement.

> **Hint**: Measurements often have **aliases** defined. Any of the available alias names can be used in place of the measurement name when loading (reading) data. The *xarray* variable name will then be the alias name. Use this feature to help make your loaded data more consistent between products. We do this below.

In [None]:
measurements = dc.list_measurements()  # Pandas DataFrame, all products
measurements.loc[[product]]

### Choose an area of interest

Choose an area of interest with `latitude`/`longitude` bounds. The `display_map` function will draw a map with the bounding box highlighted. See also the ODC Explorer website for the available *latitude*, *longitude* and *time* ranges for each product.

> **Exercise**: Feel free to change the `latitude`/`longitude` or `time` ranges of the query below. 

In [None]:
# Default area of interest

display(Markdown(f'#### Location: {this.location}'))
display(Markdown(f'See: {this.explorer}/products/{product}'))

latitude = this.latitude
longitude = this.longitude

# Or set your own latitude / longitude
# latitude = (-36.3, -35.8)
# longitude = (146.8, 147.3)

display_map(longitude, latitude)

### Load data 
Here we load product data for a given latitude, longitude and time range. The `datacube.load()` function returns an **xarray.Dataset** object.

Once you have an xarray object this can be used with many Python packages. Further information on **xarray**:

- https://tutorial.xarray.dev/overview/xarray-in-45-min.html
- https://xarray.pydata.org/en/stable/user-guide/data-structures.html

**What is the size of my dataset?**

The `display(data)` view is a convenient way to check the data request size, shape and attributes.

> **Exercise**: Click the various arrows and icons in the *xarray.Dataset* output from the previous cell to reveal information about your data.

The `data.nbytes` property returns the number of bytes in the xarray Dataset of DataArray. We have a function that formats this value for convenience.

**Datacube.load() notes**

- Use `measurements=[measurement or alias names]` to only load the measurements you will use, and label them accordingly.
- The `output_crs` and `resolution` parameters allow for remapping to a new grid. These will be required if default values are not defined for the product (see *measurement* attributes).
- The `datacube.load()` function does not apply missing values or scaling attributes. These are left to the user's discretion and requirements.
- `dask_chunks` will return a **dask** array. See the [EASI tutorial dask notebooks](dask/01_-_Introduction_to_Dask.ipynb) for information and examples.

> **Exercise**: The default is to load all available measurements. Load a selected set of measurements or alias names and consider the result.<br>
> **Exercise**: The default is to load the data arrays onto a default grid that closely matches the source data. Change the target resolution or CRS and consider the result.

In [None]:
# A standard datacube.load() call.
# This may take a few minutes while the data are loaded into JupyterLab (so choose a small area and time range).

target_crs = this.crs(family)  # If defined, else None
target_res = this.resolution(family)  # If defined, else None

data = dc.load(
    product = product, 
    latitude = latitude,
    longitude = longitude,
    time = this.time,
    measurements = ['red', 'green', 'blue', 'nir'],  # List of selected measurement names or aliases
    
    output_crs = target_crs,                   # Target CRS
    resolution = target_res,                   # Target resolution
    # dask_chunks = {'x':2048, 'y':2048),      # Dask chunk size. Requires a dask cluster (see the "tutorials/dask" notebooks)
    group_by = 'solar_day',                    # Group by day method
)

display(data)

display(f'Number of bytes: {data.nbytes}')
display(xarray_object_size(data))

### Plot the data
Plot the measurement data for a set of timesteps. The `xarray.plot()` function can simplify the rendering of plots by using the labelled dimensions and data ranges automatically. 

See the [EASI tutorial visualisation notebook](03-visualisation.ipynb) for information and examples.

- The [robust](https://docs.xarray.dev/en/stable/user-guide/plotting.html#robust) option excludes outliers when calculating the colour limts for a more consistent result across subplots (time layers in this case).

> **Exercise**: Change the data variable and perhaps the selection of time layers.

In [None]:
# Select a data variable (measurement) name
band = 'nir'

# Xarray simple array plotting
display(Markdown(f'#### Measurement: {band}'))
data[band].plot(col="time", robust=True, col_wrap=4);

### Masking and scaling

Most data products include a no-data value and/or a *data quality* array that can be used to mask (filter) the measurement arrays. For example, remote sensing quality arrays often include a "cloud" confidence flag that can be used to remove pixels affected by clouds from further analysis. Measurement arrays can also include *scale and offset factors* to transform the array values to scientific values.

This step is common to most data analysis problems so we encourage users to find and understand the relevant quality, scale and offset metadata for each product used and apply these in their applications. For example, here are the relevant product metadata pages for Landsat and Sentinel-2:
- https://www.usgs.gov/landsat-missions/landsat-science-products
- https://sentinels.copernicus.eu/web/sentinel/technical-guides/sentinel-2-msi/level-2a/algorithm

The opendatacube provides functions for creating mask arrays from quality measurements defined in the *product definition*. These are covered in various product-specific and example notebooks.

Here we use a simple function to create a mask using only the *no data* value of each measurement array.

> **Exercise**: What is the effect of applying, or not applying, the `valid_mask` to these data (hint: see the NDVI plot below).

In [None]:
# Mask by nodata

# Under the hood: data != data.nodata -> bool
# Applies to each variable in an xarray.Dataset (including any bit-masks)
valid_mask = masking.valid_data_mask(data)

# Use numpy.where() to apply a mask array to measurement arrays
valid_data = data.where(valid_mask)  # Default: Where False replace with NaN -> convert dtype to float64

# Or provide a no-data value and retain the dtype
# nodata = -9999  # A new nodata value
# valid_data = data.where(valid_mask, nodata)  # Where False replace with nodata -> retain dtype if compatible

### Perform a calculation on the data
As a simple example, we calculate the [Normalized Difference Vegetation Index (NDVI)](https://en.wikipedia.org/wiki/Normalized_difference_vegetation_index) using the *near infra-red (NIR)* and *red* measurements of the product.

- **Note**: This may not be a realistic *NDVI* example if the measurements have not been scaled to science values.

See this DEA notebook for a set of other remote sensing band indices: https://github.com/GeoscienceAustralia/dea-notebooks/blob/develop/Tools/dea_tools/bandindices.py

> **Exercise**: Calculate a different remote sensing band index, possibly with different measurements loaded into `data`.

In [None]:
# Get measurement or alias names corresponding to near-infra read (NIR) and Red bands.

# Calculate the NDVI
varname = 'ndvi'
band_diff = valid_data.nir - valid_data.red
band_sum = valid_data.nir + valid_data.red
calculation = band_diff / band_sum  # xarray.DataArray

# Convert to an xarray.Dataset
calculation = calculation.to_dataset(name=varname, promote_attrs=True)

# Plot the NDVI
display(Markdown(f'#### Calculation: {varname.upper()}'))
calculation[varname].plot(col="time", robust=True, col_wrap=4, vmin=0, vmax=0.7, cmap='RdYlGn');

### Save the results to file
We can save an `xarray.Dataset` to a file(s) that can then be imported into other applications for further analysis or publication if required.

In the code below, the file(s) will be saved to your home directory and appear in the File Browser panel to the left. You may need to select the `folder` icon to go to the top level (`$HOME`) and then `output/`.

Download a file by `'right-click' Download`.

> **Exercise**: Use the Terminal to also list the files in the output directory.

In [None]:
# Xarray can save the data to a netCDF file
# See also: https://github.com/GeoscienceAustralia/dea-notebooks/blob/develop/Frequently_used_code/Exporting_NetCDFs.ipynb

target = f'{os.environ["HOME"]}/output'
if not os.path.isdir(target):
    os.mkdir(target)

calculation.time.attrs.pop('units', None)  # Xarray re-applies this
calculation.to_netcdf(f'{target}/example_landsat_{varname}.nc')
calculation.close()

# Single-layer time slices can be written to Geotiff files
# See also: https://github.com/GeoscienceAustralia/dea-notebooks/blob/develop/Frequently_used_code/Exporting_GeoTIFFs.ipynb

for i in range(len(calculation.time)):
    date = calculation[varname].isel(time=i).time.dt.strftime('%Y%m%d').data
    single = calculation[varname].isel(time=i)
    write_cog(geo_im=single, fname=f'{target}/example_landsat_{varname}_{date}.tif', overwrite=True)

## Summary

This notebook introduced the main steps for querying data (with OpenDataCube), and filtering, plotting, calculating and saving a "cube" of data (with **Xarray**).

There is plenty of detail and options to explore so please work through the other notebooks to learn more and refer back to these notebooks when required. We encourage you to create or bring your own notebooks, and adapt notebooks from other [open-license repositories](https://docs.asia.easi-eo.solutions/user-guide/users-guide/03-using-notebooks/#other-available-odc-notebooks).

In [None]:
rgb(valid_data.isel(time=2), ['red', 'green', 'blue'])

## Be a good cloud citizen

It is good practice to close your JupyterLab session when you have finished with it. Your home directory will be retained and in most cases your workspace of open notebooks will also be retained. These will be available when you return to EASI JupyterLab.

Select `File` menu and `Hub Control Panel` from the JupyterLab menu. Then `Stop My Server`.
- Stop My Server: Your JupyterLab resources will be safely shutdown.
- Log Out: Log out of your JupyterLab *browser* session. Your JupyterLab resources will remain active until the system cleans up.

![image](../resources/stop-my-server.png)

## Further reading 

#### JupyterLab
The JupyterLab website has excellent documentation and video instructions. We recommend users take a few minutes to orientate themselves with the use and features of JupyterLab.

> *Recommended level: Familiarity with notebooks.*

- Getting started: [https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html](https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html)
- Drag and drop upload of files: [https://jupyterlab.readthedocs.io/en/stable/user/files.html](https://jupyterlab.readthedocs.io/en/stable/user/files.html)

#### Python3
There are many options for learning Python from online resources or facilitated training. Some examples are offered here with no suggestion that EASI endorses any of them.

> *Recommended level: Basic Python knowledge and familiarity with array manipulations, __numpy__ and __xarray__. Familiarity with some plotting libraries (e.g., __matplotlib__) would also help.*

- Get started: [https://www.python.org/about/gettingstarted](https://www.python.org/about/gettingstarted/)
- Learn Python tutorials: [https://www.learnpython.org](https://www.learnpython.org/)
- Data Camp: [https://www.datacamp.com](https://www.datacamp.com/)
- David Beazley courses: [https://dabeaz-course.github.io/practical-python](https://dabeaz-course.github.io/practical-python/)
- Numpy: [https://numpy.org/doc/stable/user/quickstart.html](https://numpy.org/doc/stable/user/quickstart.html)
- Xarray: [http://xarray.pydata.org/en/stable/user-guide/data-structures.html](http://xarray.pydata.org/en/stable/user-guide/data-structures.html)
- Pandas: [https://pandas.pydata.org/docs/getting_started/index.html](https://pandas.pydata.org/docs/getting_started/index.html)

#### Git
Git is a document version control system. It retains a full history of changes to all files (including deleted ones) by tracking incremental changes and recording a history timeline of changes. The best way to learn Git is by practice and incrementally: start with simple, common actions and gain more knowledge as required.  

> *Recommended level: Basic understanding of Git repositories (e.g., github.com) and practices such as __clone__, __pull__/__push__ and __merging__ changes.*

- Getting started: [https://git-scm.com/doc](https://git-scm.com/doc)
- JupyterLab Git extension: [https://github.com/jupyterlab/jupyterlab-git#readme](https://github.com/jupyterlab/jupyterlab-git#readme)
- DEA Git guide: [https://github.com/GeoscienceAustralia/dea-notebooks/wiki/Guide-to-using-DEA-Notebooks-with-git](https://github.com/GeoscienceAustralia/dea-notebooks/wiki/Guide-to-using-DEA-Notebooks-with-git)
- Undoing things guide: [https://git-scm.com/book/en/v2/Git-Basics-Undoing-Things](https://git-scm.com/book/en/v2/Git-Basics-Undoing-Things)
- Understanding branches: [https://nvie.com/posts/a-successful-git-branching-model](https://nvie.com/posts/a-successful-git-branching-model)

#### Open Data Cube
The ODC is a Python library that allows the user to search for datasets in its database and return an **xarray** data array. There are convenience functions and methods for resampling, reprojecting and masking the data.

> *Recommended level: Overview of the design and intent of ODC or other datacubes*

- ODC website: [https://www.opendatacube.org](https://www.opendatacube.org)
- ODC API reference: [https://datacube-core.readthedocs.io](https://datacube-core.readthedocs.io)
- ODC Github code: [https://github.com/opendatacube](https://github.com/opendatacube)

#### Notebooks for EO data analysis 
There are growing collections of notebooks available from many organizations, most of which can be adapted to use with ODC and EASI.

> *Recommended level: Overview of available notebooks and selected EO applications*

- CSIRO EASI notebooks: [https://github.com/csiro-easi/easi-notebooks](https://github.com/csiro-easi/easi-notebooks)
- Digital Earth Australia: [https://github.com/GeoscienceAustralia/dea-notebooks](https://github.com/GeoscienceAustralia/dea-notebooks)
- Digital Earth Africa: [https://github.com/digitalearthafrica/deafrica-sandbox-notebooks](https://github.com/digitalearthafrica/deafrica-sandbox-notebooks)
- CEOS SEO (NASA): [https://github.com/ceos-seo/data_cube_notebooks](https://github.com/ceos-seo/data_cube_notebooks)