# Welcome to EASI <img align="right" src="../resources/csiro_easi_logo.png">

This notebook introduces new users to working with EASI notebooks and the Open Data Cube.

It will demonstrate the following basic functionality:
- [Notebook setup](#Notebook-setup)
- [Connect to the OpenDataCube](#Connect-to-the-OpenDataCube)
  - [List products](#List-products)
  - [List measurements](#List-measurements)
  - [Choose a region of interest](#Choose-a-region-of-interest)
  - [Load data](#Load-data)
  - [Plot data](#Plot-data)
  - [Masking and Scaling](#Masking-and-Scaling)
  - [Perform a calculation on the data](#Perform-a-calculation-on-the-data)
  - [Save the results to a file](#Save-the-results-to-file)
- [Summary](#Summary)
- [Further reading](#Further-reading)

## Notebook setup

A notebook consists of cells that contain either text descriptions or python code for performing operations on data.

1. Start by clicking on the cell below to select it.
1. Execute a selected cell, or each cell in sequence, by clicking the &#9654; button (in the notebook toolbar above) or pressing `Shift`+`Enter`.
1. Each cell will show an asterisk icon <font color='#999'>[*]:</font> when it is running. Once this changes to a number, the cell has finished.
1. The cell below imports packages to use and sets some formatting options.

In [None]:
# Basic plots
%matplotlib inline
import matplotlib.pyplot as plt
# plt.rcParams['figure.figsize'] = [12, 8]

# Common imports and settings
import os, sys
from IPython.display import Markdown
import pandas as pd
pd.set_option("display.max_rows", None)
import xarray as xr

# Datacube
import datacube
from datacube.utils.rio import configure_s3_access
from dea_tools.plotting import display_map, rgb  # https://github.com/GeoscienceAustralia/dea-notebooks/tree/develop/Tools

# EASI tools
repo = f'{os.environ["HOME"]}/easi-notebooks'  # No easy way to get repo directory
if repo not in sys.path: sys.path.append(repo)
from easi_tools import EasiNotebooks, xarray_object_size

## Select an EASI environment

Each EASI deployment has a different set of products in its opendatacube database. We introduce a set of defaults to allow these training notebooks to be used between EASI deployments.

For this notebook we select the default **Landsat** product.

In [None]:
this = EasiNotebooks('csiro')

product = this.product('landsat')
display(Markdown(f'Default landsat product for "{this.name}": [{product}]({this.explorer}/products/{product})'))

## Connect to the Open Data Cube

The `Datacube()` API provides search, load and information functions for data products *indexed* in an ODC database. More information on the Open Data Cube software:

- https://datacube-core.readthedocs.io/en/latest/.
- https://github.com/opendatacube/datacube-core

In [None]:
dc = datacube.Datacube()

# Access AWS "requester-pays" buckets
# This is necessary for reading data from most thrid-party AWS S3 buckets
from datacube.utils.aws import configure_s3_access
configure_s3_access(aws_unsigned=False, requester_pays=True);

### List products
Show all available products in the ODC database and list them along with selected properties.

Additionally view available products, data coverage, **product definitions**, dimensions, metadata and paths to the files in the corresponding **ODC Explorer** website.

In [None]:
display(Markdown(f'### See: {this.explorer}'))

products = dc.list_products()  # Pandas DataFrame
products

### List measurements

The data arrays for each product are called **measurements**. List the measurements of a product. The columns are selected attributes or metadata for each measurement.

In [None]:
measurements = dc.list_measurements()  # Pandas DataFrame, all products
measurements.loc[[product]]

### Choose an area of interest

Choose an area of interest with `latitude`/`longitude` bounds. The `display_map` function will draw a map with the bounding box highlighted. See also the Explorer website for the available *latitude*, *longitude* and *time* ranges for each product.

> Feel free to change the `latitude`/`longitude` or `time` ranges of the query below. 

In [None]:
# Default area of interest

display(Markdown(f'### Location: {this.location}'))
display(Markdown(f'See: {this.explorer}/products/{product}'))

latitude = this.latitude
longitude = this.longitude

# Or set your own latitude / longitude
# latitude = (-36.3, -35.8)
# longitude = (146.8, 147.3)

display_map(longitude, latitude)

### Load data 
Load product data for a given latitude, longitude and time range. The `datacube.load()` function returns an **xarray.Dataset** object.

- The `output_crs` and `resolution` parameters allow for remapping to a new grid. These will be required if default values are not defined for the product (see *measurement* attributes).
- The `datacube.load()` function does not apply missing values or scaling attributes. These are left to the user's discretion and requirements.

Once you have an xarray object this can be used with many Python packages.

> **Exercise**: We normally load only the measurements that we intend to use. Load a selected set of measurements names and consider the result.

**What is the size of my dataset?**

The `display(data)` view is a convenient way to check the data request size, shape and attributes.

- Click the various arrows and icons in the *xarray.Dataset* output from the previous cell to reveal information about your data.

The `data.nbytes` property returns the number of bytes in the xarray Dataset of DataArray. We have a function that formats this value for convenience.

Further information on **xarray**:

- https://tutorial.xarray.dev/overview/xarray-in-45-min.html
- https://xarray.pydata.org/en/stable/user-guide/data-structures.html

In [None]:
# A standard datacube.load() call.
# This may take a few minutes while the data are loaded into JupyterLab (so choose a small area and time range).

data = dc.load(
    product = product, 
    latitude = latitude,
    longitude = longitude,
    time = this.time,
    
    # measurements = ['red', 'green', 'blue']  # List of selected measurement names (or aliases)
    # output_crs = 'epsg:3577',                # Target CRS
    # resolution = (30, -30),                  # Target resolution
    # dask_chunks = {'x':2048, 'y':2048),      # Dask chunk size. Requires a dask cluster (see the "tutorials/dask" notebooks)
    group_by = 'solar_day',                    # Group by day method
)

display(data)

display(f'Number of bytes: {data.nbytes}')
display(xarray_object_size(data))

### Plot the data
Plot the measurement data for a set of timesteps. The `xarray.plot()` function can simplify the rendering of plots by using the labelled dimensions and data ranges automatically. 

- The [robust](https://docs.xarray.dev/en/stable/user-guide/plotting.html#robust) option excludes outliers when calculating the colour limts for a more consistent result across subplots (time layers in this case).

In [None]:
# Select a data variable (measurement) name
band = this.default_band(product)

# Xarray simple array plotting
data[band].plot(col="time", robust=True, col_wrap=4);

### Masking and Scaling

Masking and scaling are examples of applying additional functions to the *xarray* data.

Many ODC products include a "quality" array that can be used to filter the measurement arrays. For example, most quality layers include a "cloud" confidence quality flag that can be used to remove pixels affected by clouds from further analysis. ODC products may also include scale and offset factors to transform the array values to scientific values.

Masking and scaling details are specific to each *product* and *measurement* chosen. We bypass this step in this introduction notebook, and instead we direct users to product-specific notebooks that contain the appropriate details.

### Perform a calculation on the data
As a simple example, we calculate the [Normalized Difference Vegetation Index (NDVI)](https://en.wikipedia.org/wiki/Normalized_difference_vegetation_index).

In [None]:
# Calculate the NDVI
band_diff = filtered_data.nir08 - filtered_data.red
band_sum = filtered_data.nir08 + filtered_data.red
ndvi = band_diff / band_sum

# Plot the masked NDVI
ndvi.plot(col="time", robust=True, col_wrap=4, vmin=0, vmax=0.7, cmap='RdYlGn');

### Save the results to file
After processing the data we can then save the output to a file that can then be imported into other applications for further analysis or publication if required.

The file will be saved to your home directory and appear on the File Browser panel to the left. You may need to select the `folder` icon to go to the top level (`/home/jovyan/`).

Download a file by `'right-click' Download`.

In [None]:
# Xarray can save the data to a netCDF file

ndvi.time.attrs.pop('units', None)  # Xarray re-applies this
ndvi.to_netcdf("/home/jovyan/landsat8_sr_ndvi.nc")

In [None]:
# Or export to geotiff using rioxarray.

import rioxarray
ndvi.isel(time=0).rio.to_raster("/home/jovyan/landsat8_sr_ndvi.tif")  # Single time slices

## Summary

This notebook introduced the main steps for querying data (with `Datacube`), and filtering, plotting, calculating and saving a "cube" of data (with `Xarray`).

There is plenty of detail and options to explore so please work through the other notebooks to learn more and refer back to these notebooks when required. We encourage you to create or bring your own notebooks, and adapt notebooks from other [open-license repositories](https://docs.asia.easi-eo.solutions/user-guide/users-guide/03-using-notebooks/#other-available-odc-notebooks).

In [None]:
rgb(filtered_data.isel(time=2), ['red', 'green', 'blue'])

## Further reading 

#### JupyterLab

> *Recommended level: Familiarity with notebooks.*

The JupyterLab website has excellent documentation including video instructions. We recommend users take a few minutes to orientate themselves with the use and features of JupyterLab.

- Getting started: [https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html](https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html)
- Drag and drop upload of files: [https://jupyterlab.readthedocs.io/en/stable/user/files.html](https://jupyterlab.readthedocs.io/en/stable/user/files.html)

#### Open Data Cube

- [ODC documentation](https://datacube-core.readthedocs.io/en/latest)
- [ODC github](https://github.com/opendatacube)

#### Python

> *Recommended level: Basic Python knowledge and familiarity with array manipulations, __numpy__ and __xarray__. Familiarity with some plotting libraries (e.g., __matplotlib__) would also help.*

There are many options for learning Python from online resources to in-house or facilitated training. Some examples are offered here with no suggestion that EASI endorses any of them.

- [https://www.python.org/about/gettingstarted](https://www.python.org/about/gettingstarted/)
- Learn Python tutorials: [https://www.learnpython.org](https://www.learnpython.org/)
- Data Camp: [https://www.datacamp.com](https://www.datacamp.com/)
- David Beazley courses: [https://dabeaz-course.github.io/practical-python](https://dabeaz-course.github.io/practical-python/)
- Python Charmers: [https://pythoncharmers.com](https://pythoncharmers.com/)

Background for selected libraries:

- Numpy: [https://numpy.org/doc/stable/user/quickstart.html](https://numpy.org/doc/stable/user/quickstart.html)
- Xarray: [http://xarray.pydata.org/en/stable/user-guide/data-structures.html](http://xarray.pydata.org/en/stable/user-guide/data-structures.html)
- Xarray: [https://towardsdatascience.com/basic-data-structures-of-xarray-80bab8094efa](https://towardsdatascience.com/basic-data-structures-of-xarray-80bab8094efa)
- Pandas: [https://pandas.pydata.org/docs/getting_started/index.html](https://pandas.pydata.org/docs/getting_started/index.html)

#### Git

> *Recommended level: Basic understanding of concepts such as __clone__, __add__, __commit__ and __push__ would help.*

Git is a document version control system. It retains a full history of changes to all files (including deleted ones) by tracking incremental changes and recording a history timeline of changes. Changes you make append to the history timeline. Git allows you to copy ("clone") a repository, make changes to files, and "commit" and "push" these changes back to the source repository.

The best way to learn Git is by practice and incrementally; start with simple, common actions and gain more knowledge when required. Some useful Git links are:

- Getting started: [https://git-scm.com/doc](https://git-scm.com/doc)
- Using the JupyterLab Git extension: [https://annefou.github.io/jupyter_publish/02-git/index.html](https://annefou.github.io/jupyter_publish/02-git/index.html)
- DEA Git guide: [https://github.com/GeoscienceAustralia/dea-notebooks/wiki/Guide-to-using-DEA-Notebooks-with-git](https://github.com/GeoscienceAustralia/dea-notebooks/wiki/Guide-to-using-DEA-Notebooks-with-git)
- Undoing things guide: [https://git-scm.com/book/en/v2/Git-Basics-Undoing-Things](https://git-scm.com/book/en/v2/Git-Basics-Undoing-Things)
- Understanding branches: [https://nvie.com/posts/a-successful-git-branching-model](https://nvie.com/posts/a-successful-git-branching-model)