# Welcome to EASI <img align="right" src="../resources/csiro_easi_logo.png">

This notebook introduces new users to working with EASI notebooks and the Open Data Cube.

It will demonstrate the following basic functionality:
- [Notebook setup](#Notebook-setup)
- [Connect to the OpenDataCube](#Connect-to-the-OpenDataCube)
  - [List products](#List-products)
  - [List measurements](#List-measurements)
  - [Choose a region of interest](#Choose-a-region-of-interest)
  - [Load data](#Load-data)
  - [Plot data](#Plot-data)
  - [Masking and Scaling](#Masking-and-Scaling)
  - [Perform a calculation on the data](#Perform-a-calculation-on-the-data)
  - [Save the results to a file](#Save-the-results-to-file)
- [Summary](#Summary)
- [Further reading](#Further-reading)

## Notebook setup

A notebook consists of cells that contain either text descriptions or python code for performing operations on data.

1. Start by clicking on the cell below to select it.
1. Execute a selected cell, or each cell in sequence, by clicking the "play" button (in the toolbar above) or pressing `Shift`+`Enter`.
1. Each cell will show an asterisk icon <font color='#999'>[*]:</font> when it is running. Once this changes to a number, the cell has finished.
1. This cell below imports packages to use and sets some formatting options.

In [None]:
# Formatting for basic plots
%matplotlib inline
%config InlineBackend.rc = {}
import matplotlib.pyplot as plt
# plt.rcParams['figure.figsize'] = [12, 8]

# Formatting pandas table output
import pandas
pandas.set_option("display.max_rows", None)

# Datacube
import datacube
from datacube.utils import masking  # https://github.com/opendatacube/datacube-core/blob/develop/datacube/utils/masking.py
from odc.algo import enum_to_bool   # https://github.com/opendatacube/odc-tools/blob/develop/libs/algo/odc/algo/_masking.py
from datacube.utils.rio import configure_s3_access

# Notebook helper tools (in dea_tools or in this repo)
import sys
try:
    from dea_tools.plotting import display_map, rgb
except ImportError:
    # Local copy of selected dea_tools
    if 'tools/' not in sys.path:
        sys.path.append('tools/')
    from datacube_utils import display_map
    rgb = None  # Not copied or adapted yet

## Connect to the OpenDataCube

The `Datacube()` API provides search, load and information functions for data products *indexed* in the EASI ODC database. For further information see https://datacube-core.readthedocs.io/en/latest/.

In [None]:
dc = datacube.Datacube()

# Optional: Access AWS "requester-pays" buckets
# This is necessary for Landsat ("landsatN_c2l2_*") and Sentinel-2 ("s2_l2a") products
from datacube.utils.aws import configure_s3_access
configure_s3_access(aws_unsigned=False, requester_pays=True);

### List products
Get all available products in the ODC database and list them along with selected properties.

1. View available products and data coverage at the EASI Explorer: https://explorer.asia.easi-eo.solutions

In [None]:
products = dc.list_products()  # Pandas DataFrame
products

### List measurements

The data arrays available for each product are called "*measurements*".

1. List the "*measurements*" of the Landsat-8 surface reflectance product ("**landsat8_c2l2_sr**")
1. The columns are the "*attributes*" available for each "*measurement*"

In [None]:
measurements = dc.list_measurements()  # Pandas DataFrame, all products
measurements.loc['landsat8_c2l2_sr']

### Choose a region of interest

See the available `latitude`/`longitude` and `time` ranges in the [ODC Explorer](https://explorer.asia.easi-eo.solutions/products/landsat8_c2l2_sr).

1. Feel free to change the `latitude`/`longitude` or `time` ranges of the query below. 

In [None]:
# hub.asia.easi-eo.solutions - Lake Tempe, Indonesia
latitude = (-4.2, -3.9)
longitude = (119.8, 120.1)
display_map(longitude, latitude)

### Load data 
Load Landsat-8 surface reflectance data ("**landsat8_c2l2_sr**") for a given latitude, longitude and time range.

> This may take a few minutes while the data are loaded into JupyterLab (so choose a small area and time range!).

The `datacube.load()` function returns an `xarray.Dataset` object.

- The display() view is a convenient way to check the data request size, shape and attributes
- Once you have an xarray object this can be used with many Python packages.

The `output_crs` and `resolution` parameters are dependent on the `product` chosen.

Further information on `xarray`: http://xarray.pydata.org/en/stable/user-guide/data-structures.html 

In [None]:
# A standard datacube.load() call
# Any SQLAlchemy warnings can be ignored.

data = dc.load(
    product = 'landsat8_c2l2_sr', 
    latitude = latitude,
    longitude = longitude,
    time=('2020-02-01', '2020-04-01'),
    
    output_crs="EPSG:32655",  # Target CRS
    resolution=(30, -30),     # Target resolution
    group_by='solar_day',     # Group by time method
)

display(data)

### Plot data
Plot the [Short-wave Infrared (SWIR)](https://www.usgs.gov/faqs/what-are-best-landsat-spectral-bands-use-my-research) (`"swir22"`) band data at each timestep.

The `datacube.load()` function does not apply missing values or scaling attributes. These are left to the user's discretion and requirements.

- The [robust](https://docs.xarray.dev/en/stable/user-guide/plotting.html#robust) option excludes outliers when calculating the colour limts for a more consistent result.

In [None]:
# Xarray operations
band = 'swir22'
data[band].plot(col="time", robust=True, col_wrap=4);

### Masking and Scaling

Many Earth observation products include a "quality" array that can be used to filter the measurement arrays. For example, most quality layers include a "cloud" confidence quality flag that can be use to remove pixels affected by clouds from further analysis. In addition:

- Data products usually include a `nodata` value that defines the "null" or "fill" value used in the array, and;
- Some data products may also have "scaling" attributes.

More information on these techniques is covered in other notebooks. In particular, see the `datasets/*ipynb` notebooks for specific examples.

**Landsat** data requires a scale factor to be applied to convert the data to physical reflectance or temperature values. Once converted, Landsat surface reflectance values will have numbers ranging from 0 to 1 and surface temperature values will be in the units of degrees Kelvin. The scaling factor is different for different Landsat "Collections" and it is different for the Surface Reflectance and Surface Temperature products. See https://www.usgs.gov/faqs/how-do-i-use-a-scale-factor-landsat-level-2-science-products.

In the cell below we:

1. Apply the `nodata` value
1. Define and apply the Landsat "Collection 2" surface reflectance scaling values.
1. Apply a cloud mask

Note the exclusion of cloud (if present) and the change in the colourbar range from the previous figure.

In [None]:
# Choose bands for further processing
bands = ['red', 'green', 'blue', 'nir08', 'swir22']

# Make a mask array for the nodata value
valid_mask = masking.valid_data_mask(data[bands])

# Define the scaling values (landsat8_c2l2_sr)
scale_factor = 0.0000275
add_offset = -0.2

# Make a scaled data array
scaled_data = data[bands] * scale_factor + add_offset

In [None]:
# Make a cloud mask (landsat8_c2l2_sr)
from datacube.utils import masking

# Optional - Show the flag_definition information
# See also http://explorer.asia.easi-eo.solutions/products/landsat8_c2l2_sr#definition-doc
# display( masking.describe_variable_flags(data.qa_pixel) )

# Multiple flags are combined as logical AND (bitwise)
cloud_mask = masking.make_mask(data['qa_pixel'], 
    clear='clear',
)

In [None]:
# Apply each of the masks
filtered_data = scaled_data.where(valid_mask & cloud_mask)

filtered_data['swir22'].plot(col="time", robust=True, col_wrap=4);

### Perform a calculation on the data
As a simple example, we calculate the Normalized Difference Vegetation Index (NDVI).

In [None]:
# Calculate the NDVI
band_diff = filtered_data.nir08 - filtered_data.red
band_sum = filtered_data.nir08 + filtered_data.red
ndvi = band_diff / band_sum

# Plot the masked NDVI
ndvi.plot(col="time", robust=True, col_wrap=4, vmin=0, vmax=0.7, cmap='RdYlGn');

### Save the results to file
After processing the data we can then save the output to a file that can then be imported into other applications for further analysis or publication if required.

The file will be saved to your home directory and appear on the File Browser panel to the left. You may need to select the `folder` icon to go to the top level (`/home/jovyan/`).

Download a file by `'right-click' Download`.

In [None]:
# Xarray can save the data to a netCDF file

ndvi.time.attrs.pop('units', None)  # Xarray re-applies this
ndvi.to_netcdf("/home/jovyan/landsat8_sr_ndvi.nc")

In [None]:
# Or export to geotiff using rioxarray.

import rioxarray
ndvi.isel(time=0).rio.to_raster("/home/jovyan/landsat8_sr_ndvi.tif")  # Single time slices

## Summary

This notebook introduced the main steps for querying data (with `Datacube`), and filtering, plotting, calculating and saving a "cube" of data (with `Xarray`).

There is plenty of detail and options to explore so please work through the other notebooks to learn more and refer back to these notebooks when required. We encourage you to create or bring your own notebooks, and adapt notebooks from other [open-license repositories](https://docs.asia.easi-eo.solutions/user-guide/users-guide/03-using-notebooks/#other-available-odc-notebooks).

In [None]:
if rgb:
    rgb(filtered_data.isel(time=2), ['red', 'green', 'blue'])

## Further reading 

#### Open Data Cube

- [ODC documentation](https://datacube-core.readthedocs.io/en/latest)
- [ODC github](https://github.com/opendatacube)

#### Python

> *Recommended level: Basic Python knowledge and familiarity with array manipulations, __numpy__ and __xarray__. Familiarity with some plotting libraries (e.g., __matplotlib__) would also help.*

There are many options for learning Python from online resources to in-house or facilitated training. Some examples are offered here with no suggestion that EASI endorses any of them.

- [https://www.python.org/about/gettingstarted](https://www.python.org/about/gettingstarted/)
- Learn Python tutorials: [https://www.learnpython.org](https://www.learnpython.org/)
- Data Camp: [https://www.datacamp.com](https://www.datacamp.com/)
- David Beazley courses: [https://dabeaz-course.github.io/practical-python](https://dabeaz-course.github.io/practical-python/)
- Python Charmers: [https://pythoncharmers.com](https://pythoncharmers.com/)

Background for selected libraries:

- Numpy: [https://numpy.org/doc/stable/user/quickstart.html](https://numpy.org/doc/stable/user/quickstart.html)
- Xarray: [http://xarray.pydata.org/en/stable/user-guide/data-structures.html](http://xarray.pydata.org/en/stable/user-guide/data-structures.html)
- Xarray: [https://towardsdatascience.com/basic-data-structures-of-xarray-80bab8094efa](https://towardsdatascience.com/basic-data-structures-of-xarray-80bab8094efa)

#### JupyterLab

> *Recommended level: Familiarity with notebooks.*

The JupyterLab website has excellent documentation including video instructions. We recommend users take a few minutes to orientate themselves with the use and features of JupyterLab.

- Getting started: [https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html](https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html)
- Drag and drop upload of files: [https://jupyterlab.readthedocs.io/en/stable/user/files.html](https://jupyterlab.readthedocs.io/en/stable/user/files.html)

#### Git

> *Recommended level: Basic understanding of concepts such as __clone__, __add__, __commit__ and __push__ would help.*

Git is a document version control system. It retains a full history of changes to all files (including deleted ones) by tracking incremental changes and recording a history timeline of changes. Changes you make append to the history timeline. Git allows you to copy ("clone") a repository, make changes to files, and "commit" and "push" these changes back to the source repository.

The best way to learn Git is by practice and incrementally; start with simple, common actions and gain more knowledge when required. Some useful Git links are:

- Getting started: [https://git-scm.com/doc](https://git-scm.com/doc)
- Using the JupyterLab Git extension: [https://annefou.github.io/jupyter_publish/02-git/index.html](https://annefou.github.io/jupyter_publish/02-git/index.html)
- DEA Git guide: [https://github.com/GeoscienceAustralia/dea-notebooks/wiki/Guide-to-using-DEA-Notebooks-with-git](https://github.com/GeoscienceAustralia/dea-notebooks/wiki/Guide-to-using-DEA-Notebooks-with-git)
- Undoing things guide: [https://git-scm.com/book/en/v2/Git-Basics-Undoing-Things](https://git-scm.com/book/en/v2/Git-Basics-Undoing-Things)
- Understanding branches: [https://nvie.com/posts/a-successful-git-branching-model](https://nvie.com/posts/a-successful-git-branching-model)