# Hello CEOS EAIL <img align="right" src="../resources/csiro_easi_logo.png">

This notebook will introduce new users to working with the Open Data Cube and CEOS EAIL Jupyter notebooks.
 
It will demonstrate the following basic functionality:
- [Notebook setup](#Notebook-setup)
  - [Dask computing environment](#Dask-computing-environment)
  - [Access public requester pays buckets](#Access-public-requester-pays-buckets)
- [Connect to the OpenDataCube](#Connect-to-the-OpenDataCube)
  - [List available products](#List-available-products)
  - [List all measurements](#List-all-measurements)
  - [Load some data](#Load-some-data)
  - [Plot the data](#Plot-the-data)
  - [Scaling factor (Landsat)](#Scaling-factor-(Landsat))
  - [Apply a cloud mask to data](#Apply-a-cloud-mask-to-the-data)
  - [Perform a computation on the data](#Perform-a-computation-on-the-data)
  - [Save the results to a file](#Save-the-results-to-file)
  - [Resources](#Resources)

## Notebook setup

A notebook consists of cells that contain either text descriptions or python code for performing operations on data.

Start by clicking on the cell below to select it. Then execute a selected cell (or each cell in sequence) by clicking the "play" button (in the toolbar above) or pressing `Shift`+`Enter`.

Each cell will show an asterisk icon <font color='#999'>[*]:</font> when it is running. Once this changes to a number, the cell has finished.

This cell below does some optional setup to format the outputs.
> Note: We use matplotlib here as a baseline example. Other notebooks in this tutorial may use interactive plotting libraries (e.g., Holoviews).

In [None]:
# Formatting for basic plots
%matplotlib inline
%config InlineBackend.rc = {}
import matplotlib.pyplot as plt
# plt.rcParams['figure.figsize'] = [12, 8]

# Formatting pandas table output
import pandas
pandas.set_option("display.max_rows", None)

In [None]:
# EASI tools
import git
import sys, os
os.environ['USE_PYGEOS'] = '0'
repo = git.Repo('.', search_parent_directories=True)
if repo.working_tree_dir not in sys.path: sys.path.append(repo.working_tree_dir + '/scripts')
from easi_tools import EasiDefaults
import notebook_utils
easi = EasiDefaults()

## Dask computing environment

In EASI, each notebook starts by defining a Dask cluster for the notebook to use.

> For more information regarding Dask, see [A2 - Dask](A2%20-%20Dask.ipynb).

The are two main methods for setting up your dask cluster: 
1. **Local dask cluster**
    - Provides a dask multiprocessing environment on your Jupyter node. Useful for processing data volumes that don't exceed the Jupyter node limits, which are currently set at `cores = 8, memory = 32 GB` (2x large)


1. **Dask Gateway**
    - Provides a scalable compute cluster in EASI for your use. You can (*should*) use the same cluster across each of your notebooks (a separate cluster per notebook would unnessarily use EASI resources).
    - For most notebooks and data analysis start with `2 to 4 workers` (adaptive). Dask gateway is limited to 20 workers per user.
    - It is normal for this step to take **3 to 5 minutes** if new computing nodes need to be generated

**This notebook will just use a local cluster**

### Local dask cluster

For local cluster options, see https://docs.dask.org/en/latest/setup/single-distributed.html

The Dask Dashboard link shown after the following cell is a helpful resource to explore the activity and state of your dask cluster.

In [None]:
cluster, client = notebook_utils.initialize_dask(use_gateway=False)
display(cluster if cluster else client)
print(notebook_utils.localcluster_dashboard(client, server=easi.hub))

## Access public requester pays buckets

EASI OpenDataCube can index and use datasets stored in public S3 "requester pays" buckets. Requester pays means that use of the data is charged at the time of use. The charges are relatively low for normal exploratory analysis and within the same Data Center.

> For larger analyses or between Data Centers please contact us for advice as there may be more cost-effective ways to do your analysis that we can explore with you.

To use data in public requester pays buckets, run the following code (once per dask cluster):

**All Landsat (e.g. landsat5_c2l2_sr, landsat9_c2l2_st, etc) and Sentinel-2 (s2_l2a) products require this setting**

In [None]:
"""This function obtains credentials for S3 access and passes them on to
   processing threads, either local or on dask cluster.
   Note that AWS credentials may need to be renewed between sessions or
   after a period of time."""

from datacube.utils.aws import configure_s3_access
configure_s3_access(aws_unsigned=False, requester_pays=True, client=client)

# If not using a dask cluster then remove 'client':
# configure_s3_access(aws_unsigned=False, requester_pays=True)

## Connect to the OpenDataCube
Your EASI Hub environment has been setup with default credentials to access the EASI OpenDataCube 

In [None]:
import datacube
dc = datacube.Datacube()
datacube.__version__

### List available products

Get all available products and list them along with selected properties.

> View available products and data coverage at the EAIL Explorer: https://explorer.eail.easi-eo.solutions/

In [None]:
products = dc.list_products()

# The output of list_products() changed between datacube versions 1.8.4 and 1.8.6
selected_columns = products.columns
if 'default_crs' not in selected_columns:
    selected_columns = ["name", "description", "platform", "crs", "resolution"]
products[selected_columns]

### List all measurements
Get all product measurements and list them along with selected properties.

In [None]:
measurements = dc.list_measurements()
measurements

### Load some data 
Load some data using the default configuration for this environment.

The resulting `DataSet` contains data for all bands of the Surface Reflectance within the specified location/time range.  

Feel free to change the `latitude`/`longitude` or `time` ranges of the query below. The `output_crs` and `resolution` parameters are dependent on the `product` chosen.

In [None]:
# This configuration is read from the defaults for this system. 
# Examples are provided in a commented line to show how to set these manually.

study_area_lat = easi.latitude
# study_area_lat = (39.2, 39.3)

study_area_lon = easi.longitude
# study_area_lon = (-76.7, -76.6)

product = easi.product('landsat')
# product = 'landsat8_c2l2_sr'

set_time = easi.time
# set_time = ('2020-08-01', '2020-12-01')

set_crs = easi.crs('landsat')
# set_crs = 'EPSG:32618'

set_resolution = easi.resolution('landsat')
# set_resolution = (-30, 30)

In [None]:
# hub.eail.easi-eo.solutions - Baltimore
data = dc.load(
    product=product, 
    latitude=study_area_lat,
    longitude=study_area_lon,
    time=set_time,
    output_crs=set_crs,
    resolution=set_resolution,
    group_by='solar_day',
    dask_chunks={'time':1} # For more on this line, see "A2 - Dask"
)

display(data)

### Plot the data
Plot the `"swir22"` band data at each timestep

- At some timesteps data is only available for part of the specified area so we mask the missing data. 
- The `robust=True` option instructs the plotting function to ignore outliers in applying the colourmap.

In [None]:
# Xarray operations

valid_data = data["swir22"] != data["swir22"].nodata
data["swir22"].where(valid_data).plot(col="time", robust=True, col_wrap=4)

### Scaling factor (Landsat)
Landsat data requires a scale factor to be applied to convert the data to sensible reflectance or temperature values. Once converted, Landsat surface reflectance values will have numbers ranging from 0 to 1 and surface temperature values will be in the units of degrees Kelvin. The scaling factor is different for different Landsat "Collections" and it is different for the Surface Reflectance and Surface Temperature products. The code below scales the surface reflectance values for Landsat "Collection 2" data which is being used in this notebook.

See https://www.usgs.gov/faqs/how-do-i-use-a-scale-factor-landsat-level-2-science-products for more information

Once plotted, the data below will show a scale of between 0 and 1 (depending on the data) rather than the large numbers in the images above.

In [None]:
# Retrieve the surface reflectance bands

# hub.csiro.easi-eo.solutions
product = product
measurement = 'swir22'
scale_factor = 0.0000275
add_offset = -0.2

sr_measurements = measurements.loc[measurements['units'] == 'reflectance']
sr_bands = sr_measurements.loc[product].index.values
# Apply the scaling factor
normalised_sr = data
normalised_sr.update(data[sr_bands] * scale_factor + add_offset)
normalised_sr[measurement].where(valid_data).plot(col='time', robust=True, col_wrap=4)

### Apply a cloud mask to the data

We can use the Landsat pixel quality band to create a cloud mask.

> For more information on cloud masking, see [04 - Masking Data](./04%20-%20Masking%20Data.ipynb).

Then we can plot the `"swir22"` data again with clouds masked out, this time for a single timestep.

In [None]:
# ODC masking, xarray operations

from datacube.utils import masking
# display(masking.describe_variable_flags(data))

good_pixel_flags = {
    'nodata': False,
    'cloud': 'not_high_confidence',
    'cloud_shadow': 'not_high_confidence',
    'water': 'land_or_cloud'
}
cloud_mask = masking.make_mask(normalised_sr['qa_pixel'], **good_pixel_flags)
data_masked = normalised_sr[measurement].where(cloud_mask)

In [None]:
data_masked.isel(time=1).plot(robust=True)

### Perform a computation on the data
After scaling and masking, we apply calculations or algorithms to the data.

As a simple example, we calculate the Normalized Difference Vegetation Index (NDVI) from the Surface Reflectance data we have loaded.

In [None]:
# Calculate the NDVI
band_diff = normalised_sr.nir08 - normalised_sr.red
band_sum = normalised_sr.nir08 + normalised_sr.red
ndvi = band_diff / band_sum

# Plot the masked NDVI
ndvi.where(cloud_mask).isel(time=1).plot(robust=True,cmap='RdYlGn')

### Save the results to file
After processing the data we can then save the output to a file that can then be imported into other applications for further analysis or publication if required.

The file will be saved to your home directory and appear on the File Browser panel to the left. You may need to select the `folder` icon to go to the top level (`~/` or `/home/jovyan/`).

Download a file by `'right-click' Download`.

In [None]:
# Xarray can save the data to a netCDF file

ndvi.time.attrs.pop('units', None)  # Required until ODC 1.8.1 installed
ndvi.load()                         # Load the data into memory from Dask so that it can written to file
ndvi.to_netcdf("~/landsat8_sr_ndvi.nc")

In [None]:
# Or export to geotiff using rioxarray.

import rioxarray
ndvi.isel(time=0).rio.to_raster("/home/jovyan/landsat8_sr_ndvi.tif") # Note that a single time slice has been selected

## Resources

The EASI Hub environment is built from a number components. Below are a number of links which might help make better use of this environment.


#### Open Data Cube

Open Data Cube (ODC) is the open source software that EASI Hub uses to enable users to manage and analyse vast amounts of Earth Observation data. Visit the [Open Data Cube](https://www.opendatacube.org) website for tutorials and videos. See also:
- [ODC documentation](https://datacube-core.readthedocs.io/en/latest)
- [ODC github](https://github.com/opendatacube)


#### JupyterLab

Visit the [JupyterLab documentation website](https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html) where they have an overview explaining the basics of working within the JupyterLab environment. 


#### GIT

Notebooks and associated code are managed with GIT version control system. You have seen already how to "clone" the git repository containing these notebooks into your EASI Hub home drive.
- [JupyterLab Git plugin tutorial](https://annefou.github.io/jupyter_publish/02-git/index.html)
- [Geoscience Australia Git tutorial](https://github.com/GeoscienceAustralia/dea-notebooks/wiki/Guide-to-using-DEA-Notebooks-with-git)


#### Python

The code in this and other EASI Hub notebooks is written in [Python](https://www.python.org/) using the Open Data Cube python library.

If you are new to Python there are many online tutorials to help you learn, including https://www.learnpython.org

See below for some of the Python libraries used within EASI Hub notebooks

- [datacube](https://datacube-core.readthedocs.io/en/latest/user/intro.html) - Python API for loading datasets from the Open Data Cube database.
- [xarray](http://xarray.pydata.org/en/stable/why-xarray.html) - Allows processing of gridded datasets loaded from `datacube`.
- [dask](https://docs.dask.org/en/latest/why.html) - Working with `xarray`, `dask` enables the distributed processing of very large datasets.





