# Introduction to Digital Earth Australia

**Notebook currently compatible with the `NCI`|`DEA Sandbox` environment only**

## Background
Digital Earth Australia (DEA) is a digital platform that catalogues large amounts of Earth Observation data covering continental Australia.
It is implemented using the open source software collection of the Open Data Cube (ODC) which has an ever growing list of users, implementations and contributors.

The ODC and DEA platforms are designed to:
* Catalogue large amounts of Earth Observation data
* Provide a Python based API for high performance querying and data access
* Give scientists and other users easy ability to perform exploratory data analysis
* Allow scalable continent scale processing of the stored data
* Track the provenance of all the contained data to allow for quality control and updates

The DEA program catalogues data from a range of satellite sensors and has adopted processes and terminology that users should be aware of to enable efficient querying and use of the datasets stored within.
This notebook introduces these important concepts and forms the basis of understanding for the remainder of the notebooks in this Beginners Guide.
Resources to further explore these concepts are recommended at the end of the notebook.

## Prerequisites
Users of this notebook should have a basic understanding of the use and format of the Jupyter Notebook.

To review these basics, see [Introduction_to_Jupyter](Introduction_to_Jupyter.ipynb)

## Description
This introduction to the DEA will briefly introduce the OCD and review the dominant types of data catalogued in the DEA platform, as well as reviewing important terminology for referring to measurements within product datasets.

Topics include
* a brief introduction to the ODC
* a review of the satellite sensors whose data contributes to the DEA
* an introduction to Surface Reflectance measurements: NBAR, NBART and OA 
* important terminology:
  * band naming conventions
  * the coordinate reference scheme

## Open Data Cube
The ODC provides an integrated gridded data analysis environment for decades of analysis ready earth observation satellite and related data from multiple satellite and other acquisition systems.
It is a collection of software based around the [datacube-core](https://github.com/opendatacube/datacube-core) open source Python library that enables:
* Large-scale workflows on HPC
* Exploratory Data Analysis
* Cloud-based Services
* Standalone Applications

There are a number of existing implementations of the ODC, including DEA.

More information can be found in the [Open Data Cube Manual](https://datacube-core.readthedocs.io/en/latest/index.html)

## Digital Earth Australia
### Satellite datasets 
Digital Earth Australia catalogues data from a range of satellite sensors. 
The earliest datasets of optical satellite imagery in DEA date from 1986.
Overall, DEA includes data from:
* Landsat 5 TM, operational between March 1984 and January 2013
* Landsat 7 ETM+, operational since April 1999
* Landsat 8 OLI, operational since February 2013
* Sentinel 2A MSI, operational since June 2015
* Sentinel 2B MSI, operational since March 2017

The Landsat missions are jointly operated by the United States Geological Survey (USGS) and National Aeronautics and Space Administration (NASA).
The Sentinel missions are operated by the European Space Agency (ESA).

The datasets generated by each of these sensors (satellites) are subtly different.

![Image](https://prd-wret.s3-us-west-2.amazonaws.com/assets/palladium/production/s3fs-public/styles/full_width/public/thumbnails/image/dmidS2LS7Comparison.png)
> Figure 1 shows the recent Landsat satellites and Sentinel sensors and compares the way they sample the electromagnetic spectrum (Wavelength axis). 
The figure is overlaid upon the percent transmission of each wavelength returned to the atmosphere from the Earth relative to the original amount of solar radiation. 
The bands that are detected by each of the satellites are shown in the numbered boxes and the width of each box is relative to the range of wavelengths that band detects.
The y-axis has no bearing on the comparison of the satellite sensors.
Note that Landsat 5 TM contained 7 bands which measured the same regions as bands 1 to 7 on LS7 ETM+.

It is important to note that the numbering of the bands relative to the detected wavelengths between sensors is inconsistent (Figure 1).
Note also that the width of the detected bands between sensors is also inconsistent.
For this reason, it is important to remember that the DEA data comes from multispectral sensors and that when you search.........

For example: in the green region of the electromagnetic spectrum (around 560 nm), LS5 TM and LS7 ETM+ detect a wide green region called band 2. 
LS8 OLI detects a slightly narrower region and calls it band 3.
Sentinel 2 MSI (A and B) detect a narrow green region but also call it band 3.







Similarly, the number of bands that are detected on each sensor also differs and their naming conventions are discussed in more detail below.



The spatial resolution also varies between the Landsat and Sentinel programs.
Landsat pixel sizes represent 30 $m^{2}$ of the land surface while Sentinel pixel sizes represent 10 $m^{2}$.

### Testing cell - edit figure/s and insert into 'DEA satellite datasets' section with references
https://directory.eoportal.org/web/eoportal/satellite-missions/l/landsat-9 See figure 3/4 down page. Landsat 5 7 8 band comparison

https://www.usgs.gov/media/images/comparison-landsat-7-and-8-bands-sentinel-2 See figure comparing Sentinel 2 bands with Landsat 7 and 8

### Surface Reflectance
**Surface Reflectance (SR)** is a suite of **Earth Observation (EO)** products from GA.
The SR product suite provides standardised optical surface reflectance datasets using robust physical models to correct
for variations in image radiance values due to atmospheric properties, and sun and sensor geometry.
The resulting stack of surface reflectance grids are consistent over space and time which is instrumental in identifying
and quantifying environmental change. SR is based on radiance data from the Landsat-5 TM, Landsat-7 ETM+ and Landsat-8 OLI/TIRS sensors.

#### Surface Reflectance Correction Models

    Image radiance values recorded by passive EO sensors are a composite of:

    * surface reflectance;
    * atmospheric condition;
    * interaction between surface land cover, solar radiation and sensor view angle;
    * land surface orientation relative to the imaging sensor.

### NBAR
NBAR stands for **Nadir-corrected BRDF Adjusted Reflectance**, where BRDF stands for **Bidirectional reflectance distribution function**
The approach involves atmospheric correction to compute surface-leaving radiance, and bi-directional reflectance modelling to remove the effects of
topography and angular variation in reflectance.

#### Features

* The standardised SR data products deliver calibrated optical surface reflectance data across land and coastal fringes.
  SR is a medium resolution (~25 m) grid based on the Landsat TM/ETM+/OLI archive and presents surface reflectance data in 25 square metre grid cells.

* Radiance measurements from EO sensors do not directly quantify the surface reflectance of the Earth. Such measurements are modified by variations in atmospheric
  properties, sun position, sensor view angle, surface slope and surface aspect.
  To obtain consistent and comparable measures of Earth surface reflectance from EO,these variations need to be reduced or removed from the radiance measurements (Li et al., 2010).
  This is especially important when comparing imagery acquired in different seasons and geographic regions.

* The SR product is created using a physics-based, coupled BRDF and atmospheric correction model that can be applied to both flat and inclined surfaces (Li et al., 2012).
  The resulting surface reflectance values are comparable both within individual images and between images acquired at different times and/or with different sensors.

#### NBART
**Surface reflectance NBAR-T** includes the terrain illumination reflectance correction and has the same features of SR-NBAR and along with some of the features mentioned below.

#### Features

* The SR product is created using a physics-based coupled BRDF and atmospheric correction model that can be applied to both flat and inclined surfaces (Li et al., 2012).
  The resulting surface reflectance values are comparable both within individual images and between images acquired at different times and/or with different sensors.

* Terrain affects optical satellite images through both irradiance and bidirectional reflectance distribution function (BRDF) effects.
* Slopes facing the sun receive enhanced solar irradiance and appear brighter compared to those facing away from the sun.
* For anisotropic surfaces, the radiance received at the satellite sensor from a sloping surface is also affected by surface
  BRDF which varies with combinations of surface landcover types, sun, and satellite geometry (sun and sensor view, and their relative
  azimuth angle) as well as topographic geometry (primarily slope and aspect angles).
  Consequently, to obtain comparable surface reflectance from satellite images covering mountainous areas,
  it is necessary to process the images to reduce or remove the topographic effect so that the images can be used for different purposes on the same spectral base.
* A Digital Surface Model (DSM) resolution appropriate to the scale of the resolution of satellite image is needed for the best results. 1 second SRTM DSM is
  used for NBART processing.

### Observation Attributes (formerly known as pixel quality TBC)
The PQ25 product is a product which is designed to facilitate interpretation and processing of `Surface Reflectance`_ NBAR/NBART and derivative products.

#### Features

PQ25 is an assessment of each image pixel to determine if it is an unobscured, unsaturated observation
of the Earth surface and also whether the pixel is represented in each spectral band. The PQ product allows
users to produce masks which can be used to exclude pixels which do not meet their quality criteria from analysis .
The capacity to automatically exclude such pixels is essential for emerging multi-temporal analysis techniques that
make use of every quality assured pixel within a time series of observations.Users can choose to process only land pixels,
or only sea pixels depending on their analytical requirements, leading to enhanced computationally efficient.

    PQ provides  an assessment of the quality of observations at a pixel level and includes information about whether a pixel is affected by:

    * Spectral Contiguity (lack of signal in any band)
    * Saturation in any band
    * Presence of cloud
    * Presence of cloud shadow
    * Land or sea

As Landsat Imagery becomes more readily available, there has been a rapid increase in the amount of analyses undertaken
by researchers around the globe.  Most researchers use some form of quality masking schema in order to remove undesirable
pixels from analysis, whether that be cloud, cloud shadow, observations over the ocean, or  saturated pixels.  In the past,
researchers would reject partly cloud-affected scenes in favour of cloud-free scenes.  However, Landsat time series analysis
using all cloud-free pixels has become a valuable technique and has increased the demand for automation of cloud, cloud
shadow and saturation detection.  Emergency response applications such as flood mapping typically have to contend with
individual cloud affected scenes and therefore rely on effective cloud and cloud shadow removal techniques.

The PQ25 product combines established algorithms that detect clouds including the Automated Cloud Cover Assessment
(ACCA) (Irish et al. 2006) and Function of mask (Fmask) (Zhu and Woodcock 2012) . ACCA is already widely used within the
remote sensing community; it is fast and relatively accurate.  Fmask on the other hand is newer, but is rapidly becoming
more established, and can provide a more accurate cloud mask than ACCA in certain cloud environments.

The different sensor designs of Landsat 5 TM, Landsat 7 ETM+, and Landsat 8 OLI all have
different sensor saturation characteristics. The PQ25 layer enables users to exclude
observations from a given band where the pixels are saturated (exceed the dynamic range
of the sensor). The per-band saturation information in PQ allows users to exclude pixels
where their specific band of interest is saturated.

The PQ 25 layer uses two industry standard cloud/cloud shadow detection algorithms to
flag pixels that potentially contain cloud and allows the user to generate masks based on
either algorithm or both algorithms.

## Data format
### Band naming conventions
Bands are the wavelength ranges of the EMS that are detected by each satellite sensor. 
Conventionally, the band number increases sequentially with the detected wavelength range for each sensor.
This means that as the number of bands has increased on more contemporary satellites, the detected regions of the EMS do not correlate by band number when comparing between sensors.
To overcome this when comparing DEA datasets, the sensor bands are referred to by the EMS region that they detect.

The satellite band designations are re-named in the DEA as follows:

|Description|Measurement name (NBAR)|Measurement name (NBART)|LS5|LS7|LS8|Sen2|
|----|----|----|----|----|----|----|
|Coastal aerosol|nbar_coastal_aerosol|nbart_coastal_aerosol|||1|1|
|Blue|nbar_blue|nbart_blue|1|1|2|2|
|Green|nbar_green|nbart_green|2|2|3|3|
|Red|nbar_red|nbart_red|3|3|4|4|
|Nir (Near infra-red)|nbar_nir|nbart_nir|4|4|5|8, 8a|
|Swir1 (Short wave infra-red 1)|nbar_swir1|nbart_swir1||5|6|11|
|Swir2 (Short wave infra-red 2)|nbar_swir2|nbart_swir2||7|7|12|

### Geolocating data
* Briefly introduce and discuss how to locate data in the DEA. 
* Mention how scenes are identified in the Landsat and Sentinel programs
* Discuss how to geolocate your query (queries have not yet been introduced - save for a later notebook.) Here, introduce CRS/EPSG and/or any changes that will occur in collection 3. Are there any differences geolocating Landsat vs Sentinel data?)

## Recommended next steps
For more detailed information on the concepts introduced in this notebook, please see the [DEA User Guide](https://docs.dea.ga.gov.au/index.html#) and [Open Data Cube Manual](https://datacube-core.readthedocs.io/en/latest/).
For more information on the development of the DEA platform, please see [Dhu et al. 2017](https://doi.org/10.1080/20964471.2017.1402490).

To continue with the beginners guide, the following notebooks are designed to be worked through in the following order:
- [Introduction to Products and Measurements](link to notebook)
- [Introduction to Querying](link to notebook)
- [Introduction to Plotting](link to notebook)
- [Run a basic analysis](link to notebook)
- [Other training materials](link to notebook or folder)

Once you have worked through the beginners guide, you can join advanced users by exploring:
- [DEA datasets](https://github.com/GeoscienceAustralia/dea-notebooks/tree/develop/DEA_datasets)
- [Frequently used code](https://github.com/GeoscienceAustralia/dea-notebooks/tree/develop/Frequently_used_code)
- [Real world examples](https://github.com/GeoscienceAustralia/dea-notebooks/tree/develop/Real_world_examples)

## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Australia data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.

**Contact:** If you need assistance, please post a question on the [Open Data Cube Slack channel](http://slack.opendatacube.org/) or on the [GIS Stack Exchange](https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the `open-data-cube` tag (you can view previously asked questions [here](https://gis.stackexchange.com/questions/tagged/open-data-cube)).
If you would like to report an issue with this notebook, you can file one on [Github](https://github.com/GeoscienceAustralia/dea-notebooks).

**Last modified:** September 2019

**Compatible `datacube` version:** 

In [7]:
print(datacube.__version__)

1.7+43.gc873f3ea


## Tags
Browse all available tags on the DEA User Guide's [Tags Index](https://docs.dea.ga.gov.au/genindex.html)