# Waterbody Area Mapping and Monitoring tool

**Please note that this repository and associated documentation is still being developed**

Write little abstract thing here...

## Contents

* [Background](#background)
* [Aims](#aims)
* [Intended Methods](#intendedMethods)
* [Sensitivity Analysis](#sensitivityAnalysis)
* [Waterbody Generation](#waterbodyGeneration)
* [Quality Assessment and Checking](#qualityAssessmentAndChecking)
* [Known Limitations](#knownLimitations)
* [Conclusion](#Conclusion)
* [Code Specifications](#codeSpecifications)
* [Licensing and Warranty](#licensingAndWarranty)

<a id='background'></a>
## Background

[Digital Earth Australia (DEA)](https://www.ga.gov.au/dea) is a world-class digital infrastructure that prepares and uses satellite data to detect physical changes across Australia in unprecedented detail. DEA provides government, industry and individuals with the high-quality, analysis-ready data and tools required for policy and investment decision-making. The provision of analysis-ready data removes the requirements for users to correct and manage the petabytes of satellite data available over Australia, providing an API, which allows users to query the available data and extract only the information required. 

On average, the Australian Government invests around half a billion dollars a year in monitoring, protecting and enhancing Australia's land, coasts and oceans. DEA provides near real-time satellite information which can be used by government to better target these investments.

Water is among one the most precious natural resources and is essential for the survival of life on Earth. Within Australia, the scarcity of water is both an economic and social issue, particularly across the MDB, where rights to water are heavily contested. Water is required not only for consumption but for industries and environmental ecosystems to function and flourish. 

With the demand for water increasing, and the variability in rainfall and water storage levels, there is a need to better understand our water availability to ensure we are managing our water resources effectively and efficiently.  

[DEA's Water Observations from Space (WOfS) dataset](https://www.sciencedirect.com/science/article/pii/S0034425715301929), provides a water classified image of Australia approximately every 16 days. Through providing an interpreted product, WOfS makes water information accessible and readily available, removing the need for users to perform this analysis. 

The Waterbody Area Mapping and Monitoring (WAMM) tool, developed by Geoscience Australia's DEA program in collaboration with the NSW Department of Industry Water (NSW DOI Water) and the Murray Darling Basin Authority (MDBA), is enabling the Australian government to monitor and manage water across the Murray Darling Basin (MDB).

This automated tool has mapped over 95,500 water bodies across NSW and the MDB. Each of these water bodies will be monitored on an ongoing basis as new satellite imagery is acquired, providing a vital line of evidence as to the use (and potential misuse) of water across The Basin. This tool will assist in the compliance activities within the MDB, as well as provide information as to the amount, and location of water across Australia at any given point in time.

The Waterbody Mapping and Monitoring service further packages up this information, mapping the locations of persistent waterbodies, and collating all available water observations for each mapped waterbody to produce a rich time history of water availability and use across the MDB. 

<a id='aims'></a>
## Aims

DEA in collaboration with the NSW DOI Water, aimed to develop a mapping and monitoring tool that would provide the ability to monitor water bodies in the MDB, in order to;

* Develop a large-scale, operational monitoring tool to support the water compliance activities of NSW DOI and the MDBA.
* Monitor and focus efforts of compliance on areas that show signs of change.
* Provide support for environmental monitoring and conservation projects, including environmental watering activities.

Any tool developed to respond to these needs had to: 

* Process and simplify the full spatial and temporal coverage of satellite data over the Basin into an interpreted product that identifies water;
* Be able to see when waterbodies were filling and emptying;
* Be able to combine the information above with water licence information; and
* Provide a simple user interface.

The final WAMM interface has been made publicly available, ensuring that routine, robust and repeatable information about NSW waterbodies remains current and openly available.

<a id='intendedMethods'></a>
## Intended Methods

The Waterbody Area Mapping and Monitoring (WAMM) service provides insights into both the location and dynamics of water bodies across Australia. The [Water Observations from Space (WOfS) product](https://www.sciencedirect.com/science/article/pii/S0034425715301929) is used to map the locations of water across Australia. The analysis was done on a per pixel basis prior to combining adjoining pixels into polygon objects (i.e. drawing a boundary around them and considering them as a single object rather than a series of individual pixels).  

### WOfS temporal statistics

The temporal summary statistics were generated using the [datacube-stats](https://github.com/opendatacube/datacube-stats) package, which is a tool for generating large-scale temporal statistics on data within DEA. The parameters used in this calculated are provided [here](WOFSDamDetectionAllTimeNSWMDB.yaml).

The WOfS temporal statistics return three results within each NetCDF file:
- count_wet: Number of times a pixel was validly observed as wet.
- count_clear: Number of times a pixel was validly observed (wet + dry)
- frequency: Frequency of a wet observation (wet / clear)

The NetCDF files each contain one Australian Albers tile of data, allowing us to split up the analysis on a tile-by-tile basis, prior to combining the data across the whole study region.

<a id='sensitivityAnalysis'></a>
## Sensitivity Analysis

A [sensitivity analysis](WAMMSensitivityAnalysis.ipynb) was conducted to determine the most suitable analysis date range and wetness thresholds for identifying persistent waterbodies. A compromise was sought between over and under identification of waterbodies, in order to correctly classify waterbodies, while limiting false positives and false negatives. 

**Pixels were included in this analysis, where during the period from 1987 to 2018, pixels were classified as 'wet' using the WOfS product, at least 10% of the time. A secondary threshold of 5% was used to supplement the spatial footprints of polygons identified using the 10% threshold. 
The rationale for these parameters are explored in detail in the [sensitivity analysis](WAMMSensitivityAnalysis.ipynb), but are summarised below:**

### Analysis date range

All available observations were included within this analysis (1987 to 2018), with the exception of the most recent months, which were excluded to end the analysis at the end of 2018. During the analysis period, three Landsat satellites were active: Landsat 5 (2000 - 2010), Landsat 7 (2000 - present), and Landsat 8 (2013 - present). Each satellite has a return period of ~16 days. 

Over the 31 year analysis period, across the three satellite platforms, each pixel across Australia was observed clearly, and classified approximately XXXX times (average; min = XXXX, max = XXXX, standard deviation = XXX). Differences in the number of times each pixel was observed was caused by clouds (which meant the observation was not used), 

<a id='wetness'></a>
### Wetness threshold

The second important parameter using in the generation of the waterbody polygons is the wetness threshold. This threshold determines which pixels are included in the polygon generation, and which are excluded. This threshold needs to be sensitive enough to capture the locations of persistent waterbodies, but not so sensitive as to pick up false positives like flood irrigation, flood events or soggy areas in the landscape. 

The [sensitivity analysis](WAMMSensitivityAnalysis.ipynb) explored the influence of varying wetness thresholds on the overall results. It was determined that a maximum threshold of 10% was required (pixels were classified as 'wet' using the WOfS product at least 10% of the time between 1987 and 2018). Threshold greater than 10% did not show enough sensitivity to more ephemeral waterbodies, which were a key target of this analysis. 

A threshold of 5% proved to be too sensitive, and resulted in a huge increase in the total number of identified polygons, due largely to a very high false positive rate. The 5% threshold did a better job, however, of characterising the spatial footprint of waterbody targets. Where the 10% threshold did a good job of locating waterbodies, the 5% threshold did a better job of charaterising their spatial coverage. 

In the example below, showing Lake George near the NSW/ACT border, the 10% threshold shown in blue, captures a smaller spatial footprint of Lake George than the 5% threshold in red. Here we can see that the 5% threshold does a better job of capturing the full spatial footprint of Lake George, i.e. the footprint is closer to how we would manually define the outline of Lake George. This creates a nice compromise between better spatial characterisation of footprints, and the addition of false positives from the lower 5% threshold. 

![Hybrid Threshold effect on Lake George, NSW](HybridThreshold.JPG "In this example showing Lake George near the NSW/ACT border, the 10% threshold shown in blue, captures a smaller spatial footprint of Lake George than the 5% threshold in red. Here we can see that the 5% threshold does a better job of capturing the full spatial footprint of Lake George")

The area below in south central NSW is an area where flood irrigation is common. The yellow polygons identified below, are those generated from the 5% threshold. The clear presence of 'waterbody' polygons in flood irrigated fields is a good example of the high sensitivity of the 5% threshold, when applied blindly. The hybrid approach (shown in red) shows where the polygons from the 5% threshold have been intentionally selected for inclusion in the final waterbody polygon set. These included polygons intersect with a polygon identified within the 10% threshold, thereby allowing the 10% threshold to identify the location of polygons, but the 5% threshold to characterise their spatial footprint. 

![Threshold sensitivity comparison, south central NSW](ThresholdCompare.JPG "This example in south central NSW is an area where flood irrigation is common. The yellow polygons are those generated from the 5% threshold. The clear presence of 'waterbody' polygons in flood irrigated fields is a good example of the high sensitivity of the 5% threshold, when applied blindly")

<a id='waterbodyGeneration'></a>
## Waterbody Generation

[The code that generates the waterbody polygon dataset](GenerateWaterBodyPolygons.ipynb) follows the following workflow:
* Generate a list of netCDF files within a specified folder location
* Opens each netCDF file and:
    * Keep only pixels observed at least x times
    * Keep only pixels identified as wet at least x% of the time
        * Here the code can take in two wetness thresholds, to produce two initial temporary polygon files. 
    * Convert the raster data into polygons
* Append the polygon set to a temporary shapefile
* Remove artificial polygon borders created at tile boundaries by merging polygons that intersect across Albers Tile boundaries
* Filter the combined polygon dataset (note that this step happens after the merging of Albers tile boundary polygons to ensure that artifacts are not created by part of a polygon being filtered out, while the remainder of the polygon that sits on a separate tile is treated differently).
    * Filter the polygons based on size
    * Remove polygons that intersect with Australia's coastline and estuaries
    * Remove erroneous 'water' polygons within high-rise CBD areas
    * Combine the two generated wetness thresholds (optional)
    * Optional filtering for proximity to major rivers (as identified by the Geofabric dataset)
* Save out the final polygon set to a shapefile

### Required inputs
* NetCDF files with WOfS outputs that will be used to define the persistent water body polygons
    * Variable name: `TileFolder`
    * This folder can be either a custom extraction of datacube-stats (as was done here), or you can choose to use the WOfS summary tiles for all of Australia (see [here for further information](GenerateWaterBodyPolygons.ipynb#Tiles)).
* A coastline polygon to filter out polygons generated from ocean pixels.
    * Variable name: `LandSeaMaskFile`
    * Here we use the [GEODATA COAST 100K 2004 dataset](https://data.gov.au/dataset/geodata-coast-100k-2004)
* Estuary data layer to supplement the coastline polygon, which does not identify estuaries as 'ocean'
    * Variable name: `EstuariesFile`
    * Here we generated this layer from the [OzCoasts Geomorphic habitat datasets](http://www.ozcoasts.gov.au/search_data/datasets.jsp). Each of the state datasets were downloaded, and only `channel` and `Central Basin` features were retained. These correspond to the locations of open water (including large river channels) within estuary systems (see [here for further information](GenerateWaterBodyPolygons.ipynb#Estuaries)).
* Urban high rise polygon dataset
    * Variable name: `UrbanMaskFile`
    * WOfS has a known limitation, where deep shadows thrown by tall CBD buildings are misclassified as water. This means that our algorithm is defining 'water bodies' around these misclassified shadows in capital cities. [See here](GenerateWaterBodyPolygons.ipynb#Urban) for a discussion of how an urban mask is produced. 
    
### Optional inputs
* River line dataset for filtering out polygons comprised of river segments.
    * Variable name: `MajorRiversDataset`
    * The option to filter out major rivers is provided, and so this dataset is optional if `FilterOutRivers = False`.
    * Here we use the [Bureau of Meteorology's Geofabric v 3.0.5 Beta (Suface Hydrology Network)](ftp://ftp.bom.gov.au/anon/home/geofabric/), filtered to only keep features tagged as `major rivers`. 
    * There are some identified issues with this data layer that make the filtering using this data inconsistent (see [the discussion here](GenerateWaterBodyPolygons.ipynb#rivers))
    * We therefore turn this off during the production of the water bodies shapefile. 

<a id='qualityAssessmentAndChecking'></a>
## Quality Assessment and Checking

<a id='knownLimitations'></a>
## Known Limitations

<a id='conclusion'></a>
## Conclusion

<a id='codeSpecifications'></a>
## Code Specifications

This product documentation has been written in the form of a series of [Jupyter Notebooks](https://jupyter.org/), to facilitate the integration of code and discussion. 

The code and resulting product has been run using Python 3, on the [Virtual Desktop Infrastructure (VDI)](http://nci.org.au/services/vdi/) and the [National Computational Infrastructure](https://nci.org.au/). 

The DEA environment modules are loaded into the VDI session prior to running the code. We use `/dea-env/20190329` and  `/dea/20190329` to generate this product. The accompanying [requirements file](WaterbodyAreaMappingandMonitoringRequirements.txt) provides a complete list of python libraries loaded into the DEA environments. These environments have been used to run all the associated notebooks presented throughout this documentation. 

<a id='licensingAndWarranty'></a>
## Licensing and Warranty

This product is provided under a [Creative Commons Attribution 4.0 International Licence](http://creativecommons.org/licences/by/4.0/legalcode).

Geoscience Australia has tried to make the information in this product as accurate as possible.
However, it does not guarantee that the information is totally accurate or complete. Therefore, you
should not solely rely on this information when making a commercial decision.