# DEA Coastlines raster generation <img align="right" src="https://github.com/GeoscienceAustralia/dea-notebooks/raw/develop/Supplementary_data/dea_logo.jpg">

This code conducts raster generation for DEA Coastlines:

* Load stack of all available Landsat 5, 7 and 8 satellite imagery for a location using [ODC Virtual Products](https://docs.dea.ga.gov.au/notebooks/Frequently_used_code/Virtual_products.html)
* Convert each satellite image into a remote sensing water index (MNDWI)
* For each satellite image, model ocean tides into a 2 x 2 km grid based on exact time of image acquisition
* Interpolate tide heights into spatial extent of image stack
* Mask out high and low tide pixels by removing all observations acquired outside of 50 percent of the observed tidal range centered over mean sea level
* Combine tidally-masked data into annual median composites from 1988 to the present representing the coastline at approximately mean sea level

This is an interactive version of the code intended for prototyping; to run this analysis at scale, use the [`deacoastlines_generation.py`](deacoastlines_generation.py) Python script.

**Compatability:**
```
module use /g/data/v10/public/modules/modulefiles
module load dea/20200713
pip install --user ruptures
pip install --user git+https://github.com/mattijn/topojson/
pip install --user --upgrade --extra-index-url="https://packages.dea.ga.gov.au" odc-algo
pip install --upgrade dask==2021.1.1
```
---

## Getting started


### Load packages

First we import the required Python packages, then we connect to the database, and load the catalog of virtual products.

In [None]:
%matplotlib inline
%load_ext line_profiler
%load_ext autoreload
%autoreload 2

import deacoastlines_generation as deacl_gen

import os
import sys
import datacube
import geopandas as gpd
import multiprocessing
import xarray as xr
from functools import partial
import odc.algo
import numpy as np
from datacube.utils.geometry import Geometry

dc = datacube.Datacube(app='DEA_CoastLines')

from datacube.utils.dask import start_local_dask
client = start_local_dask(mem_safety_margin='3gb')
display(client)

import datetime
start_time = datetime.datetime.now()

### Load supplementary data

In [2]:
study_area = 6205
raster_version = 'v1.1.0'
vector_version = 'v1.1.1'

# Tide points are used to model tides across the extent of the satellite data
points_gdf = gpd.read_file('input_data/tide_points_coastal.geojson')

# Albers grid cells used to process the analysis
gridcell_gdf = (gpd.read_file('input_data/50km_albers_grid_clipped.geojson')
                .to_crs(epsg=4326)
                .set_index('id')
                .loc[[study_area]])

## Loading data
### Create query


In [32]:
# Create query
geopoly = Geometry(gridcell_gdf.iloc[0].geometry, crs=gridcell_gdf.crs)
query = {'geopolygon': geopoly.buffer(0.05),
         'time': ('1987', '2020'),  # 1987, 2020
         'cloud_cover': [0, 90],
         'dask_chunks': {'time': 1, 'x': 2000, 'y': 2000}}


### Load virtual product

In [34]:
# Load virtual product    
ds = deacl_gen.load_mndwi(dc, 
                          query, 
                          yaml_path='deacoastlines_virtual_products_v1.0.0.yaml',
                          product_name='ls_nbart_mndwi')
ds

Unnamed: 0,Array,Chunk
Bytes,4.58 GB,2.69 MB
Shape,"(1697, 1347, 2002)","(1, 1347, 2000)"
Count,12655 Tasks,3394 Chunks
Type,uint8,numpy.ndarray
"Array Chunk Bytes 4.58 GB 2.69 MB Shape (1697, 1347, 2002) (1, 1347, 2000) Count 12655 Tasks 3394 Chunks Type uint8 numpy.ndarray",2002  1347  1697,

Unnamed: 0,Array,Chunk
Bytes,4.58 GB,2.69 MB
Shape,"(1697, 1347, 2002)","(1, 1347, 2000)"
Count,12655 Tasks,3394 Chunks
Type,uint8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,18.31 GB,10.78 MB
Shape,"(1697, 1347, 2002)","(1, 1347, 2000)"
Count,134844 Tasks,3394 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 18.31 GB 10.78 MB Shape (1697, 1347, 2002) (1, 1347, 2000) Count 134844 Tasks 3394 Chunks Type float32 numpy.ndarray",2002  1347  1697,

Unnamed: 0,Array,Chunk
Bytes,18.31 GB,10.78 MB
Shape,"(1697, 1347, 2002)","(1, 1347, 2000)"
Count,134844 Tasks,3394 Chunks
Type,float32,numpy.ndarray


### Clean cloud mask

In [5]:
# Rechunk if smallest chunk is less than 10
if ((len(ds.x) % 2000) <= 10) or ((len(ds.y) % 2000) <= 10):
    ds = ds.chunk({'x': 3200, 'y': 3200})

# Extract boolean mask
mask = odc.algo.enum_to_bool(ds.fmask, 
                             categories=['nodata', 'cloud', 'shadow', 'snow'])

# Close mask to remove small holes in cloud, open mask to 
# remove narrow false positive cloud, then dilate
mask = odc.algo.binary_closing(mask, 2)
mask_cleaned = odc.algo.mask_cleanup(mask, r=(10, 5))

# Add new mask as nodata pixels
ds = odc.algo.erase_bad(ds, mask_cleaned, nodata=np.nan)


## Tidal modelling
### Model tides at point locations

In [None]:
tidepoints_gdf = deacl_gen.model_tides(ds, points_gdf)
tidepoints_gdf.plot()

### Interpolate tides into each satellite timestep

In [None]:
pool = multiprocessing.Pool(multiprocessing.cpu_count() - 1)
print(f'Parallelising {multiprocessing.cpu_count() - 1} processes')
out_list = pool.map(partial(deacl_gen.interpolate_tide,
                            tidepoints_gdf=tidepoints_gdf,
                            factor=50), 
                    iterable=[(group.x.values, 
                               group.y.values, 
                               group.time.values) 
                              for (i, group) in ds.groupby('time')])

# Combine to match the original dataset
ds['tide_m'] = xr.concat(out_list, dim=ds['time'])

In [None]:
import matplotlib.pyplot as plt

# Plot 
ds_i = ds['tide_m'].isel(time=18).compute()
ds_i.plot.imshow(robust=True, 
                 cmap='viridis', 
                 size=12, 
                 vmin=ds_i.min().item(), 
                 vmax=ds_i.max().item())
tidepoints_gdf.loc[str(ds_i.time.values)[0:10]].plot(ax=plt.gca(), 
                                                     column='tide_m', 
                                                     cmap='viridis', 
                                                     markersize=100,
                                                     edgecolor='black',
                                                     vmin=ds_i.min().item(),
                                                     vmax=ds_i.max().item())


In [None]:
# Determine tide cutoff
tide_cutoff_buff = (
    (ds['tide_m'].max(dim='time') - ds['tide_m'].min(dim='time')) * 0.25)
tide_cutoff_min = 0.0 - tide_cutoff_buff  #.clip(1)
tide_cutoff_max = 0.0 + tide_cutoff_buff  #.clip(1)

## Generate yearly composites

In [None]:
# If output folder doesn't exist, create it
output_dir = f'output_data/{study_area}_{raster_version}'
os.makedirs(output_dir, exist_ok=True)

# Iterate through each year and export annual and 3-year gapfill composites
deacl_gen.export_annual_gapfill(ds, 
                                output_dir, 
                                tide_cutoff_min, 
                                tide_cutoff_max)

In [None]:
print(f'{(datetime.datetime.now() - start_time).seconds / 60:.1f} minutes')

***

## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Australia data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.

**Contact:** For assistance with any of the Python code or Jupyter Notebooks in this repository, please post a [Github issue](https://github.com/GeoscienceAustralia/DEACoastLines/issues/new). For questions or more information about this product, sign up to the [Open Data Cube Slack](https://join.slack.com/t/opendatacube/shared_invite/zt-d6hu7l35-CGDhSxiSmTwacKNuXWFUkg) and post on the [`#dea-coastlines`](https://app.slack.com/client/T0L4V0TFT/C018X6J9HLY/details/) channel.

**Last modified:** July 2021