# Vietnam Coastlines Combined


* Load stack of all available Landsat 5, 7, 8 and 9 satellite imagery for a location 
* Convert each satellite image into a remote sensing water index (MNDWI)
* For each satellite image, model ocean tides into a grid based on exact time of image acquisition
* Interpolate tide heights into spatial extent of image stack using the [FES2014 global tide model](https://github.com/GeoscienceAustralia/dea-coastlines/wiki/Setting-up-tidal-models-for-DEA-Coastlines)
* Mask out high and low tide pixels by removing all observations acquired outside of 50 percent of the observed tidal range centered over mean sea level
* Combine tidally-masked data into annual median composites representing the most representative position of the coastline at approximately mean sea level each year
* Apply morphological extraction algorithms to mask annual median composite rasters to a valid coastal region
* Extract waterline vectors using subpixel waterline extraction ([Bishop-Taylor et al. 2019b](https://doi.org/10.3390/rs11242984))
* Compute rates of coastal change at every 30 m using linear regression

This is an interactive version of the code intended for prototyping; to run this analysis at scale, use the [command line tools](DEACoastlines_generation_CLI.ipynb).

---

## Getting started

Set working directory to top level of repository to ensure links work correctly:

In [None]:
cd ..

### Load packages

First we import the required Python packages, then we connect to the database, and load the catalog of virtual products.

In [None]:
%matplotlib inline
%load_ext autoreload
%autoreload 2

In [None]:
import os

os.environ["USE_PYGEOS"] = "0"

# Load DEA Coastlines and DEA tools code
import coastlines.raster
import coastlines.utils
import coastlines.vector

from coastlines.utils import get_study_site_geometry
from coastlines.combined import (
    load_and_mask_data_with_stac,
    export_results,
    filter_by_tides,
    generate_yearly_composites,
    mask_pixels_by_tide,
)
from coastlines.vector import contours_preprocess

# Load other libraries
import shutil
from pathlib import Path
import geopandas as gpd
from datacube.utils.dask import start_local_dask
from dea_tools.coastal import pixel_tides
from dea_tools.spatial import subpixel_contours
from odc.stac import configure_s3_access

# Keep the messaging clean
import warnings
warnings.filterwarnings("once", message="RuntimeWarning: invalid value encountered in divide")
warnings.filterwarnings("once", message="NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned.")
warnings.filterwarnings("always", message="UserWarning: Geometry is in a geographic CRS. Results from 'buffer' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.")

In [None]:
# Restart the kernel before re-running this!

# Create local dask client for parallelisation
dask_client = start_local_dask(
    n_workers=8, threads_per_worker=4, mem_safety_margin="2GB"
)

# Configure S3 access including request payer
_ = configure_s3_access(requester_pays=True, cloud_defaults=True)

print(dask_client.dashboard_link.replace("/user", "https://hub.asia.easi-eo.solutions/user"))

## Setup


### Set analysis parameters

In [None]:
# Study area selection
# study_area = "13,45" # North
study_area = "9,19"    # South west
# study_area = "18,32" # Central

# Issues!
# Islands with not enough data
# Looks like around 200 scenes for these...
# ERROR Study area 34,20: Failed to run process with error 
# "None of [Index([2021], dtype='int64', name='year')] are in the [index]"
# study_area = "29,15"
# study_area = "33,17"
# study_area = "34,19"
# study_area = "35,19"
# study_area = "27,20"
# study_area = "27,21"
# 31,21
# 29,22
# 33,21  # this one has lots

# Different error:
# ERROR Study area 34,21: Failed to run process with error Can't load empty sequence

# ERROR Study area 14,22: Failed to run process with error numpy.nanmin raises on a.shape[axis]==0; So Bottleneck too.



# Config
version = "testing"
start_year = 2002
end_year = 2022
baseline_year = 2021
water_index = "mndwi"
index_threshold = 0.0

config_path = "configs/vietnam_coastlines_config_development.yaml"

# Tide data and config
home = Path("~")
tide_data_location = f"{home}/tide_models"
tide_centre = 0.0

# Output config
output_dir = Path(f"data/interim/vector/{version}/{study_area}_{version}")
output_dir.mkdir(exist_ok=True, parents=True)
output_cache_zarr = output_dir / f"{study_area.replace(',', '_')}_{version}_combined_ds.zarr"

# Load analysis params from config file
config = coastlines.utils.load_config(config_path=config_path)

# Load the geometry from the grid used for the location
geometry = get_study_site_geometry(config["Input files"]["grid_path"], study_area)

# BBOX and other query parameters
bbox = list(geometry.buffer(0.05).bounds.values[0])

# Use the USGS STAC API to identify scenes to load
query = {
    "bbox": bbox,
    "datetime": (str(start_year - 1), str(end_year + 1)),
}


In [None]:
# View the AoI
geometry.explore(
    tiles="https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}",
    attr="Esri",
    name="Esri Satellite",
)

## Loading data

### Create spatiotemporal query using a STAC API as the backend
This establishes the spatial and temporal extent used to search for Landsat satellite data.


In [None]:
# Load the data using dask
ds, items = load_and_mask_data_with_stac(config, query)

print(f"Found {len(items)} items")

ds

## Tidal modelling


### Interpolate tides into each satellite timestep
For each satellite timestep, model tide heights into a low-resolution 5 x 5 km grid (matching resolution of the FES2014 tidal model), then reproject modelled tides into the spatial extent of our satellite image. Add  this new data as a new variable in our satellite dataset to allow each satellite pixel to be analysed and filtered/masked based on the tide height at the exact moment of satellite image acquisition. 

In [None]:
filtered = filter_by_tides(ds, tide_data_location, tide_centre)

print(f"Reduced from {len(ds.time)} to {len(filtered.time)} timesteps")

## Generate yearly composites in memory
Export tidally-masked MNDWI median composites for each year, and three-yearly composites used to gapfill poor data coverage areas.

In [None]:
# Optionally load the daily dataset into memory, either do this here or
# down below for the combined dataset. This takes more memory, but is
# faster. The below one results in a big, complex dask graph, but saves
# a fair bit of memory.
filtered = filtered.compute()
filtered

## Tidal masking

Here we do a per-pixel mask of the extreme tide pixels. This is done with data loaded into memory, to keep things efficient on memory and dask graphs.

In [None]:
# Tidal masking
pixel_tide_masked = mask_pixels_by_tide(filtered, tide_data_location, tide_centre)
pixel_tide_masked

In [None]:
# Plot the percentage difference
year_filter = filtered.where(~filtered.isnull()).groupby("time.year").count(dim=["time", "x", "y"])
year_masked = pixel_tide_masked.where(~pixel_tide_masked.isnull()).groupby("time.year").count(dim=["time", "x", "y"])
diff = (year_masked / year_filter ) * 100
diff.mndwi.plot.line(x="year")

In [None]:
# Create a yearly dataset, loaded into memory. This takes a long time!
combined_ds = generate_yearly_composites(pixel_tide_masked, start_year, end_year)

# Load the combined dataset instead. Make sure you comment out the filtered
# load step. This will take longer, but use less memory.
# combined_ds = combined_ds.compute()
combined_ds

In [None]:
combined_ds.year

In [None]:
# Uncomment the below to save data

# if output_cache_zarr.exists():
#     print(f"Folder {output_cache_zarr} already exists. Deleting...")
#     shutil.rmtree(output_cache_zarr)
# combined_ds.to_zarr(output_cache_zarr)

In [None]:
# Uncomment the below to load data

# combined_ds = xr.open_zarr(output_cache_zarr).load()
# combined_ds

## Load vector data

In [None]:
# Coastal mask modifications
modifications_gdf = gpd.read_file(
    config["Input files"]["modifications_path"], bbox=bbox
).to_crs(str(combined_ds.odc.crs))

# Mask dataset to focus on coastal zone only
(
    masked_ds,
    certainty_masks,
    all_time_20,
    all_time_80,
    river_mask,
    ocean_da,
    thresholded_ds,
    temporal_mask,
    annual_mask,
    coastal_mask,
    ocean_mask,
) = contours_preprocess(
    combined_ds=combined_ds,
    water_index=water_index,
    index_threshold=index_threshold,
    buffer_pixels=50,
    mask_with_esa_wc=True,
    mask_modifications=modifications_gdf,
    debug=True
)

# Plot a single timestep
masked_ds.isel(year=0).plot(size=8)

In [None]:
# masked_ds,
# certainty_masks,
# all_time_20,
# all_time_80,
# river_mask,
# ocean_da,
# thresholded_ds,
# temporal_mask,
# annual_mask,
# coastal_mask,
# ocean_mask,

all_time_80.plot.imshow(size=10)

In [None]:
# Extract shorelines
contours_gdf = subpixel_contours(
    da=masked_ds,
    z_values=index_threshold,
    min_vertices=10,
    dim="year",
).set_index("year")

# Plot shorelines
contours_gdf.explore(
    tiles="https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}",
    attr="Esri",
    name="Esri Satellite",
)

## Compute statistics

###  Create stats points on baseline shorline

In [None]:
# Extract statistics modelling points along baseline shoreline
try:
    points_gdf = coastlines.vector.points_on_line(contours_gdf, baseline_year, distance=30)
except KeyError:
    print("Failed to make points")
    points_gdf = None

### Measure annual coastline movements

In [None]:
if points_gdf is not None and len(points_gdf) > 0:
    
    # Calculate annual movements for every shoreline
    # compared to the baseline year
    points_gdf = coastlines.vector.annual_movements(
        points_gdf,
        contours_gdf,
        combined_ds,
        str(baseline_year),
        water_index,
        max_valid_dist=1200,
    )
    
    # Reindex to add any missing annual columns to the dataset
    points_gdf = points_gdf.reindex(
        columns=[
            "geometry",
            *[f"dist_{i}" for i in range(start_year, end_year + 1)],
            "angle_mean",
            "angle_std",
        ]
    )
else:
    print("Something went wrong! Check the points.")

In [None]:
geometry.explore()

In [None]:
points_gdf.explore()

### Calculate regressions

In [None]:
if points_gdf is not None and len(points_gdf) > 0:
    # Apply regression function to each row in dataset
    points_gdf = coastlines.vector.calculate_regressions(points_gdf)

    # Add count and span of valid obs, Shoreline Change Envelope (SCE),
    # Net Shoreline Movement (NSM) and Max/Min years
    stats_list = ["valid_obs", "valid_span", "sce", "nsm", "max_year", "min_year"]
    points_gdf[stats_list] = points_gdf.apply(
        lambda x: coastlines.vector.all_time_stats(x, initial_year=start_year), axis=1
    )

## Export files

In [None]:
export_results(points_gdf, contours_gdf, version, output_dir, study_area)

### Close Dask client

In [None]:
dask_client.close()

***

## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Australia data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.

**Contact:** For assistance with any of the Python code or Jupyter Notebooks in this repository, please post a [Github issue](https://github.com/GeoscienceAustralia/dea-coastlines/issues/new).

**Last modified:** November 2022

**To cite:**

> Bishop-Taylor, R., Nanson, R., Sagar, S., Lymburner, L. (2021). Mapping Australia's dynamic coastline at mean sea level using three decades of Landsat imagery. Remote Sensing of Environment, 267, 112734. Available: https://doi.org/10.1016/j.rse.2021.112734
>
> Nanson, R., Bishop-Taylor, R., Sagar, S., Lymburner, L., (2022). Geomorphic insights into Australia's coastal change using a national dataset derived from the multi-decadal Landsat archive. Estuarine, Coastal and Shelf Science, 265, p.107712. Available: https://doi.org/10.1016/j.ecss.2021.107712
>
> Bishop-Taylor, R., Sagar, S., Lymburner, L., Alam, I., Sixsmith, J. (2019). Sub-pixel waterline extraction: characterising accuracy and sensitivity to indices and spectra. Remote Sensing, 11 (24):2984. Available: https://doi.org/10.3390/rs11242984