# Vietnam Coastlines Combined


* Load stack of all available Landsat 5, 7, 8 and 9 satellite imagery for a location 
* Convert each satellite image into a remote sensing water index (MNDWI)
* For each satellite image, model ocean tides into a grid based on exact time of image acquisition
* Interpolate tide heights into spatial extent of image stack using the [FES2014 global tide model](https://github.com/GeoscienceAustralia/dea-coastlines/wiki/Setting-up-tidal-models-for-DEA-Coastlines)
* Mask out high and low tide pixels by removing all observations acquired outside of 50 percent of the observed tidal range centered over mean sea level
* Combine tidally-masked data into annual median composites representing the most representative position of the coastline at approximately mean sea level each year
* Apply morphological extraction algorithms to mask annual median composite rasters to a valid coastal region
* Extract waterline vectors using subpixel waterline extraction ([Bishop-Taylor et al. 2019b](https://doi.org/10.3390/rs11242984))
* Compute rates of coastal change at every 30 m using linear regression

This is an interactive version of the code intended for prototyping; to run this analysis at scale, use the [command line tools](DEACoastlines_generation_CLI.ipynb).

---

## Getting started

Set working directory to top level of repository to ensure links work correctly:

In [None]:
cd ..

### Load packages

First we import the required Python packages, then we connect to the database, and load the catalog of virtual products.

In [None]:
# pip install -r requirements.in --quiet

In [None]:
%matplotlib inline
%load_ext autoreload
%autoreload 2

In [None]:
import os

os.environ['USE_PYGEOS'] = '0'

from collections import Counter

# Load DEA Coastlines and DEA tools code
import coastlines.raster
import coastlines.utils
import coastlines.vector

from coastlines.utils import get_study_site_geometry
from coastlines.combined import (
    opinionated_stac_search,
    load_and_mask_data_with_stac,
    generate_yearly_combined_ds
)

# Load other libraries
import folium
import geohash as gh
import geopandas as gpd
import pandas as pd
import xarray as xr
from datacube.utils.dask import start_local_dask
from datacube.utils.geometry import Geometry
from dea_tools.coastal import pixel_tides
from dea_tools.spatial import subpixel_contours
from odc.algo import mask_cleanup, erase_bad, to_f32
from odc.stac import configure_s3_access, load
from pystac_client import Client

In [None]:
# Create local dask client for parallelisation
dask_client = start_local_dask(n_workers=12, mem_safety_margin="3GB")

# Configure S3 access including request payer
_ = configure_s3_access(requester_pays=True, cloud_defaults=True)

print(dask_client.dashboard_link)

## Setup


### Set analysis parameters

In [None]:
# Study area selection
# study_area = "13,45" # South west
study_area = "9,19" # North
# study_area = "18,32" # Central 

# Raster config
raster_version = 'testing'
start_year = 2012
end_year = 2022
baseline_year = 2021

# Vector config
vector_version = raster_version
water_index = "mndwi"
index_threshold = 0.0

config_path = 'configs/vietnam_coastlines_config_development.yaml'

# Output config
output_dir = f"data/interim/vector/{vector_version}/{study_area}_{vector_version}"
os.makedirs(output_dir, exist_ok=True)

# Load analysis params from config file
config = coastlines.utils.load_config(
    config_path=config_path)

log = coastlines.utils.get_logger()

# Load the geometry from the grid used for the location
geometry, bbox = get_study_site_geometry(config["Input files"]["grid_path"], study_area)

## Loading data

### Create spatiotemporal query using a STAC API as the backend
This establishes the spatial and temporal extent used to search for Landsat satellite data.


In [None]:
# Use the USGS STAC API to identify scenes to load
query = {
    "bbox": bbox,
    "datetime": (str(start_year - 1), str(end_year + 1)),
}

stac_query = {
    "query": {
        "landsat:collection_category": {"in": ["T1"]},
        "eo:cloud_cover": {"lte": "95.0"}
    }
}
stac_query.update(query)

items = opinionated_stac_search(config, stac_query)

print(f"Found {len(items)} items matching the search criteria")

In [None]:
# Load the data using dask
ds = load_and_mask_data_with_stac(items, query, log)
ds

## Tidal modelling


### Interpolate tides into each satellite timestep
For each satellite timestep, model tide heights into a low-resolution 5 x 5 km grid (matching resolution of the FES2014 tidal model), then reproject modelled tides into the spatial extent of our satellite image. Add  this new data as a new variable in our satellite dataset to allow each satellite pixel to be analysed and filtered/masked based on the tide height at the exact moment of satellite image acquisition. 

In [None]:
ds["tide_m"], tides_lowres = pixel_tides(ds, resample=True, directory="/home/jovyan/tide_models/")

Plot example interpolated tide surface for a single timestep:

In [None]:
import matplotlib.pyplot as plt

# Plot
timestep = 15
ds_i = ds["tide_m"].isel(time=timestep)
ds_lowres_i = tides_lowres.isel(time=timestep)

fig, axes = plt.subplots(1, 2, figsize=(15, 8))
ds_lowres_i.plot.imshow(
    ax=axes[0],
    robust=True,
    cmap="viridis",
    vmin=ds_i.min().item(),
    vmax=ds_i.max().item(),
)
ds_i.plot.imshow(
    ax=axes[1],
    robust=True,
    cmap="viridis",
    vmin=ds_i.min().item(),
    vmax=ds_i.max().item(),
)
for ax in axes:
    gpd.GeoDataFrame(index=[0], crs=ds.odc.crs, geometry=[geometry.to_crs(ds.odc.crs)]).plot(
        ax=ax, facecolor="none", edgecolor="black"
    )
axes[0].set_title("Low resolution (5 x 5 km) modelled tides")
axes[1].set_title("Modelled tides reprojected into input satellite grid");

### Calculate per-pixel tide cutoffs
Based on the entire time-series of tide heights, compute the max and min satellite-observed tide height for each pixel, then calculate tide cutoffs used to restrict our data to satellite observations centred over mid-tide (0 m Above Mean Sea Level).

In [None]:
# Determine tide cutoff
tide_cutoff_min, tide_cutoff_max = coastlines.raster.tide_cutoffs(ds, tides_lowres, tide_centre=0.0)

## Generate yearly composites in memory
Export tidally-masked MNDWI median composites for each year, and three-yearly composites used to gapfill poor data coverage areas.

In [None]:
# Create a yearly dataset, loaded into memory. This takes a long time!
combined_ds = generate_yearly_combined_ds(ds, tide_cutoff_min, tide_cutoff_max, start_year, end_year)

In [None]:
# Optionally dump the loaded data to disk, so that it's possible to load the second step without
# the huge computation.
from pathlib import Path
import shutil

out_data = f"data/interim/raster/{raster_version}/{study_area.replace(',', '_')}_{raster_version}_combined_ds.zarr"
out_path = Path(out_data)

if out_path.exists():
    print(f"Folder {out_path} already exists. Deleting...")
    shutil.rmtree(out_path)

combined_ds.to_zarr(out_data)

In [None]:
combined_ds = xr.open_zarr(f"data/interim/raster/{raster_version}/{study_area.replace(',', '_')}_{raster_version}_combined_ds.zarr").load()
combined_ds

## Load vector data

In [None]:
# Coastal mask modifications
modifications_gdf = gpd.read_file(
    config["Input files"]["modifications_path"], bbox=bbox
).to_crs(str(combined_ds.odc.crs))

# Mask dataset to focus on coastal zone only
masked_ds, certainty_masks = coastlines.vector.contours_preprocess(
    combined_ds=combined_ds,
    water_index=water_index,
    index_threshold=index_threshold,
    mask_with_esa_wc=True,
    buffer_pixels=33,
    mask_modifications=modifications_gdf, 
)

# Plot timestep
masked_ds.isel(year=0).plot()

In [None]:
# Extract shorelines
contours_gdf = subpixel_contours(
    da=masked_ds,
    z_values=index_threshold,
    min_vertices=10,
    dim="year",
).set_index("year")

# Plot shorelines
contours_gdf.plot()

In [None]:
# Preview the data on a web map
bb = masked_ds.odc.geobox.boundingbox.to_crs(4326)
d_x = bb.right - bb.left
d_y = bb.top - bb.bottom

location = (bb.bottom + d_y / 2, bb.left + d_x / 2) 

map = folium.Map(
  location=location,
  tiles='https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}',
  attr='Esri',
  name='Esri Satellite',
  zoom_start=10
)

for _, y in contours_gdf.to_crs("epsg:4326").iterrows():
    for r in y:
      sim_geo = gpd.GeoSeries(r.geoms)
      geo_j = sim_geo.to_json()
      geo_j = folium.GeoJson(data=geo_j, style_function=lambda x: {"stroke": "red"})
      geo_j.add_to(map)
map

## Compute statistics

###  Create stats points on baseline shorline

In [None]:
# Extract statistics modelling points along baseline shoreline
try:
    points_gdf = coastlines.vector.points_on_line(contours_gdf, baseline_year, distance=30)
except KeyError:
    print("Failed to make points")
    points_gdf = None

### Measure annual coastline movements

In [None]:
if points_gdf is not None and len(points_gdf) > 0:
    
    # Calculate annual movements for every shoreline
    # compared to the baseline year
    points_gdf = coastlines.vector.annual_movements(
        points_gdf,
        contours_gdf,
        combined_ds,
        str(baseline_year),
        water_index,
        max_valid_dist=1200,
    )
    
    # Reindex to add any missing annual columns to the dataset
    points_gdf = points_gdf.reindex(
        columns=[
            "geometry",
            *[f"dist_{i}" for i in range(start_year, end_year + 1)],
            "angle_mean",
            "angle_std",
        ]
    )
else:
    print("Something went wrong! Check the points.")

### Calculate regressions

In [None]:
if points_gdf is not None and len(points_gdf) > 0:

    # Apply regression function to each row in dataset
    points_gdf = coastlines.vector.calculate_regressions(points_gdf)

    # Add count and span of valid obs, Shoreline Change Envelope (SCE),
    # Net Shoreline Movement (NSM) and Max/Min years
    stats_list = ["valid_obs", "valid_span", "sce", "nsm", "max_year", "min_year"]
    points_gdf[stats_list] = points_gdf.apply(
        lambda x: coastlines.vector.all_time_stats(x, initial_year=start_year), axis=1
    )

## Export files

### Statistics

In [None]:
if points_gdf is not None and len(points_gdf) > 0:
    # Clip stats to study area extent
    points_gdf_clipped = points_gdf.clip(gridcell_gdf)

    # Set output path
    stats_path = (
        f"{output_dir}/ratesofchange_{study_area}_"
        f"{vector_version}_{water_index}_{index_threshold:.2f}"
    )

    # Write as parquet
    points_gdf_clipped.to_parquet(f"{stats_path}.parquet")

else:
    print("Not exporting because there's no points.")

### Shorelines

In [None]:
if len(contours_gdf.index) > 0:
    # Add tide datum details (this supports future addition of extra tide datums)
    contours_gdf["tide_datum"] = "0.0 m AMSL"

    # Set output path
    contour_path = (
        f"{output_dir}/annualshorelines_{study_area}_{vector_version}_"
        f"{water_index}_{index_threshold:.2f}"
    )

    # Clip annual shoreline contours to study area extent
    contours_gdf_clipped = contours_gdf.clip(gridcell_gdf)

    # Export to parquet
    contours_gdf_clipped.to_parquet(f"{contour_path}.parquet")


### Close Dask client

In [None]:
dask_client.close()

***

## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Australia data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.

**Contact:** For assistance with any of the Python code or Jupyter Notebooks in this repository, please post a [Github issue](https://github.com/GeoscienceAustralia/dea-coastlines/issues/new).

**Last modified:** November 2022

**To cite:**

> Bishop-Taylor, R., Nanson, R., Sagar, S., Lymburner, L. (2021). Mapping Australia's dynamic coastline at mean sea level using three decades of Landsat imagery. Remote Sensing of Environment, 267, 112734. Available: https://doi.org/10.1016/j.rse.2021.112734
>
> Nanson, R., Bishop-Taylor, R., Sagar, S., Lymburner, L., (2022). Geomorphic insights into Australia's coastal change using a national dataset derived from the multi-decadal Landsat archive. Estuarine, Coastal and Shelf Science, 265, p.107712. Available: https://doi.org/10.1016/j.ecss.2021.107712
>
> Bishop-Taylor, R., Sagar, S., Lymburner, L., Alam, I., Sixsmith, J. (2019). Sub-pixel waterline extraction: characterising accuracy and sensitivity to indices and spectra. Remote Sensing, 11 (24):2984. Available: https://doi.org/10.3390/rs11242984