# Extract Shoreline Monitor data (Parquet & CSV), combining series & change-rate

## ShorelineMonitor: Satellite-Derived Shoreline-Series

The
[ShorelineMonitor-Shorelines](https://radiantearth.github.io/stac-browser/#/external/coclico.blob.core.windows.net/stac/v1/shorelinemonitor-shorelines/collection.json)
dataset provides Satellite-Derived Shorelines (SDS) extracted from annually composited
Landsat satellite imagery spanning the years 1984-2024. These shorelines are mapped onto
the [Global Coastal Transect System
(GCTS)](https://github.com/TUDelft-CITG/coastpy/blob/main/tutorials/global_coastal_transect_system.ipynb).
Together they compose a new dataset that consists of time series per transect.
The ShorelineMonitor-Series consists of more than 350 million observations, each with 54
attributes, accross almost 7.5 million transects. The dataset and attributes are described in this [STAC
collection](https://radiantearth.github.io/stac-browser/#/external/coclico.blob.core.windows.net/stac/v1/shorelinemonitor-series/collection.json).
Please have a look at the metadata in one of the items. The dataset is available upon reasonable request. Please contact the data provider for more information or collaboration opportunities.

## ShorelineMonitor: Change Rate

The ShorelineMonitor dataset provides [Satellite-Derived Shorelines (SDS)](https://radiantearth.github.io/stac-browser/#/external/coclico.blob.core.windows.net/stac/v1/shorelinemonitor-shorelines/collection.json) extracted from annually
composited Landsat satellite imagery spanning the years 1984-2024. These shorelines offer a global
view of coastal change and shoreline dynamics, serving as a foundation for coastal
analytics, modeling and management. The shorelines have been mapped onto the [Global Coastal Transect System](https://radiantearth.github.io/stac-browser/#/external/coclico.blob.core.windows.net/stac/v1/gcts/collection.json) to form a [new dataset](https://radiantearth.github.io/stac-browser/#/external/coclico.blob.core.windows.net/stac/v1/shorelinemonitor-series/collection.json) of more than 7.5 million time-series. 

This notebook shows how to explore multi-decadal trends in shoreline change that are extracted from the full dataset of time series. The dataset is available upon reasonable request. Please contact the data provider for more information or collaboration opportunities.

In [None]:
import os

import dotenv
import fsspec
import geopandas as gpd
import hvplot.pandas
import pandas as pd
import pystac
import shapely
import geojson
import coastpy
from dotenv import load_dotenv
from ipyleaflet import Map, basemaps, GeoData
from shapely.geometry import Polygon, LineString

from coastpy.stac.utils import read_snapshot
from coastpy.utils.config import fetch_sas_token

load_dotenv()

# Configure cloud and Dask settings
sas_token = os.getenv("AZURE_STORAGE_SAS_TOKEN")
storage_options = {"account_name": "coclico", "sas_token": sas_token}

coclico_catalog = pystac.Catalog.from_file(
    "https://coclico.blob.core.windows.net/stac/v1/catalog.json"
)
collection_series = coclico_catalog.get_child("shorelinemonitor-series")
collection_change = coclico_catalog.get_child("gctr")

## Set paths

In [None]:
aoi_fol = r"p:\1000545-054-globalbeaches\04_Shoreline_Monitor_data_requests\Data_requests\Bruno_Castelle"
out_fol = r"p:\1000545-054-globalbeaches\04_Shoreline_Monitor_data_requests\Data_requests\Bruno_Castelle"

## Visualize data chunks

In [None]:
snapshot_series = read_snapshot(collection_series, storage_options=storage_options)
#snapshot_series.head()
snapshot_series.explore()

In [None]:
snapshot_change = read_snapshot(collection_change, storage_options=storage_options)
#snapshot_change.head()
snapshot_change.explore()

## Original data delivery

We want to match this approximately, to ensure we are consistent to clients

In [None]:
# read csv
check = pd.read_csv(os.path.join(out_fol, r"ShorelineMonitor_1984_2021_v1.5_set1_filtered_extended_sedtypeV2.csv"))

In [None]:
print(check.keys())
print(check.shape)

## Get the AOI

In [None]:
# filter based on pre-drawn (in QGIS and exported as .shp and .geojson) AOI

# load dataset
for files in os.listdir(os.path.join(aoi_fol, 'AOI')):
    if 'AOI_FR.geojson' in files:
        print(files)
        with open(os.path.join(aoi_fol, 'AOI', files)) as f:
            gj = geojson.load(f)

# revert to polygons
polygons = []
props = []
for i in gj['features']:
    try:
        polygons.append(Polygon(i['geometry']['coordinates'][0][0]))
    except:
        polygons.append(Polygon(i['geometry']['coordinates'][0]))
    props.append(i['properties']['id'])

AOI_gdf = gpd.GeoDataFrame(data={'area': props}, geometry=polygons, crs="EPSG:4326")

In [None]:
# show it on a map
m = Map(basemap=basemaps.Esri.WorldImagery, scroll_wheel_zoom=True)
m.center = (0,0)
m.zoom = 2
m.layout.height = "900px"
m.add(GeoData(geo_dataframe=AOI_gdf))
m

In [None]:
# instead of loading an AOI, could also use the map to set a AOI
from coastpy.geo.utils import get_region_of_interest_from_map

# roi = get_region_of_interest_from_map(m, default_extent=(4.796, 53.108, 5.229, 53.272))
# west, south, east, north = roi.geometry.item().bounds

## Fetch data from the databases

In [None]:
db_change = coastpy.io.STACQueryEngine(
    stac_collection=collection_change,
    storage_backend="azure",
    # columns = ["geometry", "transect_id", "sds:change_rate"] ... # when you don't need all data
)

In [None]:
db_series = coastpy.io.STACQueryEngine(
    stac_collection=collection_series,
    storage_backend="azure",
    # columns = ["geometry", "transect_id", "sds:change_rate"] ... # when you don't need all data
)

In [None]:
sas_token = fetch_sas_token(sas_token)
df_change = db_change.get_data_within_aoi(AOI_gdf, sas_token=sas_token)
print(f"Shape: {df_change.shape}")
df_change.head()

In [None]:
sas_token = fetch_sas_token(sas_token)
df_series = db_series.get_data_within_aoi(AOI_gdf, sas_token=sas_token)
print(f"Shape: {df_series.shape}")
df_series.head()

In [None]:
# make quick checkplot for the df_change
m = df_change.explore(color='blue', name='transects')
AOI_gdf.explore(m=m, color='red', alpha=0.5, name='AOI')

In [None]:
# make dashboard for the df_series

from coastpy.viz.dashboard import ShorelineSeriesApp

ShorelineSeriesApp(df_series).show()

## Data alterations for intuitive clean delivery

In [None]:
# remove columns with prob labels
df_change_mod = df_change.loc[:, ~df_change.columns.str.contains('prob')]

# print some indicators
#df_change_mod.keys()
df_change_mod.shape
#df_change_mod.head(3)

# test outcome for one transect
# test_change = df_change_mod[df_change_mod.transect_id == "cl30793s01tr02731365"]
# with pd.option_context('display.max_columns', None):
#     display(test_change)

In [None]:
# remove columns from series which are also present in change
rem_cols = df_series.columns.intersection(df_change.columns)
df_series_mod = df_series.drop(columns=rem_cols[2:]) # do not drop transect_id nor geometry

# drop more columns related to the transect
df_series_mod = df_series_mod.drop(columns=["transect_lon", "transect_lat", "transect_quadkey",'tr_stdev', 'tr_range', 'tr_qa_pct', 'tr_is_qa'])

# select the primary observations only
df_series_mod = df_series_mod[df_series_mod.obs_is_primary == True] 

# drop the columns that are redundant; because we only select primary observations
df_series_mod = df_series_mod.drop(columns=["obs_group", "obs_is_qa", "obs_is_primary", "subseries_id", 'obs_primary_count'])

# print some indications
#df_series_mod.keys()
df_series_mod.shape
# df_series_mod.head(3)

# test outcome for one transect
# test_series = df_series_mod[df_series_mod.transect_id == "cl30793s01tr02731365"]
# print(test_series.shape)
# with pd.option_context('display.max_columns', None):
#     display(test_series)

## Write geodataframes to Parquet & CSV files

In [None]:
# to parquet
df_change_mod.to_parquet(os.path.join(out_fol, r"SM_FR-transect-rates_1984_2023_set1.parquet"))

# to CSV
df_change_mod.to_csv(os.path.join(out_fol, r"SM_FR-transect-rates_1984_2023_set1.csv"), index=False)

In [None]:
# to parquet
df_series_mod.to_parquet(os.path.join(out_fol, r"SM_FR-time-series_1984_2023_set1.parquet"))

# to CSV
df_series_mod.to_csv(os.path.join(out_fol, r"SM_FR-time-series_1984_2023_set1.csv"), index=False)

In [None]:
# OPTIONAL TODO: merge into one file with nested lists on temporal data for a transect.. ??
# TODO: design set1 to 5 for different uses, leaving out certain columns?