# SAD Slopes and Variation

In this notebook we try and characterise waterbodies by their SAD curves, specifically the slope and variation.

In [2]:
import h5py
import geopandas as gpd
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm
import matplotlib.cm
import scipy.signal
import matplotlib.animation
import sklearn.decomposition
import sklearn.svm

%matplotlib widget

## Load data

In [3]:
history_file = h5py.File('interpolated_waterbodies_by_division_and_basin.h5', 'r')

In [8]:
waterbodies = gpd.read_file('waterbodies_joined_drainage_basins.shp')

We'll just focus on one location for now.

In [9]:
waterbodies = waterbodies[(waterbodies.BNAME == 'CONDAMINE-CULGOA RIVERS') & (waterbodies.Division_ == 'Murray-Darling Basin')]

In [10]:
data = history_file['Murray-Darling Basin']['CONDAMINE-CULGOA RIVERS']['pc_wet'][()]

In [11]:
uids = [int(a) for a in history_file['Murray-Darling Basin']['CONDAMINE-CULGOA RIVERS']['uid']]

In [12]:
dates = pd.to_datetime([a.decode('ascii') for a in history_file['dates']])

In [None]:
qld = gpd.read_file('../wetlands/QSC_Extracted_Data_20200820_104013993000-3580/Wetland_areas.shp')

## SAD curves

The SAD curves are a proxy for how slowly water drains. Groundwater should have a shallow slope with low variation and regulated water should have a steep slope with high variation(?). At least, they should have different slopes and variations. Let's compute the SAD curves and their variation using a sliding window.

In [13]:
def calculate_vector_stat(
    vec: "data dim",
    stat: "data dim -> target dim",
    window_size=365,
    step=10,
    target_dim=365,
    progress=None,
    window="hann",
):
    """Calculates a vector statistic over a rolling window.
    
    Parameters
    ----------
    vec : d-dimensional np.ndarray
        Vector to calculate over, e.g. a time series.
    stat : R^d -> R^t function
        Statistic function.
    window_size : int
        Sliding window size (default 365).
    step : int
        Step size (default 10).
    target_dim : int
        Dimensionality of the output of `stat` (default 365).
    progress : iterator -> iterator
        Optional progress decorator, e.g. tqdm.notebook.tqdm. Default None.
    window : str
        What kind of window function to use. Default 'hann', but you might
        also want to use 'boxcar'. Any scipy window
        function is allowed (see documentation for scipy.signal.get_window
        for more information).
        
    Returns
    -------
    (d / step)-dimensional np.ndarray
        y values (the time axis)
    t-dimensional np.ndarray
        x values (the statistic axis)
    (d / step) x t-dimensional np.ndarray
        The vector statistic array.
    """
    # Initialise output array.
    spectrogram_values = np.zeros((vec.shape[0] // step, target_dim))

    # Apply the progress decorator, if specified.
    r = range(0, vec.shape[0] - window_size, step)
    if progress:
        r = progress(r)

    # Convert the window str argument into a window function.
    window = scipy.signal.get_window(window, window_size)

    # Iterate over the sliding window and compute the statistic.
    for base in r:
        win = vec[base : base + window_size] * window
        sad = stat(win)
        spectrogram_values[base // step, :] = sad

    return (
        np.linspace(0, vec.shape[0], vec.shape[0] // step, endpoint=False),
        np.arange(target_dim),
        spectrogram_values,
    )

In [14]:
def calculate_sad(vec):
    """Calculates the surface area duration curve for a given vector of heights.
    
    Parameters
    ----------
    vec : d-dimensional np.ndarray
        Vector of heights over time.
    
    Returns
    -------
    d-dimensional np.ndarray
        Surface area duration curve vector over the same time scale.
    """
    return np.sort(vec)[::-1]

def calculate_stsad(vec, window_size=365, step=10, progress=None, window="hann"):
    """Calculates the short-time surface area duration curve for a given vector of heights.
    
    Parameters
    ----------
    vec : d-dimensional np.ndarray
        Vector of heights over time.
    window_size : int
        Sliding window size (default 365).
    step : int
        Step size (default 10).
    progress : iterator -> iterator
        Optional progress decorator, e.g. tqdm.notebook.tqdm. Default None.
    window : str
        What kind of window function to use. Default 'hann', but you might
        also want to use 'boxcar'. Any scipy window
        function is allowed (see documentation for scipy.signal.get_window
        for more information).
    
    Returns
    -------
    (d / step)-dimensional np.ndarray
        y values (the time axis)
    t-dimensional np.ndarray
        x values (the statistic axis)
    (d / step) x t-dimensional np.ndarray
        The short-time surface area duration curve array.
    """
    return calculate_vector_stat(
        vec,
        calculate_sad,
        window_size=window_size,
        step=step,
        target_dim=window_size,
        progress=progress,
        window=window,
    )

In [24]:
sads = []
for pc_wet in tqdm(data):
    sads.append(calculate_stsad(
        pc_wet[-365 * 5:], window_size=365 * 2, window='hann', step=30,
    )[-1])

HBox(children=(FloatProgress(value=0.0, max=9196.0), HTML(value='')))




Now compute the slopes and standard deviations.

In [25]:
stdevs = [np.std(sad, axis=0) for sad in sads]

In [26]:
slopes = [np.gradient(np.mean(sad, axis=0), axis=0) for sad in sads]

In [27]:
means = [np.mean(sad, axis=0) for sad in sads]

In [28]:
plt.figure()
k = 8
mean = np.cumsum(slopes[k])
plt.fill_between(np.arange(len(mean)), mean - stdevs[k], mean + stdevs[k], alpha=0.1)
plt.plot(np.arange(len(mean)), mean)

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

[<matplotlib.lines.Line2D at 0x7fd646c25b70>]

Let's look at one day and highlighting farm dam areas.

In [33]:
k = 100
features = np.stack([
    [s[k] for s in slopes],
    [s[k] for s in stdevs],
]).T

In [34]:
plt.figure()
plt.scatter(features[:, 0], features[:, 1], s=1, c=waterbodies.FEATURETYP == 'Farm Dam Area', cmap='cool')

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

<matplotlib.collections.PathCollection at 0x7fd64280c5c0>

This is pretty good at separating farm storages! What objects are tangled up with farm storages?

In [35]:
clf = sklearn.linear_model.LogisticRegression().fit(features, waterbodies.FEATURETYP == 'Farm Dam Area')

In [36]:
plt.figure()
probs = clf.predict_proba(features)[:, 1]
plt.scatter(features[:, 0], features[:, 1], s=1, c=probs, cmap='cool')

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

<matplotlib.collections.PathCollection at 0x7fd6427b6828>

In [37]:
waterbodies.buffer(200).plot(color=matplotlib.cm.cool(probs / probs.max()))

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

<matplotlib.axes._subplots.AxesSubplot at 0x7fd642481128>

Most of these look artificial (one is a mine lake). We also pick up a river, which is weird.