# Impulse characterisation

After some discussion with Leo I think that something we particularly care about is dynamics of the waterbodies. They are a dynamical system. There are two things we might be able to do to decipher and aggregate the dynamics: the first is to hunt for impulses, which describe how the outflow of the waterbody behaves; the second is to look at the autocorrelation function, which describes how time lag affects each waterbody. This notebook examines both.

## Setup

### Load modules

In [30]:
import numpy as np
import matplotlib.pyplot as plt
import geopandas as gpd
import scipy.optimize as opt
import scipy.ndimage.filters
import pandas as pd
import sklearn.decomposition
import sklearn.manifold
import sklearn.cluster
import scipy.signal
from tqdm.notebook import tqdm

%matplotlib widget

### Load the data

This was generated in WaterbodyClustering.ipynb.

In [2]:
history = np.load('history_murray_full_norivers.npy')
times = np.load('time_axis_murray_full_norivers.npy').astype('datetime64[D]')
waterbodies = gpd.read_file('waterbodies_murray_norivers.geojson')

## Peaks into impulses

If we take all the peaks and stack them (with some buffer before and after the peak), can we characterise the impulses that way? Let's try it on a few randomly-selected time series.

In [40]:
def find_peaks(ts, buffer_before=7, buffer_after=28, sigma=2):
    smooth = scipy.ndimage.filters.gaussian_filter1d(ts, sigma=sigma)
    peaks, _ = scipy.signal.find_peaks(smooth, prominence=0.1)
    peak_samples = []
    for p in peaks:
        peak_sample = ts[p - buffer_before:p + buffer_after]
        if len(peak_sample) != buffer_before + buffer_after:
            continue
        peak_samples.append(peak_sample)
    if len(peak_samples):
        return np.array(peak_samples)
    return np.zeros((0, buffer_before + buffer_after))

In [41]:
plt.figure()
peaks = find_peaks(history[4], sigma=1)
peak_mean = peaks.mean(axis=0)
plt.plot(peaks.T, c='k', alpha=0.1);
plt.plot(peak_mean, c='k');

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [42]:
peak_repr = [find_peaks(h) for h in tqdm(history)]

HBox(children=(FloatProgress(value=0.0, max=9081.0), HTML(value='')))




In [43]:
peak_repr_ = np.array([pr.mean(axis=0) for pr in peak_repr])

  """Entry point for launching an IPython kernel.
  ret, rcount, out=ret, casting='unsafe', subok=False)


In [45]:
peak_repr_.shape

(9081, 35)

In [61]:
peak_repr_ = np.nan_to_num(peak_repr_)

In [70]:
peak_repr_normalised = (peak_repr_ - peak_repr_.min(axis=1, keepdims=True))
peak_repr_normalised /= peak_repr_normalised.max(axis=1, keepdims=True)
peak_repr_normalised = np.nan_to_num(peak_repr_normalised)

  


In [74]:
plt.figure()
plt.plot(np.arange(-7, 28), peak_repr_normalised[:1000].T, c='k', alpha=0.01);
plt.xlabel('Time offset (days)')
plt.ylabel('Normalised water level')

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Text(0, 0.5, 'Normalised water level')

We could visualise this feature space, ignoring the structure of the time axis.

In [76]:
tsne = sklearn.manifold.TSNE(verbose=1, perplexity=50)

tsne_f = tsne.fit_transform(peak_repr_normalised)

In [80]:
plt.figure()
plt.scatter(*tsne_f.T, s=1, c=peak_repr_normalised[:, 15])

  """Entry point for launching an IPython kernel.


Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

<matplotlib.collections.PathCollection at 0x7f89dea18908>

In [81]:
gradient = np.gradient(peak_repr_normalised, axis=1)

In [83]:
plt.figure()
plt.plot(np.arange(-7, 28), gradient[:1000].T, c='k', alpha=0.01);
plt.xlabel('Time offset (days)')
plt.ylabel('$\\nabla$ Normalised water level')

  """Entry point for launching an IPython kernel.


Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Text(0, 0.5, '$\\nabla$ Normalised water level')

In [84]:
tsne = sklearn.manifold.TSNE(verbose=1, perplexity=50)
tsne_f = tsne.fit_transform(gradient)

[t-SNE] Computing 151 nearest neighbors...
[t-SNE] Indexed 9081 samples in 0.062s...
[t-SNE] Computed neighbors for 9081 samples in 4.929s...
[t-SNE] Computed conditional probabilities for sample 1000 / 9081
[t-SNE] Computed conditional probabilities for sample 2000 / 9081
[t-SNE] Computed conditional probabilities for sample 3000 / 9081
[t-SNE] Computed conditional probabilities for sample 4000 / 9081
[t-SNE] Computed conditional probabilities for sample 5000 / 9081
[t-SNE] Computed conditional probabilities for sample 6000 / 9081
[t-SNE] Computed conditional probabilities for sample 7000 / 9081
[t-SNE] Computed conditional probabilities for sample 8000 / 9081
[t-SNE] Computed conditional probabilities for sample 9000 / 9081
[t-SNE] Computed conditional probabilities for sample 9081 / 9081
[t-SNE] Mean sigma: 0.015376
[t-SNE] KL divergence after 250 iterations with early exaggeration: 78.307686
[t-SNE] KL divergence after 1000 iterations: 2.013697


In [85]:
plt.figure()
plt.scatter(*tsne_f.T, s=1, c=peak_repr_normalised[:, 0])

  """Entry point for launching an IPython kernel.


Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

<matplotlib.collections.PathCollection at 0x7f89dd0526d8>

In [88]:
import statsmodels.tsa.stattools

In [149]:
acfs = [statsmodels.tsa.stattools.acf(x) for x in tqdm(gradient)]

HBox(children=(FloatProgress(value=0.0, max=9081.0), HTML(value='')))






  acf = avf[:nlags + 1] / avf[0]


In [150]:
acfs = np.array(acfs)

In [151]:
plt.figure()
plt.plot(np.arange(len(acfs[0])), acfs[:1000].T, c='k', alpha=0.01);
plt.xlabel('Time lag (days)')
plt.ylabel('Autocorrelation of $\\nabla$ normalised water level')

  """Entry point for launching an IPython kernel.


Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Text(0, 0.5, 'Autocorrelation of $\\nabla$ normalised water level')

In [152]:
tsne = sklearn.manifold.TSNE(verbose=1, perplexity=50)
tsne_f = tsne.fit_transform(np.nan_to_num(acfs))

[t-SNE] Computing 151 nearest neighbors...
[t-SNE] Indexed 9081 samples in 0.192s...
[t-SNE] Computed neighbors for 9081 samples in 3.453s...
[t-SNE] Computed conditional probabilities for sample 1000 / 9081
[t-SNE] Computed conditional probabilities for sample 2000 / 9081
[t-SNE] Computed conditional probabilities for sample 3000 / 9081
[t-SNE] Computed conditional probabilities for sample 4000 / 9081
[t-SNE] Computed conditional probabilities for sample 5000 / 9081
[t-SNE] Computed conditional probabilities for sample 6000 / 9081
[t-SNE] Computed conditional probabilities for sample 7000 / 9081
[t-SNE] Computed conditional probabilities for sample 8000 / 9081
[t-SNE] Computed conditional probabilities for sample 9000 / 9081
[t-SNE] Computed conditional probabilities for sample 9081 / 9081
[t-SNE] Mean sigma: 0.055955
[t-SNE] KL divergence after 250 iterations with early exaggeration: 76.720688
[t-SNE] KL divergence after 1000 iterations: 1.517016


In [153]:
plt.figure()
plt.scatter(*tsne_f.T, s=1, c=peak_repr_normalised[:, 0])

  """Entry point for launching an IPython kernel.


Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

<matplotlib.collections.PathCollection at 0x7f89b19c69e8>

We can definitely run DBSCAN on this data, since it’s actually dense for once. Does that work?

In [154]:
dbs = sklearn.cluster.DBSCAN(eps=0.05)

In [155]:
clusters = dbs.fit_predict(np.nan_to_num(acfs))

In [156]:
max(clusters)

54

In [157]:
plt.figure()
plt.scatter(*tsne_f.T, s=1, c=clusters, cmap='tab10')
plt.colorbar()

  """Entry point for launching an IPython kernel.


Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

<matplotlib.colorbar.Colorbar at 0x7f89c436ebe0>

In [158]:
kmc = sklearn.cluster.KMeans(n_clusters=20)

In [159]:
clusters = kmc.fit_predict(np.nan_to_num(acfs))

In [160]:
plt.figure()
plt.scatter(*tsne_f.T, s=1, c=clusters, cmap='tab20')
plt.colorbar()

  """Entry point for launching an IPython kernel.


Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

<matplotlib.colorbar.Colorbar at 0x7f89b1930b70>

In [162]:
plt.figure()
plt.plot(np.arange(len(acfs[0])), acfs[clusters == 0].T, c='k', alpha=0.01);
plt.plot(np.arange(len(acfs[0])), acfs[clusters == 1].T, c='b', alpha=0.01);
plt.plot(np.arange(len(acfs[0])), acfs[clusters == 2].T, c='r', alpha=0.01);
plt.xlabel('Time lag (days)')
plt.ylabel('Autocorrelation of $\\nabla$ normalised water level')

  """Entry point for launching an IPython kernel.


Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Text(0, 0.5, 'Autocorrelation of $\\nabla$ normalised water level')

In [164]:
plt.figure()
plt.plot(np.arange(-7, 28), peak_repr_normalised[clusters == 0].T, c='k', alpha=0.01);
plt.plot(np.arange(-7, 28), peak_repr_normalised[clusters == 1].T, c='b', alpha=0.01);
plt.plot(np.arange(-7, 28), peak_repr_normalised[clusters == 2].T, c='r', alpha=0.01);
plt.xlabel('Time offset (days)')
plt.ylabel('Normalised water level')

  """Entry point for launching an IPython kernel.


Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Text(0, 0.5, 'Normalised water level')

In [146]:
kmc = sklearn.cluster.KMeans(n_clusters=4)

In [147]:
clusters = kmc.fit_predict(gradient)

In [148]:
plt.figure()
plt.plot(np.arange(-7, 28), gradient[clusters == 0].T, c='k', alpha=0.01);
plt.plot(np.arange(-7, 28), gradient[clusters == 1].T, c='b', alpha=0.01);
plt.plot(np.arange(-7, 28), gradient[clusters == 2].T, c='r', alpha=0.01);
plt.xlabel('Time offset (days)')
plt.ylabel('$\\nabla$ normalised water level')

  """Entry point for launching an IPython kernel.


Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Text(0, 0.5, '$\\nabla$ normalised water level')

## Autocorrelation functions

Maybe the autocorrelation function of the entire signal will be interesting.

In [178]:
acfs = [statsmodels.tsa.stattools.acf(x, fft=True, nlags=90) for x in tqdm(history)]

HBox(children=(FloatProgress(value=0.0, max=9081.0), HTML(value='')))

  acf = avf[:nlags + 1] / avf[0]





In [179]:
acfs = np.array(acfs)

In [182]:
plt.figure()
plt.plot(np.arange(len(acfs[0])), acfs[:10000].T, c='k', alpha=0.01)
plt.xlabel('Time lag (days)')
plt.ylabel('Correlation')

  """Entry point for launching an IPython kernel.


Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Text(0, 0.5, 'Correlation')

Some seem to be concave, some seem to be convex.

In [189]:
acfs_diff = [statsmodels.tsa.stattools.acf(x, fft=True, nlags=90) for x in tqdm(np.diff(scipy.ndimage.filters.gaussian_filter1d(history, axis=1, sigma=2), axis=1))]

HBox(children=(FloatProgress(value=0.0, max=9081.0), HTML(value='')))

  acf = avf[:nlags + 1] / avf[0]





In [190]:
acfs_diff = np.array(acfs_diff)

In [191]:
plt.figure()
plt.plot(np.arange(len(acfs_diff[0])), acfs_diff[:100].T, c='k', alpha=0.1)
plt.xlabel('Time lag (days)')
plt.ylabel('Correlation')

  """Entry point for launching an IPython kernel.


Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Text(0, 0.5, 'Correlation')

In [193]:
tsne = sklearn.manifold.TSNE()

In [195]:
tsne_f = tsne.fit_transform(np.nan_to_num(acfs_diff))

In [196]:
plt.figure()
plt.scatter(*tsne_f.T, s=1, c=peak_repr_normalised[:, 0])

  """Entry point for launching an IPython kernel.


Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

<matplotlib.collections.PathCollection at 0x7f8996baa630>