# Seasonal representation

The average water history looks really spiky, which seems to go with the seasons. This is probably seasonal rainfall directly contributing to the water bodies. More maximum surface area equals more catchment area, so the percentages will likely have similar increases due to rainfall and decreases due to evaporation regardless of the size of the water body. Can we decompose water histories into seasonal peaks?

## Setup

### Load modules

In [38]:
import numpy as np
import matplotlib.pyplot as plt
import geopandas as gpd
import scipy.optimize as opt
import scipy.ndimage.filters
import pandas as pd
import sklearn.decomposition
import sklearn.manifold
import sklearn.cluster

%matplotlib widget

### Load the data

This was generated in WaterbodyClustering.ipynb.

In [2]:
history = np.load('history_murray_full_norivers.npy')
times = np.load('time_axis_murray_full_norivers.npy').astype('datetime64[D]')
waterbodies = gpd.read_file('waterbodies_murray_norivers.geojson')

Let's see the mean again.

In [3]:
plt.figure()
plt.plot(times, history.mean(axis=0))

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

[<matplotlib.lines.Line2D at 0x7f1a25f30b38>]

Overlaying each year, summer-to-summer...

In [4]:
years = times.astype('datetime64[Y]')

In [5]:
plt.figure()
yearly_histories = []
for year in sorted(np.unique(years)):
    time_mask = years == year
    mean = history[:, time_mask].mean(axis=0)
    plt.plot(mean, c='grey')
    yearly_histories.append(mean)
yearly_mean = np.mean([a[:365] for a in yearly_histories[1:-1]], axis=0)
plt.plot(yearly_mean, c='k')

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

[<matplotlib.lines.Line2D at 0x7f1a25dcba20>]

## Yearly peaks

What date did the peak occur for each year? This gives us a `n_years`-dimensional representation.

In [6]:
peak_dates = []
for year in sorted(np.unique(years)):
    time_mask = years == year
    year_history = history[:, time_mask]
    # Blur to remove high-frequency components.
    blurred = scipy.ndimage.filters.gaussian_filter1d(year_history, axis=1, sigma=3)
    # Find the peaks this year.
    dry_all_year = (blurred == 0).all(axis=1)
    peaks = np.where(dry_all_year, np.datetime64('1900-01-01'), times[time_mask][np.argmax(blurred, axis=1)])
    peak_dates.append(peaks)

In [7]:
peak_dates = np.array(peak_dates)

In [8]:
dry_mask = peak_dates.astype('datetime64[Y]') == np.datetime64('1900')

In [9]:
df = pd.DataFrame(peak_dates)
day_of_year = df.apply(lambda s: s.dt.dayofyear, axis=1).values

In [10]:
plt.figure()
plt.imshow(np.where(~dry_mask[:, :100], day_of_year[:, :100], np.nan))

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

<matplotlib.image.AxesImage at 0x7f1a25d472b0>

In [11]:
day_of_year

array([[230, 230, 230, ..., 230,   1,   1],
       [271,   1, 269, ...,   1,   1,   1],
       [227, 238, 230, ..., 366,   1,   1],
       ...,
       [  1, 227,   1, ..., 163, 222,  44],
       [209, 198,   1, ..., 201,  88,  97],
       [  1,   1,   1, ..., 197, 180, 197]])

We should impute null values with something. Let's use pandas and impute with the mean of the time axis.

In [12]:
df = pd.DataFrame(day_of_year)

In [13]:
df[dry_mask | (df == 1)] = np.nan

In [14]:
df = df.apply(lambda s: s.fillna(s.mean()), axis=1)

In [15]:
day_of_year = df.values

In [16]:
plt.figure()
plt.imshow(day_of_year[:, :100])

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

<matplotlib.image.AxesImage at 0x7f1a25ad3780>

In [23]:
xs = waterbodies.geometry.centroid.x
ys = waterbodies.geometry.centroid.y

In [27]:
day_of_year[-1]

array([145.04881082, 145.04881082, 145.04881082, ..., 197.        ,
       180.        , 197.        ])

Let's do PCA on this representation!

In [17]:
normalised = day_of_year.T - day_of_year.T.mean(axis=0)
normalised /= normalised.std(axis=0)

In [18]:
pca = sklearn.decomposition.PCA(n_components=2)

In [19]:
pca_f = pca.fit_transform(normalised)

In [20]:
plt.figure()
plt.scatter(*pca_f.T, s=1)

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

<matplotlib.collections.PathCollection at 0x7f1a25b25630>

That looks essentially meaningless. Hooray? Let's try tSNE, which will work on this low-rank representation.

In [34]:
tsne = sklearn.manifold.TSNE(verbose=1, perplexity=50)

In [35]:
tsne_f = tsne.fit_transform(normalised)

[t-SNE] Computing 151 nearest neighbors...
[t-SNE] Indexed 9081 samples in 0.059s...
[t-SNE] Computed neighbors for 9081 samples in 7.312s...
[t-SNE] Computed conditional probabilities for sample 1000 / 9081
[t-SNE] Computed conditional probabilities for sample 2000 / 9081
[t-SNE] Computed conditional probabilities for sample 3000 / 9081
[t-SNE] Computed conditional probabilities for sample 4000 / 9081
[t-SNE] Computed conditional probabilities for sample 5000 / 9081
[t-SNE] Computed conditional probabilities for sample 6000 / 9081
[t-SNE] Computed conditional probabilities for sample 7000 / 9081
[t-SNE] Computed conditional probabilities for sample 8000 / 9081
[t-SNE] Computed conditional probabilities for sample 9000 / 9081
[t-SNE] Computed conditional probabilities for sample 9081 / 9081
[t-SNE] Mean sigma: 1.570661
[t-SNE] KL divergence after 250 iterations with early exaggeration: 86.702423
[t-SNE] KL divergence after 1000 iterations: 3.109263


In [36]:
plt.figure()
plt.scatter(*tsne_f.T, s=1)

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

<matplotlib.collections.PathCollection at 0x7f1a25ba44e0>

This looks OK! Clumpy in a fun way.

In [39]:
kmc = sklearn.cluster.KMeans()

In [40]:
clusters = kmc.fit_predict(normalised)

In [44]:
plt.figure()
plt.scatter(*tsne_f.T, s=1, c=clusters, cmap='tab10')
plt.colorbar()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

<matplotlib.colorbar.Colorbar at 0x7f1a24b65400>