# Fourier decomposition

Water history looks like a stationary component plus a nonstationary component. Can a Fourier decomposition pull out information we can use?

## Setup

### Load modules

In [52]:
import numpy as np
import matplotlib.pyplot as plt
import geopandas as gpd
import scipy.optimize as opt
import matplotlib.colors

%matplotlib widget

### Load the data

This was generated in WaterbodyClustering.ipynb.

In [9]:
history = np.load('history_murray_full_norivers.npy')
times = np.load('time_axis_murray_full_norivers.npy').astype('datetime64[D]')
waterbodies = gpd.read_file('waterbodies_murray_norivers.geojson')

Let's see the mean again.

In [10]:
plt.figure()
plt.plot(times, history.mean(axis=0))

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

[<matplotlib.lines.Line2D at 0x7f928a0e8978>]

## Do the transform

Fourier time!

In [14]:
fft = np.fft.fftshift(np.fft.fft(history, axis=1))

In [29]:
mean_fft = np.fft.fftshift(np.fft.fft(history.mean(axis=0)))

In [30]:
plt.figure()
plt.plot(mean_fft.real)
plt.plot(mean_fft.imag)
plt.plot(abs(mean_fft), c='k')

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

[<matplotlib.lines.Line2D at 0x7f9289d77b38>]

In [62]:
plt.figure()
plt.plot(fft[:100].real.T, alpha=0.01, c='blue')
plt.plot(fft[:100].imag.T, alpha=0.01, c='orange')
plt.plot(abs(fft[:100]).T, c='k', alpha=0.01);

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Most of the FT is a single peak, and most of the remainder is low-frequency. We could run a high-pass filter over this pretty easy.

In [31]:
mean_fft_ = mean_fft.copy()
mean_fft_[:6000] = 0
mean_fft_[-6000:] = 0

In [35]:
fft_ = fft.copy()
fft_[:, :5000] = 0
fft_[:, -5000:] = 0

In [36]:
ifft = np.fft.ifft(np.fft.ifftshift(fft_))

In [63]:
plt.figure()
plt.plot(times, abs(mean_ifft))
plt.plot(times, history.mean(axis=0))

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

[<matplotlib.lines.Line2D at 0x7f928905d828>]

In [42]:
plt.figure()
k = 150
plt.plot(times, abs(ifft[k]))
plt.plot(times, history[k])

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

[<matplotlib.lines.Line2D at 0x7f9289c96a20>]

A high-pass filter should work really well on this data. Also, a very narrow Fourier representation should be pretty good too and a good way to drop the dimensionality without losing much &mdash; the worst case is when we have a mostly-empty waterbody. The obvious thing now is to try another dimensionality reduction on just the low-frequency amplitudes...

In [46]:
import sklearn.decomposition
pca = sklearn.decomposition.PCA(n_components=50)
fft_vals = abs(fft[:, 5000:-5000])
pca_f = pca.fit_transform(fft_vals)

In [65]:
phase = np.angle(fft)
pca_f = pca.fit_transform(phase)

In [66]:
import sklearn.manifold
tsne = sklearn.manifold.TSNE(verbose=True, perplexity=50, n_iter=1000)
tsne_f = tsne.fit_transform(pca_f)

[t-SNE] Computing 151 nearest neighbors...
[t-SNE] Indexed 9081 samples in 0.054s...
[t-SNE] Computed neighbors for 9081 samples in 6.568s...
[t-SNE] Computed conditional probabilities for sample 1000 / 9081
[t-SNE] Computed conditional probabilities for sample 2000 / 9081
[t-SNE] Computed conditional probabilities for sample 3000 / 9081
[t-SNE] Computed conditional probabilities for sample 4000 / 9081
[t-SNE] Computed conditional probabilities for sample 5000 / 9081
[t-SNE] Computed conditional probabilities for sample 6000 / 9081
[t-SNE] Computed conditional probabilities for sample 7000 / 9081
[t-SNE] Computed conditional probabilities for sample 8000 / 9081
[t-SNE] Computed conditional probabilities for sample 9000 / 9081
[t-SNE] Computed conditional probabilities for sample 9081 / 9081
[t-SNE] Mean sigma: 10.005137
[t-SNE] KL divergence after 250 iterations with early exaggeration: 80.253937
[t-SNE] KL divergence after 1000 iterations: 1.535125


In [67]:
names = dict(zip(waterbodies.RivRegNum.astype(int), waterbodies.RivRegName))
plt.figure(figsize=(8, 8))
xs = np.arange(min(names), max(names))
plt.scatter(tsne_f[:, 0], tsne_f[:, 1], s=(waterbodies.area / 0.5e3) ** 0.5,
            edgecolor='None', c=waterbodies.RivRegNum.astype(int), cmap='tab20', norm=matplotlib.colors.BoundaryNorm(xs, len(xs) + 1))
cb = plt.colorbar()
cb.set_ticks(xs + 0.5)
cb.set_ticklabels([names.get(i, '') for i in xs])

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

t-SNE on the magnitudes pulls out nothing of interest, but t-SNE on the phases roughly groups the rivers. Interesting!