# Manifolder

In this notebook we'll show how to use manifolder to perform manifold reconstruction on two datasets:

1. some simple test data,
2. solar wind (aka, "space weather") data.

## Example 1: Test data

The file `data/simple_data.csv` contains 8 channels of timeseries test data.

This dataset was built to be a test case that any method of timeseries clustering should be able to handle.

It has clearly defined modes, visible to the naked eye.

For simplicity, only the first column contains the test signal.

This column contains a few different, repeating signal types.

All channels have low-level noise added, for numerical stability.

At present, `Manifolder` cannot determine the number of dimensions of the
underlying manifold in an unsupervised manner. Like KMeans, the first
parameter to the constructor is the user's "guess", which will determine
the structures `Manifolder` looks for.

For example, calling `Manifolder(dim=3)` will attempt to project the signal
into an an underlying 3-dimensional manifold.

In [None]:
# useful set of python imports

%load_ext autoreload
%autoreload 2

import numpy as np

np.set_printoptions(suppress=True, precision=4)

import matplotlib.pyplot as plt
%config InlineBackend.figure_format = 'svg'

import seaborn as sns
sns.set()

import pandas as pd

import time

def separator(char='-', n=42):
    print(char*n)

### Import manifolder

In [None]:
import manifolder as mr

### Run manifolder

In [None]:
start_time = time.time()

# load the data
data_location = 'data/simple_data.csv'
df = pd.read_csv(data_location, header=None)

test_snippets = True

if test_snippets:
    # test running "snippets" of the time series, by passing in
    # chunks of data as a list
    # Example as several snippets of time series
    print('testing data a list of matrices')
    z = [df.values[0:7000, :]]
    z.append(df.values[7001:14000, :])
    z.append(df.values[14001:21000, :])
    print('loaded', data_location + ', shape:', len(z))

else:
    # this would be the standard way of running the code,
    # if you have one continous series of data
    print('testing data as a single matrix')
    z = df.values
    print('loaded', data_location + ', shape:', z.shape)

# create manifolder object
H = 80
step_size = 10
manifolder = mr.Manifolder(H=H, step_size=step_size, nbins=10, ncov=20)

# add the data, and fit.
# this is the equivalent of calling
#
#        manifolder._load_data(X)
#        manifolder._histograms_overlap()
#        manifolder._covariances()
#        manifolder._embedding()
#        manifolder._clustering()
#
manifolder.fit_transform(z)

# manifolder._clustering(kmns=False)  # display

elapsed_time = time.time() - start_time
print(f'Program Executed in {elapsed_time:.2f} seconds')

In [None]:
start_time = time.time()

manifolder._clustering()  # display

elapsed_time = time.time() - start_time
print(f'Program Executed for k means clustering in {elapsed_time:.2f} seconds')

### Build a map from cluster index to observed lengths.

The IDX attribute of the `Manifolder` instance contains the index of clusters (typically ~7).

We'll make this into a dictionary where the key denotes the cluster index (0 to 6), and the value is a list of the collection of lengths.

In [None]:
IDX = manifolder.IDX
cluster_lens = mr.count_cluster_lengths(IDX)

mr.show_cluster_lengths(cluster_lens)

In [None]:
start_time = time.time()

manifolder._clustering(kmns=True)  # display

elapsed_time = time.time() - start_time
print(f'Program Executed for k medoids with Euclidean distances clustering in {elapsed_time:.2f} seconds')

In [None]:
# clustering data for k-medoids with Euclidean distances...

IDX = manifolder.IDX
cluster_lens = mr.count_cluster_lengths(IDX)

# cluster_lens is a dictionary a dictonary, where each key is the cluster number (0:6),
# and the values are a list of cluster lengths

mr.show_cluster_lengths(cluster_lens)

### Graph Transition (Markov) Matrix

The system can be though of as being in one particular "state" (cluster value) at any given time.  This state $S$ can be though of as a column vector with $C$ dimensions, similar to states in quantum mechanics, where the column vector plays the role of the transition matrix.

Time evolution is given by the tranistion matrix $M$, which is a Markov matrix. In a Markov matrix, all columns sum to one. As such, each vector in the standard orthonormal basis is sent to a "distribution" over the states it may be in on the next time step. Symbolically, we have:

$$
S_{n+1} = M @ S_n 
$$

Where $@$ denotes matrix multiplication.

If the "Planck time" or smallest resolvable time increment is significantly smaller than the characteristic time of the physical system, then on most time steps, most clusters will "transition to" themselves. As such, the diagonal values of the matrix will typically be very close to 1. For visualization therefore, we can remove the diagonal elements of the matrix in order to see the interesting variation that involves a transition from one state to another.

In [None]:
# in this case, the index goes from 0 to 6.
# can also have outlier groups in kmeans, need to check for this

print(IDX.shape)
print(np.min(IDX))
print(np.max(IDX))

IDX_max = np.max(IDX)

In [None]:
M = mr.make_transition_matrix(IDX)
print('transition matrix:\n', M)

In [None]:
# reorder transition matrix, from most to least common cluster
# diagonal elements monotonically decreasing

IDX_ordered = mr.reorder_cluster(IDX, M)

M = mr.make_transition_matrix(IDX_ordered)

separator()
print('Transition matrix, ordered:\n', M)

mr.image_M(M)

In [None]:
# remove diagonal, and make markov, for display

print('transition matrix, diagonal elements removed, normalized (Markov)')

np.fill_diagonal(M, 0)  # happens inplace
M = mr.make_matrix_markov(M)

print(M)
mr.image_M(M, 1)

## Example 2: Solar wind data

In [None]:
start_time = time.time()

# load the data
data_location = 'data/solar_wind_data.csv'
df = pd.read_csv(data_location, header=None)
z = df.values
print(f'loaded {data_location}, shape: {z.shape}')

# create manifolder object
manifolder = mr.Manifolder()

# add the data, and fit (this runs all the functions)
manifolder.fit_transform(z)

manifolder._clustering()  # display

elapsed_time = time.time() - start_time
print(f'Program Executed in {elapsed_time:.2f} seconds')

In [None]:
# clustering data ...

IDX = manifolder.IDX
cluster_lens = mr.count_cluster_lengths(IDX)
mr.show_cluster_lengths(cluster_lens, sharey=False)

In [None]:
M = mr.make_transition_matrix(IDX)
print('transition matrix:\n', M)

In [None]:
# reorder transition matrix, from most to least common cluster
# diagonal elements monotonically decreasing

IDX_ordered = mr.reorder_cluster(IDX, M)

M = mr.make_transition_matrix(IDX_ordered)
print('transition matrix, ordered:', M)

mr.image_M(M)

In [None]:
# remove diagonal, and make markov, for display

print('Transition matrix, diagonal elements removed, normalized (Markov)')

np.fill_diagonal(M, 0)  # happens inplace
M = mr.make_matrix_markov(M)

print(M)
mr.image_M(M, 1)