<a href="https://colab.research.google.com/github/anoukB/tutorial_from_tracking_to_posture_dynamics_temp/blob/main/Part_2_State_space_reconstruction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Maximally Predictive Posture Sequences

## Imports and installation
This first section makes sure you have all the necessary functions to run this tutorial smoothly.

In [None]:
#Install necessary environment for display of videos
!pip install -U kora
from kora.drive import upload_public
from IPython.display import HTML


In [None]:
#Install necessary packages specific to this notebook
!pip install umap-learn
!pip install umap-learn[plot]
!pip install msmtools

In [None]:
#Clone the repository with all files, images and videos. This will result in a folder called cloned-repo
!git clone -l -s https://github.com/anoukB/tutorial_from_tracking_to_posture_dynamics cloned-repo
%cd cloned-repo
!ls  #Listo of all elements in the repo

In [None]:
# import warnings filter
from warnings import simplefilter
# ignore all future warnings
simplefilter(action='ignore', category=FutureWarning)

*** STOP HERE :)***

You need to set the directory for all the files that will be used in the tutorial. If they have been cloned from GitHub into '/content/cloned-repo/', copy-paste the example. Otherwise, write down the appropriate directory.


In [None]:
#Example:
#directory = '/content/cloned-repo/'

directory =

In [None]:
#Imports for this tutorial
import h5py
import numpy as np
import numpy.ma as ma
import matplotlib.pyplot as plt
import sys
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.colors as colors
import os
import scipy
from scipy.integrate import odeint
from scipy.ndimage import gaussian_filter
import umap.plot
from mpl_toolkits import mplot3d
import math
from statsmodels.graphics.tsaplots import plot_acf  #autocorrelation

#Load functions from other files
import sys
path_to_module = directory
sys.path.append(path_to_module)
import clustering_methods as cl
import operator_calculations as op_calc
import delay_embedding_1D as embed
import matplotlib.animation as anim

In [None]:
#Make a function to display pictures easily
def display_picture(url):
  im = plt.imread(url)

  fig, axs = plt.subplots(ncols=1, nrows=1)
  axs.imshow(im, cmap='gray')
  axs.set_axis_off()

In [None]:
# Load the file with the Principal Components TS made in previous tutorial
dir_file_storage = directory  # Choose the directory where you put the file
filename = 'file_principal_components_time_series_larva.csv'  #Write down your filename
PC_ts = np.loadtxt(dir_file_storage + filename  , delimiter = ",")

## Introduction

For the second notebook of this series, we will move from posture space to posture dynamics. How can we map recordings of the animal to interpretable numbers that retain useful information about the dynamics?

Our main goal will be to reconstruct the state space of the postures extracted from Part 1 of the tutorial.

But what is a state space, exactly?

The state space, or behavior space in our case, is all the possible states a dynamical system can occupy. In a simple one like a pendulum, we would have a simple state space, as this video by Prof. Ghrist Math shows.

In [None]:
#Video of the phase-space of a pendulum
url = upload_public(directory + 'vid_phase_space_pendulum_Ghrist_Math.mp4')
HTML(f"""<video src={url} width=500 controls/>""")


A pendulum has a circular state space, with two dimensions: momentum and angular position. At every point in time, the pendulum is located at one point in this circle, and a point in the circle represents all the necessary information to completely describe what the pendulum is doing at this point in time (its location in space and its speed).

More complex systems, like our bending larva, will have more complex state spaces, with more dimensions (otherwise called degrees of freedom). The following picture shows an example of a 2D non-linear embedding of C. Elegans' state space (from Antonio Costa, manuscript in progress).

In [None]:
display_picture(directory + 'img_umap_celegans_costa.png')



Different behaviors occupy different parts of the state space and have different shapes. This picture shows that forward crawling has a circular shape, much like the one of our pendulum, which can teach us something about the behavior. We can also see that dorsal turns occupy a smaller subspace than forward crawling. In that sense, a visual representation of the state space can be helpful to analyze behavior qualitatively. It is also possible to extract dynamical quantities from state-space such as Lyapunov exponents or stable periodic orbits, like in Ahamed $\textit{et al}$ [1], which I encourage you to read if you are interested.



## Building a maximally predictive state space

We can now get some intuition about how to reconstruct the state space of our bending larva from the recordings of its posture.

In the pendulum case, measuring the angle and velocity is sufficient to recover the complete state space. However, when we study a dynamical system more complicated than a pendulum, we usually cannot know the relevant degrees of freedom to measure. We typically have measurements that are only partial observations of the real dynamics of the system. Indeed, measurements are a (possibly noisy, most likely nonlinear) observation function of the state space that maps onto the real numbers line, so they only provide indirect information about its properties. However, this map is one-to-one, so we know that every time point can be mapped to a single point in state space. Let us watch this video by Sugihara et al., Detecting Causality in Complex Ecosystems, Science, 2012. to illustrate what we mean. In this case, the butterfly-like shape is the state space of the Lorenz system, but we can imagine it to be any state space of any system.


In [None]:
#Video to illustrate the concept of state space
url = upload_public(directory + 'vid_intro_state_space_Sugihara.mp4')
HTML(f"""<video src={url} width=500 controls/>""")



This video illustrates that from a single measurement (the time series  $X(t)$ for example), we only measure a subset of the variables that define the full state of the system. The unobserved states ($Y(t)$ and $Z(t)$) create a history dependence on $X(t)$. Fortunately, a useful theorem by Takens and others states that given a generic measurement on a dynamical system, it is possible to reconstruct the state space by concatenating enough time delays of the measurement data.

Let's come back to our larva. If we knew the number of dimensions the state space that entirely describes the dynamics of larva's bending has, we could measure them all. For now, we only have a 4-dimensional time series of the principal components of our larva's bending behavior. With the Takens theorem, we can look for a map of the bending state space that would conserve its properties. In other words, stacked delayed versions of our measurements will result in a correct embedding of the state space. A second video by Sugihara et al. illustrates this idea.



In [None]:
#Video to illustrate the Takens theorem
url = upload_public(directory + 'vid_Takens_thm_Sugihara.mp4')
HTML(f"""<video src={url} width=500 controls/>""")

The Takens theorem guarantees that if we use delayed versions of our measurements as our new axes, we will create a smooth and one-to-one map of the original state space and that it will conserve its topological properties. In the video, the delay is called $\tau$, but we will call it $K$ in the tutorial. This technique is called time delay embedding.


In mathematical terms, time delay embedding is the augmentation of a time series $x(t)$ into a higher dimension through the construction of a delay vector:

$$\vec{x}(t) = (x(t), x(t-K),...,x(t-K + 1) $$

 The determining factors to build the space are the delay $K$ and the embedding dimension $m$.

In computational terms, we have a measurement matrix from the last tutorial, which looks like this:



In [None]:
display_picture(directory + "img_measurement_matrix.png")

   
Where $d$ is the dimension of the measurement (4 in our case, each measurement being a principal component of the bending data) and $T$ is the length of our time series. We are looking to make a matrix that looks like this:

In [None]:
display_picture(directory + "img_trajectory_matrix.png")

We call this matrix $Y_K$ the trajectory matrix (or delay-embedding matrix). It represents trajectories in the embedded space (or posture sequences), and its size is $K \times d$ and $ T - K + 1$.

The schematics of the matrix are from [1].

## Method

The method is divided into three parts.
First, we will build the trajectory matrix or posture sequence matrix. For this, we will need to find an appropriate delay $K$. Second, we will calculate an embedding dimension $m$, smaller than the size of the initial matrix, to better visualize and interpret the space. Finally, we will make some quick analyses to visualize the space.

### Choosing an embedding parameter K

The first step in delay embedding is the choice of a sensible delay for our dataset. We can think of $K$ as the length of a window we are moving through our data. In our trajectory matrix, we will put all sequences of length $K$ against each other, making a $K\times d$ dimensional space.

There are many valid criteria to choose $K$. Ours will be to maximize predictive information, which is why we talk of a maximally predictive posture sequence. The metric we will use is the entropy rate $\delta_{h_N} (K)$, which is the amount of information that has to be kept in $K-1$ time delays for an accurate forecast of the next step, a metric often used in dynamical systems [2]. We can think about entropy as the amount of "surprise" contained in a sequence. A sequence with low entropy has low surprise and good predictive power. The opposite is true of high entropy.

$\textit{Note}$: Since entropy is an information-theoretic quantity based on discrete sequences, we will need to discretize our state space into $N$ partitions, using k-means clustering (see this tutorial for more explanations: https://www.analyticsvidhya.com/blog/2019/08/comprehensive-guide-k-means-clustering/ ).

We define the entropy rate as

$$\delta_{h_N} (K) = h_N(K - 1) - h_N (K) $$

where h_N(K) is the entropy of of the system at $N$ and $K$.

For a finite-sized Markovian system (we will explain more in Part 3), there exists a $K^*$ such that $\delta_{h_N} (K^*) = 0$, which means that the entropy rate does not change as we increase delays. In other words, as we increase the window length ($K$), we do not add information that could increase predictability. If the entropy rate is 0 as we increase $K$, then we removed all history dependence and we have complete information on the system. We are looking for this $K^*$ if it exists. If it does not, we are looking for the $K^*$ at which the entropy rate becomes stable. See [3] for more theoretical details on entropy and its use in this context.

### Finding a number of partitions

As we said, the space must first be topologically discretized with k-means clustering. For each $K$, we find the optimal number of partitions $N^*$. We choose $N^*$ as the biggest possible number of clusters before the entropy starts decreasing.

In [None]:
n_seed_range=np.arange(1,2000,300) #number of partitions to examine. This range can be modified to your liking.
n_samples = 3 #number of random partitions made for each number of seeds, in order to have an error measure.

In [None]:
#Warning: this will take time !!
range_Ks =  np.arange(1,50,5,dtype=int) #range of delays to study, in frames.
h_K=[]

for K in range_Ks: # for each delay K
    print('K =',K)
    traj_matrix = embed.trajectory_matrix(PC_ts,K=K-1) # Build a trajectory matrix with K_i
    h_seeds = []
    for n_partitions in n_seed_range: # For each number of partitions in n_seed_range
        h_samples=[]
        for idx in range(n_samples): #Repeat n_samples times since partitionning is a random process
            labels = cl.kmeans_knn_partition(traj_matrix,n_partitions) #do the paritionning with n_partitions
            h = op_calc.get_entropy(labels)  # calculate the entropy associated to that clustering
            h_samples.append(h)  #store that number
        h_seeds.append(h_samples)
        print('N =',n_partitions,"Entropy = ", h_samples)
    h_K.append(h_seeds)

In [None]:
#Build a figure of how entropy evolves with the number of partitions for each K

colors_K = plt.cm.viridis(np.linspace(0,1,len(range_Ks)))
max_h_K=np.zeros(len(range_Ks))
cil_h_K=np.zeros(len(range_Ks))
ciu_h_K=np.zeros(len(range_Ks))
max_idx_array = np.zeros(len(range_Ks), dtype = int)
plt.figure(figsize=(10,6))

for k in range(len(range_Ks)):
    mean = np.mean(h_K[k],axis=1)
    cil = np.percentile(h_K[k],2.5,axis=1)
    ciu = np.percentile(h_K[k],97.5,axis=1)
    plt.errorbar(n_seed_range,mean,c=colors_K[int(k)],yerr = [mean-cil,ciu-mean],capsize=4,marker='o',ms=5)
    max_idx = np.argmax(mean)
    max_idx_array[k] = int(max_idx)
    max_h_K[k]=mean[max_idx]
    cil_h_K[k]=cil[max_idx]
    ciu_h_K[k]=ciu[max_idx]

plt.scatter(n_seed_range,mean,c=mean,vmin=min(range_Ks), vmax=max(range_Ks), s = 0)
plt.colorbar(shrink = .7, aspect = 6,label='$K$ (frames)')
plt.xlabel('$N$ (partitions)')
plt.ylabel('$h$ (nats/symbol)')
#plt.xscale('log')
plt.show()


A lot lies in this graph. The x-axis is the number of partitions tested, the y-axis is entropy, and the colors represent each K. We should mention that:

1.  At a fixed number of partitions $N$, entropy will decrease as delays increase. Indeed, a larger delay window will contain more information, which increases predictive power.
2. For a specific $K$, entropy increases with partitions, reaches a maximum, and then drops.
    - If we split the state space into very few states (low number of clusters), predicting which state the system will occupy in the next step becomes trivial, so entropy will be low.
    - As we increase the number of clusters, the prediction becomes more challenging, so entropy grows until it reaches a maximum. From there, additional partitions will have an overfitting effect and result in an artificial drop in the entropy. This drop is due to the finite size of the time series. A time series with enough data points would show a plateau instead.

Whether you have a plateau or a drop, we take the $N^*$ associated with maximal entropy. In our case, we will take the maximum of each curve to determine the optimal number of partitions for each K.

$\textit{Note:}$ The behavior of $h$ with $N$ indicates that we are in the presence of deterministic chaos rather than a stochastic process. Indeed, the noise characteristic of a stochastic process has information at all scales. The entropy would then increase with $N$ indefinitely. Deterministic chaos has a fractal nature, so there exists a scale at which the entropy stops changing. As we said, the decrease of $h$ here is due to finite-size effects.

In [None]:
#Print the information about your optimal N for each K
print("Optimal numbers of partitions for selected delays")
print("Delays: ", range_Ks)
print("Number of partitions", n_seed_range[max_idx_array])

### Finding $K^*$  

Now that we know how many partitions are needed for every value of delays $K$, we can plot the entropy associated to the trajectory matrices built from each those delays.

In [None]:
n_partitions = n_seed_range[max_idx_array]  #Sample the correct number of partitions for each K

h_K=[]

for K in range(len(range_Ks)):
    traj_matrix = embed.trajectory_matrix(PC_ts,K=range_Ks[K]-1) # construct the delay matrix with K
    h_samples=[]
    for idx in range(n_samples):
        labels=cl.kmeans_knn_partition(traj_matrix,n_partitions[K]) #Make the partitioning
        h = op_calc.get_entropy(labels)  #Calculate its entropy
        h_samples.append(h)  #Store
    h_K.append(h_samples)
    print("K :", range_Ks[K], "Entropy : ", h_samples)

In [None]:
# Make a figure of the averaged entropy associated with each K.

mean = np.mean(h_K,axis=1) # average of each K
var = np.std(h_K, axis = 1)  #Standard variation

plt.errorbar(range_Ks,mean, yerr = var,marker = "o")
plt.xlabel('$K$')
plt.ylabel('$h$')
plt.show()

We note that as expected, the entropy decreases with increasing $K$, then seems to stabilize. Remember that we are looking for the $K^*$ at which the entropy stabilizes. We calculate the derivative of the entropy ($\Delta K$), and choose the K for which it reaches $0$.

In [None]:
# Take the derivative of average K. We don't explicitely divide by the time step as it is the
# same for all Ks
delh = np.empty(range_Ks.shape[0])
delh[:] = np.nan
delh[1:] = mean[1:]-mean[:-1]

# To get an error bar, we take the variance for all the samples for each K
h_diff = np.zeros((range_Ks.shape[0], n_samples))
h_K = np.array(h_K)
h_diff[1:,:] = h_K[1:,:] - h_K[:-1,:]
var_diff = np.std(h_diff, axis = 1)  #Calculate the variance

plt.errorbar(range_Ks,delh,yerr = var_diff)
plt.xlabel('$K$')
plt.ylabel('$h_k-h_{K-1}$')
plt.hlines(0.,range_Ks[0],range_Ks[-1],colors='k',linestyles='--')
plt.show()

The entropy stops varying around $K^* = 25$ frames. It becomes our official value for the optimal delay to maximize predictability in our dataset. The time conversion for $K^*$ is about 4 seconds for a frame rate of 5 seconds.

This means there are some relevant dynamics occurring in the 4-second range in our dataset. It is a good idea to take a step back and wonder: what kind of dynamics occur in this time frame?

### Dimensionality reduction ($m$)

We now have a trajectory matrix of size $K \times d$ and $ T - K + 1$. A space of this size is hard to visualize, and all dimensions might not be equally impactful. We use a dimensionality reduction technique called singular value decomposition to reduce the space to its most meaningful components. Singular value decomposition follows the following idea.

If we have a matrix

$$
X = \begin{pmatrix}\vec{x}_1^T & \vec{x}_2 ^T & ... & \vec{x}_m ^T  \\ \end{pmatrix}
$$

we can prove it can be decomposed into three matrices

$$
U\Sigma V^T = \begin{pmatrix}\vec{u}_1^T & \vec{u}_2^T & ... & \vec{u}_m^T  \\ \end{pmatrix}\begin{pmatrix}\sigma_1& 0& ... & 0  \\ 0 & \sigma_2&  ... & 0  \\ 0 & 0&  ... & 0\\ 0 & 0& ... & \sigma_m \\ \end{pmatrix}\begin{pmatrix}\vec{v}_1^T & \vec{v}_2^T & ... & \vec{v}_m^T  \\ \end{pmatrix}^T
$$

where the columns of $U$ are the eigenvectors of $X$ and $VV^T = I$, and $\Sigma$ is a diagonal matrix. I suggest reading [4] for more detailed information.


In our context, $X$ is the trajectory matrix, with each column being a  delayed time series of length $ T - K + 1$. Think of each column as a dimension axis for our state space. The $U$ matrix contains all the $K*D$ singular vectors of length $ T - K + 1$. These modes are another basis for the space described by $X$. $\Sigma$ is a diagonal matrix with the variance associated with each mode on its diagonal. The columns of the last matrix, $V^T$, represent the proportion of each singular mode necessary to reconstruct the data. The rows indicate how the proportions of a singular mode evolve across all the dimensions.  



We will do an SVD on our data. Like with $K*$, we will find the smallest number of singular modes $m*$ needed to minimize entropy while keeping the entropy rate stable.

In [None]:
K_opt = 25  # to enter manually from the entropy rate plot.
traj_matrix = embed.trajectory_matrix(PC_ts,K=K_opt-1)  #Make a trajectory matrix with optimal K.
u, s, v = scipy.linalg.svd(traj_matrix, full_matrices=0)  # Make a singular value decomposition.

Like with $K$, we will find the maximally predictive number of partitions for each number of dimensions $m$. To do so, we calculate the trajectory matrix with the optimal $K$, then iteratively take an increasing number of dimensions from the singular value decomposition, and calculate how the predictivity is modulated. We will take the smallest number of dimensions for which the entropy is stable.

In [None]:
n_seed_range=np.arange(100,2000,200) #number of partitions to examine
n_seed_range = np.insert(n_seed_range,0,10)
n_samples = 3 #number of random partitions made for each number of seeds

In [None]:
range_Ms =  np.arange(1,8,1,dtype=int) #range of dimensions to incorporate
h_M=[]

for M in range_Ms:
    print('M =',M)
    traj_matrix = np.dot(u, np.diag(s))[:,:M]  #We ponderate each singular vector by it's singular value and keep M vectors
    h_seeds=[]
    for n_seeds in n_seed_range:
        h_samples=[]
        for idx in range(n_samples):
            labels=cl.kmeans_knn_partition(traj_matrix,n_seeds)  #We make the partitioning
            h = op_calc.get_entropy(labels)  #Calculate the entropy
            h_samples.append(h)
        h_seeds.append(h_samples)
        print('N =',n_seeds,"Entropy = ", h_samples)
    h_M.append(h_seeds)

In [None]:
colors_M = plt.cm.viridis(np.linspace(0,1,len(range_Ms)))
max_h_M=np.zeros(len(range_Ms))
cil_h_M=np.zeros(len(range_Ms))
ciu_h_M=np.zeros(len(range_Ms))
max_idx_array = np.zeros(len(range_Ms), dtype = int)
plt.figure(figsize=(10,6))

for m in range(len(range_Ms)):
    mean = np.mean(h_M[m],axis=1)
    cil = np.percentile(h_M[m],2.5,axis=1)
    ciu = np.percentile(h_M[m],97.5,axis=1)
    plt.errorbar(n_seed_range,mean,c=colors_M[int(m)],yerr = [mean-cil,ciu-mean],capsize=4,marker='o',ms=5)
    max_idx = np.argmax(mean)
    max_idx_array[m] = int(max_idx)
    max_h_M[m]=mean[max_idx]
    cil_h_M[m]=cil[max_idx]
    ciu_h_M[m]=ciu[max_idx]
plt.scatter(n_seed_range,mean,c=mean,vmin=min(range_Ms), vmax=max(range_Ms), s = 0)
plt.colorbar(shrink = .7, aspect = 6,label='$M$')
plt.xlabel('$N$ (partitions)')
plt.ylabel('$h$ (nats/symbol)')
plt.show()

The curves for the number of dimensions look a lot like the ones for $K$. It makes sense that an increasing number of dimensions should increase the amount of information.

In [None]:
print("Optimal numbers of partitions for selected delays")
print("Dimensions: ", range_Ks)
print("Number of partitions", n_seed_range[max_idx_array])

In [None]:
n_seeds = n_seed_range[max_idx_array] # We take the optimal partitioning for each M

h_M=[]

for M in range(len(range_Ms)):
    traj_matrix = np.dot(u, np.diag(s))[:,:range_Ms[M]]  #Construct the traj matrix with M dimensions
    h_samples=[]
    for idx in range(n_samples):
        labels=cl.kmeans_knn_partition(traj_matrix,n_seeds[M])  #Make the partitioning
        h = op_calc.get_entropy(labels)  #Calculate the entropy
        h_samples.append(h)
    h_M.append(h_samples)
    print("M: ", range_Ms[M],"Entropy: ", h_samples)

In [None]:
mean = np.mean(h_M,axis=1)
var = np.std(h_M, axis = 1)
plt.errorbar(range_Ms,mean, yerr = var, marker = "o")
plt.xlabel('$M$')
plt.ylabel('$h$')
plt.show()

In [None]:
delh = np.empty(range_Ms.shape[0])
delh[:] = np.nan
delh[1:] = mean[1:]-mean[:-1]
h_diff = np.zeros((range_Ms.shape[0], n_samples))
h_M = np.array(h_M)
h_diff[1:,:] = h_M[1:,:] - h_M[:-1,:]
var_diff = np.std(h_diff, axis = 1)

plt.errorbar(range_Ms,delh,yerr = var_diff, marker = "o")
plt.xlabel('$M$')
plt.ylabel('$h_m-h_{m-1}$')
plt.hlines(0.,range_Ms[0],range_Ms[-1],colors='k',linestyles='--')
plt.show()

We found our optimal number of dimensions: $m^* = 5$, after which the entropy rate becomes stable.

# Some analysis on the state-space

We now reconstructed the state-space with optimal parameters $K^*$ frames and $m*$ dimensions. Here, we give some examples of analyses to get more familiar with the space.


In [None]:
#Build the space with optimal parameters
K_opt = 25
M_opt = 5
traj_matrix = embed.trajectory_matrix(PC_ts,K=K_opt - 1)
u, s, v = scipy.linalg.svd(traj_matrix, full_matrices=0)

#We must not forget to ponderate u with s, otherwise the singular vectors have proportions that do not reflect the
# dataset
u_var = np.dot(u, np.diag(s))[:,:M_opt]

Let us start by looking how the singular values are distributed in general

In [None]:
plt.plot(s, "-o")
#plt.semilogy(s, "-o")
plt.xlabel("Singular mode")
plt.ylabel("Singular value")
plt.show()

It is interesting to see that the first five singular modes are dominant, which confirms our choice of 5 dimensions to maximize predictability. We also note that the first mode is vastly dominant over others.


We might want to have an overall view of our state-space. For that, we will use 2D non-linear embedding called UMAP. We will use a color bar to illustrate time.

In [None]:
reducer = umap.UMAP(n_neighbors=200,min_dist=0.1)
embedding = reducer.fit_transform(u_var)
plt.scatter(embedding[:,0], embedding[:, 1],c= np.linspace(0,1,len(u_var)), s = 0.8)
plt.colorbar()
plt.xticks([])
plt.yticks([])
plt.show()

This shape is very interesting and structured. A deeper analysis might help us figure out what each of these sections corresponds to in the animal behavior. We will get deeper into it in the next tutorial.

We can check if the dimensionality reduction is sensible by looking at how the space changes if we plot the full trajectory matrix. Both plots should look alike, as they describe the same space. Let's check.

In [None]:
reducer = umap.UMAP(n_neighbors=100,min_dist=0.1)
embedding = reducer.fit_transform(traj_matrix)
plt.scatter(embedding[:,0], embedding[:, 1],c= np.linspace(0,1,len(traj_matrix)), s = 0.8)
plt.colorbar()
plt.xticks([])
plt.yticks([])
plt.show()

As expected, they have very similar structures. To validate the importance to weight our $U$ matrix vectors with their respective variance, let's look at the UMAP of U.

In [None]:
reducer = umap.UMAP(n_neighbors=200,min_dist=0.1)
embedding = reducer.fit_transform(u)
plt.scatter(embedding[:,0], embedding[:, 1],c= np.linspace(0,1,len(u)), s = 0.8)
plt.colorbar()
plt.xticks([])
plt.yticks([])
plt.show()

You see that in this case, we loose all structure. Finally, let's look at the space without embedding.



In [None]:
reducer = umap.UMAP(n_neighbors=200,min_dist=0.1)
embedding = reducer.fit_transform(PC_ts)
plt.scatter(embedding[:,0], embedding[:, 1],c= np.linspace(0,1,len(PC_ts)), s = 0.8)
plt.colorbar()
plt.xticks([])
plt.yticks([])
plt.show()

There is some structure, but less than in the embedded space. By choosing $K^*$ and $m^*$ to minimize the entropy, we built a maximally predictive sequence of postures to recover this additional information.

We might want to make sense of the structures in this embedded space. For further analysis, it is good practice to study the SVD modes. To interpret them, you could relate them to meaningful parameters, look at their time series, etc. It might or might not be useful, depending on your data. For the larva, the first SVD modes are not easily interpretable. However, we could show that the first SVD mode has a very high direct correlation with the anterior angle of the larva. However, if we did a similar process looking at the compression of the segments (peristalsis of the larva), we would get interpretable more SVD modes (all account for the wave going through the larva's body).


As for your own data, we leave it to your creativity and specific needs!


Note that with your own data, you might need to tweak the parameters of UMAP embedding, with are n_neighbors and min_dist.

References

[1] Ahamed, T. et al., Capturing the continuous complexity of behaviour in Caenorhabditis elegans. Nat. Phys., 2019.

[2] Grassberger, P., Toward a quantitative theory of self-generated complexity, International Journal of Theoretical Physics volume, 1986.

[3] Costa. C. A, et al., Maximally predictive states: From partial observations to long timescales, Chaos, 2023.

[4] J. Shlens, A tutorial on principal component analysis, arXiv, 2014.