<a href="https://colab.research.google.com/github/bermanlabemory/motionmapperpy/blob/master/demo/motionmapperpy_fly_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The github repositories for motionmapper are -

1. [MATLAB] **motionmapper** : https://github.com/gordonberman/MotionMapper

2. [PYTHON] **motionmapperpy** : https://github.com/bermanlabemory/motionmapperpy/


# 1.&nbsp; Downloading and installing motionmapperpy

First, we'll need to get motionmapperpy (sometimes we'll call it **mmpy** for brevity) from GitHub [image.png](https://github.com/bermanlabemory/motionmapperpy).

Below we clone this github repository, which will download a copy of the repository on this COLAB runtime.

In [None]:
!git clone https://github.com/bermanlabemory/motionmapperpy

You'll notice the a **folder** named *motionmapperpy* created in the working directory in the left pane. This is the repository we just cloned and it contains the mmpy package and some toy datasets in the data folder. Note that our current working path is **/content/** in case you ever get lost. We still need to install motionmapperpy as a python package and we can do that by running this command -



```
# Change to the motionmapperpy directory we just cloned into this colab instance
%cd motionmapperpy

# Install motionmapperpy as a python package to the current python environment
!python setup.py install

# Come out of the mmpy directory.
%cd ..
```

\
\
\
*Quick note* : Colab instances come with many Python packages pre installed. You can run ```%pip list``` to see what packages are already present.

In [None]:
## Install mmpy in this cell

# Change to the motionmapperpy directory we just cloned into this colab instance
%cd motionmapperpy

# Install motionmapperpy as a python package to the current python environment
!python setup.py install

# Come out of the mmpy directory.
%cd ..

!pip3 install imageio==2.4.1

# !pip install --upgrade imageio-ffmpeg

\
\
\
Great! We should have `motionmapperpy` installed on this colab runtime now! We'll **need to restart the session** so that this notebook is able to recognize motionmapperpy (meaning we can do ```import motionmapperpy```)
and its dependencies as python packages. We can restart the instance by going to **Runtime->Restart Session** in the **top menu bar**. It is equivalent to restarting the ipython kernel when working with Jupyter notebooks.

Note that restarting the runtime does not delete files and folders we have created in this colab instance, or remove any python packages we've installed here. But doing **Disconnect and delete runtime** will do all of those things so be careful! Also, note that Google will clear our colab instance if we're not using the instance for some arbitrarily brief amount, in which case the runtime will be deleted.


Once you have restarted the session, you are good to move on to the next section!

--------------------------------------------------------------------------
\
\
\
\
\
<font size="10">**Restart session before proceeding below!**</font>
\
\
\
\
\
Great! Lets now import all packages we'll use in this notebook, including motionmapperpy.

# 2.&nbsp; Toy datasets

There are three additional small datasets present within the motionmapper repository we cloned from GitHub. These are present in **`motionmapperpy/data`** path.

1. **Fly video dataset** : **fly_movie.avi** is video of a fly and **fly_movie_projections.mat** contains PCA projections obtained after segmentation and alignment of this frames/images from the movie. .

2. **Mouse dataset** : **mouse_movie.avi** contains video of a mouse moving inside an arena. **mouse_movie_projections.mat** contains PCA projections obtained after segmentation and alignment of images/frames obtained from this movie.  

3. **Leap tracked fly dataset** : This dataset has two movies **fly_leap_test.mp4** and **fly_leap_test_2.mp4** with 2 corresponding h5 files containing 32 points tracked using [LEAP](https://dataspace.princeton.edu/handle/88435/dsp01pz50gz79z).

Try downloading the movies to check out what they look like.





--------------------


Alright, lets get started! We'll focus on the LEAP tracked fly dataset as our toy example in this notebook. We'll import some packages to kick things off below.

In [None]:
# Python standard library packages to do file/folder manipulations,
# pickle is a package to store python variables
import glob, os, pickle, sys

# time grabs current clock time and copy to safely make copies of large
# variables in memory.
import time, copy

# datetime package is used to get and manipulate date and time data
from datetime import datetime

# this packages helps load and save .mat files older than v7
import hdf5storage

# numpy works with arrays, pandas used to work with fancy numpy arrays
import numpy as np
import pandas as pd

# matplotlib is used to plot and animate to make movies
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation

# moviepy helps open the video files in Python
from moviepy.editor import VideoClip, VideoFileClip
from moviepy.video.io.bindings import mplfig_to_npimage

# Scikit-learn is a go-to library in Python for all things machine learning
from sklearn.decomposition import PCA

# tqdm helps create progress bars in for loops
from tqdm import tqdm

# Scipy is a go-to scientific computing library. We'll use it for median filtering.
from scipy.ndimage import median_filter

# Configuring matplotlib to show animations in a colab notebook as javascript
# objects for easier viewing.
from matplotlib import rc
rc('animation', html='jshtml')

Now we can load the files associated with this dataset. We'll read the two video files using `moviepy`, and the two .h5 tracking datasets using pandas.

In [None]:
datasetnames = ['fly_leap_test', 'fly_leap_test_2']
clips = [VideoFileClip('/content/motionmapperpy/data/fly/%s.mp4'%d) for d in datasetnames]

h5s_pandas = [pd.read_hdf('motionmapperpy/data/fly/%s_positions.h5'%d) for d in datasetnames]

In [None]:
h5s_pandas[0]

Let's first explore some properties of the loaded movie clips and tracking data.

In [None]:
for i, m in enumerate(clips):
  print(f'Clip {i+1} is {m.duration} seconds long at {m.fps} fps. '
  'The frames are {m.w} px wide and {m.h} px high.')
print()
for i, h5 in enumerate(h5s_pandas):
  print(f'.h5 file {i} has shape {h5.shape}.')

print('\n\nLeap tracked 32 points on the fly. Why do you think we have 96 dimensions?')

We can also use handy pandas functions ```.head()``` to look at the first few rows of the tracking dataset.

In [None]:
h5s_pandas[0].head()

It seems like the .h5 files have 32 bodyparts tracked, and there are 3 columns for each of the tracked bodyparts. They contain the `x` and `y` positions, as well as the *likelihood* value of the tracked position of each part.

We'll ditch pandas now, and convert these datasets to numpy arrays since they're more intuitive to work with. Let's also remove all the likelihood columns in these numpy arrays (so every third column).

In [None]:
# We're removing every 3rd column since that contains the likelihood values
inds_to_keep = np.mod(np.arange(1,h5.shape[1]+1),3)!=0
print(inds_to_keep)

h5s = [h5.values[:,inds_to_keep] for h5 in h5s_pandas]

Lets check what the arrays look like

In [None]:
print([h5.shape for h5 in h5s])

To remove erratic tracking errors, we can median filter our data. We'll do that below and also reshape the data so that it is easier to work with.

In [None]:
plt.figure(figsize=(14,5))
plt.plot(h5s[0][:1000,:10])

In [None]:
# Reshape the arrays so that they are easier to work with.
h5s = [median_filter(x, size=(5,1)) for x in h5s]

h5s = [i.reshape((-1, i.shape[1]//2, 2)) for i in h5s]
print('New shapes : {[i.shape for i in h5s]}')

In [None]:
plt.figure(figsize=(14,5))
plt.plot(h5s[0][:1000,0])

Great! Now we are ready to plot and see what our dataset looks like. Below we'll use `matplotlib` to overlay tracking data on the video files. We'll read video frames using moviepy `videofileclip` objects stored in `clips`. Running this cell may take upto **1 minute**.



In [None]:
# This will take about about 1-2 minutes to run.

fig, ax = plt.subplots(figsize=(10,5))
h5ind = 0
tstart = 7000

connections = [np.arange(6,10), np.arange(10,14), np.arange(14,18), np.arange(18,22), np.arange(22,26), np.arange(26,30),
              [2,0,1],[0,3,4,5], [31,3,30]]

try:
  tqdm._instances.clear()
except Exception as e:
  pass

def animate(t):
  t = int(t*clips[h5ind].fps)+tstart
  ax.clear()
  for conn in connections:
      ax.plot(h5s[h5ind][t, conn, 0], h5s[h5ind][t, conn, 1], 'k-')
  for i in range(h5s[h5ind].shape[1]):
      ax.scatter(h5s[h5ind][t, i,0], h5s[h5ind][t, i,1], marker=f'${i}$', s=200, color='k')
  ax.imshow(clips[h5ind].get_frame((t)/clips[h5ind].fps), cmap='Greys', origin='lower')
  ax.set_aspect('equal')
  ax.axis('off')

  return mplfig_to_npimage(fig)

anim = VideoClip(animate, duration=20)
plt.close()
anim.ipython_display(fps=20, loop=True, autoplay=True)


## 2.1&nbsp; Data specific pre-processing

[![](http://www.accutrend.com/wp-content/uploads/2018/07/GiGo-570x315.jpg)](http://www.accutrend.com/it-still-comes-down-to-garbage-in-garbage-out/)

When working with any model, algorithm or pipeline, we have to be mindful of the data we are feeding into them. This section covers some (of many) tricks to process the input data before feeding into motionmapperpy.

Working with the tracked bodypart positions can be challenging with flies, since there are a lot of correlations between the tracked points. We can visualize these in the plot below.

In [None]:
fig, ax = plt.subplots(figsize=(16,8))
_ = ax.plot(h5s[0][:1000,:].reshape((1000,-1)))
ax.set_xlabel("Frame #", fontsize=14)
ax.set_ylabel("X and Y positions (px)", fontsize=14)
# ax.set_ylim([-0.5, 1.5])

Instead, we can compute various angles using these 32 positions. Below, we'll compute 22 new angles in each frame using these positions, such as the angle between wingtips `(31, 0, 30)` in the image above.

Working with angles is tricky since they will either fall between [-$\pi$, $\pi$] or [0, $2\pi$], and we can have discontinuties at the boundaries. As a workaround, we can choose to compute the angles to vary over either of these two ranges. For example, the wingtips cross over very frequently so we'll choose to compute then over the [-$\pi$, $\pi$] interval. We'll indicate this choice by saying 0 (or 1 for the other choice) as the last list element in angleinds defined below.  

In [None]:
#convert to angles space

# [1,0,2,0] means the angle from 1 to 0 to 2 calculated over [-pi, pi] interval.
angleinds = [[1,0,2,0], [0,3,4,0], [3,4,5,0], [31,3,30,1], [6,7,8,0], [7,8,9,0], [10, 11, 12,0],
          [11, 12, 13, 0], [14,15,16, 0] ,[15,16,17,0], [26,27,28,0] , [27,28,29,0], [22,23,24,0],
          [23,24,25,0], [18,19,20,0], [19,20,21,0], [3,4,18,0], [3,4,22,0], [3,4,26,0], [3,4,6,0],
          [3,4,10,0], [3,4,14,0]]

# Empty array to hold computed angles.
angleh5s = [np.zeros((h5.shape[0], len(angleinds))) for h5 in h5s]

# Function to compute angle between two vectors.
def angle_between(v1, v2, small_angle=1):
    """
    Calculate angle between two vectors.

    Args:
    v1, v2 : Pair of vectors of shape (N, 2).
    small_angle : True if calculating over [-pi, pi],
                  False if calculating over [0, 2pi].

    Returns
    N angles in degrees.
    """
    ang1 = np.arctan2(v1[:,1], v1[:,0])
    ang2 = np.arctan2(v2[:,1], v2[:,0])
    if small_angle:
        out = np.rad2deg((ang1 - ang2) % (2 * np.pi))
        out[out>180] = -1*(360-out[out>180])
        return out
    else:
        return np.rad2deg((ang1 - ang2) % (2 * np.pi))


for hi,h5 in enumerate(h5s):
    for ai, aind in enumerate(angleinds):
        v1 = h5[:,aind[0]]-h5[:,aind[1]]
        v2 = h5[:,aind[2]]-h5[:,aind[1]]
        angleh5s[hi][:,ai] = angle_between(v1, v2, small_angle=aind[3])


We can look at the angles time series below. We're plotting only 5 angles here just for the sake of visualization.

In [None]:
fig, ax = plt.subplots(figsize=(16,8))
labels = ['Angle between wingtips', 'Neck angle', 'Right midleg femur-tibia angle',
          'Left foreleg coxa-femur angle', 'Mesothorasic angle']
for i,l in zip([3, 1, 13, 4, 2], labels):
  _ = ax.plot(angleh5s[0][1000:2000,i]* np.pi/(180), label=l)
ax.set_xlabel("Frame #", fontsize=14)
ax.set_ylabel("Angles (Rad)", fontsize=14)
ax.legend()

It seems like some of these angles vary a lot and some very little. We can get rid of this asymmetry by doing a min-max scaling, where we constrain each angle to scale between 0 and 1.

In [None]:
# let's normalize these angles so they fall bw 0 and 1

angleh5s_min = []
for ah5 in angleh5s:
    x = copy.deepcopy(ah5)
    x = x - np.min([np.min(x, 0) for x in angleh5s], 0)[None, :]
    angleh5s_min.append(x)

angleh5s_normed = []
for ah5min in angleh5s_min:
    x = copy.deepcopy(ah5min)
    x = x/np.max([np.max(x, 0) for x in angleh5s_min], 0)[None, :]
    angleh5s_normed.append(x)

fig, ax = plt.subplots(figsize=(16,8))
labels = ['Angle between wingtips', 'Neck angle', 'Right midleg femur-tibia angle',
          'Left foreleg coxa-femur angle', 'Mesothorasic angle']
for i,l in zip([3, 1, 13, 4, 2], labels):
  _ = ax.plot(angleh5s_normed[0][:1000,i], label=l)
ax.set_xlabel("Frame #", fontsize=14)
ax.set_ylabel("Normalized angles", fontsize=14)
ax.legend()
ax.set_ylim([0, 1.])

Now, we have a 22-dimensional feature space that is normalized, and we can use our favourite tool to reduce the dimensionality further to ease up on computational costs. Here, we'll use PCA to reduce the dimensions even further.

We're using the PCA implementation from [`sklearn.decomposition.PCA`](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) here.

In [None]:
angleh5s_normed[1].shape

In [None]:
# Concatenate the two files for downstream analyses
x = np.concatenate(angleh5s_normed, axis=0)
# x.shape -> 40000, 22
variance_threshold = 0.95

#We are using sklearn.decomposition.PCA here
p = PCA()
y = p.fit_transform(x)

fig, ax = plt.subplots(figsize=(10,4))
ax.plot(np.arange(1, x.shape[1]+1), np.cumsum(p.explained_variance_ratio_), color='firebrick')

#This calculates the number of PCA components required to surpass teh variance threshold
comps_above_thresh = np.argwhere(np.cumsum(p.explained_variance_ratio_)>variance_threshold)[0][0]

ax.axvline(x=comps_above_thresh, color='royalblue', linestyle='--', alpha=0.5)
ax.text(x = comps_above_thresh + 1, y = 0.5, s=f'{comps_above_thresh}th component')
ax.set_xlabel('PCA Components')
ax.set_ylabel('Cumulative Explained Variance')
ax.grid()

print(f'We\'ll pick the first {comps_above_thresh} components that explain {variance_threshold*100}% of the variance.')

We can look at the feature space composition of the leading principal components below.

In [None]:
y.shape

In [None]:
fig, ax = plt.subplots(figsize=(20,4))

for i in range(comps_above_thresh):
  for pi in range(x.shape[1]):
    ax.bar((i-0.4+pi*(0.8/x.shape[1])), p.components_[i, pi], width=(0.7/x.shape[1]), label=f'Angle{i}')
  ax.set_prop_cycle(None)

ax.set_xlabel('Leading PCA Components ')
ax.set_ylabel('Proportion along PC direction')
# ax.legend()
ax.grid()
plt.show()

This gives us some idea of how these maximally varying components are oriented in our high dimensional feature space.


Now we can reduce our feature space dimensions further! Note that here we use variable `y` which was obtained after the PCA transformation.

In [None]:
#picking PCA components above threshold
y = y[:,:comps_above_thresh]
print(y.shape)

#Let's also split y to the size of original h5 files.
projs_list = np.split(y, np.cumsum([h5.shape[0] for h5 in h5s])[:-1])

print([p.shape for p in projs_list])

Now we have **projs_list**, which contains two (relatively) low-dimensional timeseries.

# 3.&nbsp; Creating an mmpy project directory

Now that we have two low dimensional time series which **may** not set Google servers on fire, we will create our project directory for running the `motionmapperpy` pipeline on the data we have. Having a project directory is awesome, as it helps us stay organized when working with big datasets and multiple files. It allows datasets to be easily referenced and loaded without exhausting memory, and we can store pipeline outputs in well-defined and easy to read files.


Lets start by importing `motionmapperpy` and creating a project directory.

In [None]:
import motionmapperpy as mmpy
%matplotlib inline

projectPath = '/content/Fly_Leap_mmpy'

# This creates a project directory structure which will be used to store all motionmappery pipeline
# related data in one place.

mmpy.createProjectDirectory(projectPath)

Now lets store the two low-d time series in **projs_list** to the *`Projections`* folder in the project directory.



In [None]:
for i, projs in enumerate(projs_list):
    hdf5storage.savemat(f'{projectPath}/Projections/{datasetnames[i]}_pcaModes.mat', {'projections': projs})

We'll now go through `mmpy` parameters. They are a handful and can be overwhelming, but they are very easy to understand!

Parameters are cruicial to `mmpy` as they lay out some hard-coded choices we need to make when running this pipeline. I will explain each parameter as we encounter them in the cell below, so please read through this cell below as you run it.


In [None]:
"""2. Setup run parameters for MotionMapper."""

#% Load the default parameters.
parameters = mmpy.setRunParameters()

In [None]:
parameters

In [None]:

# %%%%%%% PARAMETERS TO CHANGE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# These need to be revised everytime you are working with a new dataset. #

parameters.projectPath = projectPath #% Full path to the project directory.


parameters.method = 'UMAP' #% We can choose between 'TSNE' or 'UMAP'

parameters.minF = 1        #% Minimum frequency for Morlet Wavelet Transform

parameters.maxF = 50       #% Maximum frequency for Morlet Wavelet Transform,
                           #% usually equal to the Nyquist frequency for your
                           #% measurements.

parameters.samplingFreq = 100    #% Sampling frequency (or FPS) of data.

parameters.numPeriods = 25       #% No. of dyadically spaced frequencies to
                                 #% calculate between minF and maxF.

parameters.pcaModes = comps_above_thresh #% Number of low-d features.

parameters.numProcessors = -1     #% No. of processor to use when parallel
                                 #% processing for wavelet calculation (if not using GPU)
                                 #% and for re-embedding. -1 to use all cores
                                 #% available.

parameters.useGPU = -1           #% GPU to use for wavelet calculation,
                                 #% set to -1 if GPU not present.

parameters.training_numPoints = 3000    #% Number of points in mini-trainings.


# %%%%% NO NEED TO CHANGE THESE UNLESS MEMORY ERRORS OCCUR %%%%%%%%%%

parameters.trainingSetSize = 5000  #% Total number of training set points to find.
                                 #% Increase or decrease based on
                                 #% available RAM. For reference, 36k is a
                                 #% good number with 64GB RAM.

parameters.embedding_batchSize = 30000  #% Lower this if you get a memory error when
                                        #% re-embedding points on a learned map.

# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Above covers usually relevant parameters when using `mmpy`. However, there are parameters associated with tSNE and UMAP implementations, such as below, which aren't usually required to be changed.

In [None]:
# %%%%%%% tSNE parameters %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

#% can be 'barnes_hut' or 'exact'. We'll use barnes_hut for this tutorial for speed.
parameters.tSNE_method = 'barnes_hut'

# %2^H (H is the transition entropy)
parameters.perplexity = 32

# %number of neigbors to use when re-embedding
parameters.maxNeighbors = 200

# %local neighborhood definition in training set creation
parameters.kdNeighbors = 5

# %t-SNE training set perplexity
parameters.training_perplexity = 20


# %%%%%%%% UMAP Parameters %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

# Size of local neighborhood for UMAP.
n_neighbors = 15

# Negative sample rate while training.
train_negative_sample_rate = 5

# Negative sample rate while embedding new data.
embed_negative_sample_rate = 1

# Minimum distance between neighbors.
min_dist = 0.1

## 3.1&nbsp; Visualizing wavelet amplitudes

This section is not required to be run by motionmapperpy, but we'll go through it to visualize spectrograms on one of the low-dimensional time series.

We'll use `mmpy.findWavelets` function to obtain the waveletes, and plot the obtained spectrogram for each feature/projection.

In [None]:
wlets, freqs = mmpy.findWavelets(projs_list[0], projs_list[0].shape[1], parameters.omega0, parameters.numPeriods, parameters.samplingFreq, parameters.maxF, parameters.minF, parameters.numProcessors, parameters.useGPU)

%matplotlib inline
fig, axes = plt.subplots(y.shape[1], 1, figsize=(20,18))

for i, ax in enumerate(axes.flatten()):
  ax.imshow(wlets[:300,25*i:25*(i+1)].T, cmap='PuRd', origin='lower')
  ax.set_yticks([0, 5, 10, 15, 20, 24])
  ax.set_yticklabels([f'{freqs[j]:0.1f}' for j in [0, 5, 10, 15, 20, 24]])
  if i == 3:
    ax.set_ylabel("Frequencies (hz)", fontsize=14)
  ax.set_title(f'Projection #{i+1}')
ax.set_xlabel('Frames', fontsize=14)
plt.tight_layout()

As we can see, our low-d time series is soon dwarfed by the 25-dimensional wavelet amplitudes obtained for each low-d feature! This is why its wise to spend some time reducing the dimensionality of our original data, as much as we can.


# 5.&nbsp; Creating a training set and embedding it using tSNE/UMAP

Even though we are working with toy datasets, we have two extremely high dimensional timeseries we're using to create a smaller and more interpretable representation. tSNE and UMAP both need to compute all-to-all distances in high-dimensional space to find neighboring points and embed them closely on this low-dimensional space we're building. This computation can quickly exhaust memory (RAM) and scale exponentially with datapoints.

To navigate this challenge, we do a subsampling procedure to create a training set, and use tSNE or UMAP to create training embeddings. All of this is done in the cell below.

**Time taken** : TSNE 86 sec | UMAP 44 sec


In [None]:
t1 = time.time()

mmpy.subsampled_tsne_from_projections(parameters, parameters.projectPath)

print(f'Done in {time.time()-t1} seconds.')


Note that the `training set` and `training embedding` are both save in the `project_directory/TSNE` or `project_directory/UMAP` directories depending on which method you're using. We'll load the training embedding below and plot it. You can play around with the sigma value here to change the coarseness of the density map.

In [None]:
trainy = hdf5storage.loadmat(f'{parameters.projectPath}/{parameters.nethod}/training_embedding.mat')['trainingEmbedding']
m = np.abs(trainy).max()

sigma=2.0
_, xx, density = mmpy.findPointDensity(trainy, sigma, 511, [-m-20, m+20])

fig, axes = plt.subplots(1, 2, figsize=(12,6))
axes[0].scatter(trainy[:,0], trainy[:,1], marker='.', c=np.arange(trainy.shape[0]), s=1)
axes[0].set_xlim([-m-20, m+20])
axes[0].set_ylim([-m-20, m+20])

axes[1].imshow(density, cmap=mmpy.gencmap(), extent=(xx[0], xx[-1], xx[0], xx[-1]), origin='lower')

On the left, we see a scatter plot and on the right, we see a Gaussian kernel convolved density estimation of these points. Does it surprise you? What does changing the sigma value do?  


# 6.&nbsp; Finding embeddings for all data

Now, we can find embeddings for our entire dataset! We'll use the `mmpy.findEmbeddings` function which requires the training set and the 2-d embeddings we find in the last step, and the high-d 'projections' time series for each dataset. We'll save the obtained embeddings for each dataset neatly in the Projections folder so that we can reference them later.

**Running time** : TSNE 19 mins | UMAP 3 mins

In [None]:
#tsne takes 19 mins
tall = time.time()

import h5py
tfolder = parameters.projectPath+'/%s/'%parameters.method

# Loading training data
with h5py.File(tfolder + 'training_data.mat', 'r') as hfile:
    trainingSetData = hfile['trainingSetData'][:].T

# Loading training embedding
with h5py.File(tfolder+ 'training_embedding.mat', 'r') as hfile:
    trainingEmbedding= hfile['trainingEmbedding'][:].T

if parameters.method == 'TSNE':
    zValstr = 'zVals'
else:
    zValstr = 'uVals'

projectionFiles = glob.glob(parameters.projectPath+'/Projections/*pcaModes.mat')
for i in range(len(projectionFiles)):
    print('Finding Embeddings')
    t1 = time.time()
    print(f'{i+1}/{len(projectionFiles)} : {projectionFiles[i]}')

    # Skip if embeddings already found.
    if os.path.exists(projectionFiles[i][:-4] +f'_{zValstr}.mat'):
        print('Already done. Skipping.\n')
        continue

    # load projections for a dataset
    projections = hdf5storage.loadmat(projectionFiles[i])['projections']

    # Find Embeddings
    zValues, outputStatistics = mmpy.findEmbeddings(projections,trainingSetData,trainingEmbedding,parameters)

    # Save embeddings
    hdf5storage.write(data = {'zValues':zValues}, path = '/', truncate_existing = True,
                    filename = projectionFiles[i][:-4]+f'_{zValstr}.mat', store_python_metadata = False,
                      matlab_compatible = True)

    # Save output statistics
    with open(projectionFiles[i][:-4] + f'_{zValstr}_outputStatistics.pkl', 'wb') as hfile:
        pickle.dump(outputStatistics, hfile)

    del zValues,projections,outputStatistics

print(f'All Embeddings Saved in {time.time()-tall} seconds!')


We can visualize the obtained embeddings by calling the cell below.

In [None]:
# load all the embeddings
for i in glob.glob(parameters.projectPath+f'/Projections/*_{zValstr}.mat'):
  ally = hdf5storage.loadmat(i)['zValues']

m = np.abs(ally).max()

sigma=2.0
_, xx, density = mmpy.findPointDensity(ally, sigma, 511, [-m-20, m+20])


fig, axes = plt.subplots(1, 2, figsize=(12,6))
axes[0].scatter(ally[:,0], ally[:,1], marker='.', c=np.arange(ally.shape[0]), s=1)
axes[0].set_xlim([-m-20, m+20])
axes[0].set_ylim([-m-20, m+20])

axes[1].imshow(density, cmap=mmpy.gencmap(), extent=(xx[0], xx[-1], xx[0], xx[-1]), origin='lower')

# 7.&nbsp; Watershed transform on the density map.

There is another handy function in `motionmapperpy` called `findWatershedRegions`. This will do an iterative watershed transform on the behavioral density map until the given `minimum_regions` are found in the density map.

It saves watershed transformed output of the embedding in `project_director/UMAP/zVals_wShed_groups.mat` file.


In [None]:
startsigma = 4.2 if parameters.method == 'TSNE' else 1.0
mmpy.findWatershedRegions(parameters, minimum_regions=10, startsigma=startsigma, pThreshold=[0.33, 0.67],
                     saveplot=True, endident = '*_pcaModes.mat')

from IPython.display import Image
Image(glob.glob(f'{parameters.projectPath}/{parameters.method}/zWshed*.png')[0])

# 8.&nbsp; Ethograms and videos

We can now create ethograms using the watershed region time series created in the last step.

In [None]:
wshedfile = hdf5storage.loadmat(f'{parameters.projectPath}/{parameters.method}/zVals_wShed_groups.mat')

wregs = wshedfile['watershedRegions'].flatten()
ethogram = np.zeros((wregs.max()+1, len(wregs)))

for wreg in range(1, wregs.max()+1):
  ethogram[wreg, np.where(wregs==wreg)[0]] = 1.0


ethogram = np.split(ethogram.T, np.cumsum(wshedfile['zValLens'][0].flatten())[:-1])

fig, axes = plt.subplots(2, 1, figsize=(20,10))

for e, name, ax in zip(ethogram, wshedfile['zValNames'][0], axes.flatten()):
  print(e.shape)
  ax.imshow(e.T, aspect='auto', cmap=mmpy.gencmap())
  ax.set_title(name[0][0])
  ax.set_yticks([i for i in range(1, wregs.max()+1, 4)])
  ax.set_yticklabels([f'Region {j+1}' for j in range(1, wregs.max()+1, 4)])

  xticklocs = [6000*i for i in range(3)]
  ax.set_xticks(xticklocs)
  ax.set_xticklabels([j/(6000) for j in xticklocs])

ax.set_xlabel('Time (min)')

## 8.1 Visualize behavioral map

Run the below code to see the behavioral map in action.

This may take **2 minutes** to run.

In [None]:
wshedfile = hdf5storage.loadmat(f'{parameters.projectPath}/{parameters.method}/zVals_wShed_groups.mat')

try:
  tqdm._instances.clear()
except:
  pass

fig, axes = plt.subplots(1, 2, figsize=(10,5))
zValues = wshedfile['zValues']
m = np.abs(zValues).max()


sigma=1.0
_, xx, density = mmpy.findPointDensity(zValues, sigma, 511, [-m-10, m+10])
axes[0].imshow(density, cmap=mmpy.gencmap(), extent=(xx[0], xx[-1], xx[0], xx[-1]), origin='lower')
axes[0].axis('off')
axes[0].set_title('Method : %s'%parameters.method)
sc = axes[0].scatter([],[],marker='o', color='k', s=500)

h5ind = 0
tstart = 0
connections = [np.arange(6,10), np.arange(10,14), np.arange(14,18), np.arange(18,22), np.arange(22,26), np.arange(26,30),
              [2,0,1],[0,3,4,5], [31,3,30]]

def animate(t):
  t = int(t*clips[h5ind].fps)+tstart
  axes[1].clear()
  im = axes[1].imshow(clips[h5ind].get_frame(t/clips[h5ind].fps), cmap='Greys', origin='lower')
  for conn in connections:
      axes[1].plot(h5s[h5ind][t, conn, 0], h5s[h5ind][t, conn, 1], 'k-')
  axes[1].axis('off')
  sc.set_offsets(zValues[20000*h5ind+t])
  return mplfig_to_npimage(fig) #im, ax


anim = VideoClip(animate, duration=2) # will throw memory error for more than 100.
plt.close()
anim.ipython_display(fps=15, loop=True, autoplay=True, maxduration=120)


At this point, you know everything you need to know to create a behavioral map for a set of datasets. But this is where we can actually start doing some science!

Open the project directory on the left pane. We should see some files in *Fly_Leap_mmpy/TSNE* or Fly_Leap_mmpy/UMAP folder. The 2-dimensional embeddings of all the files we've used for this project can be found in the **zVals_wShed_groups.mat** file which can be opened using hdf5storage in Python (as we did in the previous cell) or natively in MATLAB. This is where we can start doing some science!


In [None]:
plt.figure()
plt.imshow(
wshedfile['density'])

In [None]:
wshedfile['zValNames']

In [None]:
wshedfile['zValLens']

In [None]:
wshedfile['watershedRegions'].shape

In [None]:

wshedfile['zValues'].shape

In [None]:
list(wshedfile.keys())

## 8.2 Create region videos



Now we'll try to create region videos - we'll pick contiguous time points that belong in one watershed region, and see what the animals is doing at those times by creating a movie.  

In [None]:
def makeGroupsAndSegments(watershedRegions, zValLens, min_length=10, max_length=100):

  inds = np.zeros_like(watershedRegions)
  start = 0
  for l in zValLens:
      inds[start:start + l] = np.arange(l)
      start += l
  vinds = np.digitize(np.arange(watershedRegions.shape[0]), bins=np.concatenate([[0], np.cumsum(zValLens)]))

  splitinds = np.where(np.diff(watershedRegions, axis=0) != 0)[0] + 1
  inds = [i for i in np.split(inds, splitinds) if len(i) > min_length and len(i) < max_length]
  wregs = [i[0] for i in np.split(watershedRegions, splitinds) if len(i) > min_length and len(i) < max_length]

  vinds = [i for i in np.split(vinds, splitinds) if len(i) > min_length and len(i) < max_length]
  groups = [np.empty((0, 3), dtype=int)] * watershedRegions.max()

  for wreg, tind, vind in zip(wregs, inds, vinds):
      if np.all(vind == vind[0]):
          groups[wreg - 1] = np.concatenate(
              [groups[wreg - 1], np.array([vind[0], tind[0] + 1, tind[-1] + 1])[None, :]])
  groups = np.array([[g] for g in groups], dtype=object)
  return groups


def makeregionvideo(region, parameters, wshedfile):


  animfps=50.0
  subs=2
  submaxframes = 500


  groups = makeGroupsAndSegments(wshedfile['watershedRegions'][0], wshedfile['zValLens'][0])
  nregs = len(groups)

  region = region-1

  outputdir = f'{parameters.projectPath}/{parameters.method}/region_vidoes_{nregs}/'
  if not os.path.exists(outputdir):
    os.mkdir(outputdir)
  groups = groups-1
  print(f'[Region {region+1}] Starting')

  if os.path.isfile(outputdir + 'regions_' + f'region+1:.3i'+ '.mp4'):
      print(f'[Region {region+1}] Already present. ')
      return

  tqdm._instances.clear()

  if not groups[region][0].shape[0] or groups[region][0].shape[0] == 1:
      print(f'[Region {region+1}] No frames in groups.')
      return

  nframes = np.atleast_1d(np.diff(groups[region][0][:, 1:], axis=1).squeeze())
  if np.sum(nframes < submaxframes) == 0:
      print(f'[Region %i] All frames sequences more than length {submaxframes}.')
      return

  nplots = min(subs * subs, np.sum(nframes < submaxframes))
  longinds = np.where(nframes < submaxframes)[0]
  selectedclips = longinds[np.argsort(nframes[longinds])[::-1]][:nplots]

  vidindslist = groups[region][0][selectedclips, 0]
  framestoplot = np.array([np.arange(groups[region][0][i, 1], groups[region][0][i, 2]) for i in selectedclips], dtype=object)
  maxsize = max([i.shape[0] for i in framestoplot])

  print(f'[Region {region+1}] Making region video...')


  subx = max(2, int(np.ceil(np.sqrt(nplots))))
  fig, axes = plt.subplots(subx, subx, figsize=(12, 12))
  fig.subplots_adjust(0, 0, 1.0, 1.0, 0.0, 0.0)

  def make_frame(t):
      j_ = int(t * animfps)
      for i in range(subx * subx):

          ax = axes[i // subx, i % subx]
          ax.clear()
          ax.axis('off')
          if i >= nplots:
              continue
          j = j_ % len(framestoplot[i])
          clip = clips[vidindslist[i]]
          ax.imshow(clip.get_frame(framestoplot[i][j]/clip.fps),
                    cmap='Greys_r', origin='lower')
      return mplfig_to_npimage(fig)

  try:
      tqdm._instances.clear()
  except:
      pass

  t1 = time.time()
  animation = VideoClip(make_frame, duration=maxsize / animfps)

  animation.write_videofile(outputdir + 'regions_' + f'{region+1}.3i' + '.mp4', fps=animfps, audio=False,
                            threads=1)

  print(f'[Region {region+1}] {time.time()-t1} seconds, Saved at {outputdir}regions_{region+1:.3i}.mp4')

In [None]:
makeregionvideo(10, parameters, wshedfile)

In [None]:
# This creates region videos for all the region. This can take a while to run so be careful!
wmax = wshedfile['watershedRegions'].max()
print(wmax)
for i in range(1, wmax+1):
    makeregionvideo(i, parameters, wshedfile)

In [None]:
# Set region below to see your video.
region = 10

print(f'Region {region}')
from IPython.display import HTML
from base64 import b64encode
outputdir = f"{parameters.projectPath}/{parameters.method}/region_vidoes_{wshedfile['watershedRegions'].max()}/"
mp4 = open(f'{outputdir}/regions_{region:.3i}.mp4','rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML(f"""
<video width=400 controls loop autoplay>
      <source src="{data_url}" type="video/mp4">
</video>
""")



You can also zip your project folder by calling
```!zip -r Fly_Leap_mmpy.zip Fly_Leap_mmpy```
and download the folder on your local computer to play around with it.

In [None]:
!zip -r Fly_Leap_mmpy.zip Fly_Leap_mmpy

# Transition Matrix

In [None]:
wregs = wshedfile['watershedRegions'][0]
wregs = np.split(wregs, np.cumsum(wshedfile['zValLens'][0])[:-1])
transitions = [mmpy.demoutils.getTransitions(w[w>0]) for w in wregs]


In [None]:
fig, axes = plt.subplots(1, 4, figsize=(16,5))

statevals = np.arange(1,wshedfile['watershedRegions'].max()+2)
for ax, d in zip(axes.flatten(), [1, 10, 100, 1000]):
    F = mmpy.demoutils.makeTransitionMatrix(np.concatenate(transitions), d)
    ax.imshow(F, cmap='PuRd', extent=(statevals[0], statevals[-1], statevals[0], statevals[-1]))
    ax.set_xlabel('Initial State')
    ax.set_ylabel('Final State')

In [None]:
_ = mmpy.demoutils.plotLaggedEigenvalues(transitions)