# Welcome to MoSeq2-Notebook

### Run all of the MoSeq2 tools in a self-contained notebook.

### First time here? Read the [README](https://github.com/dattalab/moseq2-app/blob/jupyter/README.md)!!

***
<center><h1>MoSeq2 Introduction</h1></center>

***

<img src="https://drive.google.com/uc?export=view&id=1Cps4eniKXpoKwSjFGC4R7S1JSeG4bVJg">


MoSeq2 software toolkit for unsupervised characterization of animal behavior. Moseq takes depth recordings of single behaving animals as input, and outputs a rich labeling of postural dynamics in terms of reused motifs or 'syllables'. 

This notebook begins with compressed depth recordings (see 'Data Acquisiting Overview' below) and transforms this data through the steps of:
- **Extraction**: The animal is segmented from the background and its position and heading direction are aligned across frames.
- **Dimensionality reduction**: Raw video is de-noised and transformed to low-dimensional pose trajectories using principal component analysis (PCA).
- **Model training**: Pose trajectories are modeled using an autoregressive hidden Markov model (AR-HMM), producing a sequence of syllable labels.
- **Analysis**: Model output is reported through visualization and statistical analysis.

### Resources
Below are a list of publications and links to the individual github tool wikis for your convenience.
- Publications
    - [Mapping Sub-Second Structure in Mouse Behavior](http://datta.hms.harvard.edu/wp-content/uploads/2018/01/pub_23.pdf)
    - [The Striatum Organizes 3D Behavior via Moment-to-Moment Action Selection](http://datta.hms.harvard.edu/wp-content/uploads/2019/06/Markowitz.final_.pdf)
    - [Q&A: Understanding the composition of behavior](http://datta.hms.harvard.edu/wp-content/uploads/2019/06/Datta-QA.pdf)
- Wikis
    - [Extract](https://github.com/dattalab/moseq2-extract/wiki)
    - [PCA](https://github.com/dattalab/moseq2-pca/wiki)
    - [Model](https://github.com/dattalab/moseq2-model/wiki)
    - [Viz](https://github.com/dattalab/moseq2-viz/wiki)

### Data Acquisition Overview

MoSeq2 takes animal depth recordings as input. We we have developed a [data acquisition pipeline](https://github.com/dattalab/moseq2-docs/wiki/Setup:-acquisition-software) for the Xbox Kinect depth camera. We suggest following our [data acquisiting tutorial](https://github.com/dattalab/moseq2-docs/wiki/Acquisition) for doing recordings. 

MoSeq2 also accepts depth recordings from an `Azure Kinect` camera as well as the `Intel RealSense` depth camera. These camera's have their own means of acquiring data that is built-in to their respective Development Kits. __Note that if you are using either of these acquisition methods, you may have to adjust the extraction parameters accordingly.__

__It is recommended that MoSeq2 users collect over 10 hours of depth videos (nframes >= ~1 million frames) for the best analysis results.__

***

***
<center><h1>Notebook Setup</h1></center>

***

<img src="https://drive.google.com/uc?export=view&id=1Rs-LYyYIHueyE3x60dKk_fTtNMkCzqSb">

### Check if the correct anaconda environment is being used

This jupyter notebook must be launched from an activated conda environment.

If the anaconda environment is called moseq2, then the following cell's output should look similar to the following output: ```/Users/username/anaconda3/envs/moseq2/bin/python```

In [None]:
%%bash
which python # [Shift + ENTER] -> run cell

### Data file organization

After performing data acquisition, store all of your session folders under a parent directory (shown below) to access them in this notebook. 

The 3 currently accepted depth data extensions are `.dat`, `.avi` and `.mkv`. Note that the avi files must NOT be in RGB formatting.

```
.
└── Data_Directory/
    ├── session_1/ ** - the folder containing all of a single session's data
    ├   ├── depth.dat        # depth data - the recording itself
    ├   ├── depth_ts.txt     # timestamps - csv/txt file of the frame timestamps
    ├   └── metadata.json    # metadata - json file that contains the rodent's info (group, subjectName, etc.)
    ...
    ├── session_2/ **
    ├   ├── depth.dat
    ├   ├── depth_ts.txt
    └── └── metadata.json

```

### Notebook Progress File

This notebook will save your progress after running any of the MoSeq operation cells in a `progress.yaml`.

In the case that your notebook kernel is shutdown for any reason, the progress file that will store all of your variables (which are paths) throughout each analysis session in case you need to restore them at a later time (without performing all the computations again).

__To restore previously computed variables, run any of the cells with the (Convenience Cell) label.__

## Find Your Recording Directories and Create The Progress File

Input the absolute path to your desired parent directory containing your recorded session subdirectories in the path variable in the cell below. 

If you copied the notebook to the base directory containing all of your session folders, you can set the `data_path = './'`.

Otherwise, input the __absolute path__ to your parent directory in `data_path` variable to find your sessions. To obtain your absolute path, you can run `pwd` in your bash terminal in your desired directory.

In [None]:
from moseq2_extract.gui import get_found_sessions, restore_progress_vars, check_progress
from glob import glob
import ruamel.yaml as yaml
import os, sys

data_path = './' # User-defined absolute path

base_dir, found_sessions = get_found_sessions(data_path, exts=['dat', 'mkv', 'avi']) # file extensions to look for
progress_filepath = os.path.join(base_dir, 'progress.yaml')

print('Number of found sessions to analyze:', found_sessions)
print('Your base directory is:', base_dir)


config_filepath, index_filepath, train_data_dir, pca_dirname, \
            scores_filename, model_path, scores_file, \
            crowd_dir, plot_path = check_progress(base_dir, progress_filepath)

### Generate Configuration Files

The `config.yaml` will be used to hold all configurable parameters for all steps in the MoSeq pipeline. 

In [None]:
import os
from moseq2_extract.gui import generate_config_command, update_progress

config_filepath = os.path.join(base_dir, 'config.yaml')

print(f'generating file in path: {config_filepath}')
update_progress(progress_filepath, 'config_file', config_filepath)
generate_config_command(config_filepath)

A configuration file has been created in the base directory (depicted below).

```
.
└── Data_Directory/
    ├── config.yaml **
    ├── session_1/ 
    ├   ├── depth.dat        
    ├   ├── depth_ts.txt     
    ├   └── metadata.json    
    ...
    ├── session_2/ 
    ├   ├── depth.dat
    ├   ├── depth_ts.txt
    └── └── metadata.json
```

### Download a Flip File

MoSeq2 currently uses a deep-learning flip classifier to guarantee that the mouse is always oriented facing east (post-extraction). The flip-classifier currently __best suits mice that are similar to adult male c57 mice recorded with Kinect v2 cameras__. (You may skip this step if your mice are very different)

In [None]:
from moseq2_extract.gui import download_flip_command
# selection=0 - large mice with fibers (default)
# selection=1 - adult male C57s
# selection=2 - mice with Inscopix cables
download_flip_command(base_dir, config_filepath, selection=1)

***
<center><h1>Raw Data Extraction</h1></center>

***

The MoSeq2-Extract module will be used to segment the mouse from the background, and create metadata files for later dimensionality reduction and modeling. The resulting metadata files used for analysis are the `.h5` and `.yaml` files, while the `.mp4` videos are primarily used for data quality assurance after it has been extracted.

Prior to extracting a full dataset, it is recommended to follow the data quality testing steps to guarantee the extraction goes smoothely with no defects.

The purple nodes in the diagram below indicate visualization steps that will serve to further ensure data quality, and indicate any surface level differences that appear across subjects or groups.

<img src="https://drive.google.com/uc?export=view&id=1IsizL2VItwjUrYWkE8sa-An09ukrPVQK">

## Pre-Extraction Data Quality Testing

Before performing a full extraction on your recordings, ensure your Regions of Interest (ROIs) are properly found, and your Sample Extraction is satisfactory. The ROIs computed below are: a complete background, the arena floor, and the first frame.

## ROI Test

This test ensures that your whole background area is properly captured without any artifacts that may interfere with the mouse video extraction. 

When you run the cell, it will prompt you to select a session to test. Input the corresponding index number of the session you would like to test.

If your ROIs are not computed correctly, refer to the ROI pathologies below.

In [None]:
import ruamel.yaml as yaml
from moseq2_extract.gui import find_roi_command

with open(config_filepath, 'r') as f:
    config_data = yaml.safe_load(f)
f.close()

# General ROI parameter you may need to configure; ensure to keep a wide range >100mm.
config_data['bg_roi_depth_range'] = (650, 750) # the min/max depth heights to search for floor of arena (in mm)

# Alternative Camera/Open Field ROI parameters
config_data['bg_roi_shape'] = 'ellipse' # ellipse for round arenas, 'rect' for rectangular arenas
config_data['bg_roi_dilate'] = (10, 10) # square kernel to increase included floor area
config_data['dilate_iterations'] = 1 # if bucket is cone shaped (\_/) increase iterations to increase included arena
config_data['bg_roi_weights'] = (1, .1, 1) # DO NOT CONFIGURE UNLESS USING AZURE OR REALSENSE CAMERAS

with open(config_filepath, 'w') as f:
    yaml.safe_dump(config_data, f)
f.close()
rois, labels = find_roi_command(base_dir, config_filepath)

Once complete, you can expect the following directory structure:

```
.
└── Data_Directory/
    ├── config.yaml 
    ├── session_1/ 
    ├   ├── proc/ **
    ├   ├   ├── bground.tiff **
    ├   ├   ├── first_frame.tiff **
    ├   ├   └── roi.tiff ** 
    ├   ├── depth.dat        
    ├   ├── depth_ts.txt     
    ├   └── metadata.json    
    ...
    └── session_2/ 
```

### Display your calculated ROI images below:

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
for i in range(len(rois)):
    plt.figure()
    print(labels[i])
    if len(rois[i].shape) < 3:
        plt.imshow(rois[i], vmin=None, vmax=None) #bg_roi_depth_range = (vmin, vmax)
    else:
        plt.imshow(rois[i][0], vmin=None, vmax=None)
    plt.colorbar()

### Possible ROI Pathologies

<table style="width: 100%;">
  <tbody>
    <tr style="width: 100%;">
      <th></th>
      <th style="text-align:center;">Good ROI Examples</th>
      <th style="text-align:center;">Bad ROI Example</th>
      <th style="text-align:center;">Bad Arena Detection Example</th>
    </tr>  
    <tr>
      <th style="text-align:center;">Pathology Description</th>
      <td></td>
      <td style="text-align:center;">Resulting background ROI contains holes, or walls are not crisp/well defined.</td>
      <td style="text-align:center;">Arena is not defined at all in roi.png.</td>
    </tr>
    <tr>
      <th style="text-align:center;">Reference Examples</th>
      <th><ul>
          <li style="text-align:center;">Circular Arena<br><img src="https://drive.google.com/uc?export=view&id=1v8GAgWJu-Gcvf9OhkoHX6G2SXmH5a4D_" width=350 height=350></li><br><br>
          <li style="text-align:center;">Alternatively Shaped Arena<br><img src="https://drive.google.com/uc?export=view&id=1w21Di6TsRg-Hgbd2PCwIU_kyrvGuajar" width=350 height=350></li></ul>
          <li style="text-align:center;">Rectange Arena Captured with RealSense<br><img src="https://drive.google.com/uc?export=view&id=1Emx-Vlsxp7kM1QVIZ01Wi7bgHHAmx-TT" width=350 height=350></li></ul>
        </th>
      <td>
         <img src="https://drive.google.com/uc?export=view&id=1gI_WZoQWpayoESj3FDwSpZ3zN5SI8rRt" width=350 height=350>
        </td>
      <td>
          <img src="https://drive.google.com/uc?export=view&id=1oGVi9rIWwkCxth0x8S2VNOatMCqMbKRm" width=350 height=350>
        </td>
    </tr>
    <tr>
      <th style="text-align:center;">Image Analysis Solutions</th>
      <td style="text-align:center;"></td> <!-- G -->
      <td>
        <ul>
            <li style="text-align:left;">The ROI does not need to have a perfectly sharp circumference, but the sharper it is, the crisper the extraction will come out.</li>  
            <li style="text-align:left;">First, ensure that you are using a wide enough depth-range. Experiment with different min/max values to capture more area.</li>
            <li style="text-align:left;">If only a subsection of the floor is being found, and the depth range is correct, then try increasing the dilation iterations to include more bground region.</li>
        </ul>
      </td>
      <td>
          <ul>
              <li style="text-align:left;"> If the walls are not showing up, or the roi is an irregular shape, use the vmin and vmax parameters in the plotting cell above to find the proper bg_depth_range values to cover your arena floor.</li>
              <li style="text-align:left;">If you are using an Azure camera, set the config_data['bg_roi_weights'] = (10, 0.1, 1), and increase the dilate_iterations till you are satisfied with the outputted result.</li>
              <li  style="text-align:left;">If you are using a RealSense camera, note that the depth values are ~4x wider depth range real-life mm measurements. Therefore, multiply real distance by 4 and use that a reference point for your new bg_roi_depth_range parameter. Moreover, set the config_data['bg_roi_weights'] = (10, 1, 4) to adjust the depth scaling.</li>
          </ul>
    </td>
    </tr>
  </tbody>
</table>

## Sample Test Extraction 

Next, test whether the mouse is segmented from the arena and oriented correctly during the extraction process. The extracted version of the mouse will appear on the top left-hand corner of the generated video. 

This cell will also prompt for a selected session. Typically, this session should be the same one as the previously tested ROI.

If the mouse is consistently facing rightward, then the extraction test is successful and you can proceed to extract your dataset with the set parameters. Otherwise, refer to the extractions pathologies below.

In [None]:
import ruamel.yaml as yaml
from moseq2_extract.gui import sample_extract_command

nframes = 600 # number of frames to extract from raw to preview

with open(config_filepath, 'r') as f:
    config_data = yaml.safe_load(f)
f.close()

# Extraction parameters you may need to configure
config_data['min_height'] = 10 # Min mouse height from floor (mm), depth values < min_height are set to 0
config_data['max_height'] = 100 # Max mouse height from floor (mm), depth values > max_height are set to 0
config_data['spatial_filter_size'] = [3] # Space prefilter kernel (median filter, value must be odd)
config_data['temporal_filter_size'] = [0] # Time prefilter kernel (median filter, value must be odd)

with open(config_filepath, 'w') as f:
    yaml.safe_dump(config_data, f)
f.close()

sample_ext_dir = sample_extract_command(base_dir, config_filepath, nframes)

After an extraction, you can expect the following directory structure:

```
.
├── config.yaml
├── session_1/
├   ├── sample_proc/ **
├   ├   ├── bground.tiff **
├   ├   ├── first_frame.tiff **
├   ├   ├── results_00.mp4 **
├   ├   ├── results_00.h5 **
├   ├   ├── results_00.yaml **
├   ├   └── roi.tiff ** 
├   ├── depth.dat
├   ├── depth_ts.txt
├   └── metadata.json
└── session_2/
```

You can view your sample extraction below:

In [None]:
from IPython.display import display, Video
import os

vid = Video(os.path.join(sample_ext_dir, 'results_00.mp4'), embed=True)

display(vid)

### Possible Extraction Pathologies

<table style="width: 100%;">
  <tbody>
    <tr>
      <th></th>
      <th style="text-align:center;">Good Extraction Example</th>
      <th style="text-align:center;">Bad; Grainy Extraction Example</th>
      <th style="text-align:center;">Bad; Incorrect Mouse Orientation and Jitter Example</th>
    </tr>  
    <tr>
      <th style="text-align:center;">Pathology Description</th>
      <td style="text-align:center;"></td>
      <td style="text-align:left;">The resulting video is too grainy; wall boundaries are too noisy, extracted mouse boundaries are not well defined, etc.</td>
      <td style="text-align:left;">Extracted mouse orientation and centroid is not consistently correct. (Mouse is facing wrong direction and jittery)</td>
    </tr>
    <tr>
      <th style="text-align:center;">Reference Examples</th>
      <td><img src="https://drive.google.com/uc?export=view&id=1qg_twPau5g0hWpvnGUQzl_7_XdSSsL1c" width=350 height=350></td>
      <td>
          <img src="https://drive.google.com/uc?export=view&id=1gjwJpDqcsORwypXU-2UjigxcbrGl2RZ-" width=350 height=350>
        </td>
      <td>
          <img src="https://drive.google.com/uc?export=view&id=1VNmUSFN7-_JnUXmQF65wxwA19CUS3Wem" width=350 height=350>
        </td>
    </tr>
    <tr>
      <th style="text-align:center;">Filtering Solutions</th>
      <td style="text-align:center;"></td> <!-- G -->
      <td>
        <ul>
          <li style="text-align:left;">Increase the spatial kernel size by increments of 1 until the mouse is crisp. (Beware of oversmoothing.)
          </li>
          <li style="text-align:left;">Decreasing any applied temporal filtering is also recommended. However, note that there is a balance of temporal and spatial filtering to achieve a good extraction.</li>
        </ul>
      </td>
      <td>
          <ul>
              <li style="text-align:left;">Environmental noise can contribute to poor centroid analysis and flip detection. Ensure the arena is clear of static noise.</li>
              <li style="text-align:left;">Increasing temporal and spatial filtering kernel sizes helps keep the fitted ellipse consistent throughout the image processing, increasing orientation-flip precision.</li>
          </ul>
    </td>
    </tr>
    <tr> 
      <th style="text-align:center;">General Solutions</th>
      <td style="text-align:center;"></td> <!-- G -->
      <td style="text-align:left;">
          <ul>
              <li style="text-align:left;">Ensure that your inputted minimum and maximum mouse heights are similar to your recording conditions.</li>
              <li style="text-align:left;">If the walls are too defined, or the mouse heights are very extreme (lots of red), increase the max_height parameter.</li>
              <li style="text-align:left;">If there are small reflective areas on the floor of the arena, try increasing the min_height to filter those out. Be aware of losing the mouse in the extraction if the min_height is too high</li>
          </ul>
      </td>
      <td style="text-align:left;">
          <ul>
              <li style="text-align:left;">Ensure your flip classifier path is found in your config.yaml file in order for it to be applied.</li>
              <li style="text-align:left;">If the mouse is too dark/blurry, try reducing the range between the min and max heights in order to filter for the depths of interest; making the mouse brighter, with a more defined ellipse shape.</li>
          </ul>
      </td> <!-- R -->
    </tr>
  </tbody>
</table>

***

__Worst Case Scenario: A factor that can make a huge negative impact on the extraction is if the depth recording itself contains noise that may stem from your recording environment.__

<center>Depth Recording Artifacts Example</center> |
- |
<img src="https://drive.google.com/uc?export=view&id=1xhB_wWLYYHfXnRslLWW_ptQPnFsBDdnk" width=350 height=350> |

Below is a list of possible environment factors that may lead to a poor/unusable extraction:
- Bucket walls were not sanded down enough, or were not painted the correct non-reflective color, resulting in artifacts appearing in the depth video.
- The room where the recording took place is too bright, letting in too much light in the bucket adding large amounts of static noise.
- If this is the case with your recording, one option is to crop the video to a frame range that contains a good extraction, and using that going into your analysis step. Otherwise, collect more data in a less noisy environment.

***

### (Convenience Cell) Restore Progress Variables

In [None]:
import os
from moseq2_extract.gui import restore_progress_vars

base_dir = './' # User-defined absolute path

progress_filepath = os.path.join(base_dir, 'progress.yaml')

config_filepath, index_filepath, train_data_dir, pca_dirname, \
scores_filename, model_path, scores_file, \
crowd_dir, plot_path = restore_progress_vars(progress_filepath)

## Extract Session(s)

The cell will prompt you to choose whether you would like to extract individual sessions, or all of them. Enter your selection, and then wait for the extraction to complete to preview them.

In [None]:
from moseq2_extract.gui import extract_found_sessions

# depth files to recursively search for that have been partially extracted or not yet extracted 
ext = '.dat'

extract_found_sessions(base_dir, config_filepath, ext, extract_all=True, skip_extracted=True)

This is what your directory structure should look like once the process is complete:

```
.
├── config.yaml
├── session_1/
├   ...
├   └── proc/ **
├   ├   ├── roi.tiff
├   ├   ...
├   ├   ├── results_00.yaml+h5 ** (represents .h5 and .yaml files)
├   └   └── results_00.mp4 ** (extracted video)
└── session_2/
├   ...
├   └── proc/ **
├   ├   ├── roi.tiff
├   ├   ...
├   ├   ├── results_00.yaml+h5 **
└   └   └── results_00.mp4 **
        
```

### Aggregate your results into one folder and generate an index file.

The following cell will search through your base directory for the `proc/` folders in each session, and copy them all in a single directory. 

Then it will generate the `moseq2-index.yaml` file by searching for all the metadata found in the `results_00.h5`/`results_00.yaml` files, and consolidate all that information in one file, assigning each session to a `default` group.

The `aggregate_results/` folder contains all the data you need to run the rest of the pipeline. The PCA will only train on data included in that folder, and same for the model.

The `moseq2-index.yaml` file contains all the sessions+metadata that is included in `aggregate_results/`, it will also be heavily used in the visualization steps to plot different mouse and/or group statistics.

__Important Note: The index file contains UUIDs for each session which are newly generated during the extraction step. These UUIDs are referenced throughout the pipeline, so if you re-extract a session, ensure that you re-aggregate your data to ensure all the UUIDs are up-to-date BEFORE the PCA step.__ Not updating the index file could cause `KeyError`s to occur when referencing the extracted data with the model results.

In [None]:
from moseq2_extract.gui import aggregate_extract_results_command, update_progress
import os

recording_format = '{start_time}_{session_name}_{subject_name}' # filename formats for the copied extracted data files

# directory NAME to save all metadata+extracted videos to with above respective name format
aggregate_results_dirname = 'aggregate_results/'

train_data_dir = os.path.join(base_dir, aggregate_results_dirname)
update_progress(progress_filepath, 'train_data_dir', train_data_dir)
index_filepath = aggregate_extract_results_command(base_dir, recording_format, aggregate_results_dirname)
update_progress(progress_filepath, 'index_file', index_filepath)

The aggregate results folder will be saved in your base directory,
resulting in the following directory (sample) structure where the base directory contains the notebook:

```
.
├── aggregate_results/ **
├   ├── session_1_results_00.h5+yaml ** # session 1 metadata 
├   ├── session_1_results_00.mp4 ** # session 1 extracted video
├   ├── session_2_results_00.h5+yaml ** # session 2 metadata
├   └── session_2_results_00.mp4 ** # session 2 extracted video
├── config.yaml
├── moseq2-index.yaml ** # index file
├── session_1/
└── session_2/
```

__Notice your index file has also been generated in your base directory.__

View your extracted videos below:

In [None]:
import os
from glob import glob
from IPython.display import display, Video
from moseq2_extract.gui import view_extraction

extractions = glob(os.path.join(base_dir, aggregate_results_dirname, '*.mp4'))
selected_vids = view_extraction(extractions)
vids = [Video(vid, embed=True) for vid in selected_vids]
for vid, ext in zip(vids, selected_vids):
    print(ext.split('/')[-1])
    display(vid)

## Compute Scalar Summary and Tracking Plots

Use the following command to compute some scalar summary information about your modeled groups, such as average velocity, height, etc.
This command will also generate a tracking summary plot; depicting the path traveled by the mouse in your recordings.

This graph is meant to give you an idea of whether your extractions were consistent throughout the sessions. If you have a large standard deviation in mouse length/width when your mice are all the same size in reality, then there may have been an error in the extraction or acquisition. You can use the `scalar_df` to find which extractions specifically went wrong.

Once you have identified the corrupted extraction, you can either 
 1. Try diagnosing the session by individually testing different ROI/Extraction parameters in the test cells, and then re-extracting the individual session completely.
 2. Or you can exclude the session altogether by removing the entry from the index file __and__ removing the corresponding generated h5,yaml and avi files from the `aggregate_results/` directory.

In [None]:
from moseq2_viz.gui import plot_scalar_summary_command
from IPython.display import display, Image
from glob import glob

plot_path = os.path.join(base_dir, 'plots/')
output_file = os.path.join(plot_path, 'scalars') # prefix name of the saved scalar position and summary graphs
scalar_df = plot_scalar_summary_command(index_filepath, output_file)

# Graph the output
images = glob(os.path.join(plot_path, 'scalars_*.png'))
ims = [Image(im) for im in images]
for im in ims:
    display(im)

In case of some outlier sessions, you can check the per session values by running the following cell: 

In [None]:
scalar_df.groupby('SessionName').mean()

## Plot Position Heatmaps For Each Session
Each heatmap will be titled with the session's subject name and group.

In [None]:
from moseq2_viz.gui import plot_verbose_position_heatmaps
output_file = os.path.join(plot_path, 'session_heatmaps') 
plot_verbose_position_heatmaps(index_filepath, output_file)

# Graph the output
images = glob(os.path.join(plot_path, 'session_heatmaps*.png'))
ims = [Image(im) for im in images]
for im in ims:
    display(im)

## Plot Group Mean Position Summary

If you already assigned groups to you index file, you can plot the average position summary for all mice in each group. You can assign groups at the beginning of the ARHMM Modeling section below.

These plots will give you a good idea of the general hyperactivity and amount of area exploration in each of your experimental groups.

In [None]:
from moseq2_viz.gui import plot_mean_group_position_heatmaps_command
output_file = os.path.join(plot_path, 'group_heatmaps') 
plot_mean_group_position_heatmaps_command(index_filepath, output_file)

# Graph the output
images = glob(os.path.join(plot_path, 'group_heatmaps*.png'))
ims = [Image(im) for im in images]
for im in ims:
    display(im)

***
<center><h1>Principal Component Analysis (PCA)</h1></center>

***

Once the data has been extracted, compute Principal Components (PCs) of the data to perform dimensionality reduction on the data going into the modeling step.
<img src="https://drive.google.com/uc?export=view&id=1I1WcfEwzpfwIxNYStX7swLAIvjQEVApy">

### (Convenience Cell) Restore Progress Variables

In [None]:
import os
from moseq2_extract.gui import restore_progress_vars

base_dir = './' # User-defined absolute path

progress_filepath = os.path.join(base_dir, 'progress.yaml')

config_filepath, index_filepath, train_data_dir, pca_dirname, \
scores_filename, model_path, scores_file, \
crowd_dir, plot_path = restore_progress_vars(progress_filepath)

## Training PCA

Train a PCA model on your extracted data to acquire the Principal Components that explain the largest possible variance in your dataset. If the resulting Principal Components look smooth with well defined regions, and your Scree plot shows an explained variance of 90% or above in less than 10 PCs then the PCA model is properly trained. Otherwise, consult the pathologies below to solve any issues.

`distributed.worker - memory WARNING`? setting `config_data['nworkers'] = 1` can help with workers having insufficient amounts of memory allocated to them when running locally.

In [None]:
from moseq2_pca.gui import train_pca_command
from moseq2_extract.gui import update_progress
import ruamel.yaml as yaml
import os 

pca_filename = 'pca' # Name of your PCA model h5 file to be saved
pca_dirname = '_pca/' # Directory to save your computed PCA results

with open(config_filepath, 'r') as f:
    config_data = yaml.safe_load(f)
f.close()

# PCA parameters you may need to configure
config_data['gaussfilter_space'] = (1.5, 1) # Spatial filter for data (Gaussian)
config_data['medfilter_space'] = [0] # Median spatial filter
config_data['medfilter_time'] = [0] # Median temporal filter
config_data['missing_data'] = False # Set True for dataset with missing/dropped frames to reconstruct respective PCs.
config_data['missing_data_iters'] = 10 # Number of times to iterate over missing data during PCA
config_data['recon_pcs'] = 10 # Number of PCs to use for missing data reconstruction

with open(config_filepath, 'w') as f:
    yaml.safe_dump(config_data, f)
f.close()

update_progress(progress_filepath, 'pca_dirname', pca_dirname)

# will train on data in aggregate_results/
train_pca_command(train_data_dir, config_filepath, pca_dirname, pca_filename)

Once complete, a new directory titled `_pca` will be created containing all your PCA data.
```
.
├── _pca/ **
├   ├── pca.h5 ** # pca model compressed file
├   ├── pca.yaml  ** # pca model YAML metadata file
├   ├── pca_components.png **
├   └── pca_scree.png **
├── aggregate_results/
├── config.yaml
├── moseq2-index.yaml
├── session_1/
└── session_2/

```

View your `computed PCs` and `scree plot` in the next cell.

In [None]:
from IPython.display import display, Image
images = [os.path.join(base_dir, pca_dirname, 'pca_components.png'), 
          os.path.join(base_dir, pca_dirname, 'pca_scree.png')]
for im in images:
    display(Image(im))

### Possible PCA Pathologies

<table style="width: 100%;">
  <tbody>
    <tr>
      <th></th>
      <th>Good PCA Output Examples</th>
      <th style="text-align:center;">Bad Scree Plot Example</th>
      <th style="text-align:center;">Bad Principal Components Example</th>
    </tr>  
    <tr>
      <th style="text-align:center;">Pathology Description</th>
      <th style="text-align:center;"></th>
      <td style="text-align:center;">Cannot achieve a explained variance of over 90% from less than 15 Principal Components (PCs).</td>
      <td style="text-align:center;">Graphed PCs look overprocessed, or are not representative of realistic mouse body regions.</td>
    </tr>
    <tr>
      <th style="text-align:center;">Reference Examples</th>
      <th style="text-align:center;">
        <ul>
            <li>Components<br>
                <img src="https://drive.google.com/uc?export=view&id=1dX5Gpd3PKL4vfVviLeP0CqBrz9PW37Au" width=350 height=350></li><br><br>
            <li>Scree Plot<br>
                <img src="https://drive.google.com/uc?export=view&id=12uqsBYuWCjpUQ6QrAjo35MnwYDzHqnge" width=350 height=350>
            <br>"90.65% in 7 PCs"</li>
        </ul>
      </th>
      <td><img src="https://drive.google.com/uc?export=view&id=14OwThgsf2GXnrl3-9TXEMvF3PDxmRsHE" width=350 height=350></td>
      <td><img src="https://drive.google.com/uc?export=view&id=1d35zKWiT7bkWbNNAon_JdSjKyVgcHHzi" width=350 height=350></td>
    </tr>
    <tr>
      <th style="text-align:center;">Image Analysis Solutions</th>
      <th style="text-align:center;"></th>
      <td>
        <ul>
          <li style="text-align:left;">Check if the crop size is too large, if so, decrease it and re-extract your data.</li>
          <li style="text-align:left;">Try (incrementally) adjusting the spatial and temporal filtering kernel sizes in the PCA step. Generally, increasing temporal smoothing will aid in increasing explained variance, however overfiltering will hinder ARHMM reliability.</li>
        </ul>
      </td>
      <td>
          <ul>
              <li style="text-align:left;">Ensure that an appropriate amount of spatial and temporal filtering is applied.</li>
              <li style="text-align:left;">If there are  missing frames, apply and appropriate amount of temporal filtering, and a proper amount of PCs are being reconstructed (recon_pcs is set to the appropriate amount of PCs).</li>
          </ul>
    </td>
    </tr>
    <tr>
      <th style="text-align:center;">General Solutions</th>
      <th style="text-align:center;"></th>
      <td style="text-align:center;">Increase the size of your dataset. If your dataset is too small, it may contribute to overprocessing PCs as well.</td> <!-- G -->
      <td style="text-align:center;">Acquire and extract more data, then try with more data.</td>
    </tr>
  </tbody>
</table>

## Computing Principal Component Scores

Apply your trained PCA model using your computed principal components to compute your PC Scores.

In [None]:
from moseq2_pca.gui import apply_pca_command
from moseq2_extract.gui import update_progress

scores_filename = 'pca_scores' # name of the scores file to compute and save

update_progress(progress_filepath, 'scores_filename', scores_filename)
apply_pca_command(train_data_dir, index_filepath, config_filepath, pca_dirname, scores_filename)

Once complete, you will have a pca_scores file saved in your pca directory. (Example shown below)
```
.
├── _pca/
├   ├── pca.h5
├   ├── pca.yaml
├   ├── pca_scores.h5  ** # scores file
├   ├── pca_components.png
├   └── pca_scree.png
├── aggregate_results/
├── config.yaml
├── moseq2-index.yaml
├── session_1/
└── session_2/

```

## (Optional) Computing Model-Free Syllable Changepoints

This is an optional step used to aid in determining model-free syllable lengths; which are general approximations of the duration of respective body language syllables. Computing Model-Free Changepoints can be useful for determining the prior variable for syllable duration, denoted as `kappa`, in the ARHMM modeling step.

A good Changepoint graph should show a smooth left-skewed histogram distribution of changepoint durations, with a CPE curve accurately fit to the histogram. If that is not the case, consult the below pathologies.

__Note: the parameters below have been preconfigured to best process C57 mouse data, and have not been tested for other species. Configure them at your own risk.__

In [None]:
from moseq2_pca.gui import compute_changepoints_command
import ruamel.yaml as yaml

with open(config_filepath, 'r') as f:
    config_data = yaml.safe_load(f)
f.close()

changepoints_filename = 'changepoints' # name of the changepoints images to generate

# Changepoint computation parameters you may want to configure
config_data['threshold'] = 0.5 # Peak threshold to use for changepoints
config_data['dims'] = 300 # Number of random projections to compare the computed principal components with

with open(config_filepath, 'w') as f:
    yaml.safe_dump(config_data, f)
f.close()

compute_changepoints_command(base_dir, config_filepath, pca_dirname, changepoints_filename)

The changepoints plot will be generated and saved in the pca directory (example below).

```
.
├── _pca/ 
├   ├── pca.h5
├   ├── pca_scores.h5
├   ...
├   └── changepoints_dist.png **
├── aggregate_results/ 
├── config.yaml
├── moseq2-index.yaml
├── session_1/
└── session_2/
```

View your changepoints distance plot:

In [None]:
from IPython.display import display, Image

display(Image(os.path.join(base_dir, pca_dirname, changepoints_filename+'_dist.png')))

### Possible Model-Free Changepoints Pathologies

<table style="width: 100%;">
  <tbody>
    <tr>
      <th></th>
      <th style="text-align:center;">Good Changepoint Analysis Example</th>
      <th style="text-align:center;">Poor Changepoints Analysis Example</th>
    </tr>  
    <tr>
      <th style="text-align:center;">Pathology Description</th>
      <td style="text-align:center;"></td>
      <td style="text-align:center;">Model-free syllable changepoint distances distribution is incorrectly skewed/too sparse.</td>
    </tr>
    <tr>
      <th style="text-align:center;">Reference Example</th>
      <td><img src="https://drive.google.com/uc?export=view&id=1sMkSB34bGbOimumN6Gg1-zV2Hk98v2Zy" width=350 height=350></td>
      <td><img src="https://drive.google.com/uc?export=view&id=1S-ALkPmb8sBZGkKmJ7Q3-RdxAbfS0PWV" width=350 height=350></td>
    </tr>
    <tr>
      <th style="text-align:center;">General Solutions</th>
      <td style="text-align:center;"></td>
      <td>
          <ul>
              <li style="text-align:left;">Try retraining the PCA with adjusted spatial and temporal filtering kernel sizes.</li>
              <li style="text-align:left;">Ensure your extracted data is correct. If the extraction version of the mouse is too noisy, then the PC trajectories cannot be accurately applied to the data.</li>
              <li style="text-align:left;">Get more data and try again.</li>
          </ul>
      </td>
    </tr>
  </tbody>
</table>

***
<center><h1>ARHMM Modeling</h1></center>

***

In order to train your ARHMM (Auto-Regressive Hidden Markov Model), you will use your computed PC scores as your input data, and specify whether you are modeling a single experimental group for observational research, or modeling multiple different groups (e.g. control vs. experimental groups) for comparative analysis.

<img src="https://drive.google.com/uc?export=view&id=1V2n5Jg61Pr86m0groyTX_qJ40bSm5LG7">

## (Optional) Specify Groups

### What are groups?

MoSeq using groups in the `moseq2-index.yaml` file to indicate whether your collected sessions are representing a single experimental group, or many different groups that you would like to compare while modeling and visualizing.

The index file requires that all your sessions have a metadata.json file in order to successfully assign each recorded subject or session to a group.

There are 3 ways you are able to specify your groups:
1. Specify group by SessionName
2. Specify group by SubjectName
3. Manually edit index file

Once a cell is run, it will display your current indexing structure.

### Check Your Index File
#### View Indexed Sessions
Use this cell to view your sessions' information regarding their SessionNames, SubjectNames, and Groups.

In [None]:
from moseq2_viz.gui import get_groups_command

index_filepath = os.path.join(base_dir, 'moseq2-index.yaml')

get_groups_command(index_filepath)

#### 1 - Specify Group by Session Name

In [None]:
from moseq2_viz.gui import add_group_by_session

value = 'drug_group' # value of the corresponding key
group = 'group2' # designated group name
exact = True # Must be exact key-value match
lowercase = False # change to lowercase
negative = False # set opposite selection than key-value pair given

add_group_by_session(index_filepath, value, group, exact, lowercase, negative)

#### 2 - Specify Group by Subject Name

In [None]:
from moseq2_viz.gui import add_group_by_subject

value = 'saline_mouse' # value of the corresponding key
group = 'group1' # designated group name
exact = True # Must be exact key-value match
lowercase = False # change to lowercase
negative = False # select opposite selection than key-value pair given

add_group_by_subject(index_filepath, value, group, exact, lowercase, negative)

#### 3 - Manually Edit Index File

Simply navigate to your `moseq2-index.yaml` file in your designated directory, and editing the group key-value pair to your specified name values.

***

### (Convenience Cell) Restore Progress Variables

In [None]:
from moseq2_extract.gui import restore_progress_vars
import os

base_dir = './' # User-defined absolute path
progress_filepath = os.path.join(base_dir, 'progress.yaml')

config_filepath, index_filepath, train_data_dir, pca_dirname, \
scores_filename, model_path, scores_file, \
crowd_dir, plot_path = restore_progress_vars(progress_filepath)

## Train ARHMM

In [None]:
from moseq2_model.gui import learn_model_command
from moseq2_extract.gui import update_progress
import ruamel.yaml as yaml
import os

pca_dir = os.path.join(base_dir, pca_dirname)
scores_file = os.path.join(pca_dir, scores_filename+'.h5') # path to input PC scores file to model
model_path = os.path.join(base_dir, 'model.p') # path to save trained model

# Advanced modeling parameters
hold_out = False # boolean to hold out data subset during the training process
hold_out_seed = 42 # integer to standardize the held out folds during training
nfolds = 2 # (if hold_out==True): number of folds to hold out during training; 1 fold per session
npcs = 10  # number of PCs being used

num_iter = 100 # number of iterations to train model
max_states = 100 # number of maximum states the ARHMM can end up with
kappa = None # syllable length probability distribution prior; if None, kappa=nframes

# use robust-ARHMM with t-distribution -> yields less states/syllables if True, 
# used to constrict accepted behavioral variability
robust = True 

separate_trans = True # separate group transition graphs; set to True if ngroups > 1

# model saving freqency (in interations); will create a checkpoints/ directory containing checkpointed models
checkpoint_freq = -1
use_checkpoint = False # resume training from latest saved checkpoint

select_groups = False # select specific groups to model; if False, will model all data as is in moseq2-index.yaml

update_progress(progress_filepath, 'scores_path', scores_file)
update_progress(progress_filepath, 'model_path', model_path)
learn_model_command(scores_file, model_path, config_filepath, index_filepath, hold_out, nfolds,
                    num_iter, max_states, npcs, kappa, separate_trans, robust, 
                    checkpoint_freq, use_checkpoint=use_checkpoint, select_groups=select_groups)

Once training is complete, your model will be saved in your base directory (shown below). 
```
.
├── _pca/ 
├── aggregate_results/ 
├── config.yaml
├── model.p **
├── moseq2-index.yaml/
├── session_1/
└── session_2/
```

Now use the moseq2-viz module to produce crowd videos and a number of statistical analysis plots.

***
<center><h1>Visualize Analysis Results</h1></center>

***

Now that you have a trained ARHMM, you can use it generate informative graphs and videos regarding the behavior syllables found, their usage frequency, and transition probabilities.

The graph below shows the 5 operations that the MoSeq2-Viz module currently affords. They can also be computed in any order at this point in the notebook.

<img src="https://drive.google.com/uc?export=view&id=1GQ7G26YCNaOex7Q-5nruSg_-NuHUmTMH">

## Cross-Model Result Comparisons

MoSeq now supports visualizing results from multiple different trained models given that those models all were trained using the same PC Scores.

Simply replace the path of the variable `model_path` with the path of a directory containing all the models you would like to compare.

### (Convenience Cell) Restore Notebook Variables

In [None]:
from moseq2_extract.gui import restore_progress_vars
import os

base_dir = './' # User-defined absolute path
progress_filepath = os.path.join(base_dir, 'progress.yaml')

config_filepath, index_filepath, train_data_dir, pca_dirname, \
scores_filename, model_path, scores_file, \
crowd_dir, plot_path = restore_progress_vars(progress_filepath)

## Make Crowd Videos

This tool allows you to create videos containing many overlayed clips of the mouse performing the same specified syllable at the moment a red dot appears on their body. The videos are sorted by most frequently expressed syllable to least.

Consult the respective pathologies below for result quality comparisons.

In [None]:
import os
from moseq2_viz.gui import make_crowd_movies_command
from moseq2_extract.gui import update_progress

crowd_dir = os.path.join(base_dir, 'crowd_movies/') # output directory to save all movies in
max_syllables, max_examples = 40, 40 # maximum number of syllables, and examples of each syllable in a video respectively

## This command will run a subprocess, therefore you can view the progress in the CLI terminal running the notebook
update_progress(progress_filepath, 'crowd_dir', crowd_dir)
make_crowd_movies_command(index_filepath, model_path, crowd_dir, max_syllables, max_examples)

Once completed, your videos will be in your inputted `crowd_dir` path with a metadata file describing the model information. Shown below:

```
.
├── _pca/ 
├── aggregate_results/ 
├── config.yaml
├── crowd_movies/ **
├   ├── info.yaml **
├   ├── syllable_sorted_44 (usage).mp4 **
├   ...
├   └── syllable_sorted_12 (usage).mp4 **
├── model.p 
├── moseq2-index.yaml
├── session_1/
└── session_2/
```

View your generated crowd movies below:

In [None]:
from IPython.display import display, Video
from glob import glob

videos = sorted(glob(os.path.join(crowd_dir, '*.mp4')))
vids = [Video(vid, embed=True) for vid in videos]
for vid, vp in zip(vids, videos):
    print(vp.split('/')[-1])
    display(vid)

### Possible Crowd Movie Pathologies

<table style="width: 100%;">
  <tbody>
    <tr>
      <th></th>
      <th style="text-align:center;">Good Crowd Movie Examples</th>
      <th style="text-align:center;">Bad; Overfitted ARHMM Crowd Movie Examples</th>
      <th style="text-align:center;">Bad; Underfitted ARHMM Crowd Movie Examples</th>
    </tr>  
    <tr>
      <th style="text-align:center;">Pathology Description</th>
      <td style="text-align:center;"></td>
      <td style="text-align:center;">Generated crowd movies look all too similar.</td>
      <td style="text-align:center;">Generated crowd movies do not contain mice unanimously exhibiting the same syllable, or have dramatically varying time-scales.</td>
    </tr>
    <tr>
      <th style="text-align:center;">Reference Examples</th>
      <td>
          <img src="https://drive.google.com/uc?export=view&id=1wkK2exvUKYWT_T-S_PkHUvb4z8Tgd6hm" width=350 height=350><br><br>
          <img src="https://drive.google.com/uc?export=view&id=1VNbK_ImeaZJ1t9GwmgyPcL8bk42B7dkJ" width=350 height=350><br><br>
          <img src="https://drive.google.com/uc?export=view&id=1fYZGPgd0belwkq8hyMaQF_jmHZE_e8L5" width=350 height=350>
      </td>
      <td>
          <img src="https://drive.google.com/uc?export=view&id=1qyS5YrRnfOWgB8o3fZok7Mc8u_9qNXUs" width=350 height=350><br><br>
           <img src="https://drive.google.com/uc?export=view&id=1E2-nPKqQ1M9rWEaZpsYEyfkwnvyv8Nfv" width=350 height=350><br><br>
           <img src="https://drive.google.com/uc?export=view&id=18hZnacz1lrZAdV9zuEErUEjPuvTt7fsz" width=350 height=350>
      </td>
      <td>
          <img src="https://drive.google.com/uc?export=view&id=1OIl-crIai6DyQwh3-dKFEKXkZs6BXCda" width=350 height=350><br><br>
          <img src="https://drive.google.com/uc?export=view&id=1Bnpwip-2_qWZEKsZ7aCYzF3JhrGOfqbe" width=350 height=350><br><br>
              <img src="https://drive.google.com/uc?export=view&id=14luLIi_lgN1chiQxj8OIXBG3RCV1BC4X" width=350 height=350>
      </td>
    </tr>
    <tr>
      <th style="text-align:center;">ARHMM Training Solutions</th>
      <td style="text-align:center;"></td> <!-- G -->
      <td>
        <ul>
          <li style="text-align:left;">Ensure your PCs cover over 90% the data's explained variance, and that they are all included in the ARHMM training data.</li>
          <li style="text-align:left;">Try increasing the number of max_states that the ARHMM has to increase the problem complexity, increasing variance in labels.</li>
          <li style="text-align:left;">Try increasing the number of PCs being used in the ARHMM training.</li>
        </ul>
      </td>
      <td>
          <ul>
              <li style="text-align:left;">Increase the number of model training iterations in order ensure that the syllable likelihoods are converging.</li>
              <li style="text-align:left;">Try decreasing the number of max_states, allowing the model to focus on less syllables, increasing their respective likelihoods. 
              <li style="text-align:left;">Try decreasing the amount of additive noise (if added), or add spatial filtering to your PC or frame data.</li>
          </ul>
    </td>
    </tr>
    <tr> 
      <th style="text-align:center;">General Solutions</th>
      <td style="text-align:center;"></td> <!-- G -->
      <td style="text-align:center;">
          <ul>
              <li style="text-align:left;">If the mouse looks very smooth, then retrain your PCA and ARHMM with less spatial filtering.</li>
              <li style="text-align:left;">Acquire more data and try again.</li>
          </ul>
      </td>
      <td style="text-align:left;">
          <ul>
              <li style="text-align:left;">If the time scales are varying vastly, or the mouse movements are blurry, then retrain your PCA and ARHMM with less temporal filtering.</li>
              <li style="text-align:left;">Acquire more data and try again.</li>
          </ul>
      </td> <!-- R -->
    </tr>
  </tbody>
</table>

## Compute Usage Plots

Use this command to compute the model-detected syllables usages sorted in descending order of usage. 

__For plotting multiple groups: the model must have been trained on >1 group with `separate_trans=True`, and the group names must be included in the `groups` tuple.__

In [None]:
from moseq2_viz.gui import plot_usages_command
from moseq2_extract.gui import update_progress

plot_path = os.path.join(base_dir, 'plots/')
output_file = os.path.join(plot_path, 'usages')

max_syllable = 40 # max syllables to plot
group = None # None to plot all groups, or list()/tuple() of group names to specifically plot

ordering = None # None for default ordering (most to least used syllables), set = "m" to plot mutation sorting

# If using mutation sorting (ordering='m'), set these parameters to the group names you would like to compare directly
ctrl_group = None
exp_group = None

update_progress(progress_filepath, 'plot_path', plot_path)
fig = plot_usages_command(model_path, index_filepath, output_file, max_syllable=max_syllable, group=group,
                   ordering=ordering, ctrl_group=ctrl_group, exp_group=exp_group, colors=None, fmt='o-')

## Compute Syllable Speeds

Computes the mean centroid speed of the rodent at all the frames each syllable label was found.

In [None]:
from moseq2_viz.gui import plot_mean_syllable_speeds_command
from moseq2_extract.gui import update_progress

output_file = os.path.join(plot_path, 'speeds')

max_syllable = 40 # max syllables to plot
group = None # None to plot all groups, or list()/tuple() of group names to specifically plot

# None for default ordering (most to least used syllables)
# set ordering = "m" to plot mutation sorting; or = 'speeds' to sort syllables from fastest to slowest
ordering = None 

# If using mutation sorting (ordering='m'), set these parameters to the group names you would like to compare directly
ctrl_group = None
exp_group = None

update_progress(progress_filepath, 'plot_path', plot_path)
fig = plot_mean_syllable_speeds_command(model_path, index_filepath, output_file, max_syllable=max_syllable, group=group,
                   ordering=ordering, ctrl_group=ctrl_group, exp_group=exp_group, colors=None, fmt='o-')

## Compute Syllable Transition Graph

Use the following command to generate a syllable transition graph. The graph will be comprised of nodes labelled by syllable, and edges depicting a probable transition, with edge thickness depicting the weight of the transition edge.

For multiple groups, there will be a transition graph for each group, as well as a difference-graph with different colors to identify the groups.

In [None]:
from moseq2_viz.gui import plot_transition_graph_command
import os

max_syllables = 40 # Maximum number of nodes in the transition graph
groups = () # Group(s) to graph, default graph if empty tuple
output_filename = os.path.join(plot_path, 'transition') # name of the png file to be saved

# If graph does not automatically appear, run cell again.
plot_transition_graph_command(index_filepath, model_path, config_filepath, max_syllables, groups, output_filename)

### Possible Syllable Transition Graph Pathologies

<table style="width:100%;">
  <tbody>
    <tr>
      <th></th>
      <th style="text-align:center;">Good Transition Graph Examples</th>
      <th style="text-align:center;">Bad (Underfitted) Transition Graph Example</th>
      <th style="text-align:center;">Bad (Overfitted) Transition Graph Example</th>
    </tr>  
    <tr>
      <th style="text-align:center;">Pathology Description</th>
      <td style="text-align:center;"></td>
      <td style="text-align:left;">Too little syllables are being generated in the syllable transition graph.</td>
      <td style="text-align:left;">Too many syllables being generated in the syllable transition graph, all with skewed weights.</td>
    </tr>
    <tr>
      <th style="text-align:center;">Reference</th>
      <th align="center">
          <ul>
              <li style="text-align:center;">Standard (default) Transition Plot</li><br>
              <img src="https://drive.google.com/uc?export=view&id=1QGEePHLEXzGIBsM2pyvvqPrSbLlSX-ES" width=350 height=350>
              <br><br>
              <li style="text-align:center;">Robust+Separate_Trans ARHMM-based Transition Plot</li><br>
              <img src="https://drive.google.com/uc?export=view&id=1j-ub8CfbHY5MKksL-PiwLhBLz-q2MiTQ" width=350 height=350>
              <br><li style="text-align:left;">Red edges = upregulated transitions; Blue edges = downregulated transitions.</li><br>
          </ul>
      </th>
      <td align="center"><img src="https://drive.google.com/uc?export=view&id=1sUtfppezEgRPeF_idcjusXf7Is1I8ncV" width=350 height=350></td>
      <td align="center"><img src="https://drive.google.com/uc?export=view&id=1Ah4XhZAaPo22d6dAXin8v_HlY1hJdqDY" width=350 height=350></td>
    </tr>
    <tr> 
      <th style="text-align:center;">General Solutions</th>
      <td style="text-align:center;"></td> <!-- G -->
      <td>
          <ul>
              <li style="text-align:left;">Use a Robust AR-HMM to normalize the syllable probability distributions to a t-distribution, this punishes syllables with low probabilities.</li>
              <li style="text-align:left;">Ensure an appropriate amount of temporal filtering is applied to your extracted videos. If too much smoothing is applied then the transitions will become less apparent, skewing model predictions.</li>
              <li style="text-align:left;">Try increasing the max_states variable to add more complexity to the model training problem.</li>
          </ul>
        </td>
      <td>
          <ul>
              <li style="text-align:left;">Use a Robust ARHMM to normalize the syllable transition edge-weights. It also removes unrealistic syllables.</li>
              <li style="text-align:left;">Increase the number of training iterations and decrease the max_states value, allowing the model to focus on less syllables.</li>
              <li style="text-align:left;">Increasing nfolds and holding out data during training (1 fold per recording in the dataset). This will ensure that the model will not overfit by estimating validation log-likelihoods to compare to during training.</li>
              <ul><li style="text-align:left;">It is recommended to hold out 1 fold per recording such that the validation set is properly representative of your training data.</li></ul>
          </ul>
        </td> <!-- R -->
    </tr>
  </tbody>
</table>

***
<center><h1>Notebook End</h1></center>

***