# Welcome to MoSeq2-Notebook

### Run all of the MoSeq2 tools in a containerized notebook.

***
<center><h1>MoSeq2 Introduction</h1></center>
<img src="https://raw.githubusercontent.com/dattalab/moseq2-app/jupyter/media/Data_Pipeline.png?token=ACRN4H6FD7TUR7AI5K3GGAC5T6EDC">
***

MoSeq2 software toolkit for unsupervised characterization of animal behavior. Moseq takes depth recordings of single behaving animals as input, and outputs a rich labeling of postural dynamics in terms of reused motifs or 'syllables'. This notebook begins with compressed depth recordings (see 'Data Acquisiting Overview' below) and transforms this data through the steps of:

- **Extraction**: The animal is segmented from the background and its position and heading direction are aligned acros frames
- **Compute PCA**: Raw video is de-noised and transformed to low-dimensional pose trajectories using principal component analysis (PCA)
- **Train ARHMM**: Pose trajectories are modeled using an autoregressive hidden Markov model (ARHMM), producing a sequence of syllable labels
- **Analysis**: Model output is reported through visualization and stiatical analysis


### Resources
Below are a list of publications and links to the individual github tool wikis for your convenience.
- Publications
    - [Mapping Sub-Second Structure in Mouse Behavior](http://datta.hms.harvard.edu/wp-content/uploads/2018/01/pub_23.pdf)
    - [Composing Probabilistic Graphical Models and Variational Autoencoders](http://datta.hms.harvard.edu/wp-content/uploads/2018/01/pub_24.pdf)
    - [Q&A: Understanding the composition of behavior](http://datta.hms.harvard.edu/wp-content/uploads/2019/06/Datta-QA.pdf)
- Wikis
    - [Extract](https://github.com/dattalab/moseq2-extract/wiki)
    - [PCA](https://github.com/dattalab/moseq2-pca/wiki)
    - [Model](https://github.com/dattalab/moseq2-model/wiki)
    - [Viz](https://github.com/dattalab/moseq2-viz/wiki)
    - [Batch](https://github.com/dattalab/moseq2-batch/wiki)

## Data Acquisition Overview

Moseq2 takes animal depth recordings as input. We we have developed a [data acquisition pipeline](https://github.com/dattalab/moseq2-docs/wiki/Setup:-acquisition-software) for the Xbox Kinect depth camera. We suggest following our [data acquisiting tutorial](https://github.com/dattalab/moseq2-docs/wiki/Acquisition) for doing recordings. 

### Data file organization

Moseq2 requires input data to have this following directory structure, where a single master directory contains one sub-directory for each recording session, and each of the sub-directories has depth data, metadata and optional timestamp data:

```
├── session_1/
├   ├── depth.dat        # depth data
├   ├── depth_ts.txt     # timestamps
├   └── metadata.json    # metadata
└── session_2/   
├   ├── depth.dat
├   ├── depth_ts.txt
└── └── metadata.json
```

***
<center><h1>Software Setup</h1></center>
***

Install the requirements for Moseq2 using the [setup notebook](http://localhost:8888/notebooks/MoSeq2_Step_0.ipynb) (if you have not done so already). Then copy the present notebook into the same directory as your depth recordings before proceeding with the following setup steps
<img src="https://raw.githubusercontent.com/dattalab/moseq2-app/jupyter/media/Setup_Pipeline.png?token=ACRN4HZAR3Z7OMAYVFRZH725T6EGA">

### Ensure your session folders are found:

In [None]:
from moseq2_extract.gui import get_found_sessions

found_sessions = get_found_sessions()

print('number of found sessions to analyze:', found_sessions)

### Ensure you are running the python version located in your corresponding conda environment.

For example, if your anaconda environment is called moseq2, then your output would look like: ```/Users/username/anaconda3/envs/moseq2/bin/python```

In [None]:
%%bash
which python

### Generate Configuration Files

In [None]:
import os
from moseq2_extract.gui import generate_config_command
from moseq2_viz.gui import generate_index_command

base_dir = './' # "./" == directory where this notebook is located
config_filepath = base_dir+'config.yaml'

generate_config_command(config_filepath)

A configuration file has been created in the same directory as your Notebook and session directories. The directory should now have the following contents

```
.
├── MoSeq2-Notebook.ipynb
├── config.yaml **
├── session_1/
└── session_2/   
```

### Download a Flip File

In order to ensure your extraction is smooth and invariant to the mouse's orientation, we recommend using a flip-classifier to aid keeping the mouse oriented throughout the extraction.

Three pre-trained flip classifiers are available
* [0] - Large mice with fibers.
* [1] - Adult male c57s.
* [2] - Mice with Inscopix cables.

Enter the index corresponding to your preferred pretrained classifier below and run the cell.

In [None]:
from moseq2_extract.gui import download_flip_command

selected_index = 1 # flip file index
download_flip_command(base_dir, config_filepath, selected_index)

### (Optional Advanced Setup)

Configure the following options to match your current hardware working environment. This is particularly relevant for running the pipeline on a compute cluster

#### Configurable Parameter Descriptions
- __CLUSTER_TYPE__: Indicates whether you are running on your local computer, or using a slurm scheduler.
    Options: {'local', 'slurm'}
- __CORES__: Number of cores to split the processes among.
- __MEMORY__: Amount of memory to allocate to your operations.
- __PROCESSES__: Number of processes to spawn that each will have a set of worker threads. (Choose 1 unless you are using a scheduler)
- __NWORKERS__: Number of worker threads that are executing your operations. (Choose 1 unless you are using a scheduler)

In [None]:
import ruamel.yaml as yaml

with open(config_filepath, 'r') as f:
    config_data = yaml.safe_load(f)
f.close()

config_data['cluster_type'] = 'local'
config_data['cores'] = 4 # recommended n-1; where n = total number of cores/cpus available
config_data['memory'] =  "15GB" # recommended n-2GB; where n = total GB of RAM
config_data['nworkers'] =  1 # recommended 1 per core (for local cluster)
config_data['processes'] =  1 # recommended 1 per core (for local cluster)


with open(config_filepath, 'w') as f:
    yaml.dump(config_data, f, Dumper=yaml.RoundTripDumper)
f.close()

***
<center><h1>Raw Data Extraction</h1></center>

***

You will use the MoSeq2-Extract module in order to convert your raw data files to human-readable/viewable formats such as mp4 videos, and YAML/HDF5 metadata files. These metadata files are used to then train your PCA model, while the mp4 file is primarily used to ensure that the session was extracted correctly with no defects or unwanted artifacts.

In the extraction step, begin by testing your detected ROIs with the default parameters. If all goes well, continue into the to the test extraction step.

The first two steps are meant to debug possible extraction errors you may encounter before performing an extraction on your full dataset.

<img src="https://raw.githubusercontent.com/dattalab/moseq2-app/jupyter/media/Extraction_Pipeline.png?token=ACRN4HYVSFGOOCOO4UCXWAS5T6EHM">

Once testing is done, you can then proceed to extract all the session files found by your notebook.

## Pre-Extraction Data Quality Testing

Before performing a full extraction on your recordings, follow the following steps to ensure your Regions of Interest (ROIs) are properly found. This will bring more clarity as to what to expect after a complete extraction of your data. 

## ROI Test

This test ensures that your whole background area is properly captured without any artifacts that may interfere with the mouse video extraction.

### Configurable Parameter Descriptions
- __BG_ROI_DILATE__: Mask size to capture the complete environment. Make both values equal for square or circular environments, and different for rectangular environment cases.
- __BG_ROI_DEPTH_RANGE__: Height range for the extraction algorithm to use when estimating distance from the camera to the floor. (Ensure to take real-life measurements to guarantee proper depth capturing.
- __BG_ROI_GRADIENT_FILTER__: Boolean value for whether you would like to extract the walls from your environment ROI.
- __USE_PLANE_BGROUND__: Boolean value for whether you would like to use a geometric-plane fit during ROI analysis. (This is typically used when some regions of the environment may not be captured fully.)

### Possible ROI Pathologies
- __BG_ROI_DILATE__: If incorrect, you may end up with an incorrect representation for your environment.
- __BG_ROI_DEPTH_RANGE__: If incorrect, the estimations for the mouse's height during the extraction will not be reliable.
- __USE_PLANE_BGROUND__: Used for when regions of the bucket/environment floor may be too shiny, causing them to be omitted from the background region representation.

The following cell will extract the first frame, ROI, and background ROI for your reference before continuing into the extraction process.

In [None]:
import ruamel.yaml as yaml
from moseq2_extract.gui import find_roi_command

sample_testdir_in = base_dir+'session_1/' # session directory to perform ROI testing
sample_roi_testfile = sample_testdir_in+'depth.dat' # depth file to perform ROI testing on
sample_testdir_out = sample_testdir_in+'sample_proc/' # directory to save roi extraction results

with open(config_filepath, 'r') as f:
    config_data = yaml.safe_load(f)
f.close()

# Relevant ROI parameters you may need to configure
config_data['bg_roi_dilate'] = (10, 10) # Size of the mask dilation (to include environment walls)
config_data['bg_roi_depth_range'] = (650, 750) # Range to search for floor of arena (in mm)
config_data['bg_roi_gradient_filter'] = False # Exclude walls with gradient filtering
config_data['use_plane_bground'] = False # Use plane fit for background

with open(config_filepath, 'w') as f:
    yaml.dump(config_data, f, Dumper=yaml.RoundTripDumper)
f.close()

find_roi_command(sample_roi_testfile, sample_testdir_out, config_filepath)

Once complete, you can expect the following directory structure:

```
.
├── config.yaml
├── MoSeq2-Notebook.ipynb
├── session_1/
├   ├── sample_proc/ **
├   ├   ├── bground.png **
├   ├   ├── first_frame.png **
├   ├   └── roi.png ** 
├   ├── depth.dat
├   ├── depth_ts.txt
├   └── metadata.json
└── session_2/
```

Display your calculated ROI images below:

In [None]:
from IPython.display import display, Image
for infile in os.listdir(sample_testdir_out):
    if infile[-3:] == 'png':
        print(infile[:-4])
        display(Image(sample_testdir_out+infile))

## Sample Test Extraction 
Run the following cell to test your raw data extraction parameters before extracting all of your data to ensure the best data quality going into the PCA step.

### Configurable Parameter Descriptions
- __MIN_HEIGHT__: The shortest possible height that the mouse can be in your recordings.
- __MAX_HEIGHT__: The tallest possible height your mouse can be in your recordings.
- __SPATIAL_FILTER_SIZE__: Median filter applied to the raw video as it is extracted in order to get crisp mp4 files. The larger the kernel, the more granular your video will become.
- __USE_TRACKING_MODEL__: Boolean value to decide whether to use Expectation Maximization (EM) Tracking on mice with inscopix cables, helps improve the processed representation of the mice poses.
- __CABLE_FILTER_ITERS__: Number of times to iterate over mouse cable during frame cleaning process. (Only occurs if use_tracking_model is true).

### Possible Extraction Pathologies
- __MIN_HEIGHT__: Important factor to include in order to properly estimate the minimum depth value during video construction.
- __MAX_HEIGHT__: Important factor to include in order to properly estimate the maximum depth value during video construction.
- __SPATIAL_FILTER_SIZE__: Careful not to set it too high as to not lose video clarity. (Must be ODD)
- __USE_TRACKING_MODEL__: Not recommended to use for regular mice without any cable obstructions in the video.
- __CABLE_FILTER_ITERS__: We recommend starting with 5 filter iterations at first and then modifying it based on how much of the cable has been cleaned from the video.

In [None]:
from moseq2_extract.gui import sample_extract_command

sample_testfile = sample_testdir_in + 'depth.dat'
extract_testdir_out = 'test_proc/' # directory to save sample extraction
nframes = 200 # number of frames to extract from raw to preview

with open(config_filepath, 'r') as f:
    config_data = yaml.safe_load(f)
f.close()

# Extraction parameters you may need to configure
config_data['min_height'] = 10 # Min mouse height from floor (mm)
config_data['max_height'] = 100 # Max mouse height from floor (mm)

# Use an expectation-maximization style model to aid mouse tracking. Useful for data with cables
config_data['use_tracking_model'] = False 
config_data['cable_filter_iters'] = 0 # Number of cable filter iterations

with open(config_filepath, 'w') as f:
    yaml.dump(config_data, f, Dumper=yaml.RoundTripDumper)
f.close()

sample_extract_command(sample_testfile, extract_testdir_out, config_filepath, nframes)

After an extraction, you can expect the following directory structure:

```
.
├── config.yaml
├── MoSeq2-Notebook.ipynb
├── session_1/
├   ├── test_proc/ **
├   ├   ├── bground.tiff **
├   ├   ├── first_frame.tiff **
├   ├   ├── results_00.mp4 **
├   ├   ├── results_00.h5 **
├   ├   ├── results_00.yaml **
├   ├   └── roi.tiff ** 
├   ├── depth.dat
├   ├── depth_ts.txt
├   └── metadata.json
└── session_2/
```

You can view your sample extraction below:

In [None]:
from IPython.display import display, Video, Image

display(Video(sample_testdir_in+extract_testdir_out+'results_00.mp4'))

If you are happy with your sample extraction, continue to extracting your full dataset. Otherwise, consider adjusting some of your ROI or extraction parameters.

## Extract Session(s)

Run the following cells to create a shell script that will extract all of your found `depth.dat` files, and subsequently execute the shell script.

In [None]:
from moseq2_batch.gui import extract_batch_command
from pathlib import Path
import os

filename = 'depth.dat' # depth files to recursively search for that have been partially extracted or not yet extracted 

## advanced settings
skip_checks = False # check whether the session directory has been previously extracted
partition = 'short' # slurm job partition specification // skip if running on local cluster


commands = extract_batch_command(base_dir, Path(config_filepath), filename, partition, skip_checks)

with open('batch_extract.sh', 'w') as f:
    f.write('#!/bin/bash\n')
    for cmd in commands:
        cmd = cmd.strip(';')
        f.write('%s\n' % cmd)
        print(cmd)

os.system('chmod a+x batch_extract.sh')

In [None]:
%%bash
./batch_extract.sh

This shell command is simply peforming a extract command for each of your sessions, (this is best utilized using a Slurm scheduler). 

This is what your directory structure should look like once the process is complete:

```
.
├── MoSeq2-Notebook.ipynb
├── config.yaml
├── session_1/
├   ...
├   └── proc/ **
├   ├   ├── roi.tiff
├   ├   ...
├   └   └── results.h5 **
└── session_2/
├   ...
├   └── proc/ **
├   ├   ├── roi.tiff
├   ├   ...
└   └   └── results.h5 **
        
```

Once that is done, aggregate all of your extraction results to consolidate all of your unique session metadata and timestamp information in one folder, and generate your index file.

In [None]:
from moseq2_batch.gui import aggregate_extract_results_command

recording_format = '{start_time}_{session_name}_{subject_name}' # filename formats for the extracted data
aggregate_results_dir = 'aggregate_results/' # directory to save all metadata+extracted videos to with above respective name format

aggregate_extract_results_command(base_dir, recording_format, aggregate_results_dir)

Resulting in the following directory (sample) structure:

```
.
├── aggregate_results/ **
├   ├── session_1_results.h5 **
├   ├── session_1_results.yaml **
├   ├── session_1_results.mp4 **
├   ├── session_2_results.h5 **
├   ├── session_2_results.yaml **
├   └── session_2_results.mp4 **
├── config.yaml
├── moseq2-index.yaml **
├── MoSeq2-Notebook.ipynb
├── session_1/
└── session_2/
```

__Notice your index file has also been generated in your base directory.__

View all of your extracted videos below:

In [None]:
from IPython.display import display, Video

for infile in os.listdir(aggregate_results_dir):
    if infile[-3:] == 'mp4':
        print(infile[:-4])
        display(Video(aggregate_results_dir+infile))

***
<center><h1>Principal Component Analysis (PCA)</h1></center>

***

Once the data has been extracted, you can now implement a Principal Component Analysis on your metadata (specifically h5 files) in order to compute the principal components of your mouse's body in order to classify its behavior in the ARHMM model.

The pipeline below depicts the flow of operations to prepare your data for the ARHMM Modeling step.

<img src="https://raw.githubusercontent.com/dattalab/moseq2-app/jupyter/media/PCA_Pipeline.png?token=ACRN4H7IOJUK6RD7AB3NATK5T6EJA">

## Training PCA

The following cell will find your extracted metadata and use it to train a PCA model, and save it to your desired directory for later use.

__A good example of what you should expect from your PCA Components and Scree plot are shown below:__

<center>Components</center> | <center>Scree Plot</center>
- | - 
<img src="https://raw.githubusercontent.com/dattalab/moseq2-app/jupyter/media/Components_Ex.png?token=ACRN4H7DJOXB3CIJL5DBHOK5T6EKA" width=400 height=400> | <img src="https://raw.githubusercontent.com/dattalab/moseq2-app/jupyter/media/Scree_Ex.png?token=ACRN4H62HBD5WPYABA7AQC25T6ELI" width=400 height=400>

### Configurable Parameter Descriptions
- __GAUSSFILTER_SPACE__: Kernel size for performing a gaussian filter on your processed mouse video before performing PCA. This helps identify crisper, more informative principal components.
- __MEDFILTER_SPACE__: Same as gauss filter kernel but uses Median Filtering instead. (Typically use one or the other)
- __MISSING_DATA__: If you have missing/dropped frames in your videos, set this to true.
- __MISSING_DATA_ITERS__: Number of times to iterate over missing data during PCA to fill in missing gaps appropriately.
- __RECON_PCS__: Number of principal components to reconstruct from missing data.

### Possible PCA Pathologies
- __GAUSSFILTER_SPACE__ & __MEDFILTER_SPACE__: Used for when the principal components do not appear to have crisp boundaries, or are all too similar to each other to be considered reliable components.

In [None]:
from moseq2_pca.gui import train_pca_command

pca_filename = 'pca' # Name of your PCA model h5 file to be saved
pca_dirname = '_pca/' # Directory to save your computed PCA results

with open(config_filepath, 'r') as f:
    config_data = yaml.safe_load(f)
f.close()

# PCA parameters you may need to configure
config_data['gaussfilter_space'] = (1.5, 1) # Spatial filter for data (Gaussian)
config_data['medfilter_space'] = [0] # Median spatial filter
config_data['recon_pcs'] = 10 # Number of PCs to use for missing data reconstruction
config_data['missing_data'] = True # Use missing data PCA
config_data['missing_data_iters'] = 10 # Number of times to iterate over missing data during PCA


with open(config_filepath, 'w') as f:
    yaml.dump(config_data, f, Dumper=yaml.RoundTripDumper)
f.close()

train_pca_command(base_dir, config_filepath, pca_dirname, pca_filename)

Once complete, you can expect your relative directory structure to look something like this:
```
.
├── _pca/ **
├   ├── pca.h5 **
├   ├── pca.yaml  **
├   ├── pca_components.png **
├   └── pca_scree.png **
├── aggregate_results/
├── config.yaml
├── moseq2-index.yaml
├── MoSeq2-Notebook.ipynb
├── session_1/
└── session_2/

```

You can now view your `computed components` and `scree plot` in the next cell.

In [None]:
from IPython.display import display, Image
images = [pca_dirname+'pca_components.png',pca_dirname+'pca_scree.png']
for im in images:
    display(Image(im))

## Computing Principal Component Scores

Once your PCA model has been trained, you can now apply your model using your extracted data amd computed principal components. To compute your PC Scores, run the following command:

In [None]:
from moseq2_pca.gui import apply_pca_command

scores_filename = 'pca_scores' # name of the scores file to compute and save

apply_pca_command(base_dir, config_filepath, pca_dirname, scores_filename)

Once complete, you will have a pca_scores file saved in your pca directory. (Example shown below)
```
.
├── _pca/
├   ├── pca.h5
├   ├── pca.yaml
├   ├── pca_scores.h5  **
├   ├── pca_components.png
├   └── pca_scree.png
├── aggregate_results/
├── config.yaml
├── moseq2-index.yaml
├── MoSeq2-Notebook.ipynb
├── session_1/
└── session_2/

```

## (Optional) Computing Model-Free Syllable Changepoints

This is an optional step used to aid in determining model-free syllable lengths; which are general approximations of the duration of respective body language syllables. Computing Model-Free Changepoints can be useful for determining the prior variable for syllable duration, denoted as `kappa`, in the ARHMM modeling step.

__A good example of a Changepoints Distance plot is shown below__
<img src="https://raw.githubusercontent.com/dattalab/moseq2-app/jupyter/media/CP_Ex.png?token=ACRN4HYHGNQVJMBDZQFQ4BK5T6EM4" width=400 height=400>


Measure syllable block duration distances between detected syllables using your PCA model or computed scores below.

__Warning: These parameters have been hard-coded to accomodate for C57 Mice, and those of the like. Therefore, we do not recommend changing the changepoint calculation parameters. However, if you decide to do so, it is at your own risk.__

### Configurable Parameter Descriptions
- __THRESHOLD__: Computed value used to determine the "peak"/transition point from one syllable to the other
- __DIMS__: Number of random projections to use in order to compare the computed principal components with, and determine a distribution for the block durations.

In [None]:
from moseq2_pca.gui import compute_changepoints_command
import ruamel.yaml as yaml
changepoints_filename = 'changepoints' # name of the changepoints images to generate

with open(config_filepath, 'r') as f:
    config_data = yaml.safe_load(f)
f.close()

# Changepoint computation parameters you may want to configure
config_data['threshold'] = 0.5 # Peak threshold to use for changepoints
config_data['dims'] = 300 # Number of random projections to use

with open(config_filepath, 'w') as f:
    yaml.dump(config_data, f, Dumper=yaml.RoundTripDumper)
f.close()

compute_changepoints_command(base_dir, config_filepath, pca_dirname, changepoints_filename)

The changepoints plot will be generated and saved in the pca directory (example below).

```
.
├── _pca/ 
├   ├── pca.h5
├   ├── pca_scores.h5
├   ...
├   └── changepoints_dist.png **
├── aggregate_results/ 
├── config.yaml
├── moseq2-index.yaml
├── MoSeq2-Notebook.ipynb
├── session_1/
└── session_2/
```

View your changepoints distance plot:

In [None]:
from IPython.display import display, Image

display(Image(pca_dirname+changepoints_filename+'_dist.png'))

***
<center><h1>ARHMM Modeling</h1></center>

***

In order to train your ARHMM (Auto-Regressive Hidden Markov Model), you will use your computed PC scores as your input data, and specify whether you are modeling a single experimental group for observational research, or modeling multiple different groups (e.g. control vs. experimental groups) for comparative analysis.

The pipeline below shows the flow of operations in order to train your ARHMM.

<img src="https://raw.githubusercontent.com/dattalab/moseq2-app/jupyter/media/Model_Pipeline.png?token=ACRN4H3AFJJ3MZIMRL3LA7C5T6EOK">

## (Optional) Specify Groups

### What are groups?

MoSeq using groups in the `moseq2-index.yaml` file to indicate whether your collected sessions are representing a single experimental group, or many different groups that you would like to compare while modeling and visuslizing.

By default, all the session recordings have the same group title: `'default'`. If you do not have 2 sessions that are different enough to separate to different groups for later comparison, you can skip this step.

Otherwise, there are 3 ways you are able to specify your groups:
1. Specify group by SessionName
2. Specify group by SubjectName
3. Manually edit index file

### View Indexed Sessions
Use this cell to view your sessions' information regarding their SessionNames, SubjectNames, and Groups.

In [None]:
from moseq2_viz.gui import get_groups_command

index_filepath = base_dir+'moseq2-index.yaml'

get_groups_command(index_filepath)

### 1 - Specify Group by Session Name

In [None]:
from moseq2_viz.gui import add_group_command

value = 'wednesday' # value of the corresponding key
group = 'group1' # designated group name
exact = False # Must be exact key-value match
lowercase = False # change to lowercase
negative = False # select opposite selection than key-value pair given

add_group_by_session(index_filepath, value, group, exact, lowercase, negative)

### 2 - Specify Group by Subject Name

In [None]:
from moseq2_viz.gui import add_group_command

value = 'mouse1' # value of the corresponding key
group = 'group1' # designated group name
exact = False # Must be exact key-value match
lowercase = False # change to lowercase
negative = False # select opposite selection than key-value pair given

add_group_by_subject(index_filepath, value, group, exact, lowercase, negative)

### 3 - Manually Edit Index File

Simply navigate to your `moseq2-index.yaml` file in your jupyter notebook homepage and edit the group names to your desired values.

## Train ARHMM

### Configurable Parameter Descriptions
- __HOLD_OUT__: Boolean for whether to hold out data during the training process.
- __HOLD_OUT_SEED__: Integer used to reproduce the same hold out set for repeated testing.
- __NFOLDS__: Number of data folds to hold out during training. (If used, nfolds <= nsessions)
- __NPCS__: Number of selected principal components, chosen in order as shown in the PC Components plot.
- __NUM_INTER__: Number of time the model will iterate over your dataset, we recommend at least 100 starting out.
- __MAX_STATES__: Maximum number of states the ARHMM that the ARHMM can end up with at the end of training. 
- __SEPARATE_TRANS__: Boolean for whether to separate the modeling process for different groups. (Must set to true if number of unique groups > 1)
- __KAPPA__: Prior probability variable used to indicate average syllable length. Setting kappa to the number of frames is a good starting point to determining the proper expressed syllable durations.
- __CHECKPOINT_FREQ__: Value indicating when to save model checkpoints per number of iterations passed. (If -1, do not checkpoint)

### Possible ARHMM Pathologies
- __KAPPA__: If kappa is too low, syllables will appear to be too short, and vice versa.
- __NPCS__: If too few or too many PCs are selected, the ARHMM predictions will become unreliable.
- __NUM_INTER__: This is modeling regularization parameter to ensure that your model is fitting appropriately to its given dataset.
- __MAX_STATES__: This is modeling regularization parameter that indicates the complexity of the transitions that may be happening in your dataset. Therefore, if there are too few the model may not learn the actual behavior, and if there are too many, then the model will overfit to the dataset.

In [None]:
from moseq2_model.gui import learn_model_command
import os
scores_file = pca_dirname+scores_filename+'.h5' # path to input PC scores file to model
model_path = base_dir+'model.p' # path to save trained model
index_filepath = base_dir+'moseq2-index.yaml' # path to your auto-generated (possibly modified) index file

# Advanced modeling parameters
hold_out = False # boolean to hold out data during the training process
hold_out_seed = -1 # integer to standardize the held out folds during training
nfolds = 5 # number of folds to hold out during training (if hold_out==True)
npcs = 10  # number of PCs being used

num_iter = 50 # number of iterations to train model
max_states = 100 # number of maximum states the ARHMM can end up with
kappa = 100000 # syllable length prior
robust = False # use robust-ARHMM with t-distribution

separate_trans = False # separate group transition graphs; set to True if ngroups > 1

checkpoint_freq = -1 # model saving freqency (in interations)

#OMIT THESE TWO
gamma = 1e3 # Weight value on syllables with higher number of usages
alpha = 5.7 # Transition probability rate

learn_model_command(scores_file, model_path, config_filepath, index_filepath, hold_out, nfolds,
                    num_iter, max_states, npcs, kappa, gamma, alpha, 
                    separate_trans, robust, checkpoint_freq)

Once training is complete, your model will be saved in your base directory (shown below) and you are ready to use the moseq2-viz module to produce crowd videos and a number of statistical analysis plots.
```
.
├── _pca/ 
├── aggregate_results/ 
├── config.yaml
├── model.p **
├── moseq2-index.yaml/
├── MoSeq2-Notebook.ipynb
├── session_1/
└── session_2/
```

***
<center><h1>Visualize Analysis Results</h1></center>

***

Now that you have a trained ARHMM, you can use it generate informative graphs and videos regarding the behavior syllables found, their usage frequency, and transition probabilities.

The graph below shows the 4 operations that the MoSeq2-Viz module currently affords. They can also be computed in any order at this point in the notebook.

<img src="https://raw.githubusercontent.com/dattalab/moseq2-app/jupyter/media/Viz_Pipeline.png?token=ACRN4H7NCTTVXJ6ULHZLAVK5T6EP2">

## Make Crowd Videos

This tool allows you to create videos containing many overlayed clips of the mouse performing the same specified syllable at the moment a red dot appears on their body. The videos are sorted by most frequently expressed syllable to least.
To create the crowd videos, run the following command:

In [None]:
from moseq2_viz.gui import make_crowd_movies_command

crowd_dir = base_dir+'crowd_movies/' # output directory to save all movies in

max_syllables, max_examples = 10, 10 # maximum number of syllables, and examples of each syllable in a video respectively

make_crowd_movies_command(index_filepath, model_path, config_filepath, crowd_dir, max_syllables, max_examples)

Once completed, you can find your crowd movies along with a metadata YAML file in your corresponding crowd directory. The metadata `info.yaml` file will contain model information pertaining to how these crowd videos were produced.
```
.
├── _pca/ 
├── aggregate_results/ 
├── config.yaml
├── crowd_movies/ **
├   ├── info.yaml **
├   ├── syllable_sorted_44 (usage).mp4 **
├   ...
├   └── syllable_sorted_12 (usage).mp4 **
├── model.p 
├── moseq2-index.yaml
├── MoSeq2-Notebook.ipynb
├── session_1/
└── session_2/
```

View your generated crowd movies below:

In [None]:
from IPython.display import display, Video

for infile in os.listdir(crowd_dir):
    if infile[-3:] == 'mp4':
        print(infile[:-4])
        display(Video(crowd_dir+infile))

## Compute Usage Plots

Use this command to compute the model-detected syllables usages sorted in descending order of usage.

In [None]:
from moseq2_viz.gui import plot_usages_command

sort = True
count = 'usage'
max_used_syllable = max_syllables - 1 
group = ''
output_file = 'usages'

plot_usages_command(index_filepath, model_path, sort, count, max_syllable, group, output_file)

View Usage Plot:

In [None]:
from IPython.display import display, Image
display(Image('usages.png'))

## Compute Scalar Summary and Tracking Plots

Use the following command to compute some scalar summary information about your modeled groups, such as average velocity, height, etc.
This command will also generate a tracking summary plot; depicting the path traveled by the mouse in your recordings.

In [None]:
from moseq2_viz.gui import plot_scalar_summary_command

output_file = 'scalars' # prefix name of the saved scalar position and summary graphs

plot_scalar_summary_command(index_filepath, output_file)

View plots:

In [None]:
from IPython.display import display, Image
display(Image('scalars_summary.png'))
display(Image('scalars_position.png'))

## Compute Syllable Transition Graph

Use the following command to generate a syllable transition graph. The graph will be comprised of nodes labelled by syllable, and edges depicting a probable transition, with edge thickness depicting the weight of the transition edge.

For multiple groups, there will be a transition graph for each group, as well as a unified graph with different colors to identify the groups.

In [None]:
from moseq2_viz.gui import plot_transition_graph_command

max_syllable = 40 # Maximum number of nodes in the transition graph
group = '' # Group to graph, default if empty str
output_filename = 'transition' # name of the png file to be saved

plot_transition_graph_command(index_filepath, model_path, config_filepath, max_syllable, group, output_filename)

Plot your syllable transition graph:

In [None]:
from IPython.display import display, Image
display(Image('transition.png'))

***
<center><h1>Notebook End</h1></center>

***