# Welcome to MoSeq2-Notebook

### Run all of the MoSeq tools in a containerized notebook.

# Setup

To begin, run the following jupyter notebook to ensure MoSeq2 is installed and running smoothely on your machine.
[SETUP NOTEBOOK](http://localhost:8889/notebooks/MoSeq2_Step_0.ipynb)

Next, copy this notebook file to your recorded session directory. You will create a new copy of this notebook for each analysis session.

You can use the cells below to ensure your sessions are found in this notebook, and the correct python is being used in your designated conda env (where your moseq2 tools are installed).

Here is a sample directory structure with your MoSeq2-Notebook:

```
.
├── MoSeq2-Notebook.ipynb
└── sample_session/
    ├── depth.dat
    ├── depth_ts.txt
    └── metadata.json
```

For multiple sessions,

```
.
├── MoSeq2-Notebook.ipynb
├── session_1/
    ├── depth.dat
    ├── depth_ts.txt
    └── metadata.json
└── session_2/   
    ├── depth.dat
    ├── depth_ts.txt
    └── metadata.json
```

In [None]:
import os
os.getcwd()

Ensure you are running the python version located in your corresponding conda environment.

For example, if your anaconda environment is called moseq2, then your output would look like: ```/Users/username/anaconda3/envs/moseq2/bin/python```

In [None]:
%%bash
which python

## Generate Configuration Files

If the files and env are correct, run the cell below to generate a configuration file that will aid in modifying advanced analysis/extraction parameters throughout the MoSeq2 pipeline.

### Extraction/Analysis Configuration File

In [None]:
import os
from moseq2_extract.gui import *

base_dir = './' # "./" == directory where this notebook is located
config_filepath = base_dir+'config.yaml'

generate_config_command(config_filepath)

## Download a Flip file
In order to ensure your extraction is smooth and invariant to the mouse's orientation, we recommend using a flip-classifier to aid keeping the mouse oriented throughout the extraction.

The flip file indices are as follows:
* [0] - Large mice with fibers.
* [1] - Adult male c57s.
* [2] - Mice with Inscopix cables.

Enter your desired index in the variable assignment below and run the cell.

In [None]:
from moseq2_extract.gui import *

selected_index = 1 # flip file index
download_flip_command(base_dir, config_filepath, selected_index)

Once that is done, the flip file will be automatically used in your following extractions.

# Raw Data Extraction Step

Extraction Road Map

## Pre-Extraction Data Quality Testing

Before performing a full extraction on your recordings, follow the following steps to ensure your Regions of Interest (ROIs) are properly found. This will bring more clarity as to what to expect after a complete extraction of your data. 

### ROI Test
The following cell will extract the first frame, ROI, and background ROI for your reference before continuing into the extraction process. This ensures proper data quality going into the analysis steps.

In [None]:
from moseq2_extract.gui import find_roi_command

sample_testdir_in = base_dir+'session_1/' # session directory to perform ROI testing
sample_testfile = sample_testdir_in+'depth.dat' # depth file to perform ROI testing on
sample_testdir_out = sample_testdir_in+'sample_proc/' # directory to save roi extraction results

find_roi_command(sample_testfile, sample_testdir_out, config_filepath)

Run the following cell to display your calculated ROI images.

In [None]:
#Display extracted ROI
from IPython.display import display, Image
for infile in os.listdir(sample_testdir_out):
    if infile[-3:] == 'png':
        print(infile[:-4])
        display(Image(sample_testdir_out+infile))

### Sample Test Extraction 
Run the following cell to test your raw data extraction parameters before extracting all of your data to ensure the best data quality going into the PCA step.

In [None]:
from moseq2_extract.gui import sample_extract_command

extract_testdir_out = sample_testdir_in+'test_proc/' # directory to save sample extraction
nframes = 100 # number of frames to extract from raw to preview

sample_extract_command(sample_testfile, extract_testdir_out, config_filepath, nframes)

In [None]:
from IPython.display import display, Video, Image

display(Video(extract_testdir_out+'results_00.mp4'))

## Single Session Extraction
If you only have one session you would like to extract, run the following cell. Otherwise, run the Batch Extraction cell.

In [None]:
from moseq2_extract.gui import extract_command

session_path = 'session_1/' # session folder to extract
input_filepath = session_path+'depth.dat' # specify depth filename to extract
output_dir = 'proc/' # will output to session_path/proc/

extract_command(input_filepath, output_dir, config_filepath)

In [None]:
from IPython.display import Video, Image

Video(session_path+output_dir+'results_00.mp4')

## Multi-Session (Batch) Extraction
Extract multiple recording sessions (directories).

In [None]:
from moseq2_batch.gui import extract_batch_command
from pathlib import Path
import os

filename = 'depth.dat' # depth files to recursively search for that have been partially extracted or not yet extracted 
cluster_type = 'local' # specify whether running on 'local' computer or 'slurm' cluster

## advanced settings
temp_storage = Path('tmp/') # path to temporarily store values
partition = 'short' # slurm job partition specification
prefix = '' # command to run before executing batch extraction
skip_checks = False

commands = extract_batch_command(base_dir, Path(config_filepath), filename, cluster_type, temp_storage,
                  partition, prefix, skip_checks)

with open('batch_extract.sh', 'w') as f:
    f.write('#!/bin/bash\n')
    for cmd in commands:
        cmd = cmd.strip(';')
        f.write('%s\n' % cmd)
        print(cmd)

os.system('chmod a+x batch_extract.sh')

Run the following bash command to execute your batch extraction:

In [None]:
%%bash
./batch_extract.sh

Once that is done, aggregate all of your extraction results to consolidate all of your metadata and timestamp data in one folder.

In [None]:
from moseq2_batch.gui import aggregate_extract_results_command

recording_format = '{start_time}_{session_name}_{subject_name}' # filename formats for the extracted data
aggregate_results_dir = 'aggregate_results/' # directory to save all metadata+extracted videos to with above respective name format
mouse_threshold = 0

aggregate_extract_results_command(base_dir, recording_format, aggregate_results_dir, mouse_threshold)

View your extracted videos by running the following cell:

In [None]:
from IPython.display import display, Video

for infile in os.listdir(aggregate_results_dir):
    if infile[-3:] == 'mp4':
        print(infile[:-4])
        display(Video(aggregate_results_dir+infile))

# Principal Component Analysis (PCA) Step

PCA Roadmap

Once all your data is extracted and saved in your desired proc/ directory, you are now able to perform the PCA step.

## Training

To train your PCA on your extracted data results, run the following command. It will recursively search within your current directory structure for your extracted results_xx.h5 files. (If extraction results were aggregated, then all the loaded files will be from your `aggregate_results/` folder.

In [None]:
from moseq2_pca.gui import train_pca_command

pca_filename = 'pca' # Name of your PCA model h5 file to be saved
pca_dirname = '_pca/' # Directory to save your computed PCA results

train_pca_command(base_dir, config_filepath, pca_dirname, pca_filename)

Once the training is finished, run the following cell to view the computed components and PCA Scree plot to determine the number of PCs to use in ARHMM modeling.

In [None]:
from IPython.display import display, Image
images = [pca_dir+'pca_components.png',pca_dir+'pca_scree.png']
for im in images:
    display(Image(im))

## Computing Principal Component Scores
Once your PCA model has been trained, you can now apply your model using your extracted data amd computed principal components. To compute your PC Scores, run the following command:

In [None]:
from moseq2_pca.gui import apply_pca_command

scores_filename = 'pca_scores' # name of the scores file to compute and save

apply_pca_command(base_dir, config_filepath, pca_dirname, scores_filename)

## (Optional) Computing Model-free Syllable Changepoints
This is an optional step that is used to help determine the kappa parameter to use in the modeling step.

To measure block duration distances between detected syllables using your PCA model or computed scores, you can run the following command:

In [None]:
from moseq2_pca.gui import compute_changepoints_command

changepoints_filename = 'changepoints' # name of the changepoints images to generate

compute_changepoints_command(base_dir, config_filepath, pca_dirname, changepoints_filename)

In [None]:
from IPython.display import display, Image

display(Image(pca_dirname+changepoints_filename+'_dist.png'))

# Train ARHMM (Compute Locally)

Modeling Roadmap?

Once you have computed your PCA Scores, you can now use this data as your input to train your Auto-Regressive Hidden Markov Model (ARHMM).
If you have multiple groups (for example, a control and experimental group) that you would like to model separately using the same model, use the ```--separate-trans``` flag in the command below.

## Single Group Training

In [None]:
from moseq2_model.gui import learn_model_command
import os

scores_file = pca_dirname+scores_filename+'.h5' # path to input scores file to model
model_path = './model.p' # path to save trained model
index_file = "" # index file path (not necessary for single group training)
hold_out = False # boolean to hold out data during the training process
hold_out_seed = -1 # integer to standardize the held out folds during training
nfolds = 5 # number of folds to hold out during training (if hold_out==True)
num_iter = 10 # number of iterations to train model
max_states = 50 # number of maximum states the ARHMM can end up with
npcs = 10  # number of PCs being used
kappa = 100000 # total number of frames
gamma = 1e3
alpha = 5.7
separate_trans = False # separate group transition graphs
robust = False # use robust-ARHMM with t-distribution
checkpoint_freq = -1 # model saving freqency (in interations)


learn_model_command(scores_file, model_path, config_filepath, index_file, hold_out, nfolds,
                    num_iter, max_states, npcs, kappa, gamma, alpha, 
                    separate_trans, robust, checkpoint_freq)

## Multiple Group Training
In order to model multiple groups separately in your model (e.g. control vs. experimental groups), you must generate an index file to point to all your relevant paths, as well as indicate use the separate transition graphs flag.

Begin by generating your index file:

### Generate your index file
This file will be used to point to all of your extracted data + metadata to aid in analysis visualization and group dictation/separation.

In [None]:
from moseq2_viz.gui import generate_index_command

index_filepath = 'moseq2-index.yaml' # index file containing all the path/metadata info about the groups + Subjects
filter_tup = ()
all_uuids = False

generate_index_command(base_dir, scores_file, index_filepath, filter_tup, all_uuids)

In the following cell you can view which groups your subjects are associated with.

### Add your subjects to groups in your index file

To add your subjects to specific groups, simply indicate the correct keys, values and groups that correspond with a session parameter in your index file, and run the cell.

This can be done to add many subjects to one group, or run multiple times to manually add different subjects to certain groups.

Note: the groups can also be manually configured in your moseq2-index.yaml file.

In [None]:
from moseq2_viz.gui import add_group_command

key = 'SubjectName' # Name of index key (metadata variable id)
value = 'dat01' # value of the corresponding key
group = 'group1' # designated group name
exact = False # Must be exact string match
lowercase = False # change to lowercase
negative = False # select opposite selection than key-value pair given

add_group_command(index_filepath, key, value, group, exact, lowercase, negative)

To view your current groups and their labeled subjects, run the following cell.

In [None]:
# Implement view groups function


### Train Multiple Groups
Now you can train your model on multiple groups using your augmented index file.

In [None]:
from moseq2_model.gui import learn_model_command

scores_file = pca_dir+scores_filename+'.h5' # path to input scores file to model
model_path = base_dir+'model.p' # path to save trained model
hold_out = False # boolean to hold out data during the training process
hold_out_seed = -1 # integer to standardize the held out folds during training
nfolds = 5 # number of folds to hold out during training (if hold_out==True)
num_iter = 10 # number of iterations to train model
max_states = 50 # number of maximum states the ARHMM can end up with
npcs = 10  # number of PCs being used
kappa = 100000 # total number of frames
gamma = 1e3
alpha = 5.7
separate_trans = True # separate group transition graphs
robust = False # use robust-ARHMM with t-distribution
checkpoint_freq = -1 # model saving freqency (in interations)

learn_model_command(scores_file, model_path, config_filepath, index_file, hold_out, nfolds,
                    num_iter, max_states, npcs, kappa, gamma, alpha, 
                    separate_trans, robust, checkpoint_freq)

# Visualize Results

Viz Roadmap

Now that you have a trained ARHMM, you can use the moseq2-viz module to produce crowd videos and a number of statistical analysis plots.

## Setup
Ensure that you have a `moseq2-index.yaml` file generated using this command:

In [None]:
from moseq2_viz.gui import generate_index_command

index_filepath = base_dir+'moseq2-index.yaml' # index file containing all the path/metadata info about the groups + Subjects
filter_tup = ()
all_uuids = False # include all uuids in the index file?

generate_index_command(base_dir, scores_file, index_filepath, filter_tup, all_uuids)

## Make Crowd Videos
This tool allows you to create videos containing many overlayed clips of the mouse performing the same specified syllable at the moment a red dot appears on their body. The videos are sorted by most frequently expressed syllable to least.
To create the crowd videos, run the following command:

In [None]:
from moseq2_viz.gui import make_crowd_movies_command

crowd_dir = base_dir+'crowd_movies/' # output directory to save all movies in

max_syllables, max_examples = 10, 10 # maximum number of syllables, and examples of each syllable in a video respectively

make_crowd_movies_command(index_filepath, model_path, config_filepath, crowd_dir, max_syllables, max_examples)

Run the following cell to view your generated crowd movies.

In [None]:
from IPython.display import display, Video

for infile in os.listdir(crowd_dir):
    if infile[-3:] == 'mp4':
        print(infile[:-4])
        display(Video(crowd_dir+infile))

## Compute Usage Plots
Use this command to compute the model-detected syllables usages sorted in descending order of usage.

In [None]:
from moseq2_viz.gui import plot_usages_command

sort = True
count = 'usage'
max_syllable = 10
group = ''
output_file = 'usages'

plot_usages_command(index_filepath, model_path, sort, count, max_syllable, group, output_file)

In [None]:
from IPython.display import display, Image
display(Image('usages.png'))

## Compute Scalar Summary and Tracking Plots
Use the following command to compute some scalar summary information about your modeled groups, such as average velocity, height, etc.
This command will also generate a tracking summary plot; depicting the path traveled by the mouse in your recordings.

In [None]:
from moseq2_viz.gui import plot_scalar_summary_command

output_file = 'scalars' # prefix name of the saved scalar position and summary graphs

plot_scalar_summary_command(index_filepath, output_file)

In [None]:
from IPython.display import display, Image
display(Image('scalars_summary.png'))
display(Image('scalars_position.png'))

## Compute Syllable Transition Graph
Use the following command to generate a syllable transition graph. The graph will be comprised of nodes labelled by syllable, and edges depicting a probable transition, with edge thickness depicting the weight of the transition edge.

For multiple groups, there will be a transition graph for each group, as well as a unified graph with different colors to identify the groups.

In [None]:
from moseq2_viz.gui import plot_transition_graph_command

max_syllable = 40 # Maximum number of nodes in the transition graph
group = '' # Group to graph, default if empty str
output_filename = 'transition' # name of the png file to be saved

plot_transition_graph_command(index_filepath, model_path, config_filepath, max_syllable, group, output_filename)

In [None]:
from IPython.display import display, Image
display(Image('transition.png'))