# Welcome to DLC2MoSeq

Model and analyze your DLC-tracked keypoint data using MoSeq's modeling and analysis pipeline.

__Note: This is notebook is still in development and has not been fully tested on many use cases yet.__

For now, the DLC2Moseq functions will remain separate from the MoSeq codebase until it has been debugged properly.

# Prepare DeepLabCut Projects

Assuming that you have already acquired some data, run each project through the DeepLabCut pipeline such that each project trained model.

Once your model is trained, use the `create_labeled_video()` DLC function to create an h5 file outlining the keypoint coordinates for each frame, as well as a video for your reference.

# Get Sessions and Create Index File 

The index file will be used to designate and separate the experimental group datasets during the arhmm modeling.

After performing data acquisition, store all of your session folders under a parent directory (shown below) to access them in this notebook. 

```
.
└── Base_Directory/
    ├── session_1/ **
    ├   ├── dlc-models/
    ├   ├── evaluation-results/
    ├   ├── labeled-data/
    ├   ├── training-datasets
    ├   ├── videoName_modelName_sessionName_iterNum.h5 # this is the dataset to model
    ├   └── videos/
    ...
    ├── session_2/ **
    ├   ├── dlc-models/
    ├   ├── evaluation-results/
    ├   ├── labeled-data/
    ├   ├── training-datasets
    ├   ├── videoName_modelName_sessionName_iterNum.h5 # this is the dataset to model
    └── └── videos/

```

In [None]:
import os
from dlc_utils.util import generate_index
base_dir = '/'

index_filepath = os.path.join(base_dir, 'moseq2dlc-index.yaml')

generate_index(base_dir, index_filepath)

# Specify Groups
### What are groups?

MoSeq using groups in the `moseq2-index.yaml` file to indicate whether your collected sessions are representing a single experimental group, or many different groups that you would like to compare while modeling and visuslizing

The index file requires that all your sessions have a metadata.json file in order to successfully assign each recorded subject or session to a group.

Once a cell is run, it will display your current indexing structure.

# View your Current Group Configuration

In [None]:
from dlc_utils.util import get_groups_command, set_group

get_groups_command(index_filepath)

# Set Groups to Model

Set the groups by inputting the respective index of the session with the desired group name.

In [None]:
# set_group(int -> sessionIndex, str-> groupName, str->path_to_index)
set_group(1, 'group1', index_filepath)
get_groups_command(index_filepath)

# Load H5 Files To Analyze

Load your DLC modeling results (labeled video) that is stored in a `.h5` file in each respective sessions' base directory. This cell will return dataframes that contain the coordinates of each labeled body part for the whole video. This is the training data you will use to train the ARHMM.

In [None]:
from dlc_utils.preprocess import load_dlc_modeling_data, pack_data

# Load dataset and coordinate key list
data_coords, coords = load_dlc_modeling_data(index_filepath)
train_data = pack_data(index_filepath, data_coords, coords)

# Get Crop Coordinates to Center and Orient Mouse Facing East

Compute a cropped version of your video containing only the mouse segmented from the background. This is only for used to compute grid movies of the mouse syllables post modeling. The cropped video is NOT included in the analysis.

In [None]:
from dlc_utils.preprocess import get_crop_rotated

# body part label names (declared in each DLC session config file, under skeleton)
front_pt = 'nose' 
rear_pt = 'tail'
cropped_videos = get_crop_rotated(index_filepath, data_coords, front_pt, rear_pt)

# Compute Model-Free Changepoints

This is an optional step used to aid in determining model-free syllable lengths; which are general approximations of the duration of respective body language syllables. Computing Model-Free Changepoints can be useful for determining the prior variable for syllable duration, denoted as `kappa`, in the ARHMM modeling step.

A good Changepoint graph should show a smooth left-skewed bell-curve representing changepoint durations.

In [None]:
from dlc_utils.analysis import compute_changepoints
from dlc_utils.viz import plot_changepoints

cps = compute_changepoints(train_data)
fig, ax = plot_changepoints(cps)

# Model Labeled Keypoints Using ARHMM

## Train or Load Previously Trained Model

In [None]:
import joblib
from dlc_utils.analysis import model_train_pbb, parse_modeling_results
from dlc_utils.viz import plot_training_lls

model_path = os.path.join(base_dir, 'arhmm.p.gz')
if not os.path.exists(model_path):
    model = model_train_pbb(train_data, 
                                index_filepath,
                                model_type="arhmm",
                                num_procs=1,
                                test_size=0, # hold out any data?
                                iters=100, # usually we do 300-400 iterations for final model fits
                                kappa=None, # stickiness
                                separate_trans=True, # model **multiple** groups separately?
                                empirical_bayes=False)
    joblib.dump(model, model_path, compress=3)
    results = parse_modeling_results(index_filepath, model)
else:
    model = joblib.load(model_path)
    results = parse_modeling_results(index_filepath, model)
    
lls, _ = plot_training_lls(results)

# Plot Model vs. Model-free Changepoints

In [None]:
from dlc_utils.viz import plot_model_cp_diff
diff, _ = plot_model_cp_diff(results, cps)

# Plot Grid Movies

This function allows you to create grid videos of the mouse performing the same syllable at different timestamps and durations. Select the respective video index from your `cropped_videos` array to compute it's respective labels.

In [None]:
from dlc_utils.viz import make_grid_movies

crowd_dir = os.path.join(base_dir, 'crowd_movies/')
video_index = 0
# select session index
labels = results['labels'][video_index]
make_grid_movies(labels, cropped_videos[video_index], output_dir=crowd_dir)

View your generated crowd movies below:

In [None]:
from IPython.display import display, Video
from glob import glob

videos = sorted(glob(os.path.join(crowd_dir, '*.mp4')))
vids = [Video(vid, embed=True) for vid in videos]
for vid, vp in zip(vids, videos):
    print(vp.split('/')[-1])
    display(vid)

# Plot Transition Graph

Use the following command to generate a syllable transition graph. The graph will be comprised of nodes labelled by syllable, and edges depicting a probable transition, with edge thickness depicting the weight of the transition edge.

For multiple groups, there will be a transition graph for each group, as well as a difference-graph with different colors to identify the groups.

In [None]:
from moseq2_viz.model.util import get_transition_matrix, get_syllable_statistics
from moseq2_viz.viz import graph_transition_matrix

labels = results['labels']
groups = [] # array of declared groups (for multiple sessions)
trans_mats = []
max_syllables, nexamples = 30, 30

if len(groups) > 0:
    for group in groups:
        use_labels = [lbl for lbl, g in zip(labels, groups) if g==group]
        trans_mats.append(get_transition_matrix(use_labels, normalize=True, combine=True, max_syllable=max_syllables))
else:
    trans_mats = [get_transition_matrix(labels, normalize=True, combine=True, max_syllable=max_syllables)]


plt, _, _ = graph_transition_matrix(trans_mats, edge_threshold=.0025, anchor=0, usage_threshold=1,
                            edge_width_scale=.2, edge_color='k')

# Plot Syllable Usages

In [None]:
from dlc_utils.viz import plot_usages
plot_usages(index_filepath, labels, max_syllables)