# This notebook contains information for the use of Key Point MoSeq

## Project Setup

### Creating a new project directory with a config.yml file

In [2]:
import keypoint_moseq as kpms

project_dir = 'name_of_project'
config = lambda: kpms.load_config(project_dir)

### Loading a SLEAP file with predictions for a single video. 
Should be a .slp or .h5 file

In [None]:
sleap_file = 'name_of_file'
kpms.setup_project(project_dir, sleap_file = sleap_file)

At his point, the config file should be edited in a text editor, VS code, or using the update_config function as shown below. Only need to include 1 body part for anterior/posterior_bodyparts for rotational alignment. Datta suggests excluding tail from analysis.

In [None]:
kpms.update_config(
    project_dir,
    video_dir='path/to/videos/',
    anterior_bodyparts = ['nose'],
    posterior_bodyparts = ['spine'],
    use_bodyparts=['nose', 'RHB', 'LHB', 'CHB', 'spine1', 'spine2', 'BT'])

### Loading keypoint detections from SLEAP

In [None]:
keypoint_data_path = 'path\to\data'
coordinates, confidence, bodyparts = kpms.load_keypoints(keypoint_data_path), 'sleap')

data, metadata = kpms.format_data(coordinates, confidence, **config())

## Calibration

This step allows us to understand the relationship between errors and keypoint confidence scores. The resulting regression coefficients (slope and intercept) are used durring modeling to set the noise on a per-frame, per-keypoint basis. also, the confidence_threshold parameter can be passed to define outlier keypoints for PCA and model initialization

In [None]:
kpms.noise_calibration(project_dir, coordinates, confidence, **config())

After this code is run a widget should appear with video frames on the left. If the widget does not appear there is an issue with the jupyter notebook extensions and the code should be opened in jupyter lab instead. Anotate each frame with the correct location of the labeled bodypart. Left click to specify the correct location (an X should appear). Use the arrow buttons to annotate additional frames. Make sure to save using the save button in the widget.

# Fit PCA model to aligned and centered keypoint coordinates

In [None]:
pca = kpms.fit_pca(**data, **config())
kpms.save_pca(pca, project_dir)

kpms.print_dims_to_explain_variance(pca, 0.9)
kpms.plot_scree(pca, project_dir=project_dir)
kpms.plot_pcs(pca, project_dir+project_dir, **config())

If a PCA model has already been fit to the key point data, this can be loaded with the following code

In [None]:
pca = kpms.load_pca(project_dir)

at this point, the latent_dimension attribute in the config file should be updated to reflect the PCA model. A good heuristic sugested by Datta is the number of dimensions to explain 90% or 10 dimensions - whichever is lower.

In [None]:
kpms.update_config(project_dir, latent_dim = # of componants)

# Modeling Fitting

### setting kappa

Most users will need to adjust the kappa hyperparameter to achieve the desired distribtion of syllable durations. Higher values of kappa lead to longer syllables. Also, you will need to pick two kappas: one for AR-HMM fitting and another for the full model. Kappa should be itteratively updated with refitted models until target syllable time-scale is attained. Model fitting can be stopped at any time by interupting the kernel, then restarted with a new kappa value. The full model will generally require a lower value for kappa to yield the same target syllable durations.

Kappa can be adjusted using kpms.update_hypparams. use small values, start with 1e4

## Initialization

In [None]:
model = kpms.init_model(data, pca = pca, **config())

model = kpms.updatehypparams(model, kappa = NUMBER)

## Fitting an AR-HMM

In [None]:
num_ar_iters = 50

model, model_name = kpms.fit_model(
    model, data, metadata, project_dir,
    ar_only=True, num_iters=num_ar_iters)

## Fitting the full model

This code fits a full keypoint-moseq model using the results of the previous step AR-HMM fitting for initialization. You may need to try a few values of kappa at this step

In [None]:
model, data, metadata, current_iter = kpms.load_checkpoint(
    project_dir, model_name, iteration = num_ar_iters)

model = kpms.update_hyperparams(model, kappa=NUMBER)

model = kpms.fit_model(
    model, data, metadata, project_dir, model_name, ar_only=False,
    start_iter=current_iter, num_iters=current_iter+500)[0]

## Sort syllables by frequency

syllable 0 is the most frequent, 1 is second most etc.

This code is only applied to sorting the checkpoint file so if this code is run after extracting modeling results or generating visualizations, then those steps will need to be rerun.



In [None]:
kpms.reindex_syllables_in_checkpoint(project_dir, model_name);

## extract model results

Parse model results and save them to {project_dir}\{model_name}\results.h5

These results can be loaded using kpms.lead_results

In [None]:
model, data, metadata, current_iter = kpms.load_checkpoint(project_dir, model_name)

results = kpms.extract_results(model, metadata, project_dir, model_name)

## Save results to csv

In [None]:
kpms.save_results_as_csv(results, project_dir, model_name)

## apply model to new data

This is useful if you performed a new experiment and would like to maintain an existing set of syllables. The results of the new experiment will be added to the existing results.h5 file.

In [None]:
# load the most recent model checkpoint and pca object
model = kpms.load_checkpoint(project_dir, model_name)[0]
pca = kpms.load_pca(project_dir)

# load new data (e.g. from deeplabcut)
new_data = 'path/to/new/data/' # can be a file, a directory, or a list of files
coordinates, confidences, bodyparts = kpms.load_keypoints(new_data, 'deeplabcut')
data, metadata = kpms.format_data(coordinates, confidences, **config())

# apply saved model to new data
results = kpms.apply_model(model, pca, data, metadata, project_dir, model_name, **config())

# optionally rerun `save_results_as_csv` to export the new results
# kpms.save_results_as_csv(results, project_dir, model_name)

## Visualization


### Trajectory plots

In [None]:
results = kpms.load_results(project_dir, model_name)
kpms.generate_trajectory_plots(coordinates, results, project_dir, model_name, **config())

### Grid movies

In [None]:
kpms.generate_grid_movies(results, project_dir, model_name, coordinates=coordinates, **config())

### Syllable Dendrogram

Plot a dendogram representing distance between each syllable's median trajectory


In [None]:
kpms.plot_similarity_dendrogram(coordinates, results, project_dir, model_name, **config())