# Analysis

This notebook contains routines for analyzing the output of keypoint-MoSeq, such as
- Annotating each recording with a group label
- Inspection and labeling of syllables
- Comparing syllable statistics across groups



```{note}
For the widgets below to work in a jupyter notebook, you must launch jupyter from a terminal in which the `keypoint_moseq` environment is active. You can double check by running `! which python` in the notebook. The output should be something like `/Users/username/miniconda3/envs/keypoint_moseq/bin/python`.
```

## Setup

We assume you have already have run keypoint-MoSeq, and that the outputs are organized as shown below.
```
.
└── <project_dir>/               ** current working directory
    ├── <model_dir>/             ** model directory
        ├── crowd_movies/        ** [Optional] crowd movies folder
        ├── grid_movies/         ** [Optional] grid movies folder
        ├── trajectory_plots/    ** [Optional] trajectory plots folder
        └── results.h5           ** model results
```

In [None]:
import keypoint_moseq as kpms

project_dir='demo_project' # the full path to the project directory
model_dirname='model_name' # name of the model to analyze

## Assign Groups

The following cell invokes an interactive spreadsheet widget that can be used to annotate each recording with a group label. These labels are important later on for performing group-wise comparisons. The annotations are saved to a to  `[project_dir]/index.yaml`.

- To assign a group label, select one or more rows, enter the group name and click `Set Group Name`
- Click the column headers to sort rows alphabetically, and click the filter icon in each column header to filter rows by name. 
- At any point, use `Update Index File` to save current group assignments.

In [None]:
index_file=kpms.interactive_group_setting(project_dir, model_dirname)

## Generate dataframes

First generate a pandas dataframe called `moseq_df` that contains syllable labels and kinematic information for each frame across all the recording sessions. 

In [None]:
moseq_df = kpms.compute_moseq_df(project_dir, model_dirname, smooth_heading=True) 

print('Generated moseq_df with shape', moseq_df.shape)
moseq_df.head()

In [None]:
import os
save_dir = os.path.join(project_dir, model_dirname) # directory to save the moseq_df dataframe
moseq_df.to_csv(os.path.join(save_dir, 'moseq_df.csv'), index=False)
print('Saved `moseq_df` dataframe to', save_dir)

Next generate a dataframe called `stats_df` that contains summary statistics for each syllable in each recording session, such as its usage frequency and its distribution of kinematic parameters.

In [None]:
stats_df = kpms.compute_stats_df(
    project_dir,
    model_dirname,
    moseq_df, 
    min_frequency=0.005, # threshold frequency for including a syllable in the dataframe
    groupby=['group', 'name'], # column(s) to group the dataframe by
    fps=30)                     # frame rate of the video from which keypoints were inferred

print('Generated stats_df with shape', stats_df.shape)
stats_df

In [None]:
import os
save_dir = os.path.join(project_dir, model_dirname)
stats_df.to_csv(os.path.join(save_dir, 'stats_df'), index=False)
print('Saved `stats_df` dataframe to', save_dir)

## Fingerprint plot

Fingerprint plots show the distribution of syllable frequencies and kinematic parameters in each recording. The plot below is saved to `[project_dir]/[model_dirname]/analysis_figures`

In [None]:
kpms.plot_fingerprint(project_dir, model_dirname, moseq_df, stats_df,
                      n_bins=50, # the number fo bins that indicates resolution of distribution 
                      range_type='robust', # range type for stats, robust filters out top and bottom 1% ("robust" or "full")
                      color_bar=False, # boolean whether to plot colorbar
                      figsize=(10, 6), # the size of the figure
                      preprocessor_type='minmax') # data preprocessor for the fingerprint ("minmax", "standard", or "none")


##  Label syllables

Assign a name and short description to each syllable for downstream interpretation.

### Display trajectory plots

In [None]:
kpms.show_trajectory_gif(project_dir, model_dirname)

### Syllable labeling widget

In [None]:
kpms.label_syllables(project_dir, model_dirname, moseq_df, movie_type='grid') # `movie_type` can be "grid" or "crowd"

## Compare between groups

Test for statistically significant differences between groups. The code below takes two groups (an experimental group and a control group) and a syllable property (e.g. frequency or duration), and tests each syllable for whether the property differs between groups. The results are summarized in a plot that is saved to `[project_dir]/[model_dirname]/analysis_figures`.

When keyword `order` is set to "diff", the syllabes ordering will be sorted as differences between `ctrl_group` and `exp_group`. `groups` specifies the groups to be included in the plot and it could be all the groups or a subset of the groups.

In [None]:
kpms.plot_syll_stats_with_sem(
    stats_df, project_dir, model_dirname, 
    plot_sig=True,    # whether to mark statistical significance with a star
    thresh=0.05,      # significance threshold
    stat='frequency', # statistic to be plotted ('duration' or 'velocity_px_s_mean')
    order='stat',     # order syllables by overall frequency ("stat") or degree of difference "diff"
    ctrl_group='a',   # name of the control group for statistical testing
    exp_group='b',    # name of the experimental group for statistical testing
    figsize=(10, 5),   # figure size    
    groups=stats_df['group'].unique(), # groups to be plotted
)

### Transition matrices
Plot heatmaps showing the transition frequencies between syllables.

In [None]:
normalize='bigram' # normalization method ("bigram", "rows" or "columns")
trans_mats, usages, groups, syll_include=kpms.generate_transition_matrices(
    project_dir, model_dirname, 
    normalize=normalize,
    min_frequency=0.005)    # minimum syllable frequency to include

kpms.visualize_transition_bigram(project_dir, model_dirname, 
                                 groups, trans_mats, syll_include, 
                                 normalize=normalize)

### Syllable Transition Graph
Render transition rates as directed graphs, where each node represents one syllable, and the directional edges represent transitions between syllables. The code below first generates a transition graph for each single group, and then generates a difference-graph for each pair of groups.

In [None]:
kpms.plot_transition_graph_group(project_dir, model_dirname, 
                                 groups, trans_mats, usages, syll_include, 
                                 layout='circular') # transition graph layout ("circular" or "spring")

In [None]:
kpms.plot_transition_graph_difference(project_dir, model_dirname, 
                                      groups, trans_mats, usages, syll_include, 
                                      layout='circular') # transition graph layout ("circular" or "spring")