<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Project-setup" data-toc-modified-id="Project-setup-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Project setup</a></span><ul class="toc-item"><li><span><a href="#Files-and-Directory-Structure" data-toc-modified-id="Files-and-Directory-Structure-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Files and Directory Structure</a></span></li><li><span><a href="#Load-Progress" data-toc-modified-id="Load-Progress-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Load Progress</a></span></li><li><span><a href="#Getting-Best-Model-Fit" data-toc-modified-id="Getting-Best-Model-Fit-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Getting Best Model Fit</a></span></li><li><span><a href="#[OPTIONAL]-Specify-Paths-to-Specific-Model" data-toc-modified-id="[OPTIONAL]-Specify-Paths-to-Specific-Model-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>[OPTIONAL] Specify Paths to Specific Model</a></span></li></ul></li><li><span><a href="#Compute-Syllable-Statistics" data-toc-modified-id="Compute-Syllable-Statistics-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Compute Syllable Statistics</a></span><ul class="toc-item"><li><span><a href="#Compute-scalar_df" data-toc-modified-id="Compute-scalar_df-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Compute <code>scalar_df</code></a></span></li><li><span><a href="#Export-scalar_df" data-toc-modified-id="Export-scalar_df-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Export <code>scalar_df</code></a></span></li><li><span><a href="#Compute-mean_df" data-toc-modified-id="Compute-mean_df-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Compute <code>mean_df</code></a></span></li><li><span><a href="#Export-mean_df" data-toc-modified-id="Export-mean_df-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Export <code>mean_df</code></a></span></li></ul></li><li><span><a href="#Interactive-Syllable-Labelling-Tool" data-toc-modified-id="Interactive-Syllable-Labelling-Tool-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Interactive Syllable Labelling Tool</a></span></li><li><span><a href="#Interactive-Syllable-Statistics-Graphing" data-toc-modified-id="Interactive-Syllable-Statistics-Graphing-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Interactive Syllable Statistics Graphing</a></span></li><li><span><a href="#Compute-Syllable-Transition-Matrices" data-toc-modified-id="Compute-Syllable-Transition-Matrices-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Compute Syllable Transition Matrices</a></span><ul class="toc-item"><li><span><a href="#Export-Transition-Matrices-and-usages" data-toc-modified-id="Export-Transition-Matrices-and-usages-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Export Transition Matrices and usages</a></span></li><li><span><a href="#Interactive-Syllable-Transition-Graph-Tool" data-toc-modified-id="Interactive-Syllable-Transition-Graph-Tool-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>Interactive Syllable Transition Graph Tool</a></span></li></ul></li><li><span><a href="#Notebook-End" data-toc-modified-id="Notebook-End-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Notebook End</a></span></li><li><span><a href="#User-Survey" data-toc-modified-id="User-Survey-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>User Survey</a></span></li></ul></div>

# Project setup

## Files and Directory Structure
To run this notebook, you need the following files in your data directory:
- `progress.yaml` (the `progress.yaml` file that contains all the required MoSeq paths)
- `model.p` (trained ARHMM to compute statistics from)
- `moseq2-index.yaml` (the `moseq2-index.yaml` generated containing paths to extracted sessions that will be used to generate syllable crowd movies)
- `config.yaml` (configuration file that contains configured parameters throughout the MoSeq pipeline)
- `_pca/` (PCA-related data generated from the PCA section)
- `aggregate_results/` (aggregated session data)

At this stage, the base directory should contain the necessary files above, as shown below:
```
.
└── Data_Directory/
    ├── progress.yaml
    ├── config.yaml
    ├── moseq2-index.yaml
    ├── model_session_path/
    ├   └── model.p
    ...
    ├── _pca/
    └── aggregate_results/

```

**Note: this notebook uses `progress.yaml` to keep track of all the necessary paths.** Please ensure you run the [Load Progress cell](#Load-Progress) before running any analysis modules.

## Load Progress
- **Run this cell** to load the `progress.yaml` file and verify progress for the result analysis. This cell checks if the `progress_paths` dictionary contains the necessary paths to use the features in the rest of the notebook and displays the content in `progress_paths`.

In [None]:
from os.path import join, dirname, abspath
from moseq2_app.gui.progress import update_progress, restore_progress_vars

progress_filepath = './progress.yaml'
progress_paths = restore_progress_vars(progress_file=progress_filepath)

# necessary paths to check for the analysis pipeline
must_have_paths = ['base_dir', 'config_file', 'index_file', 'train_data_dir', 'pca_dirname', 
                   'scores_filename', 'scores_path', 'changepoints_path', 'model_path']
# keywords that should be in the paths
keywords = [abspath(dirname(progress_filepath)), 'config.yaml', 'moseq2-index.yaml', 'aggregate_results', '_pca/',
           'pca_scores','pca_scores.h5', 'changepoints', 'model.p']
# zip the necessary paths and keywords for checking
must_have_paths = dict(zip(must_have_paths, keywords))

for key, value in must_have_paths.items():
    try:
        assert value in progress_paths.get(key)
    except AssertionError:
        print('Please check and correct the path in', key)
progress_paths

## Getting Best Model Fit
Use this feature to determine whether the trained model has captured median syllable durations that match the principal components' changepoints.

This feature can also return the best model from a list of models found in the `progress_paths['model_session_path']`. Once completed, the function will update the progress file with the returned model.

**Instructions:**
- **Run the following cell** to get the best model fit.

In [None]:
from os.path import join
from moseq2_viz.gui import get_best_fit_model
from moseq2_app.gui.progress import update_progress, restore_progress_vars

progress_paths = restore_progress_vars(progress_filepath)

output_file = join(progress_paths['plot_path'], 'model_vs_pc_changepoints')

best_model_fit = get_best_fit_model(progress_paths, plot_all=True)
progress_paths = update_progress(progress_filepath, 'model_path', best_model_fit['best model - duration'])

## [OPTIONAL] Specify Paths to Specific Model
If you have a specific model and/or specific model session path that are(is) different from the paths in the text fields below, change the text field to specify the new path(s).

**Instructions:**
- **Run the following cell** to specify the path(s) to a specific model and/or specific model session path if needed. 

In [None]:
model_path = progress_paths['model_path'] # replace this with a desired updated path
model_session_path = progress_paths['model_session_path'] # replace this with a desired updated path

update_progress(progress_filepath, 'model_session_path', model_session_path)
progress_paths = update_progress(progress_filepath, 'model_path', model_path)

# Compute Syllable Statistics

The following cells produce 2 dataframes: `scalar_df` and `mean_df`.
 - `scalar_df` is a vertically stacked dataframe of scalar values measured during the extraction step, aligned with the model_labels and timestamps. The shape would be (sum_of_session_frames, 31). To view all the measured scalars, print(scalar_df.columns)
   - This dataframe can be used to plot the scalar feature values for any session over time.
 - `mean_df` is a dataframe of the average syllable-scalar values for all the features included in `scalar_df` grouped by the resorted syllable labels, model groups, and uuids.
   - This dataframe will be used to plot mean syllable statistics and perform hypothesis testing.

## Compute `scalar_df`
**Instructions:**
- **Run the following cell** to compute `scalar_df`.

In [None]:
from moseq2_viz.util import parse_index
from moseq2_viz.scalars.util import scalars_to_dataframe

_, sorted_index = parse_index(progress_paths['index_file'])
# compute session scalar data
scalar_df = scalars_to_dataframe(sorted_index, model_path=progress_paths['model_path'])

print('The shape of scalar_df', scalar_df.shape)
scalar_df.head()

## Export `scalar_df`
**Instructions:**
- **Set `export` variable to `True`** if you want to export `scalar_df` for further analysis outside of this notebook.
- **Specify the place** you want to save the dataframe in `save_path`. If empty, the file will be saved to `base_dir`.
- **Run the following cell** to save `scalar_df` as a CSV file.

In [None]:
# Save `scalar_df` as a csv file
# set export = True if you want to export scalar_df for further analysis outside of this notebook

from os.path import exists, join

export = False
# Specify the place you want to save the dataframe in `save_path`. If empty, the file will be save to `base_dir`
save_path = ""
base_dir = progress_paths['base_dir']

if export:
    # Ensure path exist 
    if len(save_path) ==0:
        save_path = base_dir
        print("Dataframe will be saved to", save_path)
    else:
        try:
            assert exists(save_path)
            print("Dataframe will be saved to", save_path)
        except AssertionError:
            save_path = base_dir
            print('This is not a valid path. Dataframe csv will be saved to base_dir')
    scalar_df.to_csv(join(save_path,'scalar_df.csv'), index=False)
    print('Dataframe is saved')

## Compute `mean_df`

**Instructions:**
- **Run the following cell** to `mean_df`.

In [None]:
from moseq2_viz.model.util import compute_behavioral_statistics
# compute syllable usage and scalar statistics
mean_df = compute_behavioral_statistics(scalar_df, count='usage', groupby=['group', 'uuid'], usage_normalization=True)
print('The shape of mean_df', mean_df.shape)
mean_df.head()

## Export `mean_df`

**Instructions:**
- **Set `export` variable to `True`** if you want to export `mean_df` for further analysis outside of this notebook.
- **Specify the place** you want to save the dataframe in `save_path`. If empty, the file will be saved to `base_dir`.
- **Run the following cell** to save `mean_df` as a CSV file.

In [None]:
# Save `scalar_df` as a csv file
# set export = True if you want to export mean_df for further analysis outside of this notebook
from os.path import exists, join
export = False
# Specify the place you want to save the dataframe in `save_path`. If empty, the file will be save to `base_dir`
save_path = ""
base_dir = progress_paths['base_dir']

if export:
    # Ensure path exist
    if len(save_path) ==0:
        save_path = base_dir
        print("Dataframe will be saved to", save_path)
    else:
        try:
            assert exists(save_path)
            print("Dataframe will be saved to", save_path)
        except AssertionError:
            save_path = base_dir
            print('This is not a valid path. Dataframe csv will be saved to base_dir')
    mean_df.to_csv(join(save_path,'mean_df.csv'), index=False)
    print('Dataframe is saved')

#  Interactive Syllable Labelling Tool
Use this interactive tool to assign behavioral labels and short descriptions to syllables by observing the crowd movies and the Syllable Info table.

**Instructions:**
- **Run the following cell** to launch the Interactive Syllable Statistics Tool.
- **Select a syllable** from the `Syllable` dropdown menu to view the associated crowd movie and syllable info.
- **Adjust the crowd movie playback speed using the `Playback Speed` slider** to better observe the behavior associated with short/fast syllables.
- **Input the syllable behavioral label and short description** in the text fields.
- Click `Save Setting` to save the syllable label and description for later analysis.
- Use `Next` and `Previous` to navigate between syllables and the syllable label and description will be automatically saved when using these buttons.

In [None]:
from os.path import join
from moseq2_app.main import label_syllables
from moseq2_app.gui.progress import update_progress

# Path to generate crowd movies in
crowd_dir = join(progress_paths['model_session_path'], 'crowd_movies/')

# Path to file containing Syllable label information
syll_infopath = join(progress_paths['model_session_path'], 'syll_info.yaml')

# convenience file containing reused syllable statistics data
syll_info_df_path = join(progress_paths['model_session_path'], 'syll_df.parquet')

# Select number of syllables based on an explained variance percentage
explained_variance = 99

# To instead label a fixed number of syllables, set max_syllables <= nstates
max_syllables = None

update_progress(progress_filepath, 'crowd_dir', crowd_dir)
update_progress(progress_filepath, 'syll_info', syll_infopath)
progress_paths = update_progress(progress_filepath, 'df_info_path', syll_info_df_path)

label_syllables(progress_paths, max_syllables=max_syllables, n_explained=explained_variance)

# Interactive Syllable Statistics Graphing

Use this interactive tool to plot different syllable statistics and their differences in the modeled groups. The dendrogram displayed below the statistics plot represents the hierarchically sorted pairwise distances between the given model's autoregressive matrices representing the syllables.

**Instructions:**
- *Run the following cell* to launch the Interactive Syllable Statistics Tool.
- **Select the parameter(s) from the dropdown menus** parameters to control the graph. 
- To plot multiple sessions and subjects, **select multiple sessions from `SessionName` or `SubjectName` while holding down the [Ctrl]/[Command]/[Shift] key**.
- **Hover over the data points** to display syllable info.

In [None]:
from moseq2_app.main import interactive_syllable_stats

max_syllables = None

# If loading parquet files is taking too long, set load_parquet=False
interactive_syllable_stats(progress_paths, max_syllable=max_syllables, load_parquet=True)

# Compute Syllable Transition Matrices

Note this code block is loading the model from the `model_path` and is __not__ using the `mean_df` variable.

**Instructions:**
- **Run the following cell** to compute syllable transition matrices within each group.

In [None]:
from moseq2_viz.model.util import parse_model_results, relabel_by_usage
from moseq2_viz.model.trans_graph import get_trans_graph_groups, get_group_trans_mats
from moseq2_viz.model.util import compute_syllable_explained_variance

# load your model
model_path = progress_paths['model_path']
model_data = parse_model_results(model_path)
model_data['labels'] = relabel_by_usage(model_data['labels'], count='usage')[0]
max_syllable = compute_syllable_explained_variance(model_data, n_explained=99)

# select a transition matrix normalization method
normalize = 'bigram' # other options: 'columns', 'rows'

# Get modeled session uuids to compute group-mean transition graph for each group
label_group, uuids = get_trans_graph_groups(model_data)
group = list(set(label_group))
# compute transition matrices and usages for each group
print('Group(s):', group)
trans_mats, usages = get_group_trans_mats(model_data['labels'], label_group, group, max_syllable, normalize=normalize)

import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, len(group), figsize=(12, 8), sharex=False, sharey=True)

for i, g in enumerate(group):
    h = ax[i].imshow(trans_mats[i][:max_syllable,:max_syllable], cmap='magma')
    plt.colorbar(h, ax=ax[i], fraction=0.046, pad=0.04)
    ax[i].set_xlabel('Syllable j')
    ax[i].set_ylabel('Syllable i')
    ax[i].set_title(f'{g}: Bigram Transition Probabilities')

## Export Transition Matrices and usages

**Instructions:**
- **Set `export` variable to `True`** if you want to export the transition matrices and syllable usages for further analysis outside of this notebook.
- **Specify the place** you want to save the dataframe in `save_path`. If empty, the file will be saved to `base_dir`.
- **Run the following cell** to save the group transition matrices and syllable usages as CSV files.

In [None]:
import pandas as pd
from os.path import exists, join

# set export = True if you want to export the transition matrices and syllable usages
export = False
base_dir = progress_paths['base_dir']
selected_group = "" #specify group name to be exported here
save_path = ""



if export:
    # Construct data frame
    group_index = next(i for i, g in enumerate(group) if g == selected_group)
    group_trans = pd.DataFrame(trans_mats[group_index])
    group_usages = []

    for key, value in usages[group_index].items():
        group_usages.append([key, value])

    group_usages = pd.DataFrame(group_usages)
    group_usages.set_axis(['Syllable', 'Usage'], axis = 1, inplace = True)
    
  # Ensure path exist
    if len(save_path) ==0:
        save_path = base_dir
        print("Dataframe will be saved to", save_path)
    else:
        try:
            assert exists(save_path)
            print("Dataframe will be saved to", save_path)
        except AssertionError:
            save_path = base_dir
            print('This is not a valid path. Dataframe csv will be saved to base_dir')
    
    group_trans.to_csv(join(save_path,selected_group+'_trans.csv'), index=False)
    group_usages.to_csv(join(save_path,selected_group+'_usage.csv'), index=False)
    print('Dataframe is saved')

## Interactive Syllable Transition Graph Tool
Use this tool to explore the behavioral transitions of your modeled groups, such as bigrams/trigrams or different usage/transition probability given certain thresholds.

**Instructions:**
- Run the following cell to launch the Interactive Syllable Transition Graphing Tool.
- **Select the parameter(s)** from the dropdown menus to control the graph.
- **Hover over the edges and nodes** to display the edge colors and syllable info.
**Note: Nodes outside the threshold will be hidden.**

Run this cell to display the entire view in the cell output

In [None]:
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

In [None]:
from moseq2_app.main import interactive_transition_graph

max_syllables = None

interactive_transition_graph(progress_paths, max_syllables=max_syllables, plot_vertically=True, load_parquet=True)

***

# Notebook End 

# User Survey

Please take some time to tell us your thoughts about this notebook:
**[user feedback survey](https://forms.gle/FbtEN8E382y8jF3p6)**