<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><ul class="toc-item"><li><span><a href="#File-Requirements" data-toc-modified-id="File-Requirements-0.1"><span class="toc-item-num">0.1&nbsp;&nbsp;</span>File Requirements</a></span></li><li><span><a href="#Recommended-File-Structure" data-toc-modified-id="Recommended-File-Structure-0.2"><span class="toc-item-num">0.2&nbsp;&nbsp;</span>Recommended File Structure</a></span></li><li><span><a href="#Initial-Order-of-Operations" data-toc-modified-id="Initial-Order-of-Operations-0.3"><span class="toc-item-num">0.3&nbsp;&nbsp;</span>Initial Order of Operations</a></span></li></ul></li><li><span><a href="#Load-Progress" data-toc-modified-id="Load-Progress-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Load Progress</a></span><ul class="toc-item"><li><span><a href="#Expectation-from-Progress-File" data-toc-modified-id="Expectation-from-Progress-File-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Expectation from Progress File</a></span></li></ul></li><li><span><a href="#Get-Best-Model-Fit" data-toc-modified-id="Get-Best-Model-Fit-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Get Best Model Fit</a></span></li><li><span><a href="#(Optional)-Set-Paths-to-Specific-Model" data-toc-modified-id="(Optional)-Set-Paths-to-Specific-Model-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>(Optional) Set Paths to Specific Model</a></span></li><li><span><a href="#Label-Syllables" data-toc-modified-id="Label-Syllables-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Label Syllables</a></span><ul class="toc-item"><li><span><a href="#Widget-Guide" data-toc-modified-id="Widget-Guide-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Widget Guide</a></span></li><li><span><a href="#Instructions" data-toc-modified-id="Instructions-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Instructions</a></span></li></ul></li><li><span><a href="#Interactive-Syllable-Statistics-Graphing" data-toc-modified-id="Interactive-Syllable-Statistics-Graphing-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Interactive Syllable Statistics Graphing</a></span><ul class="toc-item"><li><span><a href="#Widget-Guide" data-toc-modified-id="Widget-Guide-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Widget Guide</a></span></li><li><span><a href="#Instructions" data-toc-modified-id="Instructions-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>Instructions</a></span></li></ul></li><li><span><a href="#Compute-Syllable-Transition-Matrices" data-toc-modified-id="Compute-Syllable-Transition-Matrices-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Compute Syllable Transition Matrices</a></span></li><li><span><a href="#Export-Transition-Matrices-and-usages-(Optional)" data-toc-modified-id="Export-Transition-Matrices-and-usages-(Optional)-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Export Transition Matrices and usages (Optional)</a></span></li><li><span><a href="#Interactive-Syllable-Transition-Graph" data-toc-modified-id="Interactive-Syllable-Transition-Graph-8"><span class="toc-item-num">8&nbsp;&nbsp;</span>Interactive Syllable Transition Graph</a></span><ul class="toc-item"><li><span><a href="#Widget-Guide" data-toc-modified-id="Widget-Guide-8.1"><span class="toc-item-num">8.1&nbsp;&nbsp;</span>Widget Guide</a></span></li><li><span><a href="#Instructions" data-toc-modified-id="Instructions-8.2"><span class="toc-item-num">8.2&nbsp;&nbsp;</span>Instructions</a></span></li></ul></li><li><span><a href="#Compute-Syllable-Statistics" data-toc-modified-id="Compute-Syllable-Statistics-9"><span class="toc-item-num">9&nbsp;&nbsp;</span>Compute Syllable Statistics</a></span><ul class="toc-item"><li><span><a href="#Compute-scalar_df" data-toc-modified-id="Compute-scalar_df-9.1"><span class="toc-item-num">9.1&nbsp;&nbsp;</span>Compute <code>scalar_df</code></a></span></li><li><span><a href="#Compute-mean_df" data-toc-modified-id="Compute-mean_df-9.2"><span class="toc-item-num">9.2&nbsp;&nbsp;</span>Compute <code>mean_df</code></a></span></li></ul></li></ul></div>

# MoSeq2 Interactive Results Exploration

### File Requirements and Organization
To run this notebook, you need the following files in your data directory:
- `progress.yaml` (the `progress.yaml` file that contains all the required MoSeq paths)
- `model.p` (trained ARHMM to compute statistics from)
- `moseq2-index.yaml` (the `moseq2-index.yaml` generated containing paths to extracted sessions that will be used to generate syllable crowd movies)
- `config.yaml` (configuration file that contains configured parameters throughout the MoSeq pipeline)
- `_pca/` (PCA-related data generated from the PCA section)
- `aggregate_results/` (aggregated session data)

### Recommended File Structure
```
.
└── Data_Directory/
    ├── progress.yaml
    ├── config.yaml
    ├── moseq2-index.yaml
    ├── model_session_path/
    ├   └── model.p
    ...
    ├── _pca/
    └── aggregate_results/

```

**Note: this notebook uses `progress.yaml` to keep track of all the necessary paths.** Please ensure you run the [Load Progress cell](#Load-Progress) before running any analysis modules.

## Load Progress
Run the following cell to load the `progress.yaml` file. The cell checks if all the necessary paths are found and output the path that needs checking if the path doesn't contain the intended keyword.

In [None]:
from os.path import join, dirname
from moseq2_app.gui.progress import update_progress, restore_progress_vars

progress_filepath = './progress.yaml'
progress_paths = restore_progress_vars(progress_file=progress_filepath)

# necessary paths to check for the analysis pipeline
must_have_paths = ['base_dir', 'config_file', 'index_file', 'train_data_dir', 'pca_dirname', 
                   'scores_filename', 'scores_path', 'changepoints_path', 'model_path']
# keywords that should be in the paths
keywords = [dirname(progress_filepath), 'config.yaml', 'moseq2-index.yaml', 'aggregate_results', '_pca/',
           'pca_scores','pca_scores.h5', 'changepoints', 'model.p']
# zip the necessary paths and keywords for checking
must_have_paths = dict(zip(must_have_paths, keywords))

for key, value in must_have_paths.items():
    try:
        assert value in progress_paths.get(key)
    except AssertionError:
        print('Please check and correct the path in', key)
progress_paths

## Get Best Model Fit

Use this feature to determine whether the trained model has captured median syllable durations that match the principal components' changepoints.

This feature can also return the best model from a list of models found in the `progress_paths['model_session_path']`. Once completed, the function will update the progress file with the returned model.

Run this cell to get find the best model fit.

In [None]:
from os.path import join
from moseq2_viz.gui import get_best_fit_model
from moseq2_app.gui.progress import update_progress, restore_progress_vars

progress_paths = restore_progress_vars(progress_filepath)

output_file = join(progress_paths['plot_path'], 'model_vs_pc_changepoints')

best_model_fit = get_best_fit_model(progress_paths, plot_all=True)
progress_paths = update_progress(progress_filepath, 'model_path', best_model_fit['best model - duration'])

## (Optional) Set Paths to Specific Model
If you want to use a different model from the one in `progress_paths['model_path']` and/or a different model session path from the one in `progress_paths['model_session_path']`. You can run the following cell to set paths to the specific model path and the session path.

In [None]:
model_path = progress_paths['model_path'] # replace this with a desired updated path
model_session_path = progress_paths['model_session_path'] # replace this with a desired updated path

update_progress(progress_filepath, 'model_session_path', model_session_path)
progress_paths = update_progress(progress_filepath, 'model_path', model_path)

## Label Syllables

Use this interactive tool to assign behavioral labels and short descriptions to syllables by observing the crowd movies and the Syllable Info table.

Instructions
- Run the following cell to launch the interactive Syllable Labelling Tool.
- Select a syllable from the `Syllable` dropdown menu to view the associated crowd movie and syllable info.
- Use the `Playback Speed` slider to adjust the crowd movie playback speed to better observe the behavior associated with short/fast syllables.
- Enter the syllable label in the `Syllable Name` field and desired description in `Short Description`.
- Click `Save Setting` to save the syllable label and description for later analysis.
- Use `Next` and `Previous` to navigate between syllables and the syllable label and description will be automatically saved when using these buttons.

In [None]:
from os.path import join
from moseq2_app.main import label_syllables
from moseq2_app.gui.progress import update_progress

# Path to generate crowd movies in
crowd_dir = join(progress_paths['model_session_path'], 'crowd_movies/')

# Path to file containing Syllable label information
syll_infopath = join(progress_paths['model_session_path'], 'syll_info.yaml')

# convenience file containing reused syllable statistics data
syll_info_df_path = join(progress_paths['model_session_path'], 'syll_df.parquet')

# Select number of syllables based on an explained variance percentage
explained_variance = 99

# To instead label a fixed number of syllables, set max_syllables <= nstates
max_syllables = None

update_progress(progress_filepath, 'crowd_dir', crowd_dir)
update_progress(progress_filepath, 'syll_info', syll_infopath)
progress_paths = update_progress(progress_filepath, 'df_info_path', syll_info_df_path)

label_syllables(progress_paths, max_syllables=max_syllables, n_explained=explained_variance)

## Interactive Syllable Statistics Graphing

Use this interactive tool to plot different syllable statistics and their differences in the modeled groups. The dendrogram displayed below the statistics plot represents the hierarchically sorted pairwise distances between the given model's autoregressive matrices representing the syllables.

Usage
- Run the following cell to launch the Interactive Syllable Statistics Tool.
- Select the parameters from the dropdown menus to control the graph. 
    - If you select `Difference` from the `Sorting` dropdown menu, the syllables will be sorted by the value difference between two groups and additional menus will appear for statistical testing to test whether the differences between groups are significant.  
    - If you select `group` from `Grouping`, the mean of all the sessions within each group will be plotted in the graph.
    - If you select `SessionName` or `SubjectName`, you can select multiple sessions/subjects in the `Sessions` menu by holding down CTRL/COMMAND key. you can click on the legend items to selectively hide the corresponding data points.
    - If you have label the syllables, you can use specify the syllables you want to plot in the `Syllable to Display` field, such as "run", "walk" etc. The text input is not case-sensitive.
- Select a threshold criterion from the "Threshold By" dropdown menu. Use the Thresholding Slider to include syllables with statistics within a specific value range.
- Hover over the circle data points to display a pop-up window with additional syllable metadata.

In [None]:
from moseq2_app.main import interactive_syllable_stats

max_syllables = None

# If loading parquet files is taking too long, set load_parquet=False
interactive_syllable_stats(progress_paths, max_syllable=max_syllables, load_parquet=True)

## Compute Syllable Transition Matrices

Run the following cell to compute syllable transition matrices within each group. Note this code block is loading the model from the `model_path`, and is __not__ using the `mean_df` variable.

In [None]:
from moseq2_viz.model.util import parse_model_results, relabel_by_usage
from moseq2_viz.model.trans_graph import get_trans_graph_groups, get_group_trans_mats
from moseq2_viz.model.util import compute_syllable_explained_variance

# load your model
model_path = progress_paths['model_path']
model_data = parse_model_results(model_path)
model_data['labels'] = relabel_by_usage(model_data['labels'], count='usage')[0]
max_syllable = compute_syllable_explained_variance(model_data, n_explained=99)

# select a transition matrix normalization method
normalize = 'bigram' # other options: 'columns', 'rows'

# Get modeled session uuids to compute group-mean transition graph for each group
label_group, uuids = get_trans_graph_groups(model_data)
group = list(set(label_group))
# compute transition matrices and usages for each group
print('Group(s):', group)
trans_mats, usages = get_group_trans_mats(model_data['labels'], label_group, group, max_syllable, normalize=normalize)

import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, len(group), figsize=(12, 8), sharex=False, sharey=True)

for i, g in enumerate(group):
    h = ax[i].imshow(trans_mats[i][:max_syllable,:max_syllable], cmap='magma')
    plt.colorbar(h, ax=ax[i], fraction=0.046, pad=0.04)
    ax[i].set_xlabel('Syllable j')
    ax[i].set_ylabel('Syllable i')
    ax[i].set_title(f'{g}: Bigram Transition Probabilities')

## Export Transition Matrices and usages (Optional)

In [None]:
import pandas as pd
from os.path import exists, join

# set export = True if you want to export the transition matrices and syllable usages
export = True
base_dir = progress_paths['base_dir']
selected_group = "" #specify group name to be exported here
save_path = ""

group_index = next(i for i, g in enumerate(group) if g == selected_group)

group_trans = pd.DataFrame(trans_mats[group_index])
group_usages = []

for key, value in usages[group_index].items():
    group_usages.append([key, value])
group_usages = pd.DataFrame(group_usages)
group_usages.set_axis(['Syllable', 'Usage'], axis = 1, inplace = True)

if export:
  # Ensure path exist
    if len(save_path) ==0:
        save_path = base_dir
        print("Dataframe will be saved to", save_path)
    else:
        try:
            assert exists(save_path)
            print("Dataframe will be saved to", save_path)
        except AssertionError:
            save_path = base_dir
            print('This is not a valid path. Dataframe csv will be saved to base_dir')
    
    group_trans.to_csv(join(save_path,selected_group+'_trans.csv'), index=False)
    group_usages.to_csv(join(save_path,selected_group+'_usage.csv'), index=False)
    print('Dataframe is saved')

## Interactive Syllable Transition Graph

Use this tool to explore the behavioral transition space of your modeled groups. Find sequences of behavior,e.g. bigrams/trigrams, at different usage/transition probability ranges, and gain a better understanding of the differences across your modeling groups.
 
Select the parameter(s) from the dropdown menus and run the cell again to plot the updated plots.
Hover over the edges and nodes to display the edge colors and syllable info.


Usage:
- Run the following cell to launch the Interactive Syllable Transition Graphing Tool.
- Use `Graph Layout` DropDown Menu to specify the graph layout and `Node Coloring` to specify the scalar value the node colors are based on.
- Use the `Threshold Edge Weights` slider to select a range for syllable transition probabilities to display in the graphs.
- Use the `Threshold Nodes by Usage` slider to select a range for syllable usages to display in the graphs.
- Use the `Threshold Nodes by <lorem ipsum>` to select a range for `<lorem ipsum>` to display in the graphs.

**Note: Nodes outside the threshold will be hidden.**

Run this cell to display the entire view in the cell output

In [None]:
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

In [None]:
from moseq2_app.main import interactive_transition_graph

max_syllables = None

interactive_transition_graph(progress_paths, max_syllables=max_syllables, plot_vertically=True, load_parquet=True)

## Compute Syllable Statistics

This following cells produce 2 dataframes: `scalar_df` and `mean_df`.
 - `scalar_df` is a vertically stacked dataframe of scalar values measured during the extraction step, aligned with the model_labels and timestamps. The shape would be (sum_of_session_frames, 31). To view all the measured scalars, print(scalar_df.columns)
   - This dataframe can be used to plot the scalar feature values for any session over time.
 - `mean_df` is a dataframe of the average syllable-scalar values for all the features included in `scalar_df` grouped by the resorted syllable labels, model groups and uuids.
   - This dataframe will be used to plot mean syllable statistics and perform hypothesis testing.

### Compute `scalar_df`

In [None]:
from moseq2_viz.util import parse_index
from moseq2_viz.scalars.util import scalars_to_dataframe

_, sorted_index = parse_index(progress_paths['index_file'])
# compute session scalar data
scalar_df = scalars_to_dataframe(sorted_index, model_path=progress_paths['model_path'])

print('The shape of scalar_df', scalar_df.shape)
scalar_df.head()

### Export `scalar_df`

In [None]:
# Save `scalar_df` as a csv file
# set export = True if you want to export scalar_df for further analysis outside of this notebook

from os.path import exists, join

export = True
# Specify the place you want to save the dataframe in `save_path`. If empty, the file will be save to `base_dir`
save_path = ""
base_dir = progress_paths['base_dir']

if export:
    # Ensure path exist 
    if len(save_path) ==0:
        save_path = base_dir
        print("Dataframe will be saved to", save_path)
    else:
        try:
            assert exists(save_path)
            print("Dataframe will be saved to", save_path)
        except AssertionError:
            save_path = base_dir
            print('This is not a valid path. Dataframe csv will be saved to base_dir')
    scalar_df.to_csv(join(save_path,'scalar_df.csv'), index=False)
    print('Dataframe is saved')

### Compute `mean_df`

In [None]:
from moseq2_viz.model.util import compute_behavioral_statistics
# compute syllable usage and scalar statistics
mean_df = compute_behavioral_statistics(scalar_df, count='usage', groupby=['group', 'uuid'], usage_normalization=True)
print('The shape of mean_df', mean_df.shape)
mean_df.head()

### Export `mean_df`

In [None]:
# Save `scalar_df` as a csv file
# set export = True if you want to export mean_df for further analysis outside of this notebook
from os.path import exists, join
export = True
# Specify the place you want to save the dataframe in `save_path`. If empty, the file will be save to `base_dir`
save_path = ""
base_dir = progress_paths['base_dir']

if export:
    # Ensure path exist
    if len(save_path) ==0:
        save_path = base_dir
        print("Dataframe will be saved to", save_path)
    else:
        try:
            assert exists(save_path)
            print("Dataframe will be saved to", save_path)
        except AssertionError:
            save_path = base_dir
            print('This is not a valid path. Dataframe csv will be saved to base_dir')
    mean_df.to_csv(join(save_path,'mean_df.csv'), index=False)
    print('Dataframe is saved')

***

# Notebook End 

# User Survey

Please take some time to tell us your thoughts about this notebook:
**[user feedback survey](https://forms.gle/FbtEN8E382y8jF3p6)**