# Table of Contents
<div class="toc"><ul class="toc-item"><li><span><a href="#Project-setup" data-toc-modified-id="Project-setup-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Project setup</a></span><ul class="toc-item"><li><span><a href="#Files-and-Directory-Structure" data-toc-modified-id="Files-and-Directory-Structure-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Files and Directory Structure</a></span></li><li><span><a href="#Load-Progress" data-toc-modified-id="Load-Progress-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Load Progress</a></span></li><li><span><a href="#Setup-Directory-Structure-for-Analyzing-Model(s)" data-toc-modified-id="Setup-Directory-Structure-for-Analyzing-Model(s)-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Setup Directory Structure for Analyzing Model(s)</a></span></li><li><span><a href="#Get-Best-Model-Fit" data-toc-modified-id="Get-Best-Model-Fit-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Get Best Model Fit</a></span></li><li><span><a href="#[OPTIONAL]-Specify-Paths-to-Specific-Model" data-toc-modified-id="[OPTIONAL]-Specify-Paths-to-Specific-Model-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>[OPTIONAL] Specify Paths to Specific Model</a></span></li></ul></li><li><span><a href="#Compute-Syllable-Statistics" data-toc-modified-id="Compute-Syllable-Statistics-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Compute Syllable Statistics</a></span><ul class="toc-item"><li><span><a href="#Compute-scalar_df" data-toc-modified-id="Compute-scalar_df-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Compute <code>scalar_df</code></a></span></li><li><span><a href="#Export-scalar_df" data-toc-modified-id="Export-scalar_df-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Export <code>scalar_df</code></a></span></li><li><span><a href="#Compute-mean_df" data-toc-modified-id="Compute-mean_df-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Compute <code>mean_df</code></a></span></li><li><span><a href="#Export-mean_df" data-toc-modified-id="Export-mean_df-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Export <code>mean_df</code></a></span></li><li><span><a href="#Generate-Behavioral-Summary-(Fingerprints)" data-toc-modified-id="Generate-Behavioral-Summary-(Fingerprints)-2.5"><span class="toc-item-num">2.5&nbsp;&nbsp;</span>Generate Behavioral Summary (Fingerprints)</a></span></li><li><span><a href="#Linear-Classifier-using-MoSeq-Syllables" data-toc-modified-id="Linear-Classifier-using-MoSeq-Syllables-2.6"><span class="toc-item-num">2.6&nbsp;&nbsp;</span>Linear Classifier using MoSeq Syllables</a></span></li><li><span><a href="#Linear-Classifier-using-MoSeq-Scalar-values" data-toc-modified-id="Linear-Classifier-using-MoSeq-Scalar-values-2.7"><span class="toc-item-num">2.7&nbsp;&nbsp;</span>Linear Classifier using MoSeq Scalar values</a></span></li></ul></li><li><span><a href="#Interactive-Syllable-Labelling-Tool" data-toc-modified-id="Interactive-Syllable-Labelling-Tool-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Interactive Syllable Labelling Tool</a></span></li><li><span><a href="#Interactive-Syllable-Statistics-Graphing" data-toc-modified-id="Interactive-Syllable-Statistics-Graphing-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Interactive Syllable Statistics Graphing</a></span></li><li><span><a href="#Compute-Syllable-Transition-Matrices" data-toc-modified-id="Compute-Syllable-Transition-Matrices-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Compute Syllable Transition Matrices</a></span><ul class="toc-item"><li><span><a href="#Export-Transition-Matrices-and-usages" data-toc-modified-id="Export-Transition-Matrices-and-usages-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Export Transition Matrices and usages</a></span></li><li><span><a href="#Interactive-Syllable-Transition-Graph-Tool" data-toc-modified-id="Interactive-Syllable-Transition-Graph-Tool-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>Interactive Syllable Transition Graph Tool</a></span></li></ul></li><li><span><a href="#Notebook-End" data-toc-modified-id="Notebook-End-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Notebook End</a></span></li><li><span><a href="#User-Survey" data-toc-modified-id="User-Survey-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>User Survey</a></span></li></ul></div>

***
In this notebook, the **Markdown** above each cell describes the purpose of the cell(s), cell output, and the instructions for running the cell(s) and/or interacting with the widget. The **inline code comments** in the code block provides contextual information about the function, code structure, and parameters.
***

# Project setup

## Files and Directory Structure
To run this notebook, you need the following files and folders in your data directory:
- `progress.yaml` - this file stores all the required MoSeq paths used throughout the notebooks
- `model.p` - at least one trained AR-HMM file. It is usually saved in this format within a model folder that lives in your data directory. In the example below, it is saved in `base_model_path`.
- `moseq2-index.yaml` - contains paths to extracted sessions and is used to generate syllable crowd movies
- `config.yaml` - configuration file that stores parameters used throughout the MoSeq pipeline
- `_pca/` - folder that contains data generated from the PCA section of the extraction notebook or CLI
- `aggregate_results/` - folder that contains aggregated and extracted session data

At this stage, the base directory should contain these files and folders. An example is shown below:
```
.
└── <base_dir>/
    ├── progress.yaml
    ├── config.yaml
    ├── moseq2-index.yaml
    ├── base_model_path/
    ├   └── model.p
    ...
    ├── _pca/
    └── aggregate_results/

```
For more information about how MoSeq organizes data, check out our [wiki](https://github.com/dattalab/moseq2-app/wiki/Directory-Structures-and-yaml-Files-in-MoSeq-Pipeline).

**Note: this notebook uses the `progress.yaml` file to keep track of all the necessary paths.** Please ensure you run the [Load Progress cell](#Load-Progress) before running any analysis modules.

## Load Progress

**[IMPORTANT] ALWAYS run this cell when you open this notebook.** If you don't, the analysis functions won't work.

MoSeq uses a [`progress.yaml`](https://github.com/dattalab/moseq2-app/wiki/Directory-Structures-and-yaml-Files-in-MoSeq-Pipeline#progressyaml-file) file to keep track of the progress. 

The following cell generates a `progress.yaml` file, or loads your progress from the last saved checkpoint.  This cell checks if the `progress_paths` dictionary contains the necessary paths to use the features in the rest of the notebook and displays the content in `progress_paths`.

- **Specify the path to the `progress.yaml` file**. No change needed if the notebooks are in the base directory.
- **Run the following cell**.

**Note:** If the Extraction, PCA and/or modeling steps are done uing the Command Line Interface, set `from_CLI=True`.

In [None]:
from os.path import join, dirname, abspath
from moseq2_app.gui.progress import update_progress, restore_progress_vars, progress_path_sanity_check

progress_filepath = './progress.yaml' # Add the path to your progress.yaml here.
from_CLI = False  # set to True if you are coming from the CLI. Keep False if you used the extraction notebook 
progress_paths = restore_progress_vars(progress_file=progress_filepath, init=from_CLI, overwrite=from_CLI)

progress_path_sanity_check(progress_paths, progress_filepath)

progress_paths

## Setup Directory Structure for Analyzing Model(s)
The analysis and visualization results from the model are stored in the same folder as the model. However, training multiple models will save them all to the base model folder. To isolate the analysis and visualization results for a specific model, this step reorganizes the models so that they have their own folder.

The following step detects all the models and creates a model-specific folder for each model and copies the model to the respective folder. Refer to the [wiki](https://github.com/dattalab/moseq2-app/wiki/Directory-Structures-and-yaml-Files-in-MoSeq-Pipeline#after-training-the-arhmm-model) for more details about how this step reorganizes models.

- **Run the following cell**.

In [None]:
from moseq2_app.util import setup_model_folders
model_dict = setup_model_folders(progress_paths)

## Get Best Model Fit
This feature finds the trained model with syllable durations that best match changepoint durations computed from the principal components. Learn more about how we use changepoints in our [first MoSeq publication](http://datta.hms.harvard.edu/wp-content/uploads/2018/01/pub_23.pdf).

This feature supports two objective functions to find best matches:
- `duration` finds the model where **the median syllable duration** best matches the median changepoint duration. We generally stick with this objective.
- `jsd` finds the model where **the syllable duration distribution** best matches that changepoints duration distribution.

This feature returns the best model from the list of models found in the `progress_paths['base_model_path']` folder. The cell also modifies `model_session_path` in the `progress.yaml` file such that the rest of the analysis and visualization notebook uses this model.

You can find more information in the [wiki](https://github.com/dattalab/moseq2-app/wiki/MoSeq2-Extract-Modeling-Notebook-Instructions#finding-best-model-fit).

**Instructions:**
- **Run the following cell**.

In [None]:
from os.path import join
from moseq2_viz.gui import get_best_fit_model
from moseq2_app.util import update_model_paths
from moseq2_app.gui.progress import update_progress, restore_progress_vars

progress_paths = restore_progress_vars(progress_filepath)

output_file = join(progress_paths['plot_path'], 'model_vs_pc_changepoints')
objective = 'duration' # objective can be either duration or jsd

best_model_fit = get_best_fit_model(progress_paths, output_file, plot_all=True, objective=objective)

best_model = best_model_fit['best model - duration'].split('/')[-1]

# Update the progress file with best model
progress_paths = update_model_paths(best_model, model_dict, progress_filepath)

## [OPTIONAL] Specify Paths to Specific Model
NOTE: only run this section if you want to skip the section above (finding the best model) and manually set a model path to analyze.
If you have a specific model that is different from the best model found in the cell above, change the `desired_model` to specify the file name of the model.

**Instructions:**
- **Specify the file name of the model in the `desired_model` field.**
- **Run the following cell**.

In [None]:
from moseq2_app.util import update_model_paths
desired_model = best_model # replace this with the file name to the desired model, eg. "model1.p"

# Update progress file
progress_paths = update_model_paths(desired_model, model_dict, progress_filepath)

# Compute Syllable Statistics
MoSeq extracts scalar values (eg. 2D velocity, width, height, etc) from the depth video while the models identify behavioral motifs (syllables).
The steps below combine these data streams into a `DataFrame`. Learn more about `DataFrames` on [pandas's website](https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html).

Not only are these `DataFrames` useful for the analysis and visualization cells below, but you can also save and export them to run custom analyses.
You can find more information about how we use `DataFrames` in the wiki [here](https://github.com/dattalab/moseq2-app/wiki/MoSeq2-Analysis-Visualization-Notebook-Instructions#compute-syllable-statistics).

## Compute `scalar_df`
The following cell generates a `DataFrame` of scalar values computed during the extraction step aligned to the model labels. All sessions are concatenated to form this `DataFrame`. Each row is one "frame" which corresponds to one sample from the original depth video. Each column is a different entry. To view the entries of this `DataFrame`, run `print(scalar_df.columns)`. This `DataFrame` can be used to plot the scalar feature values for any session over time.

**Note:** the syllable columns contain `-5` for the first 3 frames of each session's recordings. This is because we use the first 3 frames to initialize the AR-HMM, and thus cannot supply a syllable label to them. We generally remove these frames from analysis.

**Instructions:**
- **Run the following cell**.

In [None]:
from moseq2_viz.util import parse_index
from moseq2_viz.scalars.util import scalars_to_dataframe

_, sorted_index = parse_index(progress_paths['index_file'])
# compute session scalar data
scalar_df = scalars_to_dataframe(sorted_index, model_path=progress_paths['model_path'])

print('scalar_df size: ', scalar_df.shape[0], 'rows;', scalar_df.shape[1], 'columns')

# this line prints out the first 5 rows of the dataframe in table format. It is used to
# get a sense of what is contained in the DataFrame
scalar_df.head()

## Export `scalar_df`

You can export the `scalar_df` to a csv file (or other alternative file type) for further analysis using the following cell. [See here](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html) for a list of alternative file formats to save `scalar_df`.

**Note:** it can take quite a while to export the `scalar_df`, especially if it is saved as a `.csv` file. Check out [pandas's user guide](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html) to see if other formats that save more quickly can work for you.

**Instructions:**
- **Specify the folder** you want to save the dataframe in `save_path`. By default, the file will be saved to `base_dir`.
- **Run the following cell** to save `scalar_df` as a CSV file.

In [None]:
# Save `scalar_df` as a csv file
from os.path import exists, join

# Specify the place you want to save the dataframe in `save_path`
save_path = progress_paths['base_dir']  # save the dataframe in the base data directory
# alternatively, you can save the dataframe in the model directory. Uncomment the following line to do so:
# save_path = progress_paths['model_path']

# exports the dataframe
filename = 'scalar_df.csv'
# here we use .to_csv to export the dataframe, but you can change it to try other ways of saving your data
scalar_df.to_csv(join(save_path, filename), index=False)

print('DataFrame is saved:', join(save_path, filename))

## Compute `mean_df`
`mean_df` is a `DataFrame` of the average scalar values associated with each syllable. By default it is computed using the features included in `scalar_df` for each session independently. This dataframe will be used to plot syllable statistics and perform hypothesis testing.

**Instructions:**
- **Run the following cell**.

In [None]:
from moseq2_viz.model.util import compute_behavioral_statistics
# compute syllable usage and scalar statistics
mean_df = compute_behavioral_statistics(scalar_df, count='usage', groupby=['group', 'uuid'], usage_normalization=True)
print('mean_df size: ', mean_df.shape[0], 'rows;', mean_df.shape[1], 'columns')
mean_df.head()  # display the first 5 rows in this DataFrame

## Export `mean_df`

You can export the `mean_df` to a csv file for further analysis using the following cell.

**Instructions:**
- **Specify the place** you want to save the dataframe in `save_path`. By default, the file will be saved to `base_dir`.
- **Run the following cell** to save `mean_df` as a CSV file.

In [None]:
# Save `mean_df` as a csv file
from os.path import exists, join

# Specify the place you want to save the dataframe in `save_path`
save_path = progress_paths['base_dir']  # save the dataframe in the base data directory
# alternatively, you can save the dataframe in the model directory. Uncomment the following line to do so:
# save_path = progress_paths['model_path']

# exports the dataframe
filename = 'mean_df.csv'
mean_df.to_csv(join(save_path, filename), index=False)

print('DataFrame is saved:', join(save_path, filename))

#  Interactive Syllable Labeling Tool

This tool computes and displays crowd movies for each syllable.
Crowd movies are created by superimposing examples of mice expressing a specific syllable labeled by MoSeq.
When a colored dot appears over a mouse, that means the mouse is expressing the specific crowd movie syllable.
Crowd movies can be used to provide meaningful syllable descriptions and better understand the behavioral repertoire of your experimental cohorts.

This tool allows you to assign labels and short descriptions to each syllable by observing the crowd movies.
You can find more information in the [wiki](https://github.com/dattalab/moseq2-app/wiki/MoSeq2-Analysis-Visualization-Notebook-Instructions#interactive-syllable-labelling).

Run the following cell to start the interactive widget to assign behavioral labels and short descriptions to syllables by observing the crowd movies and the Syllable Info table. 

**Instructions:**
- **Run the following cell.**
- **Select a syllable** from the `Syllable` dropdown menu to view the associated crowd movie and syllable info.
- **Adjust the crowd movie playback speed using the `Playback Speed` slider** to better observe the behavior associated with short/fast syllables.
- **Input the syllable behavioral label and short description** in the text fields.
- Click `Save Setting` to save the syllable label and description for later analysis.
- Use `Next` and `Previous` to navigate between syllables and the syllable label and description will be automatically saved when using these buttons.

In [None]:
from os.path import join
from moseq2_app.main import label_syllables
from moseq2_app.gui.progress import update_progress

# Path to generate crowd movies in
crowd_dir = join(progress_paths['model_session_path'], 'crowd_movies/')

# Path to file containing Syllable label information
syll_infopath = join(progress_paths['model_session_path'], 'syll_info.yaml')

# convenience file containing reused syllable statistics data
syll_info_df_path = join(progress_paths['model_session_path'], 'syll_df.parquet')

# Select number of syllables based on an explained variance percentage
explained_variance = 99

# To instead label a fixed number of syllables, set max_syllables <= nstates
max_syllables = None

update_progress(progress_filepath, 'crowd_dir', crowd_dir)
update_progress(progress_filepath, 'syll_info', syll_infopath)
progress_paths = update_progress(progress_filepath, 'df_info_path', syll_info_df_path)

label_syllables(progress_paths, max_syllables=max_syllables, n_explained=explained_variance)

# Interactive Syllable Statistics Graphing
Syllable statistics provide information about the behavioral patterns and the similarity dendrogram represents the hierarchically sorted pairwise distances between the given model's autoregressive matrices representing the syllables. You can find more information in the [wiki](https://github.com/dattalab/moseq2-app/wiki/MoSeq2-Analysis-Visualization-Notebook-Instructions#interactive-syllable-statistics-graphing).


Run the following cell to start the interactive widget to plot different syllable statistics and their differences in the modeled groups and the similarity dendrogram.

**NOTE**: this widget can take a few seconds to render after changing parameters within the widget, so hang tight.

**Instructions:**
- **Run the following cell.**
- **Select the parameter(s) from the dropdown menus** parameters to control the graph. 
- To plot multiple sessions and subjects, **select multiple sessions from `SessionName` or `SubjectName` while holding down the [Ctrl]/[Command]/[Shift] key**.
- **Hover over the data points** to display syllable info.

In [None]:
from moseq2_app.main import interactive_syllable_stats

max_syllables = None

# If loading parquet files is taking too long, set load_parquet=False
interactive_syllable_stats(progress_paths, max_syllable=max_syllables, load_parquet=True)

# Compute Syllable Transition Matrices
Syllable transitions provide information about the pattern in the behavioral transition space of your modeled group(s). 

The following cell computes syllable transition matrices within each group and visualizes the transition frequency.

**Note**: This code block is loading the model from the `model_path` and is __not__ using the `mean_df` variable.

**Instructions:**
- **Run the following cell.**

In [None]:
from moseq2_viz.model.util import parse_model_results, relabel_by_usage
from moseq2_viz.model.trans_graph import get_trans_graph_groups, get_group_trans_mats
from moseq2_viz.model.util import compute_syllable_explained_variance

# load your model
model_path = progress_paths['model_path']
model_data = parse_model_results(model_path)
model_data['labels'] = relabel_by_usage(model_data['labels'], count='usage')[0]
max_syllable = compute_syllable_explained_variance(model_data, n_explained=99)

# select a transition matrix normalization method
normalize = 'bigram' # other options: 'columns', 'rows'

# Get modeled session uuids to compute group-mean transition graph for each group
label_group, uuids = get_trans_graph_groups(model_data)
group = list(set(label_group))
# compute transition matrices and usages for each group
print('Group(s):', group)
trans_mats, usages = get_group_trans_mats(model_data['labels'], label_group, group, max_syllable, normalize=normalize)

import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, len(group), figsize=(12, 8), sharex=False, sharey=True)

for i, g in enumerate(group):
    h = ax[i].imshow(trans_mats[i][:max_syllable,:max_syllable], cmap='magma')
    plt.colorbar(h, ax=ax[i], fraction=0.046, pad=0.04)
    ax[i].set_xlabel('Syllable j')
    ax[i].set_ylabel('Syllable i')
    ax[i].set_title(f'{g}: Bigram Transition Probabilities')

## Export Transition Matrices and usages

You can export the transition matrices and usages to csv files for further analysis using the following cell.

**Instructions:**
- **Set `export` variable to `True`** if you want to export the transition matrices and syllable usages for further analysis outside of this notebook.
- **Specify the place** you want to save the dataframe in `save_path`. If empty, the file will be saved to `base_dir`.
- **Run the following cell** to save the group transition matrices and syllable usages as CSV files.

In [None]:
import pandas as pd
from os.path import exists, join

# set export = True if you want to export the transition matrices and syllable usages
export = False
base_dir = progress_paths['base_dir']
selected_group = "" #specify group name to be exported here
save_path = ""



if export:
    # Construct data frame
    group_index = next(i for i, g in enumerate(group) if g == selected_group)
    group_trans = pd.DataFrame(trans_mats[group_index])
    group_usages = []

    for key, value in usages[group_index].items():
        group_usages.append([key, value])

    group_usages = pd.DataFrame(group_usages)
    group_usages.set_axis(['Syllable', 'Usage'], axis = 1, inplace = True)
    
  # Ensure path exist
    if len(save_path) ==0:
        save_path = base_dir
        print("Dataframe will be saved to", save_path)
    else:
        try:
            assert exists(save_path)
            print("Dataframe will be saved to", save_path)
        except AssertionError:
            save_path = base_dir
            print('This is not a valid path. Dataframe csv will be saved to base_dir')
    
    group_trans.to_csv(join(save_path,selected_group+'_trans.csv'), index=False)
    group_usages.to_csv(join(save_path,selected_group+'_usage.csv'), index=False)
    print('Dataframe is saved')

## Interactive Syllable Transition Graph Tool
The transition matrices can be visualized into bidirected graph(s) and you can find more granulated information about the behavioral patterns within your model group(s) from bigrams/trigrams or different usage/transition probability given certain thresholds. You can find more information in the [wiki](https://github.com/dattalab/moseq2-app/wiki/MoSeq2-Analysis-Visualization-Notebook-Instructions#syllable-transition-analysis).

The following cell to start the interactive widget for visualizing transition graphs.

**Instructions:**
- **Run the following 2 cells**
- **Select the parameter(s)** from the dropdown menus to control the graph.
- **Hover over the edges and nodes** to display the edge colors and syllable info.
**Note: Nodes outside the threshold will be hidden.**

Run this cell to display the entire view in the cell output

In [None]:
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

In [None]:
from moseq2_app.main import interactive_transition_graph

max_syllables = None

interactive_transition_graph(progress_paths, max_syllables=max_syllables, plot_vertically=True, load_parquet=True)

## Generate Behavioral Summary (Fingerprints)
MoSeq Fingerprint summarizes behavior by showing distributions of MoSeq extracted kinematic scalar values (eg. position, velocity, height, and length) and MoSeq syllables. You can find more information in the wiki [here](https://github.com/dattalab/moseq2-app/wiki/MoSeq2-Analysis-Visualization-Notebook-Instructions#generate-behavioral-summary-fingerprints).

The following cell generates the summary dataframe and plots the behavioral summary.

**Instructions:**
- **Set `n_bins` variable** to an integer to specify the number of bins for the MoSeq scalar values. Set `n_bins` variable to `None` if you want the number of bins to match the number of syllables.
- **Set `range_type` variable** to 'robust' to include data ranging from 1 percentile top 99 percentile. **Set `range_type` variable** to 'full' to include all the data.
- **Assign an `sklearn.preprocessing` object to `preprocessor` variable** if you want to scale the values by session. `preprocessor` variable is set to `None`, the figure will show the percentage usage.
- **Run the following cell**

In [None]:
from moseq2_viz.model.fingerprint_classifier import create_fingerprint_dataframe, plotting_fingerprint
from sklearn.preprocessing import MinMaxScaler, StandardScaler

n_bins = 100 
range_type='robust'
summary, range_dict = create_fingerprint_dataframe(scalar_df, mean_df, n_bins=n_bins, range_type=range_type)

preprocessor = MinMaxScaler()
plotting_fingerprint(summary, range_dict, preprocessor=preprocessor)

## Linear Classifier using MoSeq Syllables
MoSeq syllables and MoSeq kinematic scalar values can be used to train linear classifiers to predict experimental groups. The linear classifier uses data from the summary dataframe in the fingerprint plot. You can find more information in the wiki [here](https://github.com/dattalab/moseq2-app/wiki/MoSeq2-Analysis-Visualization-Notebook-Instructions#linear-classifiers).

The following cell generates the summary dataframe and trains a linear classifier using MoSeq syllables.


**Instructions:**
- **Run the following cell**. 

In [None]:
from moseq2_viz.model.fingerprint_classifier import create_fingerprint_dataframe, classifier_fingerprint, plot_cm
import matplotlib.pyplot as plt

n_bins = None
range_type = 'robust'
summary, _ = create_fingerprint_dataframe(scalar_df, mean_df, n_bins=n_bins, range_type=range_type)

features=['MoSeq'] # list of feature(s) to include in the classifier
preprocessor=None # sklearn.preprocessing object as a preprocessor for the data
classes=['group'] # target for the classifier
param_search=True # if True, the function will run GridSearchCV to find the regularization parameter for the linear classifier
C_list=None # list of regularization parameter to search through
model_type='lr' # type of linear classifier. 'lr' for logistic regression, 'svc' for linearSVC
cv='loo' # type of cross validation. 'loo' for leave one out, 'skf' for stratifiedKFold
n_splits=5 # number of split for stratifiedKFold

out = classifier_fingerprint(summary, features=features, preprocessor=preprocessor, classes=classes, param_search=param_search, C_list=C_list, model_type=model_type, cv=cv, n_splits=n_splits)
plot_cm(out['y_true'], out['y_pred'], out['shuff_y_true'], out['shuff_y_pred'])

plt.plot(out['coefs'].mean(0), label='True coefficients')
plt.plot(out['shuff_coefs'].mean(0), label='Shuffle coefficients')
plt.legend()

## Linear Classifier using MoSeq Scalar values
Similar to the cell above, the following cell generates the summary dataframe and train a linear classifier using MoSeq scalar values. You can find more information in the wiki [here](https://github.com/dattalab/moseq2-app/wiki/MoSeq2-Analysis-Visualization-Notebook-Instructions#linear-classifiers).


**Instructions:**
- **Run the following cell.**

In [None]:
from moseq2_viz.model.fingerprint_classifier import create_fingerprint_dataframe, classifier_fingerprint, plot_cm
import matplotlib.pyplot as plt

features=['dist_to_center_px', 'height_ave_mm', 'length_mm', 'velocity_2d_mm'] # list of feature(s) to include in the classifier
preprocessor=None # sklearn.preprocessing object as a preprocessor for the data
classes=['group'] # target for the classifier
param_search=True # if True, the function will run GridSearchCV to find the regularization parameter for the linear classifier
C_list=None # list of regularization parameter to search through
model_type='lr' # type of linear classifier. 'lr' for logistic regression, 'svc' for linearSVC
cv='loo' # type of cross validation. 'loo' for leave one out, 'skf' for stratifiedKFold
n_splits=5 # number of split for stratifiedKFold

out = classifier_fingerprint(summary, features=features, preprocessor=preprocessor, classes=classes, param_search=param_search, C_list=C_list, model_type=model_type, cv=cv, n_splits=n_splits)
plot_cm(out['y_true'], out['y_pred'], out['shuff_y_true'], out['shuff_y_pred'])

plt.plot(out['coefs'].mean(0), label='True coefficients')
plt.plot(out['shuff_coefs'].mean(0), label='Shuffle coefficients')
plt.legend()

***

# Notebook End 

# User Survey

Please take some time to tell us your thoughts about this notebook:
**[user feedback survey](https://forms.gle/FbtEN8E382y8jF3p6)**