***
<center><h1>Handsfree MoSeq2 App</h1></center>

***

<img src="https://drive.google.com/uc?export=view&id=1kHdmkBx_XlueTJocREDx4YeHGrjfYKJv">

This notebook assumes you're familiar with "Main-MoSeq2-Notebook" and are looking for your code to run a bit more autonomously. Here, we remove most of the interactivity of the Main-MoSeq2-Notebook, so that you can run extraction, dimensionality reduction, and modeling sequentially without any user intervention. This is great if you're processing a large cohort or are analyzing a new cohort with parameters similar to a previous one. 

This notebook **assumes** that you've already optimized your moseq parameters in Main-MoSeq2-Notebook. As you work through this notebook, you'll have clear opportunities to **copy your parameters into their appropriate places.** 

Once you've done this, the next section will all you to conveniently just run **"Run All Below"** at which point you can step away from the computer and go do some experiments, make dinner, run you errands, or (as we often do) sleep.

When the MoSeq computations are complete, you're welcome to **create a new copy of "Main-MoSeq2-Notebook" in the same directory as this** and restore progress variables inside that notebook. You can then create any of the intermediate plots that you'd like to see (heatmaps, pca variance, etc.). You can also just move from this handsfree notebook to the interactive results notebook to start visualizing your results there. 

***

##### Notebook Shortcuts
- **[Notebook Setup](#Notebook-Setup)**: Prepare all the necessary config and progress files
- **[Set Parameters for All MoSeq Steps](#Set-Parameters-for-All-MoSeq-Steps)**: Set the parameters for everything
- **[Run All MoSeq Steps](#Run-All-MoSeq-Steps)**: Run the code for all the important MoSeq Steps

***

***
<center><h1>Notebook Setup</h1></center>

***

<img src="https://drive.google.com/uc?export=view&id=1Zkd0tATi8r2ENHvN8OczIrEf4K8PFmhM">

### Check if the dependencies are found

Run the following cell to check if `moseq2-app` is installed in your current conda kernel. The latest working version number is `0.2.1`.

In [None]:
import moseq2_app
print(moseq2_app.__version__)

## Create/Restore The Progress File and Check Current Progress

- Use this cell to find and load your current notebook analysis progress. You will want to use your current directory, AKA `./`, in order for all of the media to be displayed properly.
- __Ensure the directory this notebook is launched from contains all the experimental session folders.__

The cell will print progress bars for each pipeline step in the notebook. 
- The extraction progress bar indicates total the number of extracted sessions detected in the provided `base_dir` path. Additionally the names of the sessions that are yet to be extracted will be printed for your convenience. __Note: the progress does not reflect the contents of the aggregate_results/ folder.__
- The remainder of the progress bars are derived from reading the paths in the `progress_paths` dict, filling up the bar if the included paths are found.

In [None]:
from os.path import join
from moseq2_app.gui.progress import check_progress, restore_progress_vars

# Add the path to your data folder here.
# We recommend that you run this notebook in the same folder as your data. In that case, you don't have to change base_dir
base_dir = './'
progress_filepath = join(base_dir, 'progress.yaml')

progress_paths = restore_progress_vars(progress_filepath, init=True, overwrite=False)
check_progress(progress_filepath)

### Generate Configuration Files

The `config.yaml` will be used to hold all configurable parameters for all steps in the MoSeq pipeline. The parameters used will be added to this file as you progress through the notebook. You can then use it to run an identical pipeline in future analyses, or directly configure parameters from there when debugging cells.

In [None]:
from os.path import join
from moseq2_app.gui.progress import update_progress
from moseq2_extract.gui import generate_config_command

config_filepath = join(progress_paths['base_dir'], 'config.yaml')

print(f'generating file in path: {config_filepath}')
generate_config_command(config_filepath)
progress_paths = update_progress(progress_filepath, 'config_file', config_filepath)

### Download a Flip File

MoSeq2 currently uses a deep-learning flip classifier to guarantee that the mouse is always oriented facing east (post-extraction). The flip-classifier currently __best suits mice that are similar to adult male c57 mice recorded with Kinect v2 cameras__.

If your dataset does not work with these flip classifiers, consider training your own. Click [this link](https://github.com/dattalab/moseq2-app/tree/jupyter/) to view the flip-classifier training notebooks. Once you have it trained, simply add the path to the `config.yaml` file.

In [None]:
from moseq2_extract.gui import download_flip_command
# selection=0 - large mice with fibers (default)
# selection=1 - adult male C57s
# selection=2 - mice with Inscopix cables
download_flip_command(progress_paths['base_dir'], config_filepath, selection=1)

***
<center><h1>Set Parameters for All MoSeq Steps</h1></center>

***

## Extraction Parameters
Set these based on experience in previous interactive sessions

In [None]:
from os.path import join
import ruamel.yaml as yaml
from moseq2_app.gui.progress import update_progress

session_config_path = join(progress_paths['base_dir'], 'session_config.yaml')
progress_paths = update_progress(progress_filepath, 'session_config', session_config_path)

with open(progress_paths['config_file'], 'r') as f:
    config_data = yaml.safe_load(f)

config_data['camera_type'] = 'kinect' # 'kinect', 'azure' or 'realsense'
config_data['crop_size'] = (80, 80)

# if using azure or realsense, increase the noise_tolerance
config_data['noise_tolerance'] = 30

# include the file extensions for the depth files you would like to search for and extract.
extensions = ['.avi'] # and/or .dat, .mkv

with open(progress_paths['config_file'], 'w') as f:
    yaml.safe_dump(config_data, f)

## Group Assignment Parameters
This is an oversimplified version of the group assignment module in the interactive version. 

This assumes that your file naming has some structure to it that will allow us to identify the necessary groups. For example, if you're comparing mice administered saline and amphetamine, some consistent tag should distinguish these mice in the data folder names, like "sal" and "amp." Luckily, this is flexible enough to handle many groups, so it is not just limited to two!

In [None]:
# value-group lookup parameters
by = 'SessionName' # or SubjectName
value = ['saline_', 'amphetamine_'] # value of the corresponding key; can be string or list
group = ['Saline', 'Amphetamine'] # designated group name; can be string or corresponding list

# filtering parameters
exact = False # Must be exact value-group match(es)
lowercase = False # look for values after applying lowercase to them
negative = False # select opposite selection than value-group pair(s) given

## PCA Parameters
Similar to before, set these based on experience.

Make sure to add the necessary dask parameters if you're using dask

In [None]:
from os.path import join
import ruamel.yaml as yaml
from moseq2_pca.gui import train_pca_command
from moseq2_app.gui.progress import update_progress

with open(progress_paths['config_file'], 'r') as f:
    config_data = yaml.safe_load(f)

# PCA parameters you may need to configure
config_data['overwrite_pca_train'] = True # THIS ALLOWS THIS TO RUN WITHOUT INTERACTION 
config_data['overwrite_pca_apply'] = True # THIS ALLOWS THIS TO RUN WITHOUT INTERACTION 
config_data['gaussfilter_space'] = (1.5, 1) # Spatial filter for data (Gaussian)
config_data['medfilter_space'] = [0] # Median spatial filter
config_data['medfilter_time'] = [0] # Median temporal filter

# If dataset includes head-attached cables, set missing_data=True
config_data['missing_data'] = False # Set True for dataset with missing/dropped frames to reconstruct respective PCs.
config_data['missing_data_iters'] = 10 # Number of times to iterate over missing data during PCA
config_data['recon_pcs'] = 10 # Number of PCs to use for missing data reconstruction

# Dask Configuration
config_data['dask_port'] = '8787' # port to access Dask Dashboard

# Changepoint computation parameters you may want to configure
config_data['threshold'] = 0.5 # Peak threshold to use for changepoints
config_data['dims'] = 300 # Number of random projections to compare the computed principal components with

with open(progress_paths['config_file'], 'w') as f:
    yaml.safe_dump(config_data, f)

## ARHMM Modeling Parameters

Set these based on experience and prior tests

In [None]:
select_groups = False # select specific groups to model; if False, will model all data as is in moseq2-index.yaml

# model saving freqency (in interations); will create a checkpoints/ directory containing checkpointed models
checkpoint_freq = -1
use_checkpoint = False # resume training from latest saved checkpoint

# Advanced modeling parameters
hold_out = False # boolean to hold out data subset during the training process
nfolds = 2 # (if hold_out==True): number of folds to hold out during training; 1 fold per session

npcs = 10  # number of PCs being used
max_states = 100 # number of maximum states the ARHMM can end up with

# use robust-ARHMM with t-distribution -> yields less states/syllables if True, 
# used to constrict accepted behavioral variability
robust = True 

# separate group transition graphs; set to True if ngroups > 1
separate_trans = True 

num_iter = 100 # number of iterations to train model

# syllable length probability distribution prior; (None, int or 'scan'); if None, kappa=nframes
kappa = None 

# if kappa == 'scan', optionally set bounds to scan kappa values between, in either a linear or log-scale.
scan_scale = 'log' # or linear
min_kappa = None
max_kappa = None

# total number of models to spool
n_models = 5

# Select platform to run models on
cluster_type = 'local' # currently supported cluster_types = 'local' or 'slurm'
run_cmd = False # if True, runs the commands via os.system(...)

***
<center><h1>Run All MoSeq Steps</h1></center>

***

This section is intentionally designed for you to be able to now just run **"Run All Below"** The rest of the notebook will then run all the important MoSeq Steps and save them in their appropriate places. 

If you wish to visualize any of the steps, you can **open up a new copy of "Main-MoSeq2-Notebook"**, restore progress variables, and then run any of the visualization steps that you so wish.  

### (Convenience Cell) Restore Progress Variables

In [None]:
from moseq2_app.gui.progress import restore_progress_vars

progress_filepath = './progress.yaml'

progress_paths = restore_progress_vars(progress_filepath)

## Run Extraction and Validation

- Keep `extract_all=True` to prevent interactivity
- If `skip_extracted=True`, the command will only search for (and list) sessions that have not been previously extracted.

__Note: If sessions are not listed when running the cell, ensure your selected extension matches that of your depth files.__

In [None]:
from moseq2_extract.gui import extract_found_sessions
from moseq2_app.main import validate_extractions

extract_found_sessions(progress_paths['base_dir'], progress_paths['config_file'], extensions, extract_all=True, skip_extracted=True)
validate_extractions(progress_paths['base_dir'])

### Aggregate your results into one folder and generate an index file.

In [None]:
from os.path import join
from moseq2_extract.gui import aggregate_extract_results_command

recording_format = '{start_time}_{session_name}_{subject_name}' # filename formats for the copied extracted data files

# directory NAME to save all metadata+extracted videos to with above respective name format
aggregate_results_dirname = 'aggregate_results/'

train_data_dir = join(progress_paths['base_dir'], aggregate_results_dirname)
update_progress(progress_filepath, 'train_data_dir', train_data_dir)

# the subpath indicates to only aggregate extracted session paths with that subpath, only change if aggregating data from a different location
index_filepath = aggregate_extract_results_command(progress_paths['base_dir'], recording_format, aggregate_results_dirname)
progress_paths = update_progress(progress_filepath, 'index_file', index_filepath)

### Run Group Setting
This will set the groups based on key terms in the folder names

In [None]:
from moseq2_viz.gui import add_group

add_group(progress_paths['index_file'], by=by, value=value, group=group, exact=exact, lowercase=lowercase, negative=negative)

## Run PCA Steps

In [None]:
from os.path import join
import ruamel.yaml as yaml
from moseq2_app.gui.progress import update_progress
from moseq2_pca.gui import train_pca_command, apply_pca_command

pca_filename = 'pca' # Name of your PCA model h5 file to be saved
pca_dirname = '_pca/' # Directory to save your computed PCA results
progress_paths = update_progress(progress_filepath, 'pca_dirname', join(progress_paths['base_dir'], pca_dirname))

# Train the PCA
train_pca_command(progress_paths, pca_dirname, pca_filename)

scores_filename = 'pca_scores' # name of the scores file to compute and save
scores_file = join(progress_paths['pca_dirname'], scores_filename+'.h5') # path to input PC scores file to model
progress_paths = update_progress(progress_filepath, 'scores_path', scores_file)

# Apply the PCA
apply_pca_command(progress_paths, scores_filename)

### Run Changepoint Analysis

In [None]:
import ruamel.yaml as yaml
from moseq2_app.gui.progress import update_progress
from moseq2_pca.gui import compute_changepoints_command

changepoints_filename = 'changepoints' # name of the changepoints images to generate
progress_paths = update_progress(progress_filepath, 'changepoints_path', changepoints_filename)

compute_changepoints_command(progress_paths['train_data_dir'], progress_paths, changepoints_filename)

## Run ARHMM Modeling 

In [None]:
from os.path import join
import ruamel.yaml as yaml
from moseq2_model.gui import learn_model_command
from moseq2_app.gui.progress import update_progress

modeling_session_path = 'model-data/'
model_name = 'model.p'
session_path = join(progress_paths['base_dir'], modeling_session_path)
model_path = join(session_path, model_name) # path to save trained model
progress_paths = update_progress(progress_filepath, 'model_path', model_path)
progress_paths = update_progress(progress_filepath, 'model_session_path', session_path)

learn_model_command(progress_paths, hold_out=hold_out, nfolds=nfolds, num_iter=num_iter, max_states=max_states,
                    npcs=npcs, kappa=kappa, separate_trans=separate_trans, robust=robust,
                    checkpoint_freq=checkpoint_freq, use_checkpoint=use_checkpoint, select_groups=select_groups,
                    cluster_type=cluster_type, min_kappa=min_kappa, scan_scale=scan_scale,
                    max_kappa=max_kappa, n_models=n_models, run_cmd=run_cmd, output_dir=modeling_session_path)