# DataJoint U24 - Workflow DeepLabCut

## Interactively run the workflow

The workflow requires a DeepLabCut project with labeled data.
- If you haven't configured the data, refer to [00-DataDownload](./00-DataDownload_Optional.ipynb) and [01-Configure](./01-Configure.ipynb).
- To overview the schema structures, refer to [02-WorkflowStructure](02-WorkflowStructure_Optional.ipynb).
- If you'd likea more automatic approach, refer to [03-Automate](03-Automate_optional.ipynb).

Let's change the directory to the package root directory to load the local config, `dj_local_conf.json`.

In [1]:
import os; from pathlib import Path
# change to the upper level folder to detect dj_local_conf.json
if os.path.basename(os.getcwd())=='notebooks': os.chdir('..')
assert os.path.basename(os.getcwd())=='workflow-deeplabcut', ("Please move to the "
                                                              + "workflow directory")

`Pipeline.py` activates the DataJoint `elements` and declares other required tables.

In [2]:
import datajoint as dj
from workflow_deeplabcut.pipeline import lab, subject, session, dlc

Connecting cbroz@tutorial-db.datajoint.io:3306


## Inserting entries into upstream tables

In general, you can manually insert entries into each table by directly providing values for each column as a dictionary. Be sure to follow the type specified in the table definition.

In [5]:
subject.Subject.heading

subject              : varchar(8)                   # 
---
sex                  : enum('M','F','U')            # 
subject_birth_date   : date                         # 
subject_description="" : varchar(1024)                # 

In [None]:
subject.Subject.insert1(dict(subject='subject6', 
                             sex='M', 
                             subject_birth_date='2020-01-01', 
                             subject_description='manuel'))

In [7]:
session.Session.describe();

-> Subject
session_datetime     : datetime                     



In [8]:
session.Session.heading

# 
subject              : varchar(8)                   # 
session_datetime     : datetime                     # 

In [9]:
session_keys = [dict(subject='subject6', session_datetime='2021-06-02 14:04:22'),
                dict(subject='subject6', session_datetime='2021-06-03 14:04:22')]
session.Session.insert(session_keys)
session.Session()

subject,session_datetime
subject3,2021-04-30 12:22:15.032000


## Inserting recordings

In [6]:
dlc.VideoRecording.heading

# 
subject              : varchar(8)                   # 
session_datetime     : datetime                     # 
camera_id            : int                          # 
recording_id         : int                          # 
---
recording_start_time : datetime                     # 

The `VideoRecording` table retains unique recordings file specifies all videos across sessions, including both model training
videos and videos for later analysis.

In [8]:
dlc.VideoRecording.File.heading

subject              : varchar(8)                   # 
session_datetime     : datetime                     # 
camera_id            : int                          # 
recording_id         : int                          # 
file_path            : varchar(255)                 # filepath of video, relative to root data directory

The related part table allows for multiple files for a given recording session.

In [9]:
recordings = [{'recording_id': '1',
               'subject': 'subject6',
               'session_datetime': '2021-06-02 14:04:22',
               'recording_start_time': '2021-06-02 14:07:00',
               'file_path': 'openfield-Pranav-2018-10-30/videos/m3v1mp4.mp4',
               'camera_id': '1',
               'paramset_idx': '0'},
              {'recording_id': '2',
               'subject': 'subject6',
               'session_datetime': '2021-06-03 14:04:22',
               'recording_start_time': '2021-06-04 14:07:00',
               'file_path': 'openfield-Pranav-2018-10-30/videos/m3v1mp4-copy.mp4',
               'camera_id': '1',
               'paramset_idx': '0'}
dlc.VideoRecording.File.insert(recordings)

## Training a DLC Network

First, we'll add a `ModelTrainingParamSet`. This is a lookup table that we can reference when training a model.

In [5]:
dlc.ModelTrainingParamSet.heading

# Parameters to specify a DLC model training instance
paramset_idx         : smallint                     # 
---
paramset_desc        : varchar(128)                 # 
param_set_hash       : uuid                         # hash identifying this parameterset
params               : longblob                     # dictionary of all applicable parameters

The `params` longblob should be a dictionary that includes all items to be included in model training via the `train_network` function. At minimum, this is the contents of the project's config file, as well as `suffle` and `trainingsetindex`, which are not included in the config. 

In [None]:
from deeplabcut import train_network
help(train_network) # for more information on optional parameters

Below, we give the parameters and index and description and load the config contents. We can then overwrite any defaults, including `maxiters`, to restrict our training iterations to 5.

In [12]:
import yaml
from element_interface.utils import find_full_path
from workflow_deeplabcut.paths import get_dlc_root_data_dir

paramset_idx = 1; paramset_desc='OpenField'
config_path = find_full_path(get_dlc_root_data_dir(), 
                             'openfield-Pranav-2018-10-30/config.yaml')
with open(config_path, 'rb') as y:
    config_params = yaml.safe_load(y)
training_params = {'shuffle': '1',
                   'trainingsetindex': '0',
                   'maxiters': '5',
                   'scorer_legacy': 'False',
                   'maxiters': '5'}
config_params.update(training_params)
dlc.ModelTrainingParamSet.insert_new_params(paramset_idx=paramset_idx,
                                            paramset_desc=paramset_desc,
                                            params=config_params)

DataJointError: The specified paramset_idx 1 already exists, please pick a different one.

Then we add training to the the `TrainingTask` table. The `ModelTraining` table can automatically train and populate all tasks outlined in `TrainingTask`.

In [7]:
dlc.TrainingTask.heading

# Specification for a DLC model training instance
subject              : varchar(8)                   # 
session_datetime     : datetime                     # 
camera_id            : int                          # 
recording_id         : int                          # 
paramset_idx         : smallint                     # 
training_id          : int                          # 
---
model_prefix=""      : varchar(32)                  # 
project_path=""      : varchar(255)                 # DLC's project_path in config relative to root

In [8]:
key=(dlc.VideoRecording&'recording_id=1').fetch1('KEY')
key.update({'paramset_idx':1,'training_id':1,
            'project_path':'openfield-Pranav-2018-10-30/'})
dlc.TrainingTask.insert1(key, skip_duplicates=True)
dlc.TrainingTask()

subject,session_datetime,camera_id,recording_id,paramset_idx,training_id,model_prefix,project_path  DLC's project_path in config relative to root
subject6,2021-06-02 14:04:22,1,1,1,1,,openfield-Pranav-2018-10-30/


In [None]:
dlc.TrainingTask.populate()

In [17]:
dlc.ModelTraining.populate()

In [18]:
dlc.ModelTraining()

subject,session_datetime,camera_id,recording_id,paramset_idx,training_id,"latest_snapshot  latest exact snapshot index (i.e., never -1)",config_template  stored full config file
subject6,2021-06-02 14:04:22,1,1,1,1,5,=BLOB=


To training from a previous instance, one would need to 
[edit the relevant config file](https://github.com/DeepLabCut/DeepLabCut/issues/70) and
adjust the `maxiters` paramset (if present) to a higher threshold (e.g., 10 for 5 more itterations).
Emperical work from the Mathis team suggests 200k iterations for any true use-case.

## Tracking Joints/Body Parts

The DLC schema uses a lookup table for managing Body Parts tracked across models.

In [9]:
dlc.BodyPart.heading

body_part            : varchar(32)                  # 
---
body_part_description="" : varchar(1000)                # 

This table is equipped with a helper function to insert all body parts from a given config, and can accept a list of descriptions in the same order. To see the order, you can do a dry run of the function and check the confirmation message.

In [15]:
bp_desc=['Left Ear', 'Right Ear', 'Snout Position', 'Base of Tail']
dlc.BodyPart.insert_from_config(config_path,bp_desc)

Existing body parts: ['leftear' 'rightear' 'snout' 'tailbase']
New body parts: []


Insert 0 new body part(s)? [yes, no]:  no


Alternatively, include this description list when declaring a model.

## Declaring a Model

If training appears successful, the result can be inserted into the `Model` table for automatic evaluation.

In [13]:
dlc.Model.insert_new_model(model_name='OpenField-5',dlc_config=config_path,
                           shuffle=1,trainingsetindex=0,
                           model_description='Open field model trained 5 iterations',
                           body_part_descriptions = bp_desc,paramset_idx=1)

NameError: name 'bp_desc' is not defined

In [24]:
dlc.Model()

model_name  user-friendly model name,task  task in the config yaml,date  date in the config yaml,iteration  iteration/version of this model,"snapshotindex  which snapshot for prediction (if -1, latest)",shuffle  which shuffle of the training dataset,trainingsetindex  which training set fraction to generate model,scorer  scorer/network name - DLC's GetScorerName(),config_template  dictionary of the config for analyze_videos(),project_path  DLC's project_path in config relative to root,dlc_version  keeps the deeplabcut version,model_prefix,model_description,paramset_idx
OpenField-1010,openfield,Oct30,0,-1,1,0,DLCresnet50openfieldOct30shuffle1,=BLOB=,openfield-Pranav-2018-10-30,2.2.0.6,,Open field model trained 1010 iterations,1


In [27]:
dlc.BodyPart()

body_part,body_part_description
leftear,Left Ear
rightear,Right Ear
snout,Snout Position
tailbase,Base of Tail


## Model Evaluation

Next, all inserted models can be evaluated with a similar `populate` method, which will
insert the relevant output from DLC's `evaluate_network` function.

In [16]:
dlc.ModelEvaluation.heading

model_name           : varchar(64)                  # user-friendly model name
---
train_iterations     : int                          # Training iterations
train_error          : float                        # Train error (px)
test_error           : float                        # Test error (px)
p_cutoff             : float                        # p-cutoff used
train_error_p        : float                        # Train error with p-cutoff
test_error_p=null    : float                        # Test error with p-cutoff

In [None]:
dlc.ModelEvaluation.populate()
dlc.ModelEvaluation()

## Pose Estimation

To put this model to use, we'll conduct pose estimation on the video generated in the [DataDownload notebook](./00_DataDownload_Optional.ipynb). Here, we can also specify parameters accepted by the `analyze_videos` function as a dictionary.

In [None]:
key=(dlc.VideoRecording&'recording_id=2').fetch1('KEY');
key.update({'model_name': 'OpenField-5', 'task_mode': 'trigger'})
dlc.PoseEstimationTask.insert_estimation_task(key,params={'save_as_csv':True},
                                              skip_duplicates=True)

In [None]:
dlc.PoseEstimation.populate()

By default, DataJoint will store the results of pose estimation in a subdirectory
>  processed_dir / videos / device_<#>_recording_<#>_model_<name>

Pulling processed_dir from `get_dlc_processed_dir`, and device/recording information 
from the `VideoRecording` table. The model name is taken from the primary key of the
`Model` table, with spaced replaced by hyphens.
    
We can get this estimation directly as a pandas dataframe.

In [None]:
dlc.PoseEstimation.get_trajectory(key)

<!-- Next Steps -->
.