# DataJoint U24 - Workflow DeepLabCut

## Interactively run the workflow

The workflow requires a DeepLabCut project with labeled data.
- If you haven't configured the data, refer to [00-DataDownload](./00-DataDownload_Optional.ipynb) and [01-Configure](./01-Configure.ipynb).
- To overview the schema structures, refer to [02-WorkflowStructure](02-WorkflowStructure_Optional.ipynb).
- If you'd likea more automatic approach, refer to [03-Automate](03-Automate_optional.ipynb).

Let's change the directory to the package root directory to load the local config, `dj_local_conf.json`.

In [2]:
import os
# change to the upper level folder to detect dj_local_conf.json
if os.path.basename(os.getcwd())=='notebooks': os.chdir('..')
assert os.path.basename(os.getcwd())=='workflow-deeplabcut', ("Please move to the "
                                                              + "workflow directory")

`Pipeline.py` activates the DataJoint `elements` and declares other required tables.

In [9]:
import datajoint as dj
from workflow_deeplabcut.pipeline import lab, subject, session, train, model

# Directing our pipeline to the appropriate config location
from element_interface.utils import find_full_path
from workflow_deeplabcut.paths import get_dlc_root_data_dir
config_path = find_full_path(get_dlc_root_data_dir(), 
                             'openfield-Pranav-2018-10-30/config.yaml')

### Inserting entries into upstream tables

In general, you can manually insert entries into each table by directly providing values for each column as a dictionary. Be sure to follow the type specified in the table definition.

In [3]:
subject.Subject.heading

subject              : varchar(8)                   # 
---
sex                  : enum('M','F','U')            # 
subject_birth_date   : date                         # 
subject_description="" : varchar(1024)                # 

In [4]:
subject.Subject.insert1(dict(subject='subject6', 
                             sex='F', 
                             subject_birth_date='2020-01-01', 
                             subject_description='hneih_E105'))

In [14]:
subject.Subject & "subject='subject6'"

subject,sex,subject_birth_date,subject_description
subject6,M,2020-01-03,hneih_E105


In [6]:
session.Session.describe();

-> subject.Subject
session_datetime     : datetime(3)                  



In [7]:
session.Session.heading

# 
subject              : varchar(32)                  # 
session_datetime     : datetime(3)                  # 

In [5]:
session_keys = [dict(subject='subject6', session_datetime='2021-06-02 14:04:22'),
                dict(subject='subject6', session_datetime='2021-06-03 14:43:10')]
session.Session.insert(session_keys)

In [6]:
session.Session() & "session_datetime > '2021-06-01 12:00:00'" & "subject='subject6'"

subject,session_datetime
subject6,2021-06-02 14:04:22
subject6,2021-06-03 14:43:10


### Inserting recordings

The `VideoSet` table handles all files generated in the video labeling process, including the `h5`, `csv`, and `png` files under the `labeled-data` directory. While these aren't required for launching DLC training, it may be helpful to retain records. DLC will instead refer to the `mat` file located under the `training-datasets` directory.

In [None]:
train.VideoSet.insert1({'video_set_id': 1})
labeled_dir = 'openfield-Pranav-2018-10-30/labeled-data/m4s1/'
training_files = ['CollectedData_Pranav.h5',
                  'CollectedData_Pranav.csv',
                  'img0000.png']
for file in training_files:
    train.VideoSet.File.insert1({'video_set_id': 1,
                                 'file_path': (labeled_dir + file)})
train.VideoSet.File.insert1({'video_set_id':1, 'file_path': 
                            'openfield-Pranav-2018-10-30/videos/m3v1mp4.mp4'})

In [6]:
train.VideoSet.File()

video_set_id,file_path
1,openfield-Pranav-2018-10-30/labeled-data/m4s1/CollectedData_Pranav.csv
1,openfield-Pranav-2018-10-30/labeled-data/m4s1/CollectedData_Pranav.h5
1,openfield-Pranav-2018-10-30/labeled-data/m4s1/img0000.png
1,openfield-Pranav-2018-10-30/videos/m3v1mp4.mp4


### Training a DLC Network

First, we'll add a `ModelTrainingParamSet`. This is a lookup table that we can reference when training a model.

In [10]:
train.TrainingParamSet.heading

paramset_idx         : smallint                     # 
---
paramset_desc        : varchar(128)                 # 
param_set_hash       : uuid                         # hash identifying this parameterset
params               : longblob                     # dictionary of all applicable parameters

The `params` longblob should be a dictionary that includes all items to be included in model training via the `train_network` function. At minimum, this is the contents of the project's config file, as well as `suffle` and `trainingsetindex`, which are not included in the config. 

In [None]:
from deeplabcut import train_network
help(train_network) # for more information on optional parameters

Below, we give the parameters and index and description and load the config contents. We can then overwrite any defaults, including `maxiters`, to restrict our training iterations to 5.

In [7]:
import yaml

paramset_idx = 1; paramset_desc='OpenField'

with open(config_path, 'rb') as y:
    config_params = yaml.safe_load(y)
training_params = {'shuffle': '1',
                   'trainingsetindex': '0',
                   'maxiters': '5',
                   'scorer_legacy': 'False',
                   'maxiters': '5', 
                   'multianimalproject':'False'}
config_params.update(training_params)
train.TrainingParamSet.insert_new_params(paramset_idx=paramset_idx,
                                         paramset_desc=paramset_desc,
                                         params=config_params)

Then we add training to the the `TrainingTask` table. The `ModelTraining` table can automatically train and populate all tasks outlined in `TrainingTask`.

In [19]:
train.TrainingTask.heading

video_set_id         : int                          # 
paramset_idx         : smallint                     # 
training_id          : int                          # 
---
model_prefix=""      : varchar(32)                  # 
project_path=""      : varchar(255)                 # DLC's project_path in config relative to root

In [13]:
key={'video_set_id': 1, 'paramset_idx':1,'training_id':1,
     'project_path':'openfield-Pranav-2018-10-30/'}
train.TrainingTask.insert1(key, skip_duplicates=True)
train.TrainingTask()

video_set_id,paramset_idx,training_id,model_prefix,project_path  DLC's project_path in config relative to root
1,1,1,,openfield-Pranav-2018-10-30/


In [8]:
train.ModelTraining.populate()

In [8]:
train.ModelTraining()

video_set_id,paramset_idx,training_id,"latest_snapshot  latest exact snapshot index (i.e., never -1)",config_template  stored full config file
1,1,1,5,=BLOB=


To start training from a previous instance, one would need to 
[edit the relevant config file](https://github.com/DeepLabCut/DeepLabCut/issues/70) and
adjust the `maxiters` paramset (if present) to a higher threshold (e.g., 10 for 5 more itterations).
Emperical work from the Mathis team suggests 200k iterations for any true use-case.

### Tracking Joints/Body Parts

The `model` schema uses a lookup table for managing Body Parts tracked across models.

In [24]:
model.BodyPart.heading

# 
body_part            : varchar(32)                  # 
---
body_part_description="" : varchar(1000)                # 

This table is equipped with two helper functions. First, we can identify all the new body parts from a given config file.

In [16]:
model.BodyPart.extract_new_body_parts(config_path)

Existing body parts: ['leftear' 'rightear' 'snout' 'tailbase']
New body parts: []


array([], dtype='<U8')

Now, we can make a list of descriptions in the same order, and insert them into the table

In [9]:
bp_desc=['Left Ear', 'Right Ear', 'Snout Position', 'Base of Tail']
model.BodyPart.insert_from_config(config_path,bp_desc)

Existing body parts: []
New body parts: ['leftear' 'rightear' 'snout' 'tailbase']
New descriptions: ['Left Ear', 'Right Ear', 'Snout Position', 'Base of Tail']


Insert 4 new body part(s)? [yes, no]:  yes


If we skip this step, body parts (without descriptions) will be added when we insert a model. We can [update](https://docs.datajoint.org/python/v0.13/manipulation/3-Cautious-Update.html) empty descriptions at any time.

### Declaring a Model

If training appears successful, the result can be inserted into the `Model` table for automatic evaluation.

In [None]:
model.Model.insert_new_model(model_name='OpenField-5',dlc_config=config_path,
                             shuffle=1,trainingsetindex=0,
                             model_description='Open field model trained 5 iterations',
                             paramset_idx=1)

In [11]:
model.Model()

model_name  user-friendly model name,task  task in the config yaml,date  date in the config yaml,iteration  iteration/version of this model,"snapshotindex  which snapshot for prediction (if -1, latest)",shuffle  which shuffle of the training dataset,trainingsetindex  which training set fraction to generate model,scorer  scorer/network name - DLC's GetScorerName(),config_template  dictionary of the config for analyze_videos(),project_path  DLC's project_path in config relative to root,model_prefix,model_description,paramset_idx
OpenField-5,openfield,Oct30,0,-1,1,0,DLCresnet50openfieldOct30shuffle1,=BLOB=,openfield-Pranav-2018-10-30,,Open field model trained 5 iterations,1


### Model Evaluation

Next, all inserted models can be evaluated with a similar `populate` method, which will
insert the relevant output from DLC's `evaluate_network` function.

In [47]:
model.ModelEvaluation.heading

model_name           : varchar(64)                  # user-friendly model name
---
train_iterations     : int                          # Training iterations
train_error=null     : float                        # Train error (px)
test_error=null      : float                        # Test error (px)
p_cutoff=null        : float                        # p-cutoff used
train_error_p=null   : float                        # Train error with p-cutoff
test_error_p=null    : float                        # Test error with p-cutoff

If your project was initialized in a version of DeepLabCut other than the one you're currently using, model evaluation may report key errors. Specifically, your `config.yaml` may not specify `multianimalproject: false`.

In [13]:
model.ModelEvaluation.populate()

Running  DLC_resnet50_openfieldOct30shuffle1_5  with # of training iterations: 5




Running evaluation ...


116it [01:17,  1.50it/s]


Analysis is done and the results are stored (see evaluation-results) for snapshot:  snapshot-5
Results for 5  training iterations: 95 1 train error: 245.06 pixels. Test error: 247.52  pixels.
With pcutoff of 0.4  train error: 239.24 pixels. Test error: 238.07 pixels
Thereby, the errors are given by the average distances between the labels by DLC and the scorer.
The network is evaluated and the results are stored in the subdirectory 'evaluation_results'.
Please check the results, then choose the best model (snapshot) for prediction. You can update the config.yaml file with the appropriate index for the 'snapshotindex'.
Use the function 'analyze_video' to make predictions on new videos.
Otherwise, consider adding more labeled-data and retraining the network (see DeepLabCut workflow Fig 2, Nath 2019)


In [14]:
model.ModelEvaluation()

model_name  user-friendly model name,train_iterations  Training iterations,train_error  Train error (px),test_error  Test error (px),p_cutoff  p-cutoff used,train_error_p  Train error with p-cutoff,test_error_p  Test error with p-cutoff
OpenField-5,5,245.06,247.52,0.4,239.24,238.07


### Pose Estimation

To put this model to use, we'll conduct pose estimation on the video generated in the [DataDownload notebook](./00_DataDownload_Optional.ipynb). First, we need to update the `VideoRecording` table with the recording from a session.

In [12]:
key = {'subject': 'subject6',
       'session_datetime': '2021-06-02 14:04:22',
       'recording_id': '1', 'camera_id': 1}
model.VideoRecording.insert1(key)
                         # do not include an initial `/` in relative file paths   
key.update({'file_id': 1, 
            'file_path': 'openfield-Pranav-2018-10-30/videos/m3v1mp4-copy.mp4'})
model.VideoRecording.File.insert1(key, ignore_extra_fields=True)

In [13]:
model.VideoRecording.File()

subject,session_datetime,recording_id,file_id,"file_path  filepath of video, relative to root data directory"
subject6,2021-06-02 14:04:22,1,1,openfield-Pranav-2018-10-30/videos/m3v1mp4-copy.mp4


To automatically get recording information about this file, we can use the `make` function of the `RecordingInfo` table.

In [14]:
model.RecordingInfo.populate()
model.RecordingInfo()

subject,session_datetime,recording_id,px_height  height in pixels,px_width  width in pixels,nframes  number of frames,fps  (Hz) frames per second,recording_datetime  Datetime for the start of the recording,recording_duration  video duration in seconds
subject6,2021-06-02 14:04:22,1,480,640,63,30,,2.1


 Next, we need to specify if the `PoseEstimation` table should load results from an existing file or trigger the estimation command. Here, we can also specify parameters accepted by the `analyze_videos` function as a dictionary.

In [4]:
key = (model.VideoRecording & {'recording_id': '1'}).fetch1('KEY')
key.update({'model_name': 'OpenField-5', 'task_mode': 'trigger'})
key

{'subject': 'subject6',
 'session_datetime': datetime.datetime(2021, 6, 2, 14, 4, 22),
 'camera_id': 1,
 'recording_id': 1,
 'model_name': 'OpenField-5',
 'task_mode': 'trigger'}

In [5]:
model.PoseEstimationTask.insert_estimation_task(key,params={'save_as_csv':True})

In [None]:
model.PoseEstimation.populate()

By default, DataJoint will store the results of pose estimation in a subdirectory
>  processed_dir / videos / device_<#>_recording_<#>_model_<name>

Pulling processed_dir from `get_dlc_processed_dir`, and device/recording information 
from the `VideoRecording` table. The model name is taken from the primary key of the
`Model` table, with spaced replaced by hyphens.
    
We can get this estimation directly as a pandas dataframe.

In [9]:
model.PoseEstimation.get_trajectory(key)

scorer,OpenField-5,OpenField-5,OpenField-5,OpenField-5,OpenField-5,OpenField-5,OpenField-5,OpenField-5,OpenField-5,OpenField-5,OpenField-5,OpenField-5,OpenField-5,OpenField-5,OpenField-5,OpenField-5
bodyparts,leftear,leftear,leftear,leftear,rightear,rightear,rightear,rightear,snout,snout,snout,snout,tailbase,tailbase,tailbase,tailbase
coords,x,y,z,likelihood,x,y,z,likelihood,x,y,z,likelihood,x,y,z,likelihood
0,0.790677,7.965729,0.0,0.397091,115.835762,164.004028,0.0,0.518405,58.818291,4.837649,0.0,0.514612,4.134376,463.009460,0.0,0.717231
1,2.807120,10.973466,0.0,0.435590,10.124892,470.653931,0.0,0.514644,15.192053,472.954376,0.0,0.509128,4.339864,462.988220,0.0,0.711722
2,9.415764,16.290619,0.0,0.400282,10.313096,470.749420,0.0,0.513927,15.203813,473.046204,0.0,0.509683,4.241215,463.060944,0.0,0.709923
3,8.467562,15.072682,0.0,0.407272,10.299086,470.716309,0.0,0.515085,14.914599,472.946564,0.0,0.507931,4.296385,463.385590,0.0,0.704007
4,1.952696,10.845516,0.0,0.388948,10.309416,470.719910,0.0,0.511848,14.834159,472.920166,0.0,0.504538,4.267960,463.363556,0.0,0.702786
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
58,5.497818,12.181496,0.0,0.503961,10.725180,470.430847,0.0,0.505526,15.931270,474.692963,0.0,0.507564,9.060750,481.278442,0.0,0.704268
59,4.192788,10.005349,0.0,0.455334,10.476208,470.846588,0.0,0.499014,3.508626,26.821339,0.0,0.537064,3.786860,462.760376,0.0,0.689251
60,2.216149,10.115728,0.0,0.420141,10.644203,471.036102,0.0,0.487316,3.166887,26.835373,0.0,0.548109,8.188313,481.524902,0.0,0.707340
61,5.196610,10.838953,0.0,0.484508,178.007233,72.935913,0.0,0.576688,4.478888,26.513628,0.0,0.531905,4.350879,462.553345,0.0,0.703052


<!-- Next Steps -->
.