# DataJoint U24 - Workflow DeepLabCut

## Interactively run the workflow

The workflow requires a DeepLabCut project with labeled data.
- If you haven't configured the data, refer to [00-DataDownload](./00-DataDownload_Optional.ipynb) and [01-Configure](./01-Configure.ipynb).
- To overview the schema structures, refer to [02-WorkflowStructure](02-WorkflowStructure_Optional.ipynb).
- If you'd likea more automatic approach, refer to [03-Automate](03-Automate_optional.ipynb).

Let's change the directory to the package root directory to load the local config, `dj_local_conf.json`.

In [1]:
import os
# change to the upper level folder to detect dj_local_conf.json
if os.path.basename(os.getcwd())=='notebooks': os.chdir('..')
assert os.path.basename(os.getcwd())=='workflow-deeplabcut', ("Please move to the "
                                                              + "workflow directory")

`Pipeline.py` activates the DataJoint `elements` and declares other required tables.

In [None]:
import datajoint as dj
from workflow_deeplabcut.pipeline import lab, subject, session, dlc

#### Inserting entries into upstream tables

In general, you can manually insert entries into each table by directly providing values for each column as a dictionary. Be sure to follow the type specified in the table definition.

In [3]:
subject.Subject.heading

subject              : varchar(32)                  # 
---
sex                  : enum('M','F','U')            # 
subject_birth_date   : date                         # 
subject_description="" : varchar(1024)                # 

In [4]:
subject.Subject.insert1(dict(subject='subject6', 
                             sex='M', 
                             subject_birth_date='2020-01-01', 
                             subject_description='manuel'))

In [5]:
subject.Subject()

subject,sex,subject_birth_date,subject_description
subject6,M,2020-01-01,manuel


In [6]:
session.Session.describe();

-> subject.Subject
session_datetime     : datetime(3)                  



In [7]:
session.Session.heading

# 
subject              : varchar(32)                  # 
session_datetime     : datetime(3)                  # 

In [8]:
session_keys = [dict(subject='subject6', session_datetime='2021-06-02 14:04:22'),
                dict(subject='subject6', session_datetime='2021-06-03 14:04:22')]
session.Session.insert(session_keys)
session.Session()

subject,session_datetime
subject6,2021-06-02 14:04:22
subject6,2021-06-03 14:04:22


## Inserting recordings

In [9]:
dlc.VideoRecording.heading

# 
subject              : varchar(32)                  # 
session_datetime     : datetime(3)                  # 
camera_id            : int                          # 
recording_id         : int                          # 
---
recording_start_time : datetime                     # 

The `VideoRecording` table retains unique recordings file specifies all videos across sessions, including both model training
videos and videos for later analysis.

In [10]:
recordings = [{'recording_id': '1',
               'subject': 'subject6',
               'session_datetime': '2021-06-02 14:04:22',
               'recording_start_time': '2021-06-02 14:07:00',
               'camera_id': '1'},
              {'recording_id': '2',
               'subject': 'subject6',
               'session_datetime': '2021-06-03 14:04:22',
               'recording_start_time': '2021-06-04 14:07:00',
               'camera_id': '1'}]
dlc.VideoRecording.insert(recordings)

The related part table allows for multiple files for a given recording session.

In [11]:
dlc.VideoRecording.File.heading

subject              : varchar(32)                  # 
session_datetime     : datetime(3)                  # 
camera_id            : int                          # 
recording_id         : int                          # 
file_path            : varchar(255)                 # filepath of video, relative to root data directory

In [12]:
recordings[0].update({'file_path': 'openfield-Pranav-2018-10-30/videos/m3v1mp4.mp4'})
recordings[1].update({'file_path': 'openfield-Pranav-2018-10-30/videos/m3v1mp4-copy.mp4'})
dlc.VideoRecording.File.insert(recordings, ignore_extra_fields=True)

In [13]:
dlc.VideoRecording.File()

subject,session_datetime,camera_id,recording_id,"file_path  filepath of video, relative to root data directory"
subject6,2021-06-02 14:04:22,1,1,openfield-Pranav-2018-10-30/videos/m3v1mp4.mp4
subject6,2021-06-03 14:04:22,1,2,openfield-Pranav-2018-10-30/videos/m3v1mp4-copy.mp4


The `TrainingVideo` table handles all files generated in the video labeling process, including the `h5`, `csv`, and `png` files under the `labeled-data` directory. While these aren't required for launching DLC training, it may be helpful to retain records. DLC will instead refer to the `mat` file located under the `training-datasets` directory.

In [14]:
dlc.TrainingVideo.insert1({'video_set_id': 1})
csv_path = 'openfield-Pranav-2018-10-30/labeled-data/m4s1/CollectedData_Pranav.csv'
dlc.TrainingVideo.File.insert1({'video_set_id': 1,
                                'file_path': csv_path})

In [15]:
video_key = (dlc.VideoRecording&'recording_id=2').fetch1('KEY')
video_key.update({'video_set_id': 1})
dlc.TrainingVideo.VideoRecording.insert1(video_key)

## Training a DLC Network

First, we'll add a `ModelTrainingParamSet`. This is a lookup table that we can reference when training a model.

In [16]:
dlc.ModelTrainingParamSet.heading

# Parameters to specify a DLC model training instance
paramset_idx         : smallint                     # 
---
paramset_desc        : varchar(128)                 # 
param_set_hash       : uuid                         # hash identifying this parameterset
params               : longblob                     # dictionary of all applicable parameters

The `params` longblob should be a dictionary that includes all items to be included in model training via the `train_network` function. At minimum, this is the contents of the project's config file, as well as `suffle` and `trainingsetindex`, which are not included in the config. 

In [None]:
from deeplabcut import train_network
help(train_network) # for more information on optional parameters

Below, we give the parameters and index and description and load the config contents. We can then overwrite any defaults, including `maxiters`, to restrict our training iterations to 5.

In [18]:
import yaml
from element_interface.utils import find_full_path
from workflow_deeplabcut.paths import get_dlc_root_data_dir

paramset_idx = 1; paramset_desc='OpenField'
config_path = find_full_path(get_dlc_root_data_dir(), 
                             'openfield-Pranav-2018-10-30/config.yaml')
with open(config_path, 'rb') as y:
    config_params = yaml.safe_load(y)
training_params = {'shuffle': '1',
                   'trainingsetindex': '0',
                   'maxiters': '5',
                   'scorer_legacy': 'False',
                   'maxiters': '5'}
config_params.update(training_params)
dlc.ModelTrainingParamSet.insert_new_params(paramset_idx=paramset_idx,
                                            paramset_desc=paramset_desc,
                                            params=config_params)

Then we add training to the the `TrainingTask` table. The `ModelTraining` table can automatically train and populate all tasks outlined in `TrainingTask`.

In [19]:
dlc.TrainingTask.heading

# Specification for a DLC model training instance
video_set_id         : int                          # 
paramset_idx         : smallint                     # 
training_id          : int                          # 
---
model_prefix=""      : varchar(32)                  # 
project_path=""      : varchar(255)                 # DLC's project_path in config relative to root

In [20]:
key={'video_set_id': 1, 'paramset_idx':1,'training_id':1,
     'project_path':'openfield-Pranav-2018-10-30/'}
dlc.TrainingTask.insert1(key, skip_duplicates=True)
dlc.TrainingTask()

video_set_id,paramset_idx,training_id,model_prefix,project_path  DLC's project_path in config relative to root
1,1,1,,openfield-Pranav-2018-10-30/


In [21]:
pip install numpy==1.20

Note: you may need to restart the kernel to use updated packages.


In [None]:
dlc.ModelTraining.populate()

In [23]:
dlc.ModelTraining()

video_set_id,paramset_idx,training_id,"latest_snapshot  latest exact snapshot index (i.e., never -1)",config_template  stored full config file
1,1,1,5,=BLOB=


To training from a previous instance, one would need to 
[edit the relevant config file](https://github.com/DeepLabCut/DeepLabCut/issues/70) and
adjust the `maxiters` paramset (if present) to a higher threshold (e.g., 10 for 5 more itterations).
Emperical work from the Mathis team suggests 200k iterations for any true use-case.

## Tracking Joints/Body Parts

The DLC schema uses a lookup table for managing Body Parts tracked across models.

In [24]:
dlc.BodyPart.heading

# 
body_part            : varchar(32)                  # 
---
body_part_description="" : varchar(1000)                # 

This table is equipped with two helper functions. First, we can identify all the new body parts from a given config file.

In [25]:
dlc.BodyPart.extract_new_body_parts(config_path)

Existing body parts: []
New body parts: ['leftear' 'rightear' 'snout' 'tailbase']


array(['leftear', 'rightear', 'snout', 'tailbase'], dtype='<U8')

Now, we can make a list of descriptions in the same order, and insert them into the table

In [26]:
bp_desc=['Left Ear', 'Right Ear', 'Snout Position', 'Base of Tail']
dlc.BodyPart.insert_from_config(config_path,bp_desc)

Existing body parts: []
New body parts: ['leftear' 'rightear' 'snout' 'tailbase']
New descriptions: ['Left Ear', 'Right Ear', 'Snout Position', 'Base of Tail']


Insert 4 new body part(s)? [yes, no]:  yes


If we skip this step, body parts (without descriptions) will be added when we insert a model. We can [update](https://docs.datajoint.org/python/v0.13/manipulation/3-Cautious-Update.html) empty descriptions at any time.

## Declaring a Model

If training appears successful, the result can be inserted into the `Model` table for automatic evaluation.

In [27]:
dlc.Model.insert_new_model(model_name='OpenField-5',dlc_config=config_path,
                           shuffle=1,trainingsetindex=0,
                           model_description='Open field model trained 5 iterations',
                           paramset_idx=1)

--- DLC Model specification to be inserted ---
	model_name: OpenField-5
	model_description: Open field model trained 5 iterations
	scorer: DLCresnet50openfieldOct30shuffle1
	task: openfield
	date: Oct30
	iteration: 0
	snapshotindex: -1
	shuffle: 1
	trainingsetindex: 0
	project_path: openfield-Pranav-2018-10-30
	paramset_idx: 1
	-- Template for config.yaml --
		Task: openfield
		TrainingFraction: [0.95]
		batch_size: 4
		cropping: False
		date: Oct30
		iteration: 0
		project_path: /Volumes/GoogleDrive/My Drive/Dev/DeepLabCut/examples/JUPYTER/openfield-Pranav-2018-10-30
		snapshotindex: -1
		x1: 0
		x2: 640
		y1: 277
		y2: 624


Proceed with new DLC model insert? [yes, no]:  yes


Existing body parts: ['leftear' 'rightear' 'snout' 'tailbase']
New body parts: []


In [28]:
dlc.Model()

model_name  user-friendly model name,task  task in the config yaml,date  date in the config yaml,iteration  iteration/version of this model,"snapshotindex  which snapshot for prediction (if -1, latest)",shuffle  which shuffle of the training dataset,trainingsetindex  which training set fraction to generate model,scorer  scorer/network name - DLC's GetScorerName(),config_template  dictionary of the config for analyze_videos(),project_path  DLC's project_path in config relative to root,model_prefix,model_description,paramset_idx
OpenField-5,openfield,Oct30,0,-1,1,0,DLCresnet50openfieldOct30shuffle1,=BLOB=,openfield-Pranav-2018-10-30,,Open field model trained 5 iterations,1


In [29]:
dlc.BodyPart()

body_part,body_part_description
leftear,Left Ear
rightear,Right Ear
snout,Snout Position
tailbase,Base of Tail


## Model Evaluation

Next, all inserted models can be evaluated with a similar `populate` method, which will
insert the relevant output from DLC's `evaluate_network` function.

In [30]:
dlc.ModelEvaluation.heading

model_name           : varchar(64)                  # user-friendly model name
---
train_iterations     : int                          # Training iterations
train_error=null     : float                        # Train error (px)
test_error=null      : float                        # Test error (px)
p_cutoff=null        : float                        # p-cutoff used
train_error_p=null   : float                        # Train error with p-cutoff
test_error_p=null    : float                        # Test error with p-cutoff

In [None]:
dlc.ModelEvaluation.populate()

In [32]:
dlc.ModelEvaluation()

model_name  user-friendly model name,train_iterations  Training iterations,train_error  Train error (px),test_error  Test error (px),p_cutoff  p-cutoff used,train_error_p  Train error with p-cutoff,test_error_p  Test error with p-cutoff
OpenField-5,5,148.49,156.75,0.4,82.55,76.76


## Pose Estimation

To put this model to use, we'll conduct pose estimation on the video generated in the [DataDownload notebook](./00_DataDownload_Optional.ipynb). Here, we can also specify parameters accepted by the `analyze_videos` function as a dictionary.

In [3]:
key=(dlc.VideoRecording&'recording_id=2').fetch1('KEY');
key.update({'model_name': 'OpenField-5', 'task_mode': 'trigger'})
dlc.PoseEstimationTask.insert_estimation_task(key,params={'save_as_csv':True},
                                              skip_duplicates=True)

In [None]:
dlc.PoseEstimation.populate()

By default, DataJoint will store the results of pose estimation in a subdirectory
>  processed_dir / videos / device_<#>_recording_<#>_model_<name>

Pulling processed_dir from `get_dlc_processed_dir`, and device/recording information 
from the `VideoRecording` table. The model name is taken from the primary key of the
`Model` table, with spaced replaced by hyphens.
    
We can get this estimation directly as a pandas dataframe.

In [5]:
dlc.PoseEstimation.get_trajectory(key)

scorer,OpenField-5,OpenField-5,OpenField-5,OpenField-5,OpenField-5,OpenField-5,OpenField-5,OpenField-5,OpenField-5,OpenField-5,OpenField-5,OpenField-5,OpenField-5,OpenField-5,OpenField-5,OpenField-5
bodyparts,leftear,leftear,leftear,leftear,rightear,rightear,rightear,rightear,snout,snout,snout,snout,tailbase,tailbase,tailbase,tailbase
coords,x,y,z,likelihood,x,y,z,likelihood,x,y,z,likelihood,x,y,z,likelihood
0,-2.422083,4.344821,0.0,0.550124,103.509773,154.843369,0.0,0.494453,26.769926,27.644077,0.0,0.345101,12.271347,25.387495,0.0,0.420643
1,-3.597348,4.784353,0.0,0.570660,129.002899,158.958939,0.0,0.497367,113.209633,111.148224,0.0,0.396401,11.662391,25.403496,0.0,0.409297
2,-1.888346,4.047595,0.0,0.521887,26.252184,5.579991,0.0,0.431996,111.761734,114.333969,0.0,0.431438,12.388601,25.376640,0.0,0.381368
3,-2.663505,4.979667,0.0,0.553423,26.800587,6.133034,0.0,0.429278,634.744995,28.070696,0.0,0.353685,11.839536,24.747765,0.0,0.389143
4,-3.101933,4.946546,0.0,0.552119,117.008659,145.359375,0.0,0.427354,125.948250,110.696831,0.0,0.403272,11.647130,24.026539,0.0,0.382323
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
58,-2.179861,4.917321,0.0,0.543360,43.786873,4.242162,0.0,0.440749,70.179886,11.257265,0.0,0.385803,30.412106,22.074944,0.0,0.387526
59,-3.125555,5.428480,0.0,0.522461,43.495945,4.991209,0.0,0.433459,180.951401,125.325356,0.0,0.387515,30.751884,22.198009,0.0,0.371095
60,-2.475067,5.363192,0.0,0.550597,43.691952,4.568588,0.0,0.418626,28.472328,29.518694,0.0,0.372502,31.054819,22.189482,0.0,0.383042
61,-2.877043,5.124061,0.0,0.558322,43.844006,4.631758,0.0,0.438815,85.561989,12.051997,0.0,0.374683,30.825670,22.180286,0.0,0.397028


<!-- Next Steps -->
.