# DataJoint U24 - Workflow Behavior

First, please install both `element-deeplabcut` and `workflow-deeplabcut` locally. We 
recommend launching a new conda environment and using `pip install -e ./<dir>`. For more
information, see our [install instructions](https://github.com/kabilar/datajoint-elements/blob/main/install.md). 

Next, let's change directory to the main workflow directory.

In [1]:
import os; from pathlib import Path
# change to the upper level folder to detect dj_local_conf.json
if os.path.basename(os.getcwd())=='notebooks': os.chdir('..')
assert os.path.basename(os.getcwd())=='workflow-deeplabcut', ("Please move to the "
                                                              + "workflow directory")

Second, download the example data we'll be using from the DeepLabCut repository. We will
use the [example openfield data](https://github.com/DeepLabCut/DeepLabCut/tree/master/examples/openfield-Pranav-2018-10-30) 
from the DeepLabCut github repository. If you have already cloned this repository, you 
may have this data on your machine already. [This link](https://downgit.github.io/#/home?url=https://github.com/DeepLabCut/DeepLabCut/tree/master/examples/openfield-Pranav-2018-10-30) via [DownGit](https://downgit.github.io/) will start the single-directory download 
automatically. After downloading, please add the path to this directory to the `custom`
field of your datajoint config file as `dlc_root_data_dir`. 

In [2]:
import datajoint as dj; dj.config.load('dj_local_conf.json')
from element_interface.utils import find_full_path
data_dir = find_full_path(dj.config['custom']['dlc_root_data_dir'],
                          'openfield-Pranav-2018-10-30')
assert data_dir.exists(), "Please check the that you have the folder openfield-Pranav"

As part of the DeepLabCut demo setup process, you would run the following additional
commands, as outlined in their 
[demo notebook](https://github.com/DeepLabCut/DeepLabCut/blob/master/examples/JUPYTER/Demo_labeledexample_Openfield.ipynb).
These steps establish the project path within the demo config file.

In [3]:
from deeplabcut.create_project.demo_data import load_demo_data as dlc_load_demo
dlc_load_demo(data_dir / 'config.yaml')

Loaded, now creating training data...
The training dataset is successfully created. Use the function 'train_network' to start training. Happy training!


Later, we'll use the first few seconds of the training video as a 'separate session' to model
the pose estimation feature of this pipeline. `ffmpeg` is a dependency of DeepLabCut
that can splice the training video for a demonstration purposes. The command below saves
the first 2 seconds of the training video as a copy.

In [4]:
vid_path = str(data_dir).replace(" ", "\ ") + '/videos/m3v1mp4'
cmd = (f'ffmpeg -n -hide_banner -loglevel error -ss 2 -i {vid_path}.mp4 -vcodec copy '
       + f'-acodec copy {vid_path}-copy.mp4')
os.system(cmd)

File '/Volumes/GoogleDrive/My Drive/Dev/DeepLabCut/examples/JUPYTER/openfield-Pranav-2018-10-30/videos/m3v1mp4-copy.mp4' already exists. Exiting.


256

Now, we can activate the `dlc` schema and import some data from files stored in this
directory under `user_data/<file>.csv`. Subject and session data imports like these are 
common across DataJoint workflows. They include fields like `subject_birth_date` and 
`session_datetime`

The recordings file specifies all videos across sessions, including both model training
videos and videos for later analysis. The config parameter csv is used in the 
`ModelTrainingParamSet` table, which features a longblob field for any parameters 
required in model training. Both shuffle and trainingsetindex are required in this 
field, but many others can be added to later be passed to DLC's `train_model` function.
In this case, `maxiters` to only run a handful of training iterations for our example 
model.

In [1]:
from workflow_deeplabcut.pipeline import lab, subject, session, dlc
from workflow_deeplabcut.ingest import ingest_subjects, ingest_sessions, ingest_dlc_items
ingest_subjects(); ingest_sessions(); ingest_dlc_items()

ModuleNotFoundError: No module named 'workflow_deeplabcut'

Let's look at the tables this populated.

In [6]:
subject.Subject()

subject,sex,subject_birth_date,subject_description
subject1,M,2020-12-30,test animal
subject2,F,2020-11-30,test animal
subject3,F,2020-12-30,test animal
subject4,M,2021-02-12,test animal
subject5,F,2020-01-01,rich
subject6,M,2020-01-01,manuel
subject7,U,2020-08-30,test animal
subject8,F,2020-09-30,test animal
subjectX,F,2020-01-01,manuel
subjectY,M,2020-01-01,manuel


In [7]:
session.Session * session.SessionDirectory * session.SessionNote

subject,session_datetime,session_dir  Path to the data directory for a session,session_note
subject5,2020-04-15 11:16:38,Reaching-Mackenzie-2018-08-30/,"Successful data collection, no notes"
subject6,2021-06-02 14:04:22,openfield-Pranav-2018-10-30/,Model Training Session
subject6,2021-06-03 14:04:22,openfield-Pranav-2018-10-30/,Test Session


Note that the video recording filepaths are specified relative to the root directory
defined within the workflow. This allows multiple users to operate on the same 
filestructures across different machines. Because the root directory is passed as a 
list, there can be multiple root directories on a given machine.

In [8]:
from workflow_deeplabcut.paths import get_dlc_root_data_dir
for d in get_dlc_root_data_dir():
    print(f'Root: {d}')
dlc.VideoRecording.File()

Root: /Volumes/GoogleDrive/My Drive/Dev/DeepLabCut/examples/JUPYTER/
Root: /Users/cb/Documents/U24_SampleData/


subject,session_datetime,camera_id,recording_id,"file_path  filepath of video, relative to root data directory"
subject5,2020-04-15 11:16:38,1,3,Reaching-Mackenzie-2018-08-30/videos/reachingvideo1.avi
subject6,2021-06-02 14:04:22,1,1,openfield-Pranav-2018-10-30/videos/m3v1mp4.mp4
subject6,2021-06-03 14:04:22,1,2,openfield-Pranav-2018-10-30/videos/m3v1mp4-copy.mp4


In [9]:
dlc.ModelTrainingParamSet()

paramset_idx,paramset_desc,param_set_hash  hash identifying this parameterset,params  dictionary of all applicable parameters
1,OpenField,acf342ee-75e0-6782-b5ef-f0d7d359aa17,=BLOB=
2,Reaching,8ea3dc9b-e9eb-2709-97ee-9abe32068830,=BLOB=
3,ExtraExample,ee0be706-5703-acbb-0b8b-6c8e56d8ac68,=BLOB=


We can take a closer look at the parameters specified with the `fetch` command. Using
the `ingest_dlc_items`, this naturally captures the full `config.yaml`

In [10]:
import pprint
pprint.pprint((dlc.ModelTrainingParamSet & 'paramset_idx=1'
               ).fetch(as_dict=True))

[{'param_set_hash': UUID('acf342ee-75e0-6782-b5ef-f0d7d359aa17'),
  'params': {'Task': 'openfield',
             'TrainingFraction': [0.95],
             'alphavalue': 0.7,
             'batch_size': 4,
             'bodyparts': ['snout', 'leftear', 'rightear', 'tailbase'],
             'colormap': 'jet',
             'corner2move2': [50, 50],
             'cropping': False,
             'date': 'Oct30',
             'default_augmenter': 'imgaug',
             'default_net_type': 'resnet_50',
             'dotsize': 8,
             'filter_type': '',
             'identity': None,
             'iteration': 0,
             'maxiters': '5',
             'move2corner': True,
             'multianimalproject': None,
             'numframes2pick': 20,
             'pcutoff': 0.4,
             'scorer': 'Pranav',
             'scorer_legacy': 'False',
             'shuffle': '1',
             'skeleton': [],
             'skeleton_color': 'black',
             'snapshotindex': -1,
          

For model training, we'll work with the following session and parameters. First, we 
insert a training task into the cue.

In [15]:
key=(dlc.VideoRecording&'recording_id=1').fetch1('KEY')
key.update({'paramset_idx':1,'training_id':1,
            'project_path':'openfield-Pranav-2018-10-30/'})
dlc.TrainingTask.insert1(key, skip_duplicates=True)

In the next step, all new entries in this table will undergo training.

In [16]:
dlc.TrainingTask()

subject,session_datetime,camera_id,recording_id,paramset_idx,training_id,model_prefix,project_path  DLC's project_path in config relative to root
subject6,2021-06-02 14:04:22,1,1,1,1,,openfield-Pranav-2018-10-30/


In [17]:
dlc.ModelTraining.populate()

In [18]:
dlc.ModelTraining()

subject,session_datetime,camera_id,recording_id,paramset_idx,training_id,"latest_snapshot  latest exact snapshot index (i.e., never -1)",config_template  stored full config file
subject6,2021-06-02 14:04:22,1,1,1,1,5,=BLOB=


To resume training from a previous instance, one would need to 
[edit the relevant config file](https://github.com/DeepLabCut/DeepLabCut/issues/70) and
adjust the `maxiters` paramset to a higher threshold (e.g., 10 for 5 more itterations).
Emperical work from the Mathis team suggests 200k iterations for any true use-case.

Next, we can optionally ingest all body parts from a given config with one command, including
a list of body part descriptions.

In [26]:
dlc_config_path = 'openfield-Pranav-2018-10-30/config.yaml'
bp_desc=['Left Ear', 'Right Ear', 'Snout Position', 'Base of Tail']
dlc.BodyPart.insert_from_config(dlc_config_path,bp_desc)

Existing body parts: []
New body parts: ['leftear' 'rightear' 'snout' 'tailbase']


Insert 4 new body part(s)? [yes, no]:  yes


Alternatively, the above step will be included when inserting a model into the model 
table.

In [None]:
dlc.Model.insert_new_model(model_name='OpenField-1010',dlc_config=dlc_config_path,
                           shuffle=1,trainingsetindex=0,
                           model_description='Open field model trained 1010 iterations',
                           body_part_descriptions = bp_desc,paramset_idx=1)

In [24]:
dlc.Model()

model_name  user-friendly model name,task  task in the config yaml,date  date in the config yaml,iteration  iteration/version of this model,"snapshotindex  which snapshot for prediction (if -1, latest)",shuffle  which shuffle of the training dataset,trainingsetindex  which training set fraction to generate model,scorer  scorer/network name - DLC's GetScorerName(),config_template  dictionary of the config for analyze_videos(),project_path  DLC's project_path in config relative to root,dlc_version  keeps the deeplabcut version,model_prefix,model_description,paramset_idx
OpenField-1010,openfield,Oct30,0,-1,1,0,DLCresnet50openfieldOct30shuffle1,=BLOB=,openfield-Pranav-2018-10-30,2.2.0.6,,Open field model trained 1010 iterations,1


In [27]:
dlc.BodyPart()

body_part,body_part_description
leftear,Left Ear
rightear,Right Ear
snout,Snout Position
tailbase,Base of Tail


Next, all inserted models can be evaluated with a similar `populate` method, which will
insert the relevant output from DLC's `evaluate_network` function.

In [None]:
dlc.ModelEvaluation.populate()
dlc.ModelEvaluation()

To put this model to use, we'll conduct pose estimation on the video we made earlier.
Here, we can also specify parameters accepted by the `analyze_videos` function as a 
dictionary.

In [None]:
key=(dlc.VideoRecording&'recording_id=2').fetch1('KEY');
key.update({'model_name': 'OpenField-1010', 'task_mode': 'trigger'})
dlc.PoseEstimationTask.insert_estimation_task(key,params={'save_as_csv':True},
                                              skip_duplicates=True)

In [None]:
dlc.PoseEstimation.populate()

By default, DataJoint will store the results of pose estimation in a subdirectory
>  processed_dir / videos / device_<#>_recording_<#>_model_<name>

Pulling processed_dir from `get_dlc_processed_dir`, and device/recording information 
from the `VideoRecording` table. The model name is taken from the primary key of the
`Model` table, with spaced replaced by hyphens.
    
We can get this estimation directly as a pandas dataframe.

In [None]:
dlc.PoseEstimation.get_trajectory(key)

# From scratch didactic guide - needs work

This notebook will describe the steps to explore the lab and animal management tables 
created by the elements. Prior to using this notebook, please refer to the README for the installation instructions.

Importing the module `workflow_deeplabcut.pipeline` is sufficient to create tables 
inside the elements. This workflow comes prepackaged with example data and ingestion 
functions to populate lab, subject, and session tables.

## Workflow architecture

In [None]:
from element_lab import lab
from element_animal import subject
from element_session import sessions

In [None]:
lab.Lab()

In [None]:
dj.Diagram(lab)

In [None]:
subject.Subject()

In [None]:
dj.Diagram(subject)

In [None]:
session.Session()

In [None]:
dj.Diagram(session)

## Explore each table

In [None]:
# check table definition with describe()
subject.Subject.describe()

## Insert data into Manual and Lookup tables

Tables in this workflow are either manual tables or lookup tables. To insert into these tables, DataJoint provide method `.insert1()` and `insert()`.

In [None]:
subject.Subject.insert1(
    dict(subject='subject1', sex='M', subject_birth_date='2020-12-30', 
         subject_description='test animal'), skip_duplicates=True)
subject.Subject.insert1(
    ('subject2', 'F', '2020-11-30', 'test animal'), skip_duplicates=True)

`skip_duplicates=True` will prevent an error if you already have data for the primary keys in a given entry.

In [None]:
subject.Subject()

In [None]:
# `insert()` takes a list of dicts or tuples
subject.Subject.insert(
    [dict(subject='subject3', sex='F', subject_birth_date='2020-12-30', 
            subject_description='test animal'),
     dict(subject='subject4', sex='M', subject_birth_date='2021-02-12', 
          subject_description='test animal')
    ],
    skip_duplicates=True)
subject.Subject.insert(
    [
        ('subject7', 'U', '2020-08-30', 'test animal'),
        ('subject8', 'F', '2020-09-30', 'test animal')
    ],
    skip_duplicates=True)

In [None]:
subject.Subject()

For more documentation of insert, please refer to [DataJoint Docs](https://docs.datajoint.io/python/manipulation/1-Insert.html) and [DataJoint playground](https://playground.datajoint.io/)

## Insert into Manual and Lookup tables with Graphical User Interface