In [1]:
import deeplabcut as dlc
import numpy as np

## Set the below variables to match your setup

In [2]:
USER = 'dan'
PROJECT = 'multiview'
VIDEOS_TO_LABEL = [
    # put the full paths to the videos here
    '/home/dmurphy/multiview-dlc/data/videos/201612201054-Starbuck_Treadmill-Trial03-Speed20-1.avi',
    '/home/dmurphy/multiview-dlc/data/videos/201612201054-Starbuck_Treadmill-Trial03-Speed20-2.avi'
]
VIDEO_NAMES = [
    # put the names of the videos here
    '201612201054-Starbuck_Treadmill-Trial03-Speed20-1',
    '201612201054-Starbuck_Treadmill-Trial03-Speed20-2'
]
PROJECTION_MATRICES = [
    # put your projection matrices here, one per viewpoint
    # they should each be 3 x 4 numpy arrays
    np.array([[ 6.90941206e+01, -4.17703871e+02,  9.93451694e+00,  5.53906543e+02],
              [ 1.29719680e+02, -6.05473646e+01, -3.44204026e+02,  4.48483287e+02],
              [ 9.06237685e-01, -4.20760250e-01,  4.11590870e-02,  2.71965981e+00]]),
    np.array([[ 3.45215721e+02, -2.53024791e+02, -7.60161863e+00,  4.45226712e+02],
              [ 1.02410289e+02,  2.40476680e+01, -3.61350694e+02,  4.09231091e+02],
              [ 9.41345833e-01,  3.28273836e-01, -7.81300900e-02,  2.00987342e+00]])
]
# this is where the labeled videos will be placed in the end
ANALYSIS_OUTPUT_DIR = '/home/dmurphy/multiview-dlc/data2/'

In [3]:
# create a new deeplabcut project
cfg = dlc.create_new_project(PROJECT, USER, VIDEOS_TO_LABEL)

## Modify your config file
#### cfg points to the location of a configuration file, 'config.yaml', containing parameters for the project
#### for a multiview project, automatically cropping the images is not supported (though you could always crop the images beforehand yourself, as long as you know how to compensate for the cropping when projecting to 3 dimensions)
#### a batch size of more than 1 is also not supported
#### For this step, just open the config file and update the bodyparts list to match the parts you are tracking

## Label your data (via DeepLabCut's tool or another means)
#### Inside the project directory, there should now be a directory called 'labeled-data', which contains a subdirectory for each video. 
#### You need to put your labeled data into these directories in the same format that DeepLabCut uses:
- each extracted frame should be placed in its video's directory and named 'img_<frame #>.png'
- each directory should be given a pandas .hdf file, named 'CollectedData_\<USER\>.h5', containing the coordinates of all labels for that video
#### The labels should be synced; there should be the same number of labels for each video, and a label corresponding to frame n in one video should also correspond to frame n in all other videos

In [None]:
# create training dataset
dlc.create_multiview_training_dataset(cfg, VIDEO_NAMES)

#### If you now go to the dlc-models folder and go through all the subdirectories, you'll eventually see a folder with two subdirectories, called 'train' and 'test'. These contain files called 'pose_cfg.yaml', which contain more parameters for training the model.
#### Some parameters of interest here:
- **optimizer**: by default (if this parameter is not specified in the file) this is a simple gradient descent optimizer, but you can switch it to 'adam' (and I do so for the optional training step that I implemented).
- **multistep**: the schedule for the learning rate. The format is a list of 2-element lists, where the first element is the learning rate, and the second element is the step up to which that learning rate will be used. This is not used if you are using the adam optimizer.
- **adam_lr**: the max step size for the adam optimizer (only used if you're using the adam optimizer).
- **display_iters**: the default interval at which it will print out the training progress
- **save_iters**: how often it will save a snapshot of your model
- **global_scale, scale_jitter_lo, scale_jitter_up**: before feeding each image through the network, deeplabcut rescales it by global_scale, then applies another scaling by a factor randomly chosen between scale_jitter_lo and scale_jitter_up. This data augmentation can help with generalization, but it is ignored in the optional training step that I implemented because it could mess up 3D projection.

#### When training the network, you'll also see a subdirectory called 'log' appear in these directories. This allows you to use tensorboard to track the progress of your model (to do this, you can cd into the 'log' directory, type 'tensorboard --logdir=.', and open a browser and navigate to the url it gives you)

In [None]:
# train the multiview network
# there will likely be a few warnings, but it should be okay
# kill it when you think the network has converged (tensorboard's graphs may be useful for this)
dlc.train_multiview_network_step_1(cfg, PROJECTION_MATRICES)

In [None]:
# evaluate the network's performance
# the fourth argument should be either 1 or 2, to specify whether we are evaluating a network trained via
# train_multiview_network_step_1 or train_multiview_network_step_2
# the snapshot_index specifies the training step that is appended onto the name of the snapshot file that 
# we want to evaluate.

# so here I am assuming that we trained for 10000 steps (though realistically this is probably too few)
dlc.evaluate_multiview_network(cfg, VIDEO_NAMES, PROJECTION_MATRICES, 1, snapshot_index=10000)

#### OPTIONAL: If the worst predictions are concerning, you could try incorporating more frames into your dataset. Or, you could try fine-tuning an augmentation of the network, via train_multiview_network_step_2, which can also help improve the worst predictions.
#### NOTE: the augmented network reweights the amount of emphasis we place on what we see in each view; since at least 2 views are needed to get 3D predictions, it shouldn't help a network that only uses 2 views.

In [None]:
# usually trains pretty fast, maybe 5000 steps
dlc.train_multiview_network_step_2(cfg, PROJECTION_MATRICES, 10000)

In [None]:
dlc.evaluate_multiview_network(cfg, VIDEO_NAMES, PROJECTION_MATRICES, 2, snapshot_index=5000)

In [4]:
dlc.analyze_videos_multiview(cfg, VIDEOS_TO_LABEL, PROJECTION_MATRICES, 1, ANALYSIS_OUTPUT_DIR, snapshot_index=10000)

10000
10000 1
Using snapshot-10000 for model /home/dmurphy/multiview-dlc/dlc-projects/multiview-dan-2019-06-03/dlc-models/iteration-0/multiviewJun3-trainset95shuffle1
INFO:tensorflow:Restoring parameters from /home/dmurphy/multiview-dlc/dlc-projects/multiview-dan-2019-06-03/dlc-models/iteration-0/multiviewJun3-trainset95shuffle1/train/snapshot-10000


INFO:tensorflow:Restoring parameters from /home/dmurphy/multiview-dlc/dlc-projects/multiview-dan-2019-06-03/dlc-models/iteration-0/multiviewJun3-trainset95shuffle1/train/snapshot-10000
  0%|          | 0/9762 [00:00<?, ?it/s]

Duration of video [s]:  97.62 , recorded with  100.0 fps!
Overall # of frames:  9762  found with (before cropping) frame dimensions:  2048 1088
Extracting pose


  1%|          | 103/9762 [00:57<1:26:14,  1.87it/s]

KeyboardInterrupt: 