## Supervised behavior analysis

Takes DLC tracks and behavior labels (collected in BORIS), trains ML model to predict behavior. 

In [1]:
#Imports
import os
from glob import glob 

#Don't use the GPU for this code
os.environ["CUDA_VISIBLE_DEVICES"] = ''

Uses the 'behaveml' package to manage the behavior and tracking data. This data is managed in a VideosetDataFrame object. The package also does some postprocessing and creates features useful for ML.

In [3]:
from behaveml import VideosetDataFrame, clone_metadata
from behaveml.io import get_sample_data_paths
from behaveml import mars_feature_maker, cnn_probability_feature_maker, interpolate_lowconf_points

In [4]:
tracking_files, boris_files = get_sample_data_paths()

To create the DataFrame object, we provide a list of tracking DLC files, and a list of BORIS behavior label files (ordered so that .boris files match with .dlc track for )

In [6]:
frame_length = None              # (float) length of entire horizontal shot
fps = 30                         # (int) frames per second

#Metadata is a dictionary that attaches each of the above parameters to the video/behavior annotations
metadata = clone_metadata(tracking_files, 
                          label_files = boris_files, 
                          fps = fps)

dataset = VideosetDataFrame(metadata)

In [7]:
#Filter out low-confidence DLC tracks and interpolate those points instead
print("Interpolating low-confidence tracking points")
interpolate_lowconf_points(dataset)

#Now create features on this dataset
print("Calculating MARS features")
dataset.add_features(mars_feature_maker, 
                     featureset_name = 'MARS', 
                     add_to_features = True)

#Note: by default this keras code will try to use CUDA. 
print("Calculating 1D CNN pretrained network features")
dataset.add_features(cnn_probability_feature_maker, 
                     featureset_name = '1dcnn', 
                     add_to_features = True)

Interpolating low-confidence tracking points
processing /home/blansdel/projects/behaveml/behaveml/data/dlc/e3v813a-20210610T120637-121213DLC_dlcrnetms5_pilot_studySep24shuffle1_100000_el_filtered.csv
processing /home/blansdel/projects/behaveml/behaveml/data/dlc/e3v813a-20210610T121558-122141DLC_dlcrnetms5_pilot_studySep24shuffle1_100000_el_filtered.csv
processing /home/blansdel/projects/behaveml/behaveml/data/dlc/e3v813a-20210610T122332-122642DLC_dlcrnetms5_pilot_studySep24shuffle1_100000_el_filtered.csv
processing /home/blansdel/projects/behaveml/behaveml/data/dlc/e3v813a-20210610T122758-123309DLC_dlcrnetms5_pilot_studySep24shuffle1_100000_el_filtered.csv
processing /home/blansdel/projects/behaveml/behaveml/data/dlc/e3v813a-20210610T123521-124106DLC_dlcrnetms5_pilot_studySep24shuffle1_100000_el_filtered.csv
Calculating MARS features
['adult_x_nose', 'adult_x_leftear', 'adult_x_rightear', 'adult_x_neck', 'adult_x_lefthip', 'adult_x_righthip', 'adult_x_tail', 'adult_y_nose', 'adult_y_le

2022-03-08 12:04:31.222081: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-03-08 12:04:31.223911: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2022-03-08 12:04:31.293855: E tensorflow/stream_executor/cuda/cuda_driver.cc:328] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2022-03-08 12:04:31.293899: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: splpdnb02.stjude.sjcrh.local
2022-03-08 12:04:31.293907: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: splpdnb02.stjude.sjcrh.local
2022-03-08 12:04:31.294397: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 495.44.0
2022-03-08 12:04:31.294427: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 495.44.0
2022-03-08 12:04:31.2

Building baseline 1D CNN model with parameters:
dropout_rate: 0.5, learning_rate: 0.0001, layer_channels: (128, 64, 64), conv_size: 5


100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:08<00:00,  1.37s/it]


Building baseline 1D CNN model with parameters:
dropout_rate: 0.5, learning_rate: 0.0001, layer_channels: (128, 64, 64), conv_size: 5


100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:08<00:00,  1.42s/it]


Building baseline 1D CNN model with parameters:
dropout_rate: 0.5, learning_rate: 0.0001, layer_channels: (128, 64, 64), conv_size: 5


100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:08<00:00,  1.44s/it]


Building baseline 1D CNN model with parameters:
dropout_rate: 0.5, learning_rate: 0.0001, layer_channels: (128, 64, 64), conv_size: 5


100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:09<00:00,  1.63s/it]


['1dcnn__prob_attack',
 '1dcnn__prob_investigation',
 '1dcnn__prob_mount',
 '1dcnn__prob_other']

## Supervised learning

Sample ML model

In [8]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from xgboost import XGBClassifier
from sklearn.model_selection import cross_val_predict, GroupKFold
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
from sklearn.pipeline import Pipeline

splitter = GroupKFold(n_splits = dataset.n_videos)
model = XGBClassifier()

print("Fitting ML model with (group) LOO CV")
predictions = cross_val_predict(model, 
                                dataset.features, 
                                dataset.labels, 
                                groups = dataset.group, 
                                cv = splitter,
                                verbose = 1,
                                n_jobs = 5)

#Append these for later use
dataset.data['prediction'] = predictions
acc = accuracy_score(dataset.labels, predictions)
f1 = f1_score(dataset.labels, predictions)
pr = precision_score(dataset.labels, predictions)
re = recall_score(dataset.labels, predictions)
print("Acc", acc, "F1", f1, 'precision', pr, 'recall', re)

Fitting ML model with (group) LOO CV


[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
[Parallel(n_jobs=5)]: Done   2 out of   5 | elapsed:  3.7min remaining:  5.5min


Acc 0.9378347067695468 F1 0.7299145299145299 precision 0.7713769570453634 recall 0.6926820475847152


[Parallel(n_jobs=5)]: Done   5 out of   5 | elapsed:  5.0min finished
