# Lab Meeting 10/2/20: 

## Can we use deep learning to detect grooming bouts in videos?   
___   
        

# INTRODUCTION: 
  
## Convolutional Neural Networks (CNNs) have had great success in learning to categorize images 
## Can we use them to categorize higher-order behaviors? 


<img src="lab_meeting/CNN_example.jpeg" width="1000">


# One cool idea is to transform data into images for training with CNNs: 
### Here's an example of spectrograms that were fed into a CNN for categorization
<img src="lab_meeting/sound_examples.png" width="1000">

___

# I tried to do something similar with behavior using some example videos from Robyn
## I started making some tools along the way that could be helpful for the lab

>### OUTLINE:
- ### **STEP 1: Clean the data**
- ### **STEP 2: Create a training set**
- ### **STEP 3: Train a neural network (cloud computing with google colab notebook)**
___

# Video clips: grooming vs. locomotion

In [None]:
import moviepy.editor as mpy
fps = 70
frame_st = 46100
frame_ed = 46400
clip = mpy.VideoFileClip("RS07082020b_08182020_frameLabeled.mp4")
clip = clip.subclip(frame_st/fps,frame_ed/fps).resize(height=360)
clip.ipython_display()

In [None]:
fps = 70
frame_st = 500
frame_ed = 800
clip = mpy.VideoFileClip("RS07082020b_08182020_frameLabeled.mp4")
clip = clip.subclip(frame_st/fps,frame_ed/fps).resize(height=360)
clip.ipython_display()

___
# Idea is to create images from behaviors
## Example: grooming vs. locomotion
*2s timelapsed traces for each body part*

<div align="center"> 

### Grooming
<img src="lab_meeting/groom.png" width="500">


### Locomotion
<img src="lab_meeting/locomotion.png" width="500">




___

> # STEP 1: CLEAN THE DATA
- ## Data from DeepLabCut often contains many labeling errors. 
- ### *Whether using B-SOID or an alternative, it is critical to feed in as high quality data as possible*  
    - *Note: Do everything you can as early as possible in the pipeline to fix labeling errors* (**ideally at acquisition**)

### An example clip with labeling errors:


In [None]:
fps = 70
frame_st = 1200
frame_ed = 1500
clip = mpy.VideoFileClip("RS07082020b_08182020_frameLabeled.mp4")
clip = clip.subclip(frame_st/fps,frame_ed/fps).resize(height=360)
clip.ipython_display()

___
# A tool for cleaning DeepLabCut data:
### Create a DataCleaner Object and import DLC data:

In [None]:
from DataCleaner import *
D = DataCleaner('RS07082020b_08182020.csv')

**This creates a pandas dataframe for x,y coordinates and confidence ratings based on your DeepLabCut results**

In [None]:
D.x.head()

**I also included handy access to frame-by-frame displacements:**

*(Number of pixels that each label moves per frame*)

In [None]:
D.disp.head()

**We can take advantage of functionality built into pandas**

For example, the describe() method gives quick overview of stats (front of mouse is faster than back which makes sense)

In [None]:
# Displacement statistics: 
D.disp.describe()

### Plot of mislabeled frames from video example:

In [None]:
# Show x and y plots for the clip where the jump exists
import matplotlib.pyplot as plt
fig,axes = plt.subplots(1,2,figsize=(10,5))
axes[0].plot(D.x['pawFL'][frame_st:frame_ed],'r.-')
axes[0].set_title('PawFL: X-coord')
axes[0].set_xlabel('Frame Number')
axes[0].set_ylabel('Pixel Number')
axes[1].plot(D.y['pawFL'][frame_st:frame_ed],'r.-')
axes[1].set_title('PawFL: Y-coord')
axes[1].set_xlabel('Frame Number')
plt.tight_layout()

## Exploring the dataset:
### *Displacement plot for the entire session:*

In [None]:
# Show displacement heat map:
import seaborn as sns
yticks = [x for x in range(0,D.disp.shape[0],20000)]
plt.figure(figsize=(7, 5))
ax = sns.heatmap(D.disp,vmin=0,vmax=50,xticklabels = D.body_parts,yticklabels = yticks)
ax.set_yticks(yticks)
ax.set_title('Displacement')
ax.set_ylabel('Frame Number')
plt.show()

### Low confidence frames (based on DeepLabCut results):

In [None]:
# Histogram of likelihoods for each bodypart: 
fig,axes = plt.subplots(1,1,figsize=(8,4))
sns.distplot(D.conf.iloc[:,2],kde=False,bins=20,color='red')
axes.set_ylabel('')
axes.set_title('Histogram of Confidence Rating for pawFL')
axes.set_ylim(0,50000)
axes.set(yticklabels=[])

___
## Built-in methods to clean the data:
### Remove low confidence frames

In [None]:
D.remove_low_likelihood(.1) # Remove any label with confidence < 10%

In [None]:
# display removed frames: (don't show scale)
yticks = [x for x in range(0,D.disp.shape[0],20000)]
plt.figure(figsize=(7, 5))
ax = sns.heatmap(D.x.isnull(),xticklabels = D.body_parts,yticklabels = yticks, cbar=False)
ax.set_yticks(yticks)
ax.set_title('Display Removed Frames')
ax.set_ylabel('Frame Number')
plt.show()

### Remove frames where large jumps are detected:

In [None]:
D.remove_jumps(40) # Remove any jumps creater than 40 pixels

In [None]:
# Compare results:
D_old = DataCleaner('RS07082020b_08182020.csv')
yticks = [x for x in range(0,D.disp.shape[0],20000)]
fig,axes = plt.subplots(1,2,figsize=(10,5))
sns.heatmap(D_old.disp,vmin=0,vmax=50,xticklabels=D.body_parts,yticklabels=yticks, ax=axes[0])
sns.heatmap(D.disp,vmin=0,vmax=50,xticklabels=D.body_parts,yticklabels=yticks, ax=axes[1])
axes[0].set_yticks(yticks)
axes[0].set_title('Original Version')
axes[0].set_ylabel('Frame Number')
axes[1].set_yticks(yticks)
axes[1].set_title('Cleaned Version')
plt.tight_layout()

### Interpolate missing values:

In [None]:
D.interpolate()

### Display cleaned example data:

In [None]:
fig,axes = plt.subplots(1,2,figsize=(10,5))
axes[0].plot(D_old.x['pawFL'][frame_st:frame_ed],'r.',markersize=5)
axes[0].plot(D.x['pawFL'][frame_st:frame_ed],'b')
axes[0].set_title('PawFL: X-coord')
axes[0].set_ylabel('Pixel Number')
axes[0].set_xlabel('Frame Number')
axes[1].plot(D_old.y['pawFL'][frame_st:frame_ed],'r.',markersize=5)
axes[1].plot(D.y['pawFL'][frame_st:frame_ed],'b')
axes[1].set_title('PawFL: Y-coord')
axes[1].set_xlabel('Frame Number')
axes[1].legend(['Original','Corrected'],bbox_to_anchor=(1.05, 1))
plt.tight_layout()


### Write to a new .csv file in the orignal DLC format:


In [None]:
D.write_csv('RS07082020b_08182020_cleaned.csv')

### Another useful tool: removing frames where labels get swapped:

In [None]:
frame_st = 1200
frame_ed = 1500
from DataCleaner import *
import matplotlib.pyplot as plt
D = DataCleaner('RS07082020b_08182020.csv')
D.remove_low_likelihood(.1) # Remove any label with confidence < 10%
D.remove_body_swaps() # Remove frames that jump close to other body parts
D.interpolate()

# plot results:
D_old = DataCleaner('RS07082020b_08182020.csv')
fig,axes = plt.subplots(1,2,figsize=(10,5))
axes[0].plot(D_old.x['pawFL'][frame_st:frame_ed],'r.',markersize=5)
axes[0].plot(D.x['pawFL'][frame_st:frame_ed],'b')
axes[0].set_title('PawFL: X-coord')
axes[0].set_ylabel('Pixel Number')
axes[0].set_xlabel('Frame Number')
axes[1].plot(D_old.y['pawFL'][frame_st:frame_ed],'r.',markersize=5)
axes[1].plot(D.y['pawFL'][frame_st:frame_ed],'b')
axes[1].set_title('PawFL: Y-coord')
axes[1].set_xlabel('Frame Number')
axes[1].legend(['Original','Corrected'],bbox_to_anchor=(1.05, 1))
plt.tight_layout()



## Cleaning methods are still a work in progress:
- ### This is just a first-pass attempt
    - It still doesn't handle cases very well where large chunks of contiguous frames are mislabeled
- ### There are probably unique problems related to each experiment
    - Occlusion (position of camera)
    - quality of video (lighting, focus, exposure)
    - difficult to label body parts

- ### Please try out some of these tools! That is the only way they will improve

___
# Adding a label for grooming bouts:

In [None]:
# Include label column
df_labels = pd.read_csv('grooming_labels.csv')
labels = np.zeros([D.disp.shape[0],1])
for bout in range(df_labels.shape[0]):
    labels[df_labels['Start'][bout]:df_labels['End'][bout]] = 1e6
D.disp['grooming'] = labels


# Make plot:
yticks = [x for x in range(0,D.disp.shape[0],20000)]
fig,axes = plt.subplots(1,1,figsize=(8,4))
sns.heatmap(D.disp,vmin=0,vmax=50,xticklabels=D.disp.columns,yticklabels=yticks)
axes.set_yticks(yticks)
axes.set_title('Displacement with Grooming Bouts')
axes.set_ylabel('Frame Number')



___
> # STEP 2: Create a set of training images

<img src="lab_meeting/test.png" width="300">


- ### Lots of decisions need to be made here for optimization: 
    - Length of each video segment (depends on timescale of behaviors we care about) (2s used here, 1s sliding window)
    - How to include velocity information (color coded here)
    - How large of a training set do we need? 
        - Do we need data augmentation? What is the best way to do that? 
    - Need careful testing of parameter space to fine tune the model
___


> # STEP 3: Train Neural Network (Cloud computing with google colab)