## Stress Determinator

### Bi-directional Long-Short-Term-Memory

#### Implemented by Pytorch

This is the tutorial notebook for how to use the stress determinator, prepared by **DMaS** and **Douglas Research Center**. 

**Data used:**

All the data sourced from Dr. Wong's mouse neuron experiments in Douglas Research Center with two main categories used in our model training experiments:
* Bullying mouse in the enclosure


<img src="https://github.com/Adriandliu/Neural-Decoding-Project/blob/master/img/one_free.PNG?raw=true" width="200"/>

* Bullying and defeated mice are both free to move in the cage

<img src="https://github.com/Adriandliu/Neural-Decoding-Project/blob/master/img/two_free.PNG?raw=true" width="150"/>


Data are mainly backed up and stored in Drive H:/Donghan's Project Data Backup/.

**Important:**

In order to make this experiment reproducible, please make sure the following data are available for input:
* Neuron activity data, primarily extracted from [CNMF-E](https://github.com/zhoupc/CNMF_E), available in CNMF-E folder
* Mouse behavioral data, primarily extracted and labeled from [DeepLabCut](https://github.com/AlexEMG/DeepLabCut), available in DeepLabCut folder
* Timestamp file that automatically generated by the camera and its application is file for aligning behavioral camera and neuron camera, available in the index format of mouse experiments date and time (./Raw data).

#### Packages install

In [None]:
import cv2
import torch
import numpy as np
import pandas as pd
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from torch.utils.data import Dataset, TensorDataset
from sklearn.model_selection import train_test_split
from preprocessor import distOneF,distTwoF,fourPointTransform,locCoordConvert,ptsCoordConvert,align

## Data read and preparation

### Overall procedure:

1. **Align** the frames between behavioral data and neuron data by timestamp file
1. **Determine** the four corner coordinate points
2. **Transform** the defeated behavioral data from pixel-based to centimeter-based
3. **Transform** bullying mouse behavioral data
4. Calculate the two mice real **distance**
5. **Classify** the distance into two groups: 
    * interacted with bullying mouse: <10/15 cm
    * no interaction: >10/15 cm


### Some term explanations:

* **gap_time**: In the raw video, there might be a gap time that the mouse is yet in the cage, so excludes this part helps the experimental veracity. Two ways to determine the gap time:
    * CNMF-E folder may contain the gap information, such as 1056_SI_A_Substack (240-9603), means after 240 frames, the mouse then shows up in the cage
    * Manually check the video one by one
* **timestamp**: The file seperator is '\t' and must exclude the first row (header) and rename them in ["camNum","frameNum","sysClock","buffer"]
* **video coordinates**: In the raw video, upper-left is (0,0), horizontal is X-axis, and vertical is Y-axis. While looking for the four corner points, screenshot one frame from the raw video, then import to image application, say photoshop/online RGB checker, to check its coordinates. (automation could be possible, but take time). Here are some corner points references for the experiment video:
    * Two free-moving mice scenario: np.array([(40,60),(213,62), (205,405),(42,405)], np.float32)
    * One free-moving mouse scenario: np.array([(85,100),(85,450), (425,440), (420,105)], np.float32)
    
    In addition, the actual size of the cage varies:
       * Two free-moving mice: 22 * 44
       * One free-moving mouse: 44 * 44
       
    but it all depends on which scenario the experiment chose

* **bullying mouse position**: 
    * For one free-moving scenario, the bullying mouse is fixed inside the enclosure, its position is therefore fixed as well. In such case, we will use the central point of the enclosure as the bullying mouse position 
    * For two free-moving scenario, the bulling mouse position is read from DeepLabCut behaviral data. Therefore the distance calculation would be different
    
* **one-hot encoding**: Convert numerical labelled distance to one-hot format 

* **neuron_A/B**: 
    * neuron_A: no bullying mouse
    * neuron_B: bullying mouse presents


In [None]:
gap_time_A = 240
gap_time_B = 150

dlc_A = pd.read_csv("//DMAS-WS2017-006/E/A RSync FungWongLabs/DLC_Data/1053 SI_A, Mar 22, 9 14 20/videos/\
1056 SI_A, Mar 22, 12 45 13DeepCut_resnet50_1053 SI_A, Mar 22, 9 14 20Jul31shuffle1_600000.h.csv", skiprows = 2).iloc[gap_time_A:,]
dlc_B = pd.read_csv("//DMAS-WS2017-006/E/A RSync FungWongLabs/DLC_Data/1053 SI_A, Mar 22, 9 14 20/videos/\
1056 SI_B, Mar 22, 12 52 59DeepCut_resnet50_1053 SI_A, Mar 22, 9 14 20Jul31shuffle1_600000.h.csv", skiprows = 2).iloc[gap_time_B:,]


neuron_A = pd.read_csv("//Dmas-ws2017-006/e/A RSync FungWongLabs/CNMF-E/1056/SI/1056_SI_A_Substack (240-9603)_source_extraction/frames_1_9364/LOGS_15-Sep_13_52_07/1056SI_A_240-9603.csv", header = None).T
neuron_B = pd.read_csv("//Dmas-ws2017-006/e/A RSync FungWongLabs/CNMF-E/1056/SI/1056_SI_B_source_extraction/frames_1_27256/LOGS_19-Apr_00_38_59/1056SI_B.csv", header = None).T.iloc[gap_time_B:,]
timestamp_A = pd.read_csv("//DMAS-WS2017-006/H/Donghan's Project Data Backup/Raw Data/Witnessing/female/Round 8/3_22_2019/H12_M45_S13/timestamp.dat", \
sep='\t', header = None, skiprows=1, names = ["camNum","frameNum","sysClock","buffer"])
timestamp_B = pd.read_csv("//DMAS-WS2017-006/H/Donghan's Project Data Backup/Raw Data/Witnessing/female/Round 8/3_22_2019/H12_M52_S59/timestamp.dat", \
sep='\t', header = None, skiprows=1, names = ["camNum","frameNum","sysClock","buffer"])
timestamp_A = timestamp_A[timestamp_A["frameNum"]>=gap_time_A]
timestamp_B = timestamp_B[timestamp_B["frameNum"]>=gap_time_B]

# f =  open("1056SIA_test_0.0001_0.3_lstm_10_pytorch.txt",'w+')


# IF ONE FREE-MOVING MOUSE
msCam, behavCam = align(neuron_A, dlc_A, timestamp_A, gap_time_A)      # alignment[0] == aligned neurons_1053B; alignment[1] == aligned dlc_1053B
pts = np.array([(85,100),(85,450), (425,440), (420,105)], np.float32)   # four corner points
newLoc = locCoordConvert(behavCam,pts,44,44)                            # convert to new location data with new dimension
referPt = ptsCoordConvert(pts, [400,270], 44, 44)[0]                    # convert bullying mouse location with new dimension
dist = distOneF(newLoc, referPt)                                        # calculate distance between bullying and defeated mouse
labeled = [1 if i < 10 else 0 for i in dist]                            # if dist < 15, label 1 (has interaction), else 0 (no interaction)


# IF TWO FREE-MOVING MICE
msCam, behavCam = align(neuron_A, dlc_A, timestamp_A, gap_time_A)      # alignment[0] == aligned neurons_1053B; alignment[1] == aligned dlc_1053B
pts = np.array([(40,60),(213,62), (205,405),(42,405)], np.float32)   # four corner points
newLoc = locCoordConvert(behavCam,pts,22,44)                            # convert to new location data with new dimension
# referPt = ptsCoordConvert(pts, [400,270], 44, 44)[0]                    # convert bullying mouse location with new dimension
dist = distTwoF(newLoc, "head")                                        # calculate distance between bullying and defeated mouse
labeled = [1 if i < 15 else 0 for i in dist]                            # if dist < 15, label 1 (has interaction), else 0 (no interaction)




data = pd.concat([msCam, pd.DataFrame(labeled)], axis=1).dropna(axis = 0)
data.columns = list(range(1,len(msCam.columns)+2))                      # avoid duplicate column name
data = data.rename(columns={len(msCam.columns)+1:"interaction"})

# One hot encoding
one_hot = pd.get_dummies(data['interaction'])
one_hot.columns = ["interaction.a", "interaction.b"]
data = data.drop("interaction", axis = 1).join(one_hot)

frac = 0.3
x_train, x_test, y_train, y_test = \
        train_test_split(data[list(range(1,len(data.columns)-1))], data[["interaction.a", "interaction.b"]], test_size=frac, random_state=0)


x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.2, random_state=0)
x_train = x_train.drop(1, axis = 1)

### Data preparation for DL model

Initialize model hyper-parameters and transform to torch data type

In [None]:
sequence_length = 1
input_size = len(x_train.columns)
hidden_size = len(x_train.columns)
num_layers = 1
num_classes = 2
batch_size = 1
num_epochs = 1
dropout = 0.3
learning_rate = 0.003
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')


x_train_tensor = torch.from_numpy(np.array(x_train)).float().to(device)
y_train_tensor = torch.from_numpy(np.array(y_train)).long().to(device)
train = TensorDataset(x_train_tensor, y_train_tensor)
train_loader = torch.utils.data.DataLoader(dataset=train,batch_size=batch_size, shuffle=True)
total_step = len(train_loader)

x_test_tensor = torch.from_numpy(np.array(x_test)).float()
y_test_tensor = torch.from_numpy(np.array(y_test)).float()
test = TensorDataset(x_test_tensor, y_test_tensor)
test_loader = torch.utils.data.DataLoader(dataset=test,batch_size=batch_size, shuffle=True)

def prepare_sequence(seq):
    return torch.tensor(seq, dtype=torch.float)

### Bi-directional RNN  -  LSTM/GRU

In [None]:
def RNN(model, input_size, hidden_size, num_layers, dropout = 0.3, batch_first=True, bidirectional=True):
    if model == 'GRU':
        return nn.GRU(input_size, hidden_size, num_layers, dropout = dropout, batch_first=True, bidirectional=True)
    else:
        return nn.LSTM(input_size, hidden_size, num_layers,dropout = dropout, batch_first=True, bidirectional=True)

In [None]:
class BiRNN(nn.Module):
    def __init__(self, model, input_size, hidden_size, num_layers, num_classes, dropout):
        super(BiRNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.model = model
        self.lstm = RNN(self.model, input_size, hidden_size, num_layers, dropout = 0.3, batch_first=True, bidirectional=True)
        self.fc = nn.Linear(hidden_size*2, num_classes)  # 2 for bidirection
        
    def forward(self, x):
        # Set initial states
        h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device) # 2 for bidirection
        c0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)

        # Forward propagate LSTM
        out, _ = self.lstm(x, (h0, c0))  # out: tensor of shape (batch_size, seq_length, hidden_size*2)

        # Decode the hidden state of the last time step
        out = self.fc(out[:, -1, :])
        return out


### Define Loss and Optimizer functions

* **Cross-Entropy**: Loss for classification
* **Adam**: Efficient and outperforms many others

In [None]:
model = BiRNN('LSTM', input_size, hidden_size, num_layers, num_classes, dropout).to(device)
# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

### Network training

In [None]:
# Train the network
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = prepare_sequence(images).reshape(-1, sequence_length, input_size).to(device)
        labels = prepare_sequence(labels).long().to(device)

        # Forward pass
        outputs, hidden = model(images)
        loss = criterion(outputs, torch.max(labels, 1)[1])

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (i+1) % 1000 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
                   .format(epoch+1, num_epochs, i+1, total_step, loss.item()))