# Supplementary 3: Deep Learning Model Architectures

This jupyter contains model architectures of all deep learning models fitted in the project. Complete scripts used to train the models can be found in the following git repository: GIT LINK

In [2]:
library(keras)

## 3.1 CNN on single images

I first used single frames obtained from Deep Meerkat and fitted a simple CNN as well as using VGG19 for transfer learning.The model architectures can be seen below.


### Simple CNN

In [7]:
CNN <- keras_model_sequential()

CNN %>% layer_conv_2d(filters=20, kernel_size=c(4,4), activation = 'relu',
                      input_shape=c(128,72,3), data_format="channels_last") %>%
  layer_max_pooling_2d(pool_size = c(3,3)) %>%
  layer_flatten() %>%
  layer_dense(units=64, activation = 'relu')%>%
  layer_dense(units=32, activation='relu')%>%
  layer_dropout(rate=0.2) %>%
  layer_dense(units=1, activation='sigmoid')%>%
  compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=c('accuracy')
  )

summary(CNN)

Model: "sequential_4"
________________________________________________________________________________
Layer (type)                        Output Shape                    Param #     
conv2d_1 (Conv2D)                   (None, 125, 69, 20)             980         
________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)      (None, 41, 23, 20)              0           
________________________________________________________________________________
flatten_2 (Flatten)                 (None, 18860)                   0           
________________________________________________________________________________
dense_5 (Dense)                     (None, 64)                      1207104     
________________________________________________________________________________
dense_6 (Dense)                     (None, 32)                      2080        
________________________________________________________________________________
dropou

### Transfer Learning using VGG19

In [8]:
VGG19 <- application_vgg19(include_top = FALSE, weights="imagenet", input_shape = c(128,72,3))
VGGModel <- keras_model_sequential()%>%
    VGG19 %>%
    layer_dropout(rate=0.5)%>%
    layer_flatten()%>%
    layer_dense(units=32, activation='relu')%>%
    layer_dropout(rate=0.2)%>%
    layer_dense(units=1, activation='sigmoid') %>%
    compile(
      optimizer=optimizer_adam(lr=1e-05),
      loss='binary_crossentropy',
      metrics=c('accuracy')
    )
summary(VGGModel)

Model: "sequential_5"
________________________________________________________________________________
Layer (type)                        Output Shape                    Param #     
vgg19 (Functional)                  (None, 4, 2, 512)               20024384    
________________________________________________________________________________
dropout_4 (Dropout)                 (None, 4, 2, 512)               0           
________________________________________________________________________________
flatten_3 (Flatten)                 (None, 4096)                    0           
________________________________________________________________________________
dense_8 (Dense)                     (None, 32)                      131104      
________________________________________________________________________________
dropout_5 (Dropout)                 (None, 32)                      0           
________________________________________________________________________________
dense_

## 3.2 RNN on stacked images

After training models on single images, I stacked 4 images at a time and used a recurrent layer to evaluate all the frames together. The same approach was used for both sex and behaviour classification.

### Conv-LSTM2D

In [10]:
RNN <- keras_model_sequential()

RNN %>% layer_conv_lstm_2d(filters=30, kernel_size = c(3,3), 
                           input_shape = c(4,3,120,72), 
                           data_format = "channels_first", return_sequences = FALSE,
                           activation="relu") %>%
  layer_dropout(rate=0.25)%>%
  layer_flatten() %>%
  layer_dense(units=64, activation = 'relu')%>%
  layer_dropout(rate=0.2)%>%
  layer_dense(units=32, activation='relu')%>%
  layer_dropout(rate=0.2) %>%
  layer_dense(units=1, activation='sigmoid')%>%
  compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=c('accuracy')
  )
summary(RNN)

Model: "sequential_7"
________________________________________________________________________________
Layer (type)                        Output Shape                    Param #     
conv_lst_m2d (ConvLSTM2D)           (None, 30, 118, 70)             35760       
________________________________________________________________________________
dropout_6 (Dropout)                 (None, 30, 118, 70)             0           
________________________________________________________________________________
flatten_4 (Flatten)                 (None, 247800)                  0           
________________________________________________________________________________
dense_10 (Dense)                    (None, 64)                      15859264    
________________________________________________________________________________
dropout_7 (Dropout)                 (None, 64)                      0           
________________________________________________________________________________
dense_

### Conv 3D

In [12]:
RNN <- keras_model_sequential()

RNN %>% layer_conv_3d(filters=50, kernel_size = c(3,3,3), 
                           input_shape = c(4,128,72,3), 
                           data_format = "channels_last",
                           activation="relu") %>%
  layer_max_pooling_3d(pool_size = c(2,2,2)) %>%
  layer_flatten()%>%
  layer_dropout(rate=0.25)%>%
  layer_dense(units=64, activation = 'relu')%>%
  layer_dropout(rate=0.2)%>%
  layer_dense(units=32, activation='relu')%>%
  layer_dropout(rate=0.2) %>%
  layer_dense(units=1, activation='sigmoid')%>%
  compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=c('accuracy')
  )
summary(RNN)

Model: "sequential_9"
________________________________________________________________________________
Layer (type)                        Output Shape                    Param #     
conv3d_1 (Conv3D)                   (None, 2, 126, 70, 50)          4100        
________________________________________________________________________________
max_pooling3d_1 (MaxPooling3D)      (None, 1, 63, 35, 50)           0           
________________________________________________________________________________
flatten_6 (Flatten)                 (None, 110250)                  0           
________________________________________________________________________________
dropout_12 (Dropout)                (None, 110250)                  0           
________________________________________________________________________________
dense_16 (Dense)                    (None, 64)                      7056064     
________________________________________________________________________________
dropou

## 3.3 LRCN Framework

Next, a long-term reccurent convolution network (LRCN) framework was used, by converting events into 7 second clips, and fitting time distributed CNNs and RNNS to evaluate them. The same approach was used for both sex and behaviour classification

In [33]:
LRCNModel <- keras_model_sequential()%>%
  time_distributed(VGG19, input_shape = c(36,128,72,3)) %>%
  time_distributed(layer_flatten(), input_shape = c(36,4,2,512))%>%
  layer_dropout(rate=0.2)%>%
  layer_lstm(units=64,input_shape = c(36,4096))%>%
  layer_dropout(rate=0.2)%>%
  layer_dense(units=64, activation='relu')%>%
  layer_dropout(rate=0.2)%>%
  layer_dense(units=32, activation='relu')%>%
  layer_dropout(rate=0.2)%>%
  layer_dense(units=1, activation='sigmoid')%>%
  compile(
    optimizer=optimizer_adam(lr=1e-05),
    loss='binary_crossentropy',
    metrics=c('accuracy')
  )

summary(LRCNModel)

Model: "sequential_17"
________________________________________________________________________________
Layer (type)                        Output Shape                    Param #     
time_distributed_9 (TimeDistributed (None, 36, 4, 2, 512)           20024384    
________________________________________________________________________________
time_distributed_10 (TimeDistribute (None, 36, 4096)                0           
________________________________________________________________________________
dropout_21 (Dropout)                (None, 36, 4096)                0           
________________________________________________________________________________
lstm_1 (LSTM)                       (None, 64)                      1065216     
________________________________________________________________________________
dropout_22 (Dropout)                (None, 64)                      0           
________________________________________________________________________________
dense