# Entry to Fall SCML Competition

In [1]:
# loading/normalizing data to start with
import numpy as np
import h5py

datafile = h5py.File('SCNeuronModelCompetition.mat')
movie = datafile.get('trainingmovie_mini') # training movie
movie = (movie - np.mean(movie)) / np.std(movie) # normalize
frhist = datafile.get('FRhist_tr') # firing rate histograms

### Model

The following simple model is the best I was able to find for this dataset. It achieves a loss of `.3423` vs the template model's `.3632`. I was able to sometimes get a slightly (<`.0004`) higher accuracy using one LSTM layer, but it didn't seem worth it due to the quadrupled parameter count per node. The second hidden layer decreased loss by only about `.0005`.

In [2]:
import keras
from keras.layers import TimeDistributed, Dense

model = keras.models.Sequential()

model.add(TimeDistributed(Dense(65, activation='sigmoid'), input_shape=movie.shape[1:]))
model.add(TimeDistributed(Dense(65, activation='sigmoid')))
model.add(TimeDistributed(Dense(frhist.shape[2], activation='softplus')))

model.summary()
model.compile(optimizer=keras.optimizers.Adam(lr=0.001, decay=1e-7), loss='poisson')

early_stopping = keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)
history = model.fit(movie, frhist, epochs=200, batch_size=32, validation_split=0.2, shuffle=True, callbacks=[early_stopping])

Using TensorFlow backend.


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
time_distributed_1 (TimeDist (None, 150, 65)           798785    
_________________________________________________________________
time_distributed_2 (TimeDist (None, 150, 65)           4290      
_________________________________________________________________
time_distributed_3 (TimeDist (None, 150, 54)           3564      
Total params: 806,639
Trainable params: 806,639
Non-trainable params: 0
_________________________________________________________________
Train on 230 samples, validate on 58 samples
Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200


### Thoughts

There are a few potential explanations as to why more complicated structures do not improve accuracy. There might just not be enough data to make recurrent layers like LSTM useful, or there could truly be little time dependency in the dataset. Using conv/pool layers might destroy data useful for particular neurons as a result of the pooling operation. Given how shallow my best ANN model ended up being, other methods like SVM might be better suited to the problem.

Thanks Dr. Ito for an interesting competition!