# RNN with sensor data - Keras

Data consists of 15 subjects monitored on the same 7 activities with accelerometer data(x, y, z axes)

Training set uses the first 14 subjects, with the validation set composed of the 15th data.

The data is first smoothed out with a min-max scaler, followed by a onehot encoding of the 8 activities.
(0 1 0 0 0 0 0 0 would represent the 2nd activity)

Timesteps are defined as the amount of rows in a time slice. 

In [135]:
import os
import numpy as np
from sklearn.preprocessing import OneHotEncoder, MinMaxScaler
from keras.models import Sequential
from keras.layers.recurrent import SimpleRNN
from keras.layers import Dense, Dropout

subj_list = []
for f in os.listdir('data'):
    f = 'data\\' + f
    with open(f) as subj:
        subj_list.append(np.loadtxt(subj, delimiter=',', dtype=float)[:-1,1:])

data = np.vstack(subj_list[0:-1])
validation_data = subj_list[-1]
X = data[:,0:-1]
Y = data[:,-1]
X_val = validation_data[:,0:-1]
Y_val = validation_data[:,-1]
Y.shape,Y_val.shape = (-1,1),(-1,1)
minmax = MinMaxScaler()

X = minmax.fit_transform(X)
X_val = minmax.fit_transform(X_val)

timesteps = 40

onehot = OneHotEncoder(sparse=False)
Y = onehot.fit_transform(Y)
Y_val = onehot.fit_transform(Y_val)

For the training values(X_t), an organization of [slices, timesteps, features] was necessary. Y_t required a similar arrangement of [slices, timesteps, classifications]

Each slice was composed of 40 timesteps.
If one slice was composed of the 10:50 timesteps, the next slice would be composed of 11:51 timesteps.

In [181]:
%%time
X_t = np.empty((X.shape[0]-timesteps, timesteps, 3))
for x in range(X.shape[0]-timesteps):
    slic = X[x:x+timesteps,:]
    slic.shape = (1, timesteps, 3)
    X_t[x,:,:] = slic
    
X_val_t = np.empty((X_val.shape[0]-timesteps, timesteps, 3))
for x in range(X_val.shape[0]-timesteps):
    slic = X_val[x:x+timesteps,:]
    slic.shape = (1, timesteps, 3)
    X_val_t[x,:,:] = slic
    
Y_t = np.empty((X_t.shape[0], timesteps, 8))
for x in range(Y_t.shape[0]):
    Y_t[x] = Y[x:x+timesteps,:]

Y_val_t = np.empty((X_val_t.shape[0], timesteps, 8))
for x in range(Y_val_t.shape[0]):
    Y_val_t[x,:,:] = Y_val[x:x+timesteps,:]

Wall time: 8.72 s


The shape of the validation set of the Y classifications.

In [182]:
Y_val_t.shape

(166700L, 40L, 8L)

A time distributed wrapper is used to go around a Dense layer in order to have it work well with recurrent networks.

In [150]:
from keras.layers.wrappers import TimeDistributed

Red denotes the compilation of the GPU code.
Even with GPU, I believe it took several hours to train.(forgot to include a timing function)

In [166]:
model = Sequential()
model.add(SimpleRNN(output_dim=10, input_shape=(timesteps, 3), activation='sigmoid', return_sequences=True))
model.add(TimeDistributed(Dense(8, activation='softmax')))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
hist = model.fit(X_t,Y_t, batch_size=1, nb_epoch=1,  verbose=0)

DEBUG: nvcc STDOUT mod.cu
   Creating library C:/Users/dmley/AppData/Local/Theano/compiledir_Windows-10-10.0.10586-Intel64_Family_6_Model_42_Stepping_7_GenuineIntel-2.7.11-64/tmpnu8a06/98f730ac3a1ea7df4872c3ca17df8dc1.lib and object C:/Users/dmley/AppData/Local/Theano/compiledir_Windows-10-10.0.10586-Intel64_Family_6_Model_42_Stepping_7_GenuineIntel-2.7.11-64/tmpnu8a06/98f730ac3a1ea7df4872c3ca17df8dc1.exp

DEBUG: nvcc STDOUT mod.cu
   Creating library C:/Users/dmley/AppData/Local/Theano/compiledir_Windows-10-10.0.10586-Intel64_Family_6_Model_42_Stepping_7_GenuineIntel-2.7.11-64/tmpvcvzf_/7f77e66149e522cdd27a8c3c0ae96021.lib and object C:/Users/dmley/AppData/Local/Theano/compiledir_Windows-10-10.0.10586-Intel64_Family_6_Model_42_Stepping_7_GenuineIntel-2.7.11-64/tmpvcvzf_/7f77e66149e522cdd27a8c3c0ae96021.exp

DEBUG: nvcc STDOUT mod.cu
   Creating library C:/Users/dmley/AppData/Local/Theano/compiledir_Windows-10-10.0.10586-Intel64_Family_6_Model_42_Stepping_7_GenuineIntel-2.7.11-64/tmpvo

Checks the validation data manually - as I didn't want to burden the training algorithm even more than it already was.

The output activation was a softmax function, and so it seems to predict this timeslice accurately.

In [187]:
print(model.predict(X_val_t[20000,:,:].reshape(1,40,-1) , verbose=1))

Y_val_t[20000,:,:]






[[[  3.37347119e-05   3.50438535e-01   4.80841193e-03   1.36099264e-01
     1.95501328e-01   3.68134188e-03   7.74597656e-03   3.01691502e-01]
  [  6.40502330e-05   3.66281301e-01   4.90757311e-03   1.22001871e-01
     1.91140920e-01   3.50599992e-03   8.14373791e-03   3.03954542e-01]
  [  6.61160957e-05   3.67044419e-01   4.91179852e-03   1.21319458e-01
     1.90932199e-01   3.49661848e-03   8.16170033e-03   3.04067731e-01]
  [  6.55482145e-05   3.66835415e-01   4.91079967e-03   1.21502250e-01
     1.90986618e-01   3.49932327e-03   8.15705303e-03   3.04043025e-01]
  [  6.11062860e-05   3.65100324e-01   4.90011740e-03   1.22996934e-01
     1.91480592e-01   3.51909711e-03   8.11549835e-03   3.03826332e-01]
  [  6.43778330e-05   3.66361350e-01   4.90864553e-03   1.21894538e-01
     1.91136032e-01   3.50462622e-03   8.14651139e-03   3.03983867e-01]
  [  6.98216900e-05   3.68378937e-01   4.91943723e-03   1.20195195e-01
     1.90553382e-01   3.48159694e-03   8.19537789e-03   3.04206222e-01]

array([[ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0

In [178]:
print(hist.history)

{'loss': [1.6677824325436796]}


For future reference: Need to decrease training data by perhaps 95% in order to experiment with dropout, more layers, and including the validation set within the Keras algorithm.