# Code for data loading

In [0]:
!wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1Ki6Dc6bS9YjueO7_YR_CH4b_eMHQuCyf' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1Ki6Dc6bS9YjueO7_YR_CH4b_eMHQuCyf" -O x_raw_small.npz && rm -rf /tmp/cookies.txt
!wget --no-check-certificate 'https://drive.google.com/uc?export=download&id=1w0SSWeZoMP1r21Xznm2mx9NorTEisnPq' -O small_y_label.npz
!wget --no-check-certificate 'https://drive.google.com/uc?export=download&id=17NujAC2E68NbuqxBcUaGklEE5z0M_Ukp' -O small_pid.npz

# **Classifying Time Series Motion data using Deep Learning**

In this exercise we are going to apply some Deep Learning techniques to the Capture24 wearable data set. Unlike classical Machine Learning techniques in which we engineer features with which to train our model, we are going to feed the raw sensor data in and let the network learn its own features.

First of all, we need to declare the various Numpy and Keras libraries that we'll be using in this exercise.

In [0]:
import os
import numpy as np          
import matplotlib.pyplot as plt  
import random
import io
from datetime import datetime
from sklearn import preprocessing
import keras
from keras.layers import LSTM, Dense, Input, Dropout, Convolution2D, Flatten, Activation, BatchNormalization, Convolution1D, MaxPool2D, MaxPool1D, ConvLSTM2D
from keras.models import Model, Sequential

# configure notebook to display plots
%matplotlib inline

print('Imports done')

Next, we need to define a function that will allow us to look at the results in a confusion matrix format.

In [0]:
def show_confusion_matrix(predictions, truth):
  c_matrix = np.zeros((5,5), dtype=np.int)
  for i, pred in enumerate(predictions):
    row = np.argmax(pred)
    col = np.argmax(truth[i])
    if i==0:
      print(row, col)
    c_matrix[row, col] +=1
  print('Predictions - rows, Truth - columns')
  print('sleep, sedentary, tasks-light, walking, moderate')
  print(c_matrix)

print('show_confusion_matrix() created.')


Let's load some data from the UK biobank wearable activity dataset. For this exercise we are using a subset of the data - to keep the training times shorter. 

In [0]:
# load the raw data from a numpy exported file
x_raw = np.load('x_raw_small.npz',allow_pickle=True)
x_raw = x_raw['arr_0']
# load the activity labels that relate to this data from
y = np.load('small_y_label.npz',allow_pickle=True)
y = y['arr_0']

# load the activity labels that relate to this data from
pid = np.load('small_pid.npz',allow_pickle=True)
pid = pid['arr_0']

#Check the data looks okay
print('y shape:', y.shape)
print('x_raw shape:', x_raw.shape)

So, you can see that we have 11016 labelled time steps.
Each time step is 30 seconds long, with data collected for x, y and z axes 100 times per second, giving the (11016, 3, 3000) shape.

Let's take a look at some of the data for a random 30 second time step (run the cell again for a different result)



In [0]:
# Provide a list of activity names that relate to each of the numeric activity labels
class_names = ['sleep', 'sedentary', 'tasks-light', 'walking', 'moderate']
ix = random.randint(0,100)
fig, axs = plt.subplots(5, sharex=True, sharey=True, figsize=(5,5))
for i in range(len(class_names)):
    axs[i].plot(x_raw[y == i][ix].T)
    axs[i].set_title(class_names[i])
fig.tight_layout()
fig.show()

So, we can see that, visually, sleep is quite easy to recognise, whereas the distinction between some of the other classes is a little more subtle.

Next, let's look at the distribution of the activities within our data. Can you add one or more plots that show the same information for each of the five unique PIDs in the detaset?

In [0]:
# Plot a bar chart of each activity from all entries 
fig, ax = plt.subplots()
x = np.arange(5)
plt.bar(x, sum(keras.utils.to_categorical(y)))
plt.xticks(x, class_names)
plt.show()

# TODO Add further plot(s) that show the breakdown of activities for each participant



Okay, so we can see that we have a lot more 'sleep' and 'sedentary' data than we do the other categories. This is something that you need to consider when training a model (Try searching for 'class imbalance', if you are not sure)

Before we start training, we need to split our data into training and validation sets. In this case we are using 3 participants for training and the other 2 for validation. The data is in participant order so we can split it at the transition between person 3 and person 4, which is up to record number 7711 of our 11016. 

In [0]:
#Optional - shuffle the data prior to splitting
rng = np.random.random.__self__
indexes = np.array(range(11016), dtype=int)
rng.shuffle(indexes)
y = y[indexes]
x_raw = x_raw[indexes]

# Split the y labels data between train and validation
testy = y[-3305:]
trainy = y[:7711]

# Do the same for the sensor data
testx = x_raw[-3305:]
trainx = x_raw[:7711]

print(trainy.shape, testy.shape, trainx.shape, testx.shape)

The next thing that we need to do is to get our labels into a format suitable for Deep Learning training. At the moment the y labels are values ranging from 0 to 4, representing the five activity classes. Instead, they need to be in the One-Hot-Encoded format e.g. [1,0,0,0,0] = 0 and [0,0,0,1,0] = 4. This format matches the output of the network, which is making a prediction for each of the classes and enables the DL framework (Tensorflow/Keras) to calculate the loss for us (Hint - Keras has a function for this :)).

In [0]:
# TODO - one hot encode the class values
trainy = ....
testy = ....

print(trainy.shape, testy.shape)

Lets take a look at one of the thirty second time steps, from a random interval. You can see the three accelerometer plots over the 30 second period, along with the activity label (run again for a different time step)

In [0]:
ix = random.randint(0,1000)
plt.plot(trainx[200,0,:], label='x')
plt.plot(trainx[200,1,:], label='y')
plt.plot(trainx[200,2,:], label='z')
plt.legend()
plt.show()
class_names[y[200]]

Now that we have the data in a format that we can ingest, it's time to try perhaps the most simple Deep Learning approach: A Multi-Layer Perceptron.
This type of neural network is very flexible but is not ideal for all tasks. Compared to other approaches, especially for image-type inputs, it is quite memory hungry because every neuron is attached to every other neuron in the subsequent layer (i.e. it is Fully-Connected). However, its simplicity and flexibility make it a great starting point.

First, let's create a function that will train any model using the test and training data that we pass as parameters and display the results.

In [0]:
def train_model(model, inputs, test_inputs, epochs=100, class_weights=None):
  # train the model
  history = model.fit(inputs, trainy, epochs=epochs, batch_size=256, validation_data=(test_inputs, testy), verbose=2, shuffle=False, class_weight=class_weights)

  # validate the model on unseeen data
  accuracy = model.evaluate(test_inputs, testy, batch_size=256, verbose=0)

  # plot history
  plt.plot(history.history['acc'], label='train')
  plt.plot(history.history['val_acc'], label='test')
  plt.legend()
  plt.show()

  show_confusion_matrix(model.predict(test_inputs), testy)

print('train_model() created.')

For our model, we're using Keras becasuse it only requires a few lines of code to create a network. This will use Tensorflow as the DL platform backend.

Note that we need to reshape the input data to a 1 dimensional array for this type of network.

In [0]:
# create a network
model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(9000,)))
model.add(Dense(5, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])

# TODO reshape the data to match the network input dimensions
inputx = trainx.reshape(...
test_inputx = testx.reshape(...

# Now run the training
train_model(model, inputx, test_inputx)

After 100 epochs, you should end up with a validation accuracy of somewhere around 50% which is certainly better than chance, but there is a lot of room for improvement. Also, if you run it again note that you may get quite different results.

Firstly, how do the plots for training accuracy and validation accuracy differ?
Try running the training for 200 epochs by modifying the code below and see whether the validation and training accuracies stabilise. 
Is there any noticable difference between the training and validation accuracies? Why might this be? 

When we train with small datasets there is always the risk that the network will overfit the data. It learns to associate each input with its label rather than learning the essence of the task. This means that it tends to perform significantly worse on unseen data. In other words, it does not *generalise* well. To combat this, we can try including a dropout layer. 

A dropout layer randomly switches certain neurons off, which makes the network less reliant on specific features and therefore less likely to overfit. Note that the dropout rate affects the probability than a neuron will be switched off, so a value closer to 1 will mean more fewer neurons are switched off and, closer, to zero more are left on. Try adjusting this setting to see what happens.

Finally, what happens if you add more Dense layers? Experiment with different layers and dropout settings and see what sort of accuracy you can acheive. You should be able to get around 70%

In [0]:
# create a network
model = Sequential()
model.add(Dense(1024, activation='relu', input_shape=(9000,)))
# TODO Add a dropout layer...
model.add(Dense(5, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])

# reshape the data to match the network input dimensions
inputx = trainx.reshape((trainx.shape[0],9000))
test_inputx = testx.reshape((testx.shape[0],9000))

# Now run the training
train_model(model, inputx, test_inputx, epochs=100)

One thing that you may have noticed is that the network has a tendency not to  predict the 'tasks-light' activity - or in fact 'walking' and 'moderate'. This is very likely because our training data does not contain as many examples for these categories and so the network has learned that a prediction in these categories is much less likely. This a form of bias that it is usually a good idea to avoid. To redress this situation we have two options: 
We can supplement the dataset with more examples from the less frequent classes or;
We can weight the samples according to their frequency, so that the less frequent examples are given more significance. 
The latter is what we will do in this case. Happily, Keras has a training parameter that we can use named, unsuprisingly, 'class_weight=...', which takes a dictionary in the form: {0 : 4.0, 1 : 3.0, 2 : 20.0, 3 : 12.5, 4 : 1.0} 

In [0]:
# create function to set the weights according to the distribution of the classes
def set_weights():
  # TODO - create a weighting function that redresses the class imbalance
  # It needs to return a dictionary e.g. {0: 0.1, 1: 0.25, 3: 1.0 ...}
  weights = ...
  

  return weights

# create a network
model = Sequential()
model.add(Dense(1024, activation='relu', input_shape=(9000,)))
model.add(Dropout(0.6))
model.add(Dense(5, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])

# reshape the data to match the network input dimensions
inputx = trainx.reshape((trainx.shape[0],9000))
test_inputx = testx.reshape((testx.shape[0],9000))

# Now retain
train_model(model, inputx, test_inputx, epochs=100, class_weights=set_weights())

You should observe that the model is now more likely to pick from the rarer classes than before. Feel free to spend more time tweaking your model if you like.

# Recurrent Neural Networks
So, the MLP got us so far, but there are better techniques for this type of data. LSTMs (Long Short Term Memory) are a type of recurrent neural network (RNN) that are able to learn important associations between data points separated by longer time periods than other RNNs. Let's see how they perform on this task.

In [0]:
# design network
model = keras.Sequential()
model.add(LSTM(500, input_shape=(trainx.shape[1], trainx.shape[2])))
model.add(Dense(5, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])

# TODO you will need to reshape the inputs for this model
# reshape the data to match the network input dimensions
inputx = trainx.reshape(....
test_inputx = testx.reshape(....

# Train the model
train_model(model, inputx, test_inputx, epochs=100, class_weights=set_weights())

So we have a slight improvement over the MLP approach although nothing ground-breaking. This is most likely because, although this is time series data, there are few longer-term dependencies in the data. Also we are not doing LSTMs are good at predicting the next values in a sequence and this is actually a classification task. Given that we can easily ingest each time step of 9000 data-points we can actually formulate this task as an image recognition problem. To do this we can make the time axis a spatial axis and then make the three channels into another axis. Alternatively we can make it a 1D image, with pixel intensities as the sensor values and have three channels as input. The latter is the most intuitive approach. In either case, we can then use convolutional kernels rather than fully connected layers to learn a set of features. We can still use a dense layer with softmax outputs for the classification.

In [0]:
# design network
model = Sequential()
model.add(Convolution1D(32, kernel_size=(5), activation='relu', input_shape=(3000,3) ))
# TODO Add some layers...
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])

# reshape the data
inputx = trainx.reshape((trainx.shape[0],3000,3))
test_inputx = testx.reshape((testx.shape[0],3000,3))

# Train the model
train_model(model, inputx, test_inputx, epochs=100, class_weights=set_weights())

So, you should be seeing accuracy well into the 80s% using this convolutional approach. Try experimenting with the number of layers, filter size and adding pooling layers and see how accurate you can get it.

# ConvLSTM

To complete our journey through the approaches for this sort of data, we are finnaly going to try a Convolutional LSTM, which is very mach as it sounds - we use a convolutional kernel and then feed that into the LSTM. It's very simple to set up with Keras:

In [0]:
# design network
model = Sequential()
model.add(ConvLSTM2D(filters=32, kernel_size=(1,9), activation='relu', input_shape=(30,1,100,3) ))
# TODO Add some layers
model.add(Dense(5))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])

# TODO reshape the data
inputx = trainx.reshape(...
test_inputx = testx.reshape(...

# Train the model
train_model(model, inputx, test_inputx, epochs=150, class_weights=set_weights())

Hopefully this has given you some ideas on how to tackle time series data using Deep Learning. Keras has taken care of many of the implementation details for us, but you will find that other popular DL frameworks operate in quite similar ways.

You are encouraged to experiment with these techniques for the rest of the session. Can you you acheive better results? We know that engineered features can acheive 85-90% accuracy, so there is certainly scope for further improvement!

Thanks for your attention!
