# Time series classification

Time series classification (TSC) operates on time series data, a series of values that is ordered by time. Data samples are labelled as belonging to a particular class. The TSC system is trained using this data to classify unlabelled samples. There is a wide range of TSC applications. Smartwatch data is used to classify human activities (walking, running, ascending stairs, etc.). Animal behaviour (hunting, sleeping) is monitored using accelerometers on tagged, wild animals for environmental studies. Sensors on industrial machines are used to classify time series samples as either normal or preceding a failure, informing machine maintenance schedules.

This exercise uses the SonyAIBORobotSurface1 dataset from the UEA & UCR Time Series Classification Repository (Dau et al, 2018). This dataset was collected by Vail and Veloso (2004), Carnegie Mellon University, from an accelerometer on a Sony AIBO robot. Their aim was to detect the surface that the robot was walking on in order to optimise its gait for that surface. The robots competed in the RoboCup League, a football game played on a carpeted field.

![The Sony AIBO Robot is a robot dog. It is pictured with a ball.](https://i1.wp.com/www.techdigest.tv/wp-content/uploads/2015/06/aibo-560.jpg "Sony AIBO Robot")

## References
Dau, H. A., Bagnall, A., Kamgar, K., Yeh, C.-C. M., Zhu, Y., Gharghabi, S., Ratanamahatana, C. A. and Keogh, E. (2018) ‘The UCR Time Series Archive’, [Online]. Available at http://arxiv.org/abs/1810.07758 (Accessed 4 May 2019).

Vail, D. and Veloso, M. (2004) ‘Learning from accelerometer data on a legged robot’, *IFAC Proceedings*, vol. 37, no. 8, pp. 822–827 [Online]. Available at https://www.cs.cmu.edu/~mmv/papers/04iav-doug.pdf (Accessed 4 May 2019).



 


# Load Python packages
Import the Python packages that we will need.

In [None]:
from pathlib import Path
import time

import numpy as np
import pandas as pd
import sklearn
from sklearn.model_selection import train_test_split, RepeatedStratifiedKFold
import tensorflow as tf
import tensorflow.keras as keras
import matplotlib.pyplot as plt
import seaborn as sns

from tensorflow.keras.layers import Input, Dense, Activation, Dropout
from tensorflow.keras.models import Model

# General settings
sns.set_style('whitegrid')

# User settings

In [None]:
load_from_web = True

# Load the data
The robot data provided is the x-axis accelerometer data sampled at 125Hz (125 times per second). A positive value relates acceleration in the forward direction. Each data sample has 70 data points (0.56s) and is labelled as either cement or carpet. The original data had a positive mean, because the robot leans forwards slightly, and was in the range approximately [0, 0.4] gravities. The dataset provided has been normalised.

The machine learning approach that Vail and Veloso took was to take a one second window and extract statistical features from all three accelerometer axes. Six features were calculated – variance in acceleration and correlation between the accelerations. A decision tree was used for learning. The paper reports on three classes – walking on cement, carpet in their laboratory and carpet on the football field. The overall classification accuracy was 84.9%.

The dataset has been split into two, balanced, datasets. One for model development and one for our final test to evaluate the finished model.

In [None]:
if load_from_web:
    url = 'https://raw.githubusercontent.com/Withington/deepscent/master/data/SonyAIBORobotSurface1_IoC/SonyAIBORobotSurface1_IoC_DEV.txt'
    robot_df = pd.read_csv(url, sep='\t', header=None)
    print('Loaded from', url)
    robot_data = robot_df.values
else:
    data_dir = '../../data'
    data_name = 'SonyAIBORobotSurface1_IoC'
    data_filename = data_dir+'/'+data_name+'/'+data_name+'_DEV.txt'
    robot_data = np.loadtxt(Path(data_filename))
    print('Loaded from', data_filename)
print('The shape of robot_data is', robot_data.shape)
print('robot_data:', robot_data)

Extract the labels, y, and the data samples, x. For convenience we will use labels class 0 and 1 instead of classes 1 and 2. 

class 0 : cement

class 1 : carpet

In [None]:
y_dev = robot_data[:,0]
x_dev = robot_data[:,1:]
print('The shape of x_dev is', x_dev.shape)
print('The shape of y_dev is', y_dev.shape)

# Change from classes 1 and 2 to classes 0 and 1
y_dev = (y_dev - y_dev.min())/(y_dev.max()-y_dev.min())

## Plot the data

In [None]:
sample_number = 3 ### CHANGE PARAMETER HERE ###
plt.plot(x_dev[sample_number], label='category'+str(y_dev[sample_number]))
plt.legend(loc='upper right', frameon=False)

In [None]:
sample_a = 0 ### CHANGE PARAMETER HERE ###
sample_b = 3 ### CHANGE PARAMETER HERE ###
plt.plot(x_dev[sample_a], label='category'+str(y_dev[sample_a]))
plt.plot(x_dev[sample_b], label='category'+str(y_dev[sample_b]))
plt.legend(loc='upper right', frameon=False)

In [None]:
plt.plot(x_dev[3], label='category'+str(y_dev[3]))
plt.plot(x_dev[7], label='category'+str(y_dev[7]))
plt.plot(x_dev[8], label='category'+str(y_dev[8]))
plt.plot(x_dev[10], label='category'+str(y_dev[10]))
plt.plot(x_dev[11], label='category'+str(y_dev[11]))
plt.legend(loc='upper right', frameon=False)
plt.ylim([-3.5, 3.5])
plt.title('Walking on cement')

In [None]:
plt.plot(x_dev[0], label='category'+str(y_dev[0]))
plt.plot(x_dev[1], label='category'+str(y_dev[1]))
plt.plot(x_dev[2], label='category'+str(y_dev[2]))
plt.plot(x_dev[4], label='category'+str(y_dev[4]))
plt.plot(x_dev[5], label='category'+str(y_dev[5]))
plt.legend(loc='upper right', frameon=False)
plt.ylim([-3.5, 3.5])
plt.title('Walking on carpet')

# Split the development dataset into training and test datasets

In [None]:
print('Number of samples of class 0', (y_dev == 0).sum())
print('Number of samples of class 1', (y_dev == 1).sum())
y_dev_df = pd.DataFrame(y_dev)
y_dev_df[0].value_counts().plot(kind='bar')

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x_dev, y_dev, test_size=100, random_state=21, stratify=y_dev)
print('The shape of train_data is', x_train.shape)
print('The shape of test_data is', x_test.shape)
print('Training data:')
print('Number of samples of class 0', (y_train == 0).sum())
print('Number of samples of class 1', (y_train == 1).sum())
print('Test data:')
print('Number of samples of class 0', (y_test == 0).sum())
print('Number of samples of class 1', (y_test == 1).sum())

# Pre-process the data

In [None]:
x_train_mean = x_train.mean()
x_train_std = x_train.std()
x_train = (x_train - x_train_mean)/(x_train_std) 
x_test = (x_test - x_train_mean)/(x_train_std)

print('x_train_mean', x_train_mean)
print('x_train_std', x_train_std)

In [None]:
sample_a = 0 ### CHANGE PARAMETER HERE ###
sample_b = 4 ### CHANGE PARAMETER HERE ###
plt.plot(x_train[sample_a], label='category'+str(y_train[sample_a]))
plt.plot(x_train[sample_b], label='category'+str(y_train[sample_b]))
plt.legend(loc='upper right', frameon=False)

# MLP 1
Create an multilayer perceptron (MLP). This first MLP is small.

In [None]:
input_shape = x_train.shape[1:]

In [None]:
n0 = 16
x = Input(shape=(input_shape), name='MLP1InputLayer')
y = Dense(n0, activation='relu', name='Dense010')(x)
# Output layer
out = Dense(1, activation='sigmoid', name='OutputLayer')(y)

# Build model
model_mlp1 = Model(x, out)
print(model_mlp1.summary())

## Understanding the number of parameters
TODO - exercise around the calculation that arrives at the number of parameters in each layer.

In [None]:
print(70*16+16)
print(16+1)

## Select an optimizer and compile the model

In [None]:
optimizer = keras.optimizers.Adam()
model_mlp1.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
# TODO - can we access TensorBoard on colab? If so, add tensorboard callback

## Train MLP 1

In [None]:
batch_size = 5
epochs = 50

In [None]:
start = time.time()
hist = model_mlp1.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_data=(x_test, y_test), verbose=1)
end = time.time()
log = pd.DataFrame(hist.history) 
print('Training complete in', round(end-start), 'seconds')

In [None]:
print('The first five rows in log are')
log.head()

In [None]:
log[['loss', 'val_loss']].plot()
# TODO add axes labels, etc

In [None]:
log[['acc', 'val_acc']].plot()

# Make predictions using MLP 1
Classify the data using MLP 1

In [None]:
result = model_mlp1.evaluate(x_test, y_test, batch_size=batch_size)
print('Validation accuracy is', result[1])

In [None]:
y_probability = model_mlp1.predict_on_batch(x_test)
y_predicted_class = np.round(y_probability).flatten()

In [None]:
sample = 3
print('The probability that sample', sample, 'belongs to class 1 is', y_probability[sample][0])
print('The model classifies sample', sample, 'as class', y_predicted_class[sample])
print('The true class of sample', sample, 'is class', y_test[sample])

In [None]:
sample_a = 0 ### CHANGE PARAMETER HERE ###
sample_b = 3 ### CHANGE PARAMETER HERE ###
plt.plot(x_test[sample_a], label='True:'+str(y_test[sample_a])+' Pred:'+str(y_predicted_class[sample_a]))
plt.plot(x_test[sample_b], label='True:'+str(y_test[sample_b])+' Pred:'+str(y_predicted_class[sample_b]))
plt.legend(loc='upper right', frameon=False)

# MLP 2
This time we will create a function that builds our model.

In [None]:
def build_model():
    x = Input(shape=(input_shape), name='MLP2InputLayer')
    ### CHANGE PARAMETERS HERE ###
    y = Dropout(0.1,name='Drop010')(x)
    y = Dense(16, activation='relu', name='Dense010')(y) 
    y = Dropout(0.2,name='Drop020')(y)
    y = Dense(16, activation='relu', name='Dense020')(y)
    y = Dropout(0.2,name='Drop030')(y)
    y = Dense(16, activation='relu', name='Dense030')(y)
    y = Dropout(0.3,name='Drop040')(y)
    ### END OF CHANGE PARAMETERS ###
    out = Dense(1, activation='sigmoid', name='OutputLayer')(y)

    # Build model and compile the model
    model = Model(x, out)
    optimizer = keras.optimizers.Adam()
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model
    
model = build_model()

In [None]:
batch_size = 5
epochs = 50
start = time.time()
hist = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_data=(x_test, y_test), verbose=1)
end = time.time()
log = pd.DataFrame(hist.history) 
print('Training complete in', round(end-start), 'seconds')

In [None]:
log[['loss', 'val_loss']].plot()

In [None]:
log[['acc', 'val_acc']].plot()
model.evaluate(x_test, y_test, batch_size=batch_size)
print('Validation accuracy is', result[1])

# k-fold cross validation

In [None]:
### CHANGE PARAMETERS HERE ###
k = 3 
m = 5 
batch_size = 10
epochs = 30
### END OF CHANGE PARAMETERS ###

kfold = RepeatedStratifiedKFold(n_splits=k, n_repeats=m, random_state=76)
count = 0
val_acc = list()
start = time.time()
for train, test in kfold.split(x_dev, y_dev):
    x_train, y_train, x_test, y_test = x_dev[train], y_dev[train], x_dev[test], y_dev[test]
    # Normalise the data
    x_train_mean = x_train.mean()
    x_train_std = x_train.std()
    x_train = (x_train - x_train_mean)/(x_train_std) 
    x_test = (x_test - x_train_mean)/(x_train_std)
    # Build and train a model
    model = build_model()
    fold_start = time.time()
    hist = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_data=(x_test, y_test), verbose=1)
    fold_end = time.time()
    log = pd.DataFrame(hist.history) 
    print('Training of iteration', count, 'complete in', round(fold_end-fold_start), 'seconds')
    val_acc.append(log.iloc[-1]['val_acc'])
    count = count + 1

end = time.time()
val_acc = pd.DataFrame(val_acc, columns=['val_acc'])

In [None]:
print(val_acc)
print(m, 'repeats of', k, '-fold cross validation completed in', round(end-start), 'seconds')

## Plot the k-fold cross validation results

In [None]:
ax = sns.boxplot(data=val_acc)
ax = sns.swarmplot(data=val_acc, color='black')
print('Validation accuracy mean and sample standard deviation', val_acc['val_acc'].mean(), val_acc['val_acc'].std())

# GPU
Using a GPU can speed up calculations. However, it can take longer to transfer the data to the GPU.

You are more likely to see a speed-up if batch size is large. As you increase batch size, check that valuation accuracy does not deteriorate.

To use a GPU in colab select Edit - Notebook settings and then set Hardware accelerator to GPU

In [None]:
# Check to see if you are using a GPU.
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
    print('GPU device not found')
print('Found a GPU at: {}'.format(device_name))