# Time series - binary classification
The following example shows binary classification task on time series data. There are several approaches possible. We will try two of them:

- **Feedforward neural network** fixed window training & prediction
- **Recurrent neural network** fixed window training & continuous prediction

The task is to classify (detect) a jump performed by a trampoline jumper during her training. We would classify a specific jump (called twist - backflip with one spin) from the other jumps.

First, we need some imports. We won't use `pandas` for this example, just `numpy`:

In [None]:
import numpy as np
import os
import matplotlib.pyplot as plt

%matplotlib inline

## Dataset preparation
Dataset consist of several hundreds of timeseries. It's separated into two folders, the first contains *positive samples* - twists, and the other contains *negative samples* - other jumps.

Eeach jump is captured in a CSV file that contains 13 values per time step.

In [None]:
def read_csv_files(path):
    dirpath, dirnames, filenames = list(os.walk(path))[0]    
    return [
        np.genfromtxt(dirpath + '/' + file, delimiter=',') for file in filenames 
        if os.path.splitext(file)[1] == '.csv'
    ]

In [None]:
DATA_FOLDER = '../data/trampoliny/'

positive_samples = read_csv_files(DATA_FOLDER + '42')
negative_samples = read_csv_files(DATA_FOLDER + 'ostatni')

There are several hundreads of samples in the datasets. Their lengths vary from ~20 to ~120 timesteps.

In [None]:
print("Positive samples: %d" % len(positive_samples))
print("Negative samples: %d" % len(negative_samples))

#TODO: add correct collections of lengths of samples to see histograms of lengths

plt.hist(...)
plt.hist(...)
plt.show()

### Columns selection
Data were captured using a motion sensor attached to the jumper's leg during her training. The motion sensor can give us several quantities:

- linear acceleration (3D vector)
- angual acceleration (3D vector)
- direction of gravity (3D sensor)
- orientation (4D quaternion)

Quantities are captured in different frequencies and all dataset was resampled to the the same frequency using nearies neighbour resampling (see "steps" in the plotted curves).

In [None]:
plt.figure(figsize=(15,6))
plt.plot(positive_samples[0])
plt.show()

Orientation values seems to vary a lot, let's ignore them for the training. (We can keep them there but the model would probably ignore them anyway)

In [None]:
plt.figure(figsize=(15,6))
plt.plot(positive_samples[0][:,9:13])
plt.show()

plt.figure(figsize=(15,6))
plt.plot(positive_samples[0][:,0:9])
plt.show()

### Normalization & padding
The values returned the sensor is given by it's digital nature (usually between -32k, 32k) and needs to be normalized (column-wise).

In [None]:
def normalize(*datasets):            
    all_samples = np.vstack([np.vstack(samples) for samples in datasets])    
    #TODO: add `min_vals` and `max_vals` arrays here, both should have shape (13,)
    
    return [
        #TODO construct a collection where XXX is a normalized sample:  [XXX for sample in samples], normalize to <-1, 1>
        for samples in datasets
    ], min_vals, max_vals

In [None]:
(norm_positive_samples, norm_negative_samples), max_vals, min_vals = normalize(positive_samples, negative_samples)

Also, for tensor-based training, the data need to be padded to a fixed length. Let's take the maximum length and pad all sequences with zeros (leading).

In [None]:
from keras.preprocessing.sequence import pad_sequences

def pad(*datasets):
    max_length = #TODO: get a maximal length of a sample here
    return [pad_sequences(samples, maxlen=max_length, dtype=datasets[0][0].dtype) for samples in datasets]

..now, let's see how our samples look like after normalization and padding

In [None]:
norm_positive_samples, norm_negative_samples = pad(norm_positive_samples, norm_negative_samples)

In [None]:
plt.figure(figsize=(15,6))
plt.plot(norm_positive_samples[0][:,0:9])
plt.show()

## Model training
Now, we are ready to train the model. Let§s start with training set construction

Notice how we are creating the target variable by filling correct number of *zeroes* and *ones* into the `training_Y` array.

In [None]:
training_X = np.vstack((norm_positive_samples[:,:,0:9], norm_negative_samples[:,:,0:9]))
training_Y = #TODO: generate correct labels for our input series

We want to use `validation_split` and we need to shuffle the training set randomly. It needs to be performed "pair-wise".

In [None]:
import random

#TODO: shuffle the training set !element wise!

training_X =
training_Y = 

### Feed-forward model
Let's start with the simple **feed-forward network** with no recurrent connections. 

In this case, we need to reshape out input data as we are feeding it all together in one step (127x9 values)

In [None]:
from keras import Model
from keras.layers import LSTM, Input, Dense

inputs = Input(shape=(training_X.shape[1] * training_X.shape[2],))
x = 
#TODO add three dense layers with tanh activation (256,128,64)

outputs = Dense(1, activation='sigmoid')(x)

ffn_model = Model(inputs, outputs)
ffn_model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
ffn_model.summary()

In [None]:
#TODO: why we need this? replace ... with correct window size
ffn_training_X = training_X.reshape(training_X.shape[0], ... )

In [None]:
#TODO run model.fit on training data with validation split 0.1 and 40 epochs

We finished with ~90% accuracy on the validation set.

We can do better with recurrent nets. Both better accuracy and much smaller model.

### LSTM recurrent model

Let's take famous LSTM units and make smaller 2-layered network out of them. 

Notice the shape of the input. LSTMs are recurrent networds and they expects sequences of inputs):

In [None]:
inputs = Input(shape=training_X.shape[1:])

#TODO: create and compile model of two layers of LSTMs with one Dense output at the end of the sequence. Fit function must work with it.

In [None]:
model.fit(training_X, training_Y, epochs=30, validation_split=0.1)
model.save_weights('model_trampoline_9i.hdf')

We finished with >90% accuracy on the validation set.
### Predicion phase

Now, let's build the prediction model. We will use the same architecture (LSTMs) but now we are aiming for continuous prediction (return output value for each timestep).

In [None]:
from keras import Model
from keras.layers import LSTM, Input, Dense, Dropout

inputs = Input(shape=(None, training_X.shape[2]))
#TODO: repeat the model layers but return an output at every time step
outputs = Dense(1, activation='sigmoid')(x)

cont_model = Model(inputs, outputs)
cont_model.summary()
cont_model.load_weights("model_trampoline_9i.hdf")

Let's see how the network works throughout the whole sequence:

In [None]:
def show_prediction(test_case):
    print(test_case[1])

    c_prediction = #TODO: call model predict on test_case
    plt.figure(figsize=(15,6))
    plt.plot(test_case[0], 'silver')
    plt.plot(c_prediction[0], 'red' if test_case[1] == 0 else 'green')
    plt.show()

In [None]:
positive_test_sample = next(sample for sample in reversed(training_set) if sample[1] == 1)
negative_test_sample = next(sample for sample in reversed(training_set) if sample[1] == 0)

show_prediction(positive_test_sample)
show_prediction(negative_test_sample)

The LSTM prediction model is not restricted to a fixed sequence length and can predict for arbitrary sequence length:

In [None]:
#TODO: make LSTM model predict from truncated test sample

## Analysis

Let's make some analysis on the whole dataset and finish our experiment with standard performance measures.

First, let's see how our model perform on various sequence lengths:

In [None]:
plt.figure(figsize=(25,6))
c_prediction = cont_model.predict(norm_positive_samples[:100,:,0:9])
plt.plot(np.squeeze(c_prediction).T, '#00800020')
c_prediction = cont_model.predict(norm_negative_samples[:100,:,0:9])
plt.plot(np.squeeze(c_prediction).T, '#80000020')
plt.show()

plt.figure(figsize=(25,6))
c_prediction = cont_model.predict(norm_positive_samples[:100,60:,0:9])
plt.plot(np.squeeze(c_prediction).T, '#00800020')
c_prediction = cont_model.predict(norm_negative_samples[:100,60:,0:9])
plt.plot(np.squeeze(c_prediction).T, '#80000020')
plt.show()

plt.figure(figsize=(25,6))
c_prediction = cont_model.predict(norm_positive_samples[:100,100:,0:9])
plt.plot(np.squeeze(c_prediction).T, '#00800020')
c_prediction = cont_model.predict(norm_negative_samples[:100,100:,0:9])
plt.plot(np.squeeze(c_prediction).T, '#80000020')
plt.show()

plt.figure(figsize=(25,6))
c_prediction = cont_model.predict(norm_positive_samples[:100,117:,0:9])
plt.plot(np.squeeze(c_prediction).T, '#00800020')
c_prediction = cont_model.predict(norm_negative_samples[:100,117:,0:9])
plt.plot(np.squeeze(c_prediction).T, '#80000020')
plt.show()

Finally let's see what would be the optimal threshold for application of our model:

In [None]:
from sklearn.metrics import roc_curve, precision_recall_curve

threshold = 0.2

fpr, tpr, thresholds = roc_curve(...) #TODO: Search the documentation and add correct params here

fig = plt.figure(figsize=(10, 8))
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr, tpr)

t_index = min(enumerate(thresholds), key=lambda x: abs(x[1] - threshold))[0]
s = plt.scatter(fpr[t_index], tpr[t_index])
s.axes.annotate(thresholds[t_index], (fpr[t_index] + 0.01, tpr[t_index] - 0.02))

    
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.show()

precision, recall, thresholds = precision_recall_curve(...) #TODO: Search the documentation and add correct params here

fig = plt.figure(figsize=(10, 8))
plt.plot(recall, precision)

t_index = min(enumerate(thresholds), key=lambda x: abs(x[1] - threshold))[0]
s = plt.scatter(recall[t_index], precision[t_index])
s.axes.annotate(thresholds[t_index], (recall[t_index] + 0.01, precision[t_index] + 0.002))

plt.xlabel('Recall')    
plt.ylabel('Precision')
plt.show()

In [None]:
from sklearn.metrics import confusion_matrix

confusion_matrix(true_Y, prediction)