## Multi-label time series classification with LSTM 

#### Implementation of model for multi-label tims series classification as discussed in the following paper: <a href="https://arxiv.org/abs/1511.03677"> Learning to diagnose with LSTM and RNNs</a>.

#### Importing required libraries

In [1]:
import pandas as pd
import numpy as np
import tensorflow as tf
from scipy import stats
from tensorflow.contrib import rnn
from sklearn.metrics import roc_auc_score

#### Helper methods for reading and creating sequences of data for RNN/LSTM

You may be need to modify these methods according to your needs.

In [2]:
filename = "data_rnn_multilabel.csv"
feature_column_name = "Feature"
class_column_name = "Class"

In [6]:
def read_data(file_path):
    data = pd.read_csv(file_path,header = 0)
    data = data.iloc[0:10009,:] #Only first 10.009
    return data 

def windows(data, window_size):
    start = 0
    while start < len(data):
        yield start, start + window_size
        start += (window_size // 2)
        
def extract_segments(data, window_size = 30):
    segments = np.empty((0,window_size))
    labels = np.empty((0))
    for (start,end) in windows(data,window_size):
        if(len(data[start:end]) == (window_size)):
            signal = data.iloc[start:end][feature_column_name]
            segments = np.vstack([segments, signal])
            
            #Label is MODAL (most_common) value in sequence
            #Label is in onehot format
            labels = np.append(labels,stats.mode(data.iloc[start:end][class_column_name])[0][0])
    return segments, labels

In [7]:
win_size = 10
'''
MIMIC-III dataset can possibly be use to train and test the model. 
But beware this is not the data set used by the authors of the paper. 
For dataset description and format please see Section 3: Data Description in the paper.
'''
data = read_data(filename) #Pandas DF
segments,labels = extract_segments(data, win_size)
labels = np.asarray(pd.get_dummies(labels), dtype = np.int8)
reshaped_segments = segments.reshape([len(segments),win_size,1])

In [9]:
print('segments shape:',segments.shape,'\nFirst elem as example:\n',segments[0],'\n')
print('reshaped_segments shape:',reshaped_segments.shape,'\nFirst elem as example:\n',reshaped_segments[0],'\n')
print('lables shape:',labels.shape,'\nFirst elem as example:\n',labels[0],'\n') #Remember: Label is modal


segments shape: (2000, 10) 
First elem as example:
 [ 0.76385228  0.62054265  0.85255228  0.60326969  0.57533214  0.20306302
  0.50424147  0.59451474  0.71279341  0.07645766] 

reshaped_segments shape: (2000, 10, 1) 
First elem as example:
 [[ 0.76385228]
 [ 0.62054265]
 [ 0.85255228]
 [ 0.60326969]
 [ 0.57533214]
 [ 0.20306302]
 [ 0.50424147]
 [ 0.59451474]
 [ 0.71279341]
 [ 0.07645766]] 

lables shape: (2000, 3) 
First elem as example:
 [0 1 0] 



In [10]:
train_test_split = np.random.rand(len(reshaped_segments)) < 0.80
train_x = reshaped_segments[train_test_split]
train_y = labels[train_test_split]
test_x = reshaped_segments[~train_test_split]
test_y = labels[~train_test_split]

In [13]:
idx = np.random.randint(0,len(train_x))
print(train_x[idx])
print(train_y[idx])

[[ 0.60793542]
 [ 0.00931542]
 [ 0.71178549]
 [ 0.99879496]
 [ 0.57842481]
 [ 0.6999442 ]
 [ 0.62030079]
 [ 0.2829745 ]
 [ 0.99434038]
 [ 0.90881374]]
[0 0 1]


#### Hyperparameters Configuration

In [14]:
tf.reset_default_graph()

learning_rate = 0.001
training_epochs = 100
batch_size = 10
total_batches = (train_x.shape[0]//batch_size)

n_input = 1
n_steps = 10
n_hidden = 64
n_classes = 3

alpha = 0.5

In [15]:
print('Total batches: %d' % total_batches)

Total batches: 162


#### Input/Output placeholders for Tensorflow graph

In [16]:
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])
y_steps = tf.placeholder("float", [None, n_classes])

#### Helper methods 

Addition of Dropout and/or other modification to model architecture can be made in LSTM function.

In [17]:
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev = 0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.0, shape = shape)
    return tf.Variable(initial)

def LSTM(x, weight, bias):
    cell = tf.contrib.rnn.LSTMCell(n_hidden,state_is_tuple = True)
    multi_layer_cell = tf.contrib.rnn.MultiRNNCell([cell] * 2)
    output, state = tf.nn.dynamic_rnn(multi_layer_cell, x, dtype = tf.float32)
    output_flattened = tf.reshape(output, [-1, n_hidden])
    output_logits = tf.add(tf.matmul(output_flattened,weight),bias)
    output_all = tf.nn.sigmoid(output_logits)
    output_reshaped = tf.reshape(output_all,[-1,n_steps,n_classes])
    output_last = tf.gather(tf.transpose(output_reshaped,[1,0,2]), n_steps - 1)  
    #output = tf.transpose(output, [1, 0, 2])
    #last = tf.gather(output, int(output.get_shape()[0]) - 1)
    #output_last = tf.nn.sigmoid(tf.matmul(last, weight) + bias)
    return output_last, output_all

#### Loss function: Binary cross entropy and target replication 

Loss function used in the paper is a combination of two losses 1) average loss of each time step prediction 2) loss of the prediction calculated at the last time step. Alpha in the combined loss function is a hyper-parameter. See the <a href="https://arxiv.org/abs/1511.03677">paper</a> for more information on target replication and loss function.

In [18]:
weight = weight_variable([n_hidden,n_classes])
bias = bias_variable([n_classes])
y_last, y_all = LSTM(x,weight,bias)

In [19]:
#all_steps_cost=tf.reduce_mean(-tf.reduce_mean((y_steps * tf.log(y_all))+(1 - y_steps) * tf.log(1 - y_all),reduction_indices=1))
all_steps_cost = -tf.reduce_mean((y_steps * tf.log(y_all))  + (1 - y_steps) * tf.log(1 - y_all))
last_step_cost = -tf.reduce_mean((y * tf.log(y_last)) + ((1 - y) * tf.log(1 - y_last)))
loss_function = (alpha * all_steps_cost) + ((1 - alpha) * last_step_cost)

optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(loss_function)

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


#### Training and testing the model

In [20]:
with tf.Session() as session:
    tf.global_variables_initializer().run()
    for epoch in range(training_epochs):
        for b in range(total_batches):    
            offset = (b * batch_size) % (train_y.shape[0] - batch_size)
            batch_x = train_x[offset:(offset + batch_size), :]
            batch_y = train_y[offset:(offset + batch_size), :]
            batch_y_steps = np.tile(batch_y,((train_x.shape[1]),1))
            _, c = session.run([optimizer, loss_function],feed_dict={x: batch_x, y : batch_y, y_steps: batch_y_steps})   
        pred_y = session.run(y_last,feed_dict={x:test_x})
        print("ROC AUC Score: ",roc_auc_score(test_y,pred_y))

ROC AUC Score:  0.711319803191
ROC AUC Score:  0.746158461839
ROC AUC Score:  0.756713097237
ROC AUC Score:  0.775873891406
ROC AUC Score:  0.787036263956
ROC AUC Score:  0.791886437325
ROC AUC Score:  0.794910158415
ROC AUC Score:  0.797292557798
ROC AUC Score:  0.79871226322
ROC AUC Score:  0.800318222841
ROC AUC Score:  0.802036358787
ROC AUC Score:  0.804460244981
ROC AUC Score:  0.806972058262
ROC AUC Score:  0.809247132546
ROC AUC Score:  0.811631696184
ROC AUC Score:  0.81357341637
ROC AUC Score:  0.816073977403
ROC AUC Score:  0.818698083555
ROC AUC Score:  0.820917569853
ROC AUC Score:  0.82266976121
ROC AUC Score:  0.82475174408
ROC AUC Score:  0.826626303345
ROC AUC Score:  0.828265715926
ROC AUC Score:  0.831055773995
ROC AUC Score:  0.834382890453
ROC AUC Score:  0.838287789669
ROC AUC Score:  0.842151329332
ROC AUC Score:  0.843671208363
ROC AUC Score:  0.846298930295
ROC AUC Score:  0.849817587297
ROC AUC Score:  0.852941392214
ROC AUC Score:  0.856690395347
ROC AUC Scor