<a href="https://colab.research.google.com/github/DanielhCarranza/Curso-Deep-Learning/blob/master/Time_Series_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#  Time Series Classification

Classifying time series data? Is that really possible? What could potentially be the use of doing that? These are just some of the questions you must have had when you read the title of this article. And it’s only fair – I had the exact same thoughts when I first came across this concept!

The time series data most of us are exposed to deals primarily with generating forecasts. Whether that’s predicting the demand or sales of a product, the count of passengers in an airline or the closing price of a particular stock, we are used to leveraging tried and tested time series techniques for forecasting requirements.
![alt text](https://s3-ap-south-1.amazonaws.com/av-blog-media/wp-content/uploads/2019/01/time-series-.jpg)


But as the amount of data being generated increases exponentially, so does the opportunity to experiment with new ideas and algorithms. Working with complex time series datasets is still a niche field, and it’s always helpful to expand your repertoire to include new ideas.

And that is what I aim to do in the article by introducing you to the novel concept of time series classification. We will first understand what this topic means and it’s applications in the industry. But we won’t stop at the theory part – we’ll get our hands dirty by working on a time series dataset and performing binary time series classification. Learning by doing – this will help you understand the concept in a practical manner as well.

If you have not worked on a time series problem before, I highly recommend first starting with some basic forecasting. You can go through the below article for starters:


references:
 * [Time Series Classification](https://www.analyticsvidhya.com/blog/2019/01/introduction-time-series-classification/)
 
 * [Time Series Datasets](http://www.timeseriesclassification.com/dataset.php)

In [None]:
! wget https://archive.ics.uci.edu/ml/machine-learning-databases/00348/MovementAAL.zip
! unzip MovementAAL.zip

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os

from keras.preprocessing import sequence
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM

from keras.optimizers import Adam
from keras.models import load_model
from keras.callbacks import ModelCheckpoint

## Setting up the Problem Statement
We will be working on the ‘Indoor User Movement Prediction‘ problem. In this challenge, multiple motion sensors are placed in different rooms and the goal is to identify whether an individual has moved across rooms, based on the frequency data captured from these motion sensors.

There are four motion sensors (A1, A2, A3, A4) placed across two rooms. Have a look at the below image which illustrates where the sensors are positioned in each room. The setup in these two rooms was created in 3 different pairs of rooms (group1, group2, group3).

![alt text](https://s3-ap-south-1.amazonaws.com/av-blog-media/wp-content/uploads/2018/12/MovementAAL.jpg)

A person can move along any of the six pre-defined paths shown in the above image. If a person walks on path 2, 3, 4 or 6, he moves within the room. On the other hand, if a person follows path 1 or path 5, we can say that the person has moved between the rooms.

The sensor reading can be used to identify the position of a person at a given point in time. As the person moves in the room or across rooms, the reading in the sensor changes. This change can be used to identify the path of the person.

[Dataset](https://archive.ics.uci.edu/ml/datasets/Indoor+User+Movement+Prediction+from+RSS+data)

In [None]:
dataset_group=pd.read_csv('groups/MovementAAL_DatasetGroup.csv', header=0)
dataset_group.head()
groups=dataset_group.values[:,1]

In [None]:
paths= pd.read_csv('groups/MovementAAL_Paths.csv')
paths.head()

Unnamed: 0,#sequence_ID,path_ID
0,1,1
1,2,1
2,3,1
3,4,1
4,5,1


In [None]:
df1=pd.read_csv('dataset/MovementAAL_RSS_1.csv')
print(df1.shape)
df1.head()

(27, 4)


Unnamed: 0,#RSS_anchor1,RSS_anchor2,RSS_anchor3,RSS_anchor4
0,-0.90476,-0.48,0.28571,0.3
1,-0.57143,-0.32,0.14286,0.3
2,-0.38095,-0.28,-0.14286,0.35
3,-0.28571,-0.2,-0.47619,0.35
4,-0.14286,-0.2,0.14286,-0.2


In [None]:
files=os.listdir('dataset')
files[:5]

['MovementAAL_RSS_141.csv',
 'MovementAAL_RSS_139.csv',
 'MovementAAL_RSS_102.csv',
 'MovementAAL_RSS_66.csv',
 'MovementAAL_RSS_252.csv']

In [None]:
targets=pd.read_csv('dataset/MovementAAL_target.csv')
targets=targets.values[:,1]


In [None]:
path = 'dataset/MovementAAL_RSS_'
sequences = list()
for i in range(1,315):
    file_path = path + str(i) + '.csv'
    print(file_path)
    df = pd.read_csv(file_path, header=0)
    values = df.values
    sequences.append(values)

In [None]:
len(sequences)

314

In [None]:
len_sequences = []
for one_seq in sequences:
    len_sequences.append(len(one_seq))
pd.Series(len_sequences).describe()

count    314.000000
mean      42.028662
std       16.185303
min       19.000000
25%       26.000000
50%       41.000000
75%       56.000000
max      129.000000
dtype: float64

In [None]:
#Padding the sequence with the values in last row to max length
to_pad = 129
new_seq = []
for one_seq in sequences:
    len_one_seq = len(one_seq)
    last_val = one_seq[-1]
    n = to_pad - len_one_seq
   
    to_concat = np.repeat(one_seq[-1], n).reshape(4, n).transpose()
    new_one_seq = np.concatenate([one_seq, to_concat])
    new_seq.append(new_one_seq)
final_seq = np.stack(new_seq)

#truncate the sequence to length 60
from keras.preprocessing import sequence
seq_len = 60
final_seq=sequence.pad_sequences(final_seq, maxlen=seq_len, padding='post', dtype='float', truncating='post')

Using TensorFlow backend.


In [None]:
train = [final_seq[i] for i in range(len(groups)) if (groups[i]==2)]
validation = [final_seq[i] for i in range(len(groups)) if groups[i]==1]
test = [final_seq[i] for i in range(len(groups)) if groups[i]==3]


In [None]:

train_target = [targets[i] for i in range(len(groups)) if (groups[i]==2)]
validation_target = [targets[i] for i in range(len(groups)) if groups[i]==1]
test_target = [targets[i] for i in range(len(groups)) if groups[i]==3]

In [None]:
train = np.array(train)
validation = np.array(validation)
test = np.array(test)

train_target = np.array(train_target)
train_target = (train_target+1)/2

validation_target = np.array(validation_target)
validation_target = (validation_target+1)/2

test_target = np.array(test_target)
test_target = (test_target+1)/2

In [None]:
train.shape

(106, 60, 4)

In [None]:
model = Sequential()
model.add(LSTM(256, input_shape=(seq_len, 4)))
model.add(Dense(1, activation='sigmoid'))

Instructions for updating:
Colocations handled automatically by placer.


In [None]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 256)               267264    
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 257       
Total params: 267,521
Trainable params: 267,521
Non-trainable params: 0
_________________________________________________________________


In [None]:
adam = Adam(lr=0.001)
chk = ModelCheckpoint('best_model.pkl', monitor='val_acc', save_best_only=True, mode='max', verbose=1)
model.compile(loss='binary_crossentropy', optimizer=adam, metrics=['accuracy'])
model.fit(train, train_target, epochs=200, batch_size=128, callbacks=[chk], validation_data=(validation,validation_target))

Instructions for updating:
Use tf.cast instead.
Train on 106 samples, validate on 104 samples
Epoch 1/200

Epoch 00001: val_acc improved from -inf to 0.61538, saving model to best_model.pkl
Epoch 2/200

Epoch 00002: val_acc improved from 0.61538 to 0.62500, saving model to best_model.pkl
Epoch 3/200

Epoch 00003: val_acc did not improve from 0.62500
Epoch 4/200

Epoch 00004: val_acc did not improve from 0.62500
Epoch 5/200

Epoch 00005: val_acc did not improve from 0.62500
Epoch 6/200

Epoch 00006: val_acc did not improve from 0.62500
Epoch 7/200

Epoch 00007: val_acc did not improve from 0.62500
Epoch 8/200

Epoch 00008: val_acc did not improve from 0.62500
Epoch 9/200

Epoch 00009: val_acc did not improve from 0.62500
Epoch 10/200

Epoch 00010: val_acc did not improve from 0.62500
Epoch 11/200

Epoch 00011: val_acc did not improve from 0.62500
Epoch 12/200

Epoch 00012: val_acc did not improve from 0.62500
Epoch 13/200

Epoch 00013: val_acc did not improve from 0.62500
Epoch 14/200



<keras.callbacks.History at 0x7f317df70f98>

In [None]:
#loading the model and checking accuracy on the test data
model = load_model('best_model.pkl')

from sklearn.metrics import accuracy_score
test_preds = model.predict_classes(test)
accuracy_score(test_target, test_preds)

0.6730769230769231

In [None]:
from keras.datasets import imdb
from keras.preprocessing import sequence
max_features = 10000
max_len = 500
print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

Using TensorFlow backend.


Loading data...
Downloading data from https://s3.amazonaws.com/text-datasets/imdb.npz
25000 train sequences
25000 test sequences


In [None]:
import pandas as pd 
import numpy as np
pd.DataFrame(x_train[0]).T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,208,209,210,211,212,213,214,215,216,217
0,1,14,22,16,43,530,973,1622,1385,65,...,4472,113,103,32,15,16,5345,19,178,32


In [None]:
print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=max_len)
x_test = sequence.pad_sequences(x_test, maxlen=max_len)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

Pad sequences (samples x time)
x_train shape: (25000, 500)
x_test shape: (25000, 500)


In [None]:
sequence.pad_sequences?

In [None]:
from keras.models import Sequential
from keras import layers
from keras.optimizers import RMSprop



model= Sequential()
model.add(layers.Embedding(max_features, 128, input_length=max_len)) # Input-> [Bs, TimeSetp] --> Embedding ---> [Bs, TimeStep, Features]
model.add(layers.Conv1D(32,7,activation='relu'))
model.add(layers.MaxPooling1D(5))
model.add(layers.Conv1D(32,7, activation='relu'))
model.add(layers.GlobalAveragePooling1D())
model.add(layers.Dense(1))

In [None]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 500, 128)          1280000   
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 494, 32)           28704     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 98, 32)            0         
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 92, 32)            7200      
_________________________________________________________________
global_average_pooling1d_1 ( (None, 32)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 33        
Total params: 1,315,937
Trainable params: 1,315,937
Non-trainable params: 0
_________________________________________________________________


In [None]:
model.compile(optimizer=RMSprop(lr=1e-4), loss='binary_crossentropy', metrics=['acc'])

In [None]:
model.fit(x_train,y_train, batch_size=128, epochs=10, validation_split=0.2)

Instructions for updating:
Use tf.cast instead.
Train on 20000 samples, validate on 5000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fcf537e1048>