
# **Project 1: Classification of Time Series Data**

Name: Haneen Alsuradi
NetID: hha243

In this project, I will use 1 dimensional convolotional network to perform classification on time series data. Data is taken from Kaggle competition [Surface Type Classification] found in the link below: https://www.kaggle.com/c/career-con-2019/overview

## **Step 1: Data Preparation**

The first step is to import, view and prepare the data. We import the data saved in X_train.csv and y_train.csv. We split the data to train and test as advised. After that, we view the data using the head() command from pandas.

In [2]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
import numpy as np
import matplotlib.pyplot as plt

# read Kaggle datasets
X_train = pd.read_csv('C:\\data/X_train.csv')
y_train = pd.read_csv('C:\\data/y_train.csv')
# split X_train
samples = 20
time_series = 128
start_x = X_train.shape[0] - samples*time_series
X_train_new, X_test_new = X_train.iloc[:start_x], X_train.iloc[start_x:]
# split y_train
start_y = y_train.shape[0] - samples
y_train_new, y_test_new = y_train.iloc[:start_y], y_train.iloc[start_y:]
X_train_new.head(5)

Unnamed: 0,row_id,series_id,measurement_number,orientation_X,orientation_Y,orientation_Z,orientation_W,angular_velocity_X,angular_velocity_Y,angular_velocity_Z,linear_acceleration_X,linear_acceleration_Y,linear_acceleration_Z
0,0_0,0,0,-0.75853,-0.63435,-0.10488,-0.10597,0.10765,0.017561,0.000767,-0.74857,2.103,-9.7532
1,0_1,0,1,-0.75853,-0.63434,-0.1049,-0.106,0.067851,0.029939,0.003385,0.33995,1.5064,-9.4128
2,0_2,0,2,-0.75853,-0.63435,-0.10492,-0.10597,0.007275,0.028934,-0.005978,-0.26429,1.5922,-8.7267
3,0_3,0,3,-0.75852,-0.63436,-0.10495,-0.10597,-0.013053,0.019448,-0.008974,0.42684,1.0993,-10.096
4,0_4,0,4,-0.75852,-0.63435,-0.10495,-0.10596,0.005135,0.007652,0.005245,-0.50969,1.4689,-10.441


As can be noticed, the first 3 columns are not part of the features and must be dropped. We drop these columns for X_train_new and X_test_new.

In [3]:
X_train_new=X_train_new.drop(['row_id', 'series_id','measurement_number'], axis=1)
X_test_new=X_test_new.drop(['row_id', 'series_id','measurement_number'], axis=1)


Now, we have a look at the labels of the training data saved in y_train_new. We are only interested in the surface type which is the last column. We drop the other columns from y_train_new.

In [4]:
y_train_new.head(5)


Unnamed: 0,series_id,group_id,surface
0,0,13,fine_concrete
1,1,31,concrete
2,2,20,concrete
3,3,31,concrete
4,4,22,soft_tiles


We drop the first two columns from the y_train_new and y_test_new as they will not be needed.

In [5]:
y_train_new=y_train_new.drop(['series_id', 'group_id'], axis=1)
y_test_new=y_test_new.drop(['series_id', 'group_id'], axis=1)

Now, we convert the dataframes to a numpy array. We extract the values from the data frame using .values. We print the shape of training and testing data.

In [6]:
X_train_new=X_train_new.values
X_test_new=X_test_new.values
y_train_new=y_train_new.values
y_test_new=y_test_new.values
print('The size of X_train_new:', X_train_new.shape )
print('The size of X_test_new:', X_test_new.shape )

print('The size of y_train_new:', y_train_new.shape )
print('The size of y_test_new:', y_test_new.shape )

The size of X_train_new: (485120, 10)
The size of X_test_new: (2560, 10)
The size of y_train_new: (3790, 1)
The size of y_test_new: (20, 1)


Now, we convert the strings (concrete, tiled, soft_tiels, ..etc.) in y_train_new and y_test_new to integer labels: 0,1,2... etc using the LabelEncorder function. After that we implement one hot coding using the to_categorical function. We have 9 types of surfaces and thus the number of cloumns for y_train_new and y_test_new  will be 9. One hot coding is suitable for 1D conv net models to prevent poor performance or unexpected results (predictions halfway between categories).

In [7]:
from keras.utils import to_categorical

labelencoder_y = LabelEncoder()

y=np.concatenate([y_train_new,y_test_new])
y=labelencoder_y.fit_transform(y)
y=to_categorical(y)


y_train_new = y[:-20]
y_test_new = y[-20:]
print('The shape of y_train_new:', y_train_new.shape)
print('The shape of y_test_new:', y_test_new.shape)


The shape of y_train_new: (3790, 9)
The shape of y_test_new: (20, 9)


  y = column_or_1d(y, warn=True)


We think that the orientation of the robot moving across surfaces is not a relevant information to predict the surface type. Instead, the rate of change of the roll, yaw and pitch can serve as better features for prediction. The rate of change can be affected by the way the robot moves which is directly affected by the tyoe of surface the robot is moving on. We first calculate the roll, yaw and pitch from the orientation information using the transformation formulas as shown below. We add these features to X_train_new and X_test_new (separately) and delete the orientation features (W,X,Y,Z). The rate of change in yaw,pitch and roll will be calculated at a later step.

In [10]:
#FOR TRAIN DATASET
roll=np.zeros([X_train_new.shape[0],1])
pitch=np.zeros([X_train_new.shape[0],1])
yaw=np.zeros([X_train_new.shape[0],1])


for i in range(X_train_new.shape[0]):
  roll[i] = np.arctan2(2*(X_train_new[i,1]*X_train_new[i,2] + X_train_new[i,3]*X_train_new[i,0]),1 - 2*(X_train_new[i,2]*X_train_new[i,2] + X_train_new[i,3]*X_train_new[i,3]))
  pitch[i] = np.arcsin(2*(X_train_new[i,1]*X_train_new[i,3] - X_train_new[i,0]*X_train_new[i,2]))
  yaw[i] = np.arctan2(2*(X_train_new[i,1]*X_train_new[i,0] + X_train_new[i,2]*X_train_new[i,3]),1 - 2*(X_train_new[i,3]*X_train_new[i,3] + X_train_new[i,0]*X_train_new[i,0]))

X_train_new=np.delete(X_train_new,[0,1,2,3], 1)
X_train_new=np.concatenate((roll,pitch,yaw,X_train_new),axis=1)

#FOR TEST DATA SET
roll=np.zeros([X_test_new.shape[0],1])
pitch=np.zeros([X_test_new.shape[0],1])
yaw=np.zeros([X_test_new.shape[0],1])


for i in range(X_test_new.shape[0]):
  roll[i] = np.arctan2(2*(X_test_new[i,1]*X_test_new[i,2] + X_test_new[i,3]*X_test_new[i,0]),1 - 2*(X_test_new[i,2]*X_test_new[i,2] + X_test_new[i,3]*X_test_new[i,3]))
  pitch[i] = np.arcsin(2*(X_test_new[i,1]*X_test_new[i,3] - X_test_new[i,0]*X_test_new[i,2]))
  yaw[i] = np.arctan2(2*(X_test_new[i,1]*X_test_new[i,0] + X_test_new[i,2]*X_test_new[i,3]),1 - 2*(X_test_new[i,3]*X_test_new[i,3] + X_test_new[i,0]*X_test_new[i,0]))

X_test_new=np.delete(X_test_new,[0,1,2,3], 1)
X_test_new=np.concatenate((roll,pitch,yaw,X_test_new),axis=1)


  if __name__ == '__main__':


## **Step 2: Building the model**

The data should be reshaped in a 3D matrix to suit the 1D CNN input data shape. The first dimension is for the samples, the second for the timestamp, and the third for the featueres. 

In [12]:
nfeatures=X_train_new.shape[1]
ntimestamp=128
nsamples=3790
X_3D_train=X_train_new[:,0].reshape(nsamples,ntimestamp)
for i in range(nfeatures-1):
  i=i+1
  r=X_train_new[:,i].reshape(nsamples,ntimestamp)
  X_3D_train=np.dstack((X_3D_train,r))
print('The shape of X_train: ', X_3D_train.shape)

nfeatures=X_test_new.shape[1]
ntimestamp=128
nsamples=20
X_3D_test=X_test_new[:,0].reshape(nsamples,ntimestamp)
for i in range(nfeatures-1):
  i=i+1
  r=X_test_new[:,i].reshape(nsamples,ntimestamp)
  X_3D_test=np.dstack((X_3D_test,r))
print('The shape of X_test: ', X_3D_test.shape)


The shape of X_train:  (3790, 128, 8)
The shape of X_test:  (20, 128, 8)


As mentioned earlier, we need to include the rate of change for the yaw, pitch and roll as they are affected by the type of surface the robot is moving on. We calculated the roll, yaw and pitch earlier and added them to X_train_new and X_test_new. We will replace them with their rate of change instead. They are stored in the first three features. 

In [9]:
for i in range(2):
    rate = X_3D_train[:,:,i]
    rate_c = np.copy(rate)
    rate_c[:,1:] = rate_c[:,:-1]
    rate = rate - rate_c
    X_3D_train[:,:,i] = rate
    
for i in range(2):
    rate = X_3D_test[:,:,i]
    rate_c = np.copy(rate)
    rate_c[:,1:] = rate_c[:,:-1]
    rate = rate - rate_c
    X_3D_test[:,:,i] = rate

Another feature we think is important is the fft of the time series data. We believe that each surface will cause the robot to vibrate or oscillate with specific frequencies. Thus, we calculate the fft for all the timeseries features in X_train_new and X_test_new (separately) and add the fft of the features to the corresponding matrix.

In [10]:
from scipy import fftpack
X_3Dfft = np.abs(np.fft.fft(X_3D_train,axis=1))
#freqs = fftpack.fftfreq(len(x)) * f_s
X_3D_train=np.dstack((X_3D_train,X_3Dfft))
print('The size of X_3D: ', X_3D_train.shape)

from scipy import fftpack
X_3Dfft = np.abs(np.fft.fft(X_3D_test,axis=1))
#freqs = fftpack.fftfreq(len(x)) * f_s
X_3D_test=np.dstack((X_3D_test,X_3Dfft))
print('The size of X_3D: ', X_3D_test.shape)

The size of X_3D:  (3790, 128, 18)
The size of X_3D:  (20, 128, 18)


We print the parameters of X_train_new. Number of: (features, timestamps and samples)

In [11]:
nfeatures=X_3D_train.shape[2]
ntimestamp=X_3D_train.shape[1]
nsamples=X_3D_train.shape[0]

print('Number of features in Xtrain: ', nfeatures)
print('Number of timestamps in Xtrain: ', ntimestamp)
print('Number of samples in Xtrain: ', nsamples)


Number of features in Xtrain:  18
Number of timestamps in Xtrain:  128
Number of samples in Xtrain:  3790


We standarize X_train_new and X_test_new separately by calculating the mean and standard deviation for each of the feautures across all samples, and then subtracting the mean and dividing by the standard deviation. Standarization can help the conv net to acheive better results by removing any effects resulted from different recording sessions.

In [12]:
for k in range(nfeatures):
    X_train_m = np.mean(X_3D_train[:,:,k])
    X_train_sd = np.std(X_3D_train[:,:,k])
    X_3D_train[:,:,k] = (X_3D_train[:,:,k]-X_train_m)/X_train_sd

for k in range(nfeatures):
    X_test_m = np.mean(X_3D_test[:,:,k])
    X_test_sd = np.std(X_3D_test[:,:,k])
    X_3D_test[:,:,k] = (X_3D_test[:,:,k]-X_test_m)/X_test_sd

Now, we create the 1D convnet model. I have tried to tune the number of layers, number of filters in each layer, the batch size and the dropout ratio to give the best validation accuracy. Removing the 256 filters layer resulted in a very poor accuracy. Adding more layers resulted in overfitting (high accuracy on training data but low on validation). Lower dropout ratio resulted in over fitting too. I added layers one by one and observed how the training and validation accuracy change over training. 15% of the training data is kept for the validation. The high dropout ration is to avoid overfitting during training. Also, I tried to minimize the complexity of the model by introducing enough layers that can achieve the max possible accuracy with the current features created above.

In [13]:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers import Dropout
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D
predict=np.zeros([y_test_new.shape[0],y_test_new.shape[1],3])
for k in range(1):
  verbose, epochs, batch_size = 1, 600, 512
  model = Sequential()
  model.add(Conv1D(filters=64, kernel_size=8, activation='relu', input_shape=(ntimestamp,nfeatures)))
  model.add(Dropout(0.5))
  model.add(Conv1D(filters=128, kernel_size=8, activation='relu'))
  model.add(Dropout(0.5))
  model.add(Conv1D(filters=256, kernel_size=8, activation='relu'))
  model.add(Dropout(0.5))
  model.add(MaxPooling1D(pool_size=1))
  model.add(Flatten())
  model.add(Dense(128, activation='relu')) 
  model.add(Dropout(0.5))
  model.add(Dense(y_train_new.shape[1], activation='softmax'))
  model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit network
  model.fit(X_3D_train, y_train_new, epochs=epochs, batch_size=batch_size, verbose=verbose)


Epoch 1/600
Epoch 2/600
Epoch 3/600
Epoch 4/600
Epoch 5/600
Epoch 6/600
Epoch 7/600
Epoch 8/600
Epoch 9/600
Epoch 10/600
Epoch 11/600
Epoch 12/600
Epoch 13/600
Epoch 14/600
Epoch 15/600
Epoch 16/600
Epoch 17/600
Epoch 18/600
Epoch 19/600
Epoch 20/600
Epoch 21/600
Epoch 22/600
Epoch 23/600
Epoch 24/600
Epoch 25/600
Epoch 26/600
Epoch 27/600
Epoch 28/600
Epoch 29/600
Epoch 30/600
Epoch 31/600
Epoch 32/600
Epoch 33/600
Epoch 34/600
Epoch 35/600
Epoch 36/600
Epoch 37/600
Epoch 38/600
Epoch 39/600
Epoch 40/600
Epoch 41/600
Epoch 42/600
Epoch 43/600
Epoch 44/600
Epoch 45/600
Epoch 46/600
Epoch 47/600
Epoch 48/600
Epoch 49/600
Epoch 50/600
Epoch 51/600
Epoch 52/600
Epoch 53/600
Epoch 54/600
Epoch 55/600
Epoch 56/600
Epoch 57/600
Epoch 58/600
Epoch 59/600
Epoch 60/600
Epoch 61/600
Epoch 62/600
Epoch 63/600
Epoch 64/600
Epoch 65/600
Epoch 66/600
Epoch 67/600
Epoch 68/600
Epoch 69/600
Epoch 70/600
Epoch 71/600
Epoch 72/600
Epoch 73/600
Epoch 74/600
Epoch 75/600
Epoch 76/600
Epoch 77/600
Epoch 78

Epoch 169/600
Epoch 170/600
Epoch 171/600
Epoch 172/600
Epoch 173/600
Epoch 174/600
Epoch 175/600
Epoch 176/600
Epoch 177/600
Epoch 178/600
Epoch 179/600
Epoch 180/600
Epoch 181/600
Epoch 182/600
Epoch 183/600
Epoch 184/600
Epoch 185/600
Epoch 186/600
Epoch 187/600
Epoch 188/600
Epoch 189/600
Epoch 190/600
Epoch 191/600
Epoch 192/600
Epoch 193/600
Epoch 194/600
Epoch 195/600
Epoch 196/600
Epoch 197/600
Epoch 198/600
Epoch 199/600
Epoch 200/600
Epoch 201/600
Epoch 202/600
Epoch 203/600
Epoch 204/600
Epoch 205/600
Epoch 206/600
Epoch 207/600
Epoch 208/600
Epoch 209/600
Epoch 210/600
Epoch 211/600
Epoch 212/600
Epoch 213/600
Epoch 214/600
Epoch 215/600
Epoch 216/600
Epoch 217/600
Epoch 218/600
Epoch 219/600
Epoch 220/600
Epoch 221/600
Epoch 222/600
Epoch 223/600
Epoch 224/600
Epoch 225/600
Epoch 226/600
Epoch 227/600
Epoch 228/600
Epoch 229/600
Epoch 230/600
Epoch 231/600
Epoch 232/600
Epoch 233/600
Epoch 234/600
Epoch 235/600
Epoch 236/600
Epoch 237/600
Epoch 238/600
Epoch 239/600
Epoch 

Epoch 335/600
Epoch 336/600
Epoch 337/600
Epoch 338/600
Epoch 339/600
Epoch 340/600
Epoch 341/600
Epoch 342/600
Epoch 343/600
Epoch 344/600
Epoch 345/600
Epoch 346/600
Epoch 347/600
Epoch 348/600
Epoch 349/600
Epoch 350/600
Epoch 351/600
Epoch 352/600
Epoch 353/600
Epoch 354/600
Epoch 355/600
Epoch 356/600
Epoch 357/600
Epoch 358/600
Epoch 359/600
Epoch 360/600
Epoch 361/600
Epoch 362/600
Epoch 363/600
Epoch 364/600
Epoch 365/600
Epoch 366/600
Epoch 367/600
Epoch 368/600
Epoch 369/600
Epoch 370/600
Epoch 371/600
Epoch 372/600
Epoch 373/600
Epoch 374/600
Epoch 375/600
Epoch 376/600
Epoch 377/600
Epoch 378/600
Epoch 379/600
Epoch 380/600
Epoch 381/600
Epoch 382/600
Epoch 383/600
Epoch 384/600
Epoch 385/600
Epoch 386/600
Epoch 387/600
Epoch 388/600
Epoch 389/600
Epoch 390/600
Epoch 391/600
Epoch 392/600
Epoch 393/600
Epoch 394/600
Epoch 395/600
Epoch 396/600
Epoch 397/600
Epoch 398/600
Epoch 399/600
Epoch 400/600
Epoch 401/600
Epoch 402/600
Epoch 403/600
Epoch 404/600
Epoch 405/600
Epoch 

Epoch 501/600
Epoch 502/600
Epoch 503/600
Epoch 504/600
Epoch 505/600
Epoch 506/600
Epoch 507/600
Epoch 508/600
Epoch 509/600
Epoch 510/600
Epoch 511/600
Epoch 512/600
Epoch 513/600
Epoch 514/600
Epoch 515/600
Epoch 516/600
Epoch 517/600
Epoch 518/600
Epoch 519/600
Epoch 520/600
Epoch 521/600
Epoch 522/600
Epoch 523/600
Epoch 524/600
Epoch 525/600
Epoch 526/600
Epoch 527/600
Epoch 528/600
Epoch 529/600
Epoch 530/600
Epoch 531/600
Epoch 532/600
Epoch 533/600
Epoch 534/600
Epoch 535/600
Epoch 536/600
Epoch 537/600
Epoch 538/600
Epoch 539/600
Epoch 540/600
Epoch 541/600
Epoch 542/600
Epoch 543/600
Epoch 544/600
Epoch 545/600
Epoch 546/600
Epoch 547/600
Epoch 548/600
Epoch 549/600
Epoch 550/600
Epoch 551/600
Epoch 552/600
Epoch 553/600
Epoch 554/600
Epoch 555/600
Epoch 556/600
Epoch 557/600
Epoch 558/600
Epoch 559/600
Epoch 560/600
Epoch 561/600
Epoch 562/600
Epoch 563/600
Epoch 564/600
Epoch 565/600
Epoch 566/600
Epoch 567/600
Epoch 568/600
Epoch 569/600
Epoch 570/600
Epoch 571/600
Epoch 

We calculate the accuracy by comparing the prediction vs ytest:



In [14]:
# evaluate model
predict=np.zeros([y_test_new.shape[0],y_test_new.shape[1],3])

yhat=model.predict(X_3D_test)
predict[:,:,k]=yhat

#_, accuracy = model.evaluate(testX, testy, batch_size=batch_size, verbose=0)

yhat=np.mean(predict,axis=2)
yhat_final = np.array(list(np.argmax(yhat,axis=1)))
y_testt=np.argmax(y_test_new,axis=1)

accuracy=np.mean(yhat_final==y_testt)
print('The accuracy is:', accuracy)

The accuracy is: 0.8


In [15]:
from keras.models import load_model
model.save('my_model.h5')

In [16]:
from keras.models import load_model
model.save('C:\\data/my_model2.h5')

In [18]:
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_keras_model(model)

tflite_model = converter.convert()


INFO:tensorflow:Assets written to: d:\temp\tmpwcrz5bg4\assets


In [19]:
with open('model.tflite', 'wb') as f:
  f.write(tflite_model)

In [21]:
model = load_model(model.tflite)


AttributeError: 'Sequential' object has no attribute 'tflite'