Skip to content

Chrissi2802/WISDM---Biometric-time-series-data-classification

Repository files navigation

WISDM - Biometric time series data classification

This repository contains several models for a classification of the reduced WISDM dataset.
Neural networks are used for feature extraction and classification.
These were implemented in Python using the PyTorch library. The latest neural networks have been implemented in TensorFlow. All files or folders with a "_tf" or "_TF" in the name are for TensorFlow.
This repository is based on a Kaggle Competition. The website for this Competition can be found here.

Data

The task is a classification of biometric time series data. The dataset is the "WISDM Smartphone and Smartwatch Activity and Biometrics Dataset", WISDM stands for Wireless Sensor Data Mining. The actual dataset was created by the Department of Computer and Information Science at Fordham University in New York. The researchers collected data from the accelerometer and gyroscope sensors of a smartphone and smartwatch as 51 subjects performed 18 diverse activities of daily living. Each activity was performed for 3 minutes, so that each subject contributed 54 minutes of data.
A detailed description of the dataset is also included in this repo. However, if you would like to view the original data, you can find the complete dataset here.

As already mentioned, a reduced dataset is used, which contains the following six activities:
A - walking
B - jogging
C - climbing stairs
D - sitting
E - standing
M - kicking soccer ball

Models

Moreover, not only eleven different neural networks are available, but training procedures and data pre-processing scripts are also included.

Models (neural networks):

  • PyTorch
    • Linear / Multilayer Perceptron (MLP) model
    • Convolutional Neural Network (CNN) 1D model
    • Gated Recurrent Units (GRU), this is a Recurrent Neural Network (RNN) model
    • CNN 2D model
    • Long Short-Term Memory (LSTM) model
  • TensorFlow
    • MLP model
    • CNN 2D model
    • GRU model
    • LSTM model
    • Big GRU model
    • Convolutional LSTM model

Overview of the folder structure and files

Files Description
Datasets/ contains the data and the submissions
Models/ contains the trained models
Plots/ contains all plots from the training and testing
.gitignore contains files and folders that are not tracked via git
dataset_tf.py provides the dataset and prepares the data for TensorFlow
datasets.py provides the dataset and prepares the data for PyTorch
helpers.py provides auxiliary classes and functions for neural networks
Job.sh provides a script to carry out the training on a computer cluster
models_tf.py provides the models for TensorFlow
models.py provides the models for PyTorch
train_tf.py provides functions for training and testing for TensorFlow
train.py provides functions for training and testing for PyTorch
WISDM-dataset-description.pdf further description of the dataset

Achieved results

The scores were calculated by Kaggle. The metric is the categorization accuracy (ACC).

Models Public leaderboard score Training time (hh:mm:ss) Parameters of the model
MLP_NET_V1 0.45856 00:05:22 902
CNN_NET_V1 0.51933 00:21:17 141,766
GRU_NET 0.00000 PyTorch GRU does not work 0
CNN_NET_V2 0.85635 00:01:28 134,134
LSTM_NET 0.83425 00:16:16 529,926
MLP_NET_TF 0.90055 00:08:20 112,262
CNN_NET_TF 0.87845 00:06:18 1,641,030
GRU_NET_TF 0.89502 00:18:55 4,175,238
LSTM_NET_TF 0.88950 00:19:04 4,470,150
GRU_NET_BIG_TF 0.95027 00:22:47 10,621,830
CONV_LSTM_NET_TF 0.93370 00:35:53 14,721,926

The two models GRU_NET_BIG_TF and CONV_LSTM_NET_TF were trained with an extended data set. For this purpose, three new features were added by means of feature engineering. The features are the Fast Fourier Transformation (FFT) of the individual signals.
In addition, these two models were trained with data created with a sliding window of size 200. All other models were trained with size 100.

The best model is therefore the GRU_NET_BIG_TF with an accuracy of 95.027%.