# Load & preprocess MyoGym data

In [1]:
from scipy.io import loadmat
import pandas as pd
import numpy as np

This notebook contains functions to load and preprocess the MyoGym dataset.

The MyoGym dataset was first introduced in [1]. The dataset was collected using a Myo Armband worn on the forearm, which consists of 8 electromyogram (EMG) sensors and a 9-axis IMU containing a 3-axis gyroscope, 3-axis accelerometer and a 3-axis magnetometer. In our work, we discard the EMG sensor and magnetometer data and make use of only the gyroscope and accelerometer data. This results in 6 streams of data which collectively fully define the movement of the arm positionally and rotationally along the x, y and z axes (see below diagram of a MyoGym armband).

There are 2 data labels. The 1st label column indexes the activity and the 2nd indexes the trainer. The labels are provided at the timestamp level, so that a workout has a sequence of activity labels. 

## Load data

In [13]:
# Load the MyoGym data. 
datamat = loadmat("data/MyoGym.mat")

raw_data = pd.DataFrame(datamat["raw_data"])
label_data = pd.DataFrame(datamat["raw_data_labels"])

# Extract the accelerometer and gyroscope timestamps and features
raw_data.rename(columns={9: "time_acc", 
                         10: "acc_x",
                         11: "acc_y",
                         12: "acc_z", 
                         13: "time_gyr", 
                         14: "gyr_x",
                         15: "gyr_y",
                         16: "gyr_z"
                        }, inplace=True)

raw_data = raw_data[["time_acc", "acc_x", "acc_y", "acc_z", "time_gyr", "gyr_x", "gyr_y", "gyr_z"]]

# Rename the 2 columns in the labels
# The 1st label column is the activity and the 2nd label column is the trainer performing the exercise
label_data.rename(columns={0:"activity", 
                           1: "trainer"
                          }, inplace=True)

# Concatenate the raw_data and data labels
data = pd.concat([raw_data, label_data],  axis=1)

## Synthesise *Time* column & remove duplicates

There are duplicate readings (with identical timestamps) arising from the buffering process which are removed. Sort by the trainer, then by the timestamp. 

In [16]:
# Sort data and remove duplicates
data = data.sort_values(by=['trainer', 'time_acc'], ascending=True)
data = data.drop_duplicates()

The data are provided in continuous streams, an identifier for which is given in the *trainer* column. There is a *time_acc* column and *time_gyr* column to record the stream arrival times of the accelerometer and gyroscope sensor data respectively. The mechanism behind this is unclear, but the sensor data is buffered and streamed in packets, so the arrival times are not always equidistant. Both instruments record at 50 Hz. We create a synthetic time column for later use.

In [17]:
# Define a synthetic timestamp identifier and delete the sensor arrival times
    
fq = 50

data["time"] = data.groupby("trainer").cumcount()
data["time"] /= fq

data.drop(columns = ["time_acc", "time_gyr"], axis = 0, inplace=True)
#data.set_index(["trainer", "time"], inplace=True)

In [18]:
data

Unnamed: 0_level_0,Unnamed: 1_level_0,acc_x,acc_y,acc_z,gyr_x,gyr_y,gyr_z,activity
trainer,time,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,0.00,-0.791504,0.595703,0.054199,38.5625,-15.9375,12.6250,99
1,0.02,-0.781250,0.603516,0.031250,44.8125,-19.5625,19.2500,99
1,0.04,-0.764648,0.622559,0.020020,48.5625,-21.3125,23.6875,99
1,0.06,-0.751465,0.649414,0.011719,50.6875,-22.3750,24.2500,99
1,0.08,-0.765137,0.670410,-0.008301,52.0625,-23.7500,21.0625,99
...,...,...,...,...,...,...,...,...
10,2087.48,-0.645508,0.797363,-0.219238,10.1250,8.8125,-1.2500,99
10,2087.50,-0.628906,0.681641,-0.226563,-10.9375,3.0625,1.7500,99
10,2087.52,-0.678223,0.708008,-0.219238,-10.3750,-0.7500,2.0000,99
10,2087.54,-0.611328,0.641602,-0.176270,-17.2500,-4.7500,5.6875,99


## References

[1] Koskimäki, Heli, Pekka Siirtola and Juha Röning. “MyoGym: introducing an open gym data set for activity recognition collected using myo armband.” Proceedings of the 2017 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2017 ACM International Symposium on Wearable Computers (2017): n. pag.