The following example runs over a dataset of aerobic actions recorded from subjects "using the Inertial Measurement Unit (IMU) on an Apple iPhone 4 smartphone. The IMU includes a 3D accelerometer, gyroscope, and magnetometer*. Each sample was taken at 60Hz, and manually trimmed to 500 samples (8.33s) to eliminate starting and stopping movements. iPhone is always clipped to the belt on the right hand side."

Each file contains 500 rows, each row with the following information:
Acc_x,Acc_y,Acc_z,Gyr_x,Gyr_y,Gyr_z,Mag_x,Mag_y,Mag_z

Each sensor has 3 channels.

You may find the dataset and revelant information about the publication 'Corey McCall, Kishore Reddy and Mubarak Shah, Macro-Class Selection for Hierarchical K-NN Classification of Inertial Sensor Data, Second International Conference on Pervasive and Embedded Computing and Communication Systems, PECCS 2012, February 24-26, 2012, Rome, Italy.' at: http://crcv.ucf.edu/data/UCF-iPhone.php

In [None]:
from itertools import chain
import numpy as np
from sklearn.model_selection import StratifiedShuffleSplit
from os import listdir
import time
from model import mosm_model

The following functions transform the dataset info into a more readily usable format, for multi-output regression. Note that the original files only contain the y-axis information, in increasing time order.

In [None]:
#Since the class of the activity is encoded in the filename we use this information to create numerical labels.
def conversion(label):
    if(label == 'bike'):
        return 0
    elif(label == 'climbing'):
        return 1
    elif(label == 'descending'):
        return 2
    elif(label == 'gymbike'):
        return 3
    elif(label == 'jumping'):
        return 4
    elif(label == 'running'):
        return 5
    elif(label == 'standing'):
        return 6
    elif(label == 'treadmill'):
        return 7
    elif(label == 'walking'):
        return 8
    
#For each one of the folders we get all the filenames within them.
def get_file_names(path):
    folder = path + 'S0%d'
    filenames = []
    for x in range(1,10):
        current_folder = folder % x
        onlyfiles = [current_folder + "/" + f for f in listdir(current_folder)]
        filenames.append(sorted(onlyfiles))
    return filenames

#Given the full_path (where the dataset resides in memory) we get all the filenames of all the dataset folders
#and generate (Y,label) lists.
def make_dataset(full_path):
    full_file_names = get_file_names(full_path)
    Y = []
    label_names = []
    label_numbers = []
    for folder_number in range(9):
        for filename in range(len(full_file_names[folder_number])):
            path = full_file_names[folder_number][filename]
            sample_data = np.genfromtxt(path, delimiter=',')
            Y.append(sample_data)
            label = path.split('/')
            label = label[len(label)-1].split('.')[0]
            label = label[0:len(label)-1]
            label_names.append(label)
            label_numbers.append(conversion(label))

    return Y, label_numbers, label_names

Now that we can access the y-axis information for each channel we have to create an x-axis counterpart to feed the model.

In [None]:
full_path = './data/HAR/Smartphone_Dataset/'
measurements, label_number, label_names = make_dataset(full_path)

#Since we're fitting curves, instead of performing a classification, we won't be using
#the class labels. We have to fabricate an X component to our y (the measurements).
#The measurements correspond to 9 channels (3 per sensor: accel, gyro, magnetometer) at
#a rate of 60hz. There's 500 measurements per channel, so the total time spanned is approx
#8.33s
X_list_bike = []
y_list_bike = []
X_list_climb = []
y_list_climb = []
#Note that measurements is a list containing all y-values for all channels for all experiments. So, by 
#accessing measurements[0] we are acquiring all the y-values for all 9 channels of experiment 0 which
#happens to be a bicycle ride.
measurements_for_one_bicycle_ride = measurements[0] #Remember that measurements is a list containing 
#Experiment 5 is an instance of climbing.
measurements_for_one_instance_of_climbing = measurements[5]

#The following loop allows us to pick any number of channels, without modifying channel order.
number_of_channels = 9
for index in range(number_of_channels):
    X_list_bike.append(np.array([x/60 for x in range(500)]))
    X_list_climb.append(np.array([x/60 for x in range(500)]))
    #We also remove the mean from the y-values to better approximate a mean=0 GP.
    y_list_bike.append(measurements_for_one_bicycle_ride[:,index]-measurements_for_one_bicycle_ride[:,index].mean())
    y_list_climb.append(measurements_for_one_instance_of_climbing[:,index]-measurements_for_one_instance_of_climbing[:,index].mean())