# Pre-processing MotionSense Dataset and Generate Views

This notebook generates raw motionsense view and another preprocessed. The preprocessing steps includes:
- Transform accelerometer unit measure from G to m/s² (multiplying by 9.81)
- Remove gravity (using butterworth filter)
- Resample to 20Hz

In [1]:
from pathlib import Path
from typing import List
import hashlib
import pandas as pd

from librep.datasets.har.motionsense import (
    RawMotionSense,
    RawMotionSenseIterator,
    MotionSenseDatasetGenerator
)
from librep.utils.dataset import PandasDatasetsIO

%matplotlib inline

## Raw balanced MotionSense

In [2]:
dataset_dir = Path("../data/raw/MotionSense/A_DeviceMotion_data")
output_dir = Path("../data/processed/MotionSense/")
motionsense_dataset = RawMotionSense(dataset_dir, download=False)
motionsense_dataset

MotionSense Dataset at: '../data/raw/MotionSense/A_DeviceMotion_data'

In [3]:
act_names = [motionsense_dataset.activity_names[i] for i in motionsense_dataset.activities]
act_names

['dws', 'ups', 'sit', 'std', 'wlk', 'jog']

In [4]:
iterator = RawMotionSenseIterator(motionsense_dataset, users_to_select=None, activities_to_select=None)
iterator

MotionSense Iterator: users=24, activities=6

In [5]:
motionsense_raw = MotionSenseDatasetGenerator(iterator, time_window=150, window_overlap=0, add_gravity=False, 
                                             add_filter=False, resampler=False, change_acc_measure=False)

motionsense_raw

Dataset generator: time_window=150, overlap=0

In [6]:
train_raw, validation_raw, test_raw = motionsense_raw.create_datasets(
    train_size=0.7,
    validation_size=0.1,
    test_size=0.2,
    ensure_distinct_users_per_dataset=True,
    balance_samples=True,
    seed=0
)

Generating full df over MotionSense View: 360it [00:22, 15.90it/s]


In [7]:
print(hashlib.sha1(pd.util.hash_pandas_object(train_raw).values).hexdigest())
print(hashlib.sha1(pd.util.hash_pandas_object(validation_raw).values).hexdigest())
print(hashlib.sha1(pd.util.hash_pandas_object(test_raw).values).hexdigest())

bc25931a783ad4ddc7d5dfaafb6ef7bcc7caf547
a2995751f63a981b933a67386208d73cd0ae606a
3b3d2dc852908c1f499a2422e605e23a9344cfb2


In [8]:
train_raw

Unnamed: 0,attitude.roll-0,attitude.roll-1,attitude.roll-2,attitude.roll-3,attitude.roll-4,attitude.roll-5,attitude.roll-6,attitude.roll-7,attitude.roll-8,attitude.roll-9,...,userAcceleration.z-145,userAcceleration.z-146,userAcceleration.z-147,userAcceleration.z-148,userAcceleration.z-149,activity code,length,trial_code,index,user
0,1.311800,1.309805,1.294033,1.259262,1.214031,1.174594,1.150417,1.126066,1.071678,0.984775,...,0.198949,-0.241833,-0.228292,-0.409867,-0.227758,0,150,11,300,16
1,0.979769,0.853751,0.724747,0.620533,0.563019,0.546236,0.540058,0.531511,0.509747,0.520985,...,0.061945,0.108357,0.042498,-0.119922,-0.535207,0,150,1,750,7
2,2.457231,2.508876,2.562549,2.610262,2.646260,2.662423,2.663410,2.662757,2.656153,2.639064,...,0.389712,-0.012963,-0.117823,-0.242463,-0.520011,0,150,1,750,11
3,-0.816211,-0.847936,-0.773849,-0.642674,-0.511272,-0.443049,-0.422701,-0.404203,-0.357625,-0.292546,...,1.096083,0.919155,0.980044,0.167161,0.291327,0,150,1,450,12
4,0.093224,0.153045,0.230516,0.329710,0.430513,0.511403,0.596036,0.689030,0.762821,0.789000,...,0.559331,0.268818,0.286077,0.244404,0.149644,0,150,1,150,22
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3313,-2.111703,-2.102741,-2.190346,-2.449247,-2.566130,-2.529341,-2.443290,-2.422711,-2.459332,-2.526483,...,0.087716,-0.110435,-0.828552,-1.046646,-0.055499,5,150,9,1500,1
3314,-1.278602,-1.296929,-1.318944,-1.332741,-1.340593,-1.345565,-1.348499,-1.349254,-1.349257,-1.351391,...,-0.700390,-0.252751,-0.190417,-0.040037,-0.082757,5,150,9,2850,19
3315,0.836112,0.812902,0.810092,0.824654,0.832589,0.845721,0.838886,0.835931,0.894171,0.984809,...,0.050288,0.044914,-1.069165,0.136681,0.499976,5,150,16,450,16
3316,-2.773770,-2.700234,-2.523756,-2.319851,-2.587409,-2.997274,2.076503,0.682116,0.555318,0.601072,...,-0.617489,-0.714354,-0.769634,-0.851130,-1.088768,5,150,9,0,11


In [9]:
output_path = output_dir / "balanced"

description = """# Raw Balanced MotionSense

This view contains train, validation and test subsets in the following proportions:
- Train: 70% of samples
- Validation: 10% of samples
- Test: 20% of samples

After splits, the datasets were balanced in relation to the activity code column, that is, each subset have the same number of activitiy samples.

**NOTE**: Each subset contain samples from distinct users, that is, samples of one user belongs exclusivelly to one of three subsets.
"""
pandas_io = PandasDatasetsIO(output_path)
pandas_io

PandasDatasetIO at '../data/processed/MotionSense/balanced'

In [10]:
pandas_io.save(
    train=train_raw, 
    validation=validation_raw, 
    test=test_raw, 
    description=description
)

## Normalized balanced MotionSense

In [11]:
motionsense_normalized = MotionSenseDatasetGenerator(iterator, time_window=60, window_overlap=0, add_gravity=True, 
                                                     add_filter=True, resampler=True, fs=20, 
                                                     change_acc_measure=True)

motionsense_normalized

Dataset generator: time_window=60, overlap=0

In [12]:
train_normalized, validation_normalized, test_normalized = motionsense_normalized.create_datasets(
    train_size=0.7,
    validation_size=0.1,
    test_size=0.2,
    ensure_distinct_users_per_dataset=True,
    balance_samples=True,
    seed=0
)

Generating full df over MotionSense View: 360it [00:19, 18.32it/s]


In [13]:
print(hashlib.sha1(pd.util.hash_pandas_object(train_normalized).values).hexdigest())
print(hashlib.sha1(pd.util.hash_pandas_object(validation_normalized).values).hexdigest())
print(hashlib.sha1(pd.util.hash_pandas_object(test_normalized).values).hexdigest())

96d787a1d103561831c63a54b98a4f474a014152
2a82bce8c747679ec3e873cc4676e3eab6ba4fc1
e44fd1dca210774114feaf75a00751f1af71dc0b


In [14]:
output_path = output_dir / "balanced_normalized"

description = """# Normalized Balanced MotionSense

This view contains train, validation and test subsets in the following proportions:
- Train: 70% of samples
- Validation: 10% of samples
- Test: 20% of samples

After splits, the datasets were balanced in relation to the activity code column, that is, each subset have the same number of activitiy samples. In this dataset we sum the user acceleration gravity and apply a high pass Butterworth filter with 0.3 cutoff to remove the user aceleration, and after this we resampler the signal from 50Hz to 20Hz. 

**NOTE**: Each subset contain samples from distinct users, that is, samples of one user belongs exclusivelly to one of three subsets.
"""
pandas_io = PandasDatasetsIO(output_path)
pandas_io

PandasDatasetIO at '../data/processed/MotionSense/balanced_normalized'

In [15]:
pandas_io.save(
    train=train_normalized, 
    validation=validation_normalized, 
    test=test_normalized, 
    description=description
)