# Scratch work for TensorFlow skills test (classification)

- Author: Chris Hodapp
- Date: 2017-11-15
- For SharpestMinds/Yazabi

## Links:

- [human-activity-recognition-using-smartphones (GitHub)](https://github.com/pdelboca/human-activity-recognition-using-smartphones)
- [Data source (60 MB ZIP)](https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip)
- [Human Activity Recognition Using Smartphones (UCI)](http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones)
- [MLconf 2015 Seattle: When do I use zero-mean/unit variance normalization vs unit L1/L2 normalization?
](https://www.quora.com/MLconf-2015-Seattle-When-do-I-use-zero-mean-unit-variance-normalization-vs-unit-L1-L2-normalization)

## Boilerplate

In [1]:
import pandas as pd

## Data Loading

In [2]:
# I really don't feel like copying these into the document, so
# instead load them from the included file:
feature_list = pd.read_csv("UCI HAR Dataset/features.txt",
    index_col=0, header=None, sep=" ")
# ...and sanitize them so Pandas can use them for column names:
san_dict = {ord("("): "", ord(")"): "", ord("-"): "_", ord(","): "_"}
features = list(f.translate(san_dict) for f in feature_list.iloc[:,0])
features

['tBodyAcc_mean_X',
 'tBodyAcc_mean_Y',
 'tBodyAcc_mean_Z',
 'tBodyAcc_std_X',
 'tBodyAcc_std_Y',
 'tBodyAcc_std_Z',
 'tBodyAcc_mad_X',
 'tBodyAcc_mad_Y',
 'tBodyAcc_mad_Z',
 'tBodyAcc_max_X',
 'tBodyAcc_max_Y',
 'tBodyAcc_max_Z',
 'tBodyAcc_min_X',
 'tBodyAcc_min_Y',
 'tBodyAcc_min_Z',
 'tBodyAcc_sma',
 'tBodyAcc_energy_X',
 'tBodyAcc_energy_Y',
 'tBodyAcc_energy_Z',
 'tBodyAcc_iqr_X',
 'tBodyAcc_iqr_Y',
 'tBodyAcc_iqr_Z',
 'tBodyAcc_entropy_X',
 'tBodyAcc_entropy_Y',
 'tBodyAcc_entropy_Z',
 'tBodyAcc_arCoeff_X_1',
 'tBodyAcc_arCoeff_X_2',
 'tBodyAcc_arCoeff_X_3',
 'tBodyAcc_arCoeff_X_4',
 'tBodyAcc_arCoeff_Y_1',
 'tBodyAcc_arCoeff_Y_2',
 'tBodyAcc_arCoeff_Y_3',
 'tBodyAcc_arCoeff_Y_4',
 'tBodyAcc_arCoeff_Z_1',
 'tBodyAcc_arCoeff_Z_2',
 'tBodyAcc_arCoeff_Z_3',
 'tBodyAcc_arCoeff_Z_4',
 'tBodyAcc_correlation_X_Y',
 'tBodyAcc_correlation_X_Z',
 'tBodyAcc_correlation_Y_Z',
 'tGravityAcc_mean_X',
 'tGravityAcc_mean_Y',
 'tGravityAcc_mean_Z',
 'tGravityAcc_std_X',
 'tGravityAcc_std_Y',
 't

In [3]:
def read_features(fname):
    return pd.read_csv(fname, delim_whitespace=True, header=None,
                       names=features, index_col=None)

train_X = read_features("UCI HAR Dataset/train/X_train.txt")
test_X = read_features("UCI HAR Dataset/test/X_test.txt")

In [4]:
def read_labels(fname):
    df = pd.read_csv(fname, header=None)
    # We just want the series, not the dataframe:
    return df.iloc[:, 0]

train_y = read_labels("UCI HAR Dataset/train/y_train.txt")
test_y = read_labels("UCI HAR Dataset/test/y_test.txt")

In [5]:
def read_accel(s, train_or_test):
    fname = "UCI HAR Dataset/{0}/Inertial Signals/total_acc_{1}_{0}.txt".\
        format(train_or_test, s)
    df = pd.read_csv(fname, delim_whitespace=True, header=None,
                     index_col=None)
    assert df.shape[1] == 128
    return df

train_X_accX = read_accel("x", "train")
train_X_accY = read_accel("y", "train")
train_X_accZ = read_accel("z", "train")
test_X_accX = read_accel("x", "test")
test_X_accY = read_accel("y", "test")
test_X_accZ = read_accel("z", "test")