This script illustrates the use of Neural Networks for classification tasks and compare their efficacy vs. those of tree-based methods such as Random Forest & Boosting.

The subject data set concerns reading sensor signals from mobile phones to classify 6 types of human activities:

1. **Walking**,
2. **Walking Upstairs**,
3. **Walking Downstairs**,
4. **Sitting**,
5. **Standing**, and
6. **Laying**

# Technical Dependencies

Note that this script requires the following to be installed:

- **`theano`**
- **`bokeh`**: you also need to start the `bokeh` visualization server by opening a new command-line terminal and run:
    - **`bokeh-server --backend=memory`**

# _import modules:_

In [1]:
# enable In-Line MatPlotLib
%matplotlib inline

In [2]:
# Imports from Python v3
from __future__ import division, print_function

# Generic imports
from multiprocessing import cpu_count
from os import system
from pandas import get_dummies
from random import seed
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier

# Keras imports
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.optimizers import Adam, Adadelta, RMSprop, SGD

# Imports from other modules in same folder
from ParseData import parse_human_activity_recog_data

# Imports from Helpy package
system('pip install --upgrade git+git://GitHub.com/ChicagoBoothML/Helpy --no-dependencies')
from ChicagoBoothML_Helpy.KerasTrainingMonitor import NeuralNetworkTrainingMonitor
from ChicagoBoothML_Helpy.Print import printflush

# Seed the randomizer
RANDOM_SEED = 99
seed(RANDOM_SEED)

Using gpu device 0: GeForce GT 750M


# import UCI Human Activity Recognition data set

In [4]:
data = parse_human_activity_recog_data()

data.keys()

Parsing Data Set "UCI Human Activity Recognition Using Smartphones"...
   Parsing Unique Input Features' (X's) Names... done!
   Parsing Train & Test Input Feature Data Sets... done!
   Removing Input Feature Data Rows with Missing (NaN) Values... done!
   Parsing Train & Test Labels (y)... done!


['X_test', 'X_train', 'y_train', 'y_test']

In [5]:
X_train = data['X_train']
y_train = data['y_train']
y_train_binary = get_dummies(y_train)

X_test = data['X_test']
y_test = data['y_test']
y_test_binary = get_dummies(y_test)

# Get some basic counts on the data sets
nb_X_features = len(X_train.columns)
nb_train_samples = len(X_train)
nb_test_samples = len(X_test)
printflush('No. of Input Features X = %s' % '{:,}'.format(nb_X_features))
printflush('No. of Train Samples = %s' % '{:,}'.format(nb_train_samples))
printflush('No. of Test Samples = %s' % '{:,}'.format(nb_test_samples))

No. of Input Features X = 477
No. of Train Samples = 7,352
No. of Test Samples = 2,947


Just to ensure we don't have a missing data problem:

In [6]:
for x_var_name in X_train:
    printflush(
        '%s: %s type, %i missing'
        % (x_var_name, X_train[x_var_name].dtype, X_train[x_var_name].isnull().sum()))

angle(X,gravityMean): float64 type, 0 missing
angle(Y,gravityMean): float64 type, 0 missing
angle(Z,gravityMean): float64 type, 0 missing
angle(tBodyAccJerkMean),gravityMean): float64 type, 0 missing
angle(tBodyAccMean,gravity): float64 type, 0 missing
angle(tBodyGyroJerkMean,gravityMean): float64 type, 0 missing
angle(tBodyGyroMean,gravityMean): float64 type, 0 missing
fBodyAcc-bandsEnergy()-1,16: float64 type, 0 missing
fBodyAcc-bandsEnergy()-1,24: float64 type, 0 missing
fBodyAcc-bandsEnergy()-1,8: float64 type, 0 missing
fBodyAcc-bandsEnergy()-17,24: float64 type, 0 missing
fBodyAcc-bandsEnergy()-17,32: float64 type, 0 missing
fBodyAcc-bandsEnergy()-25,32: float64 type, 0 missing
fBodyAcc-bandsEnergy()-25,48: float64 type, 0 missing
fBodyAcc-bandsEnergy()-33,40: float64 type, 0 missing
fBodyAcc-bandsEnergy()-33,48: float64 type, 0 missing
fBodyAcc-bandsEnergy()-41,48: float64 type, 0 missing
fBodyAcc-bandsEnergy()-49,56: float64 type, 0 missing
fBodyAcc-bandsEnergy()-49,64: float64

# Neural Network Model

In [7]:
# Global constants
NB_HUMAN_ACTIVITES = 6
NB_EXAMPLE_FEATURES = 10


# **********************************************************************************************************************
# *** USER-ADJUSTABLE GLOBAL CONSTANTS *********************************************************************************
# **********************************************************************************************************************
NB_NEURAL_NETWORK_HIDDEN_UNITS = 300   #default=100

SGD_OPTIMIZER_LEARNING_RATE = .01   # default=.01
SGD_OPTIMIZER_LEARNING_RATE_DECAY_RATE = .0   # default=0, meaning no learning rate decay
SGD_OPTIMIZER_MOMENTUM_RATE = .9   # default=.9; 0 means no momentum
SGD_OPTIMIZER_NESTEROV_MOMENTUM_YESNO = True   # default=True; using Nesterov momentum usually speeds up learning

NB_TRAIN_EPOCHS = 100   # default=100
TRAIN_MINI_BATCH_SIZE = 300   # default=300
VALIDATION_DATA_PROPORTION = .2   # default=.2; this is proportion of Training Data held out for validation
# **********************************************************************************************************************

In [8]:
printflush('\nCreating Feed-Forward Neural Network (FFNN)... ', end='')

ffnn = Sequential()
ffnn.add(
    Dense(input_dim=nb_X_features,
          output_dim=NB_NEURAL_NETWORK_HIDDEN_UNITS,
          init='uniform'))
ffnn.add(
    Activation('tanh'))
ffnn.add(
    Dense(input_dim=NB_NEURAL_NETWORK_HIDDEN_UNITS,
          output_dim=NB_HUMAN_ACTIVITES,
          init='uniform'))
ffnn.add(Activation('softmax'))

printflush('done!\n')


# Set FFNN's Loss Function & Optimizer
printflush('\nCompiling FFNN with Objective Loss Function & Optimization Method... ', end='')
stochastic_gradient_descent_optimizer = SGD(
    lr=SGD_OPTIMIZER_LEARNING_RATE,
    decay=SGD_OPTIMIZER_LEARNING_RATE_DECAY_RATE,
    momentum=SGD_OPTIMIZER_MOMENTUM_RATE,
    nesterov=SGD_OPTIMIZER_NESTEROV_MOMENTUM_YESNO)
ffnn.compile(
    loss='categorical_crossentropy',
    optimizer=stochastic_gradient_descent_optimizer)
printflush('done!\n')


# Initiate FFNN Training Monitor to keep track of training progress
ffnn_training_history = NeuralNetworkTrainingMonitor(
    plot_title='Neural Network Learning Curves: Human Activity Recognition')


# Train FFNN
ffnn.fit(
    X=X_train.values,
    y=y_train_binary.values,
    nb_epoch=NB_TRAIN_EPOCHS,
    batch_size=TRAIN_MINI_BATCH_SIZE,
    show_accuracy=True,
    validation_split=VALIDATION_DATA_PROPORTION,
    verbose=0,   # no need to log output to the terminal because we already have the live plot
    callbacks=[ffnn_training_history],
    shuffle=True)


# Get the best trained FFNN
ffnn = ffnn_training_history.best_model


Creating Feed-Forward Neural Network (FFNN)... done!


Compiling FFNN with Objective Loss Function & Optimization Method... done!


Connecting to Bokeh Server for live Learning Curves plotting...

Using saved session configuration for http://localhost:5006/
To override, pass 'load_from_config=False' to Session

Connecting to Bokeh Server for live Learning Curves plotting... done!


FFNN Training Progress
______________________

FFNN Training Finished! (1,999 Batches in total)

Best trained FFNN (with lowest Validation Loss) is from epoch #96
Training Accuracy (approx) = 99.1%, Validation Accuracy = 96.1%



In [9]:
# Evaluate trained FFNN on Test Data
printflush('\nEvaluating Trained FFNN on Test Data...')
test_evaluation = ffnn.evaluate(
    X=X_test.values,
    y=y_test_binary.values,
    show_accuracy=True,
    verbose=0)
printflush('Test Set Loss = %s' % '{:.3g}'.format(test_evaluation[0]))
printflush('Test Set Accuracy = %s%%' % '{:.2f}'.format(100. * test_evaluation[1]))


Evaluating Trained FFNN on Test Data...
Test Set Loss = 0.156
Test Set Accuracy = 94.54%


# Trees-Based Models

Let's now try out a Random Forest and a Boosted model:

In [10]:
B = 600

rf_model = RandomForestClassifier(
    n_estimators=B,
    criterion='entropy',
    max_depth=None,   # expand until all leaves are pure or contain < MIN_SAMPLES_SPLIT samples
    min_samples_split=100,
    min_samples_leaf=50,
    min_weight_fraction_leaf=0.0,
    max_features=None,   # number of features to consider when looking for the best split; None: max_features=n_features
    max_leaf_nodes=None,   # None: unlimited number of leaf nodes
    bootstrap=True,
    oob_score=True,   # estimate Out-of-Bag Cross Entropy
    n_jobs=cpu_count() - 2,   # paralellize over all CPU cores but 2
    class_weight=None,    # our classes are skewed, but but too skewed
    random_state=RANDOM_SEED,
    verbose=0,
    warm_start=False)

rf_model.fit(X=X_train, y=y_train)

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
            max_depth=None, max_features=None, max_leaf_nodes=None,
            min_samples_leaf=50, min_samples_split=100,
            min_weight_fraction_leaf=0.0, n_estimators=600, n_jobs=6,
            oob_score=True, random_state=99, verbose=0, warm_start=False)

In [11]:
# B = 1200

# boost_model = GradientBoostingClassifier(
#     n_estimators=B,
#     loss='deviance',   # a.k.a Cross Entropy in Classification
#     learning_rate=.01,   # shrinkage parameter
#     subsample=1.,
#     min_samples_split=200,
#     min_samples_leaf=100,
#     min_weight_fraction_leaf=0.0,
#     max_depth=10,   # maximum tree depth / number of levels of interaction
#     init=None,
#     random_state=RANDOM_SEED,
#     max_features=None,   # number of features to consider when looking for the best split; None: max_features=n_features
#     verbose=0,
#     max_leaf_nodes=None,   # None: unlimited number of leaf nodes
#     warm_start=False)

# boost_model.fit(X=X_train, y=y_train)

We'll now evaluate the OOS performances of these 2 models on the Test set:

In [12]:
rf_pred = rf_model.predict(X=X_test)

rf_oos_accuracy = (rf_pred == y_test).sum() / nb_test_samples

rf_oos_accuracy

0.88564642008822536

In [13]:
# boost_pred = boost_model.predict(X=X_test)

# boost_oos_accuracy = (boost_pred == y_test).sum() / nb_test_samples

# boost_oos_accuracy

We see that with hundreds of X features, it become rather hard and time-consuming to train good trees-based models. Trees would have to go very deep to capture complex variable interactions.

In such cases, Neural Networks can often (but certainly not always) be an efficient way to figure out the interactions and estimate their influences on the predictive outcomes at the same time.