# Supervised DNN pipeline with Tensorflow

This notebook trains a supervised model via Tensorflow

Specifically,
* Use `tensorflow.keras` to set up and train the model
* A single-hidden-layer cross-sectional model is trained (multi-classification)

We finally test the model on the test data set, as defined by the authors.

Note: some improvements could be made with respect to the training process.

In [1]:
%cd ..

/project


In [2]:
from src.data import *

In [3]:
from tensorflow import keras

## Prepare the data

Load the feature data and then prepare the train / test objects.

Note that `keras` is different from `sklearn` in that we need to format the integer activity labels as one-hot-encoded vectors.

In [8]:
activities = load_activity_names(); activities
features_df = load_feature_data() \
    .sort_values(['subject_id', 'time_exp']) \
    .reset_index(drop=True)
features_df.shape

(7352, 564)

We only input the data features into the model, so we need to skip subject, time, and activity labels.

In [11]:
X_train = features_df.drop(['subject_id', 'time_exp', 'activity_id'], axis=1)
y_train = keras.utils.to_categorical(features_df.activity_id - 1)

In [16]:
features_test_df = load_feature_data('test') \
    .sort_values(['subject_id', 'time_exp']) \
    .reset_index(drop=True)
X_test = features_test_df.drop(['subject_id', 'time_exp', 'activity_id'], axis=1)
y_test = keras.utils.to_categorical(features_test_df.activity_id - 1)
features_test_df.shape

(2947, 564)

## Model training

### Prepare the model

We create a single hidden layer DNN with 100 nodes. Using the `Sequential` API, we create the model object and `.add` layers to it. The last layer must be congruent with the model task: a 6-category prediction.

In [13]:
model = keras.models.Sequential()
model.add(keras.layers.Dense(100))
model.add(keras.layers.Dense(6, activation='softmax'))

We need to add a loss function, optimizer, and metrics for training.
* The `categorical_crossentropy` is a good choice for multi-category classification
* The `Adam` optimizer generally performs well
* `accuracy` is a common metric to choose; it must get the label exactly right

In [14]:
model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adam(),
              metrics=['accuracy'])

### Train

In this example, we fit the model with a single validation data set. This is actually common in deep learning problems, since the datasets are so large that cross-validation is often not necessary.

Some notes:
* We choose the batch size as 128 since it's on par with the way the data is recorded
* Using 3 epochs is common, but could be increased

In [17]:
model.fit(X_train, y_train,
          batch_size=128,
          epochs=3,
          validation_data=(X_test, y_test))

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7f669b6b5b20>

We can summarize the model for our understanding. Note there is a large number of parameters to fit. We can see this, noting that a 561-length vector's elements are being multiplied 100 times each in order to be summed into the 100 nodes. There is also the 100-element bias vector. That adds up to 56200. Similarly, 100-hidden-nodes are multiplied 6 times each to arrive at the 6 output nodes, which each have a bias parameter. That adds up to 606.

In [18]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 100)               56200     
_________________________________________________________________
dense_1 (Dense)              (None, 6)                 606       
Total params: 56,806
Trainable params: 56,806
Non-trainable params: 0
_________________________________________________________________


The full specification of the model can be obtained as JSON.

In [19]:
model.get_config()

{'name': 'sequential',
 'layers': [{'class_name': 'InputLayer',
   'config': {'batch_input_shape': (None, 561),
    'dtype': 'float64',
    'sparse': False,
    'ragged': False,
    'name': 'dense_input'}},
  {'class_name': 'Dense',
   'config': {'name': 'dense',
    'trainable': True,
    'dtype': 'float32',
    'units': 100,
    'activation': 'linear',
    'use_bias': True,
    'kernel_initializer': {'class_name': 'GlorotUniform',
     'config': {'seed': None}},
    'bias_initializer': {'class_name': 'Zeros', 'config': {}},
    'kernel_regularizer': None,
    'bias_regularizer': None,
    'activity_regularizer': None,
    'kernel_constraint': None,
    'bias_constraint': None}},
  {'class_name': 'Dense',
   'config': {'name': 'dense_1',
    'trainable': True,
    'dtype': 'float32',
    'units': 6,
    'activation': 'softmax',
    'use_bias': True,
    'kernel_initializer': {'class_name': 'GlorotUniform',
     'config': {'seed': None}},
    'bias_initializer': {'class_name': 'Zeros',

The `keras` fitting process already prints out the accuracy, but for illustration we can compute it like we did in the other notebooks, only we need to translate the one-hot-encoded vector back to the integer label.

In [37]:
y_train_hat = model.predict(X_train)
y_train_hat = np.array([y.argmax() + 1 for y in y_train_hat])

In [38]:
from sklearn.metrics import accuracy_score, classification_report

In [39]:
accuracy_score(y_train_hat, features_df.activity_id.values)

0.9578346028291621

### Evaluate on the test data

We evaluate the model on the test data that was defined by the authors.

In [40]:
y_test_hat = model.predict(X_test)
y_test_hat = np.array([y.argmax() + 1 for y in y_test_hat])

In [41]:
accuracy_score(y_test_hat, features_test_df.activity_id.values)

0.9338310145911096

In [42]:
print(classification_report(y_test_hat, features_test_df.activity_id.values))

              precision    recall  f1-score   support

           1       0.99      0.91      0.95       538
           2       0.89      0.95      0.92       442
           3       0.92      0.95      0.93       410
           4       0.86      0.92      0.89       460
           5       0.93      0.88      0.91       563
           6       0.99      1.00      1.00       534

    accuracy                           0.93      2947
   macro avg       0.93      0.94      0.93      2947
weighted avg       0.94      0.93      0.93      2947



When we cross-tabulate the actual labels with the classified ones, we see a pretty diagonal matrix. 

Here we will want to validate if the errors made are acceptable. For example, errors for walking downstairs are either walking or walking upstairs. It may be important to continue tuning parameters such that this activity is never (or less commonly) misclassified as walking upstairs.

In [45]:
pd.crosstab(features_test_df.activity_id.values,
            y_test_hat,
            rownames=['True'],
            colnames=['Classified'])

Classified,1,2,3,4,5,6
True,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,491,0,5,0,0,0
2,34,420,17,0,0,0
3,13,19,388,0,0,0
4,0,3,0,424,63,1
5,0,0,0,36,496,0
6,0,0,0,0,4,533
