# 1D Convolutional Neural Networks

### About the Data

Data for this exercise is from the [Human Activity Recognition Using Smartphones Data Set](https://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+smartphones) which is furnished by the University of California Irvine Machine Learning Repository.
<br><br>
In order to populate the dataset, subjects wore a smartphone that recorded their linear acceleration (using an accelerometer) and angular accelleration (using a gyroscope). After the data was collected, segments of each data recording were labeled based on the activity of the subject during that period. (This was possible by consulting video recordings.) Possible data labels included: WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, and LAYING) <br><br>

The goal of our classifier will be to predict the label of a piece of data based on accellerometer and gyroscope readings.


### Read in data (using functions provided below)

Before we read in test/training data, we need to load two provided functions. (read_data_test() and read_data_train()) These functions just load in the data separated into two groups.

(Note: because we want to train, validate, and test our machine learning algorithm we would rather it be in 3 groups.)

In [18]:
# Imports

import numpy as np
import os
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import tensorflow as tf
import pandas as pd
import tensorflow as tf
from tensorflow import keras

In [4]:
def read_data_test():
  """ Read data """

  # Fixed params
  n_class = 6
  n_steps = 128

  label_path = 'https://raw.githubusercontent.com/BeaverWorksMedlytics2020/Data_Public/master/NotebookExampleData/Week2/week2_conv1d/y_test.txt'
  labels = pd.read_csv(label_path, header = None)

  list_of_channels = ['body_acc_x', 'body_acc_y', 'body_acc_z', 'body_gyro_x',
  'body_gyro_y', 'body_gyro_z', 'total_acc_x', 'total_acc_y', 'total_acc_z']

  X = np.zeros((len(labels), n_steps, len(list_of_channels)))

  label_path = 'https://raw.githubusercontent.com/BeaverWorksMedlytics2020/Data_Public/master/NotebookExampleData/Week2/week2_conv1d/IntertialSignals/body_acc_x_test.txt'
  dat_ = pd.read_csv(label_path, delim_whitespace = True, header = None)
  X[:,:,0] = dat_.to_numpy()

  label_path = 'https://raw.githubusercontent.com/BeaverWorksMedlytics2020/Data_Public/master/NotebookExampleData/Week2/week2_conv1d/IntertialSignals/body_acc_y_test.txt'
  dat_ = pd.read_csv(label_path, delim_whitespace = True, header = None)
  X[:,:,1] = dat_.to_numpy()

  label_path = 'https://raw.githubusercontent.com/BeaverWorksMedlytics2020/Data_Public/master/NotebookExampleData/Week2/week2_conv1d/IntertialSignals/body_acc_z_test.txt'
  dat_ = pd.read_csv(label_path, delim_whitespace = True, header = None)
  X[:,:,2] = dat_.to_numpy()

  label_path = 'https://raw.githubusercontent.com/BeaverWorksMedlytics2020/Data_Public/master/NotebookExampleData/Week2/week2_conv1d/IntertialSignals/body_gyro_x_test.txt'
  dat_ = pd.read_csv(label_path, delim_whitespace = True, header = None)
  X[:,:,3] = dat_.to_numpy()

  label_path = 'https://raw.githubusercontent.com/BeaverWorksMedlytics2020/Data_Public/master/NotebookExampleData/Week2/week2_conv1d/IntertialSignals/body_gyro_y_test.txt'
  dat_ = pd.read_csv(label_path, delim_whitespace = True, header = None)
  X[:,:,4] = dat_.to_numpy()

  label_path = 'https://raw.githubusercontent.com/BeaverWorksMedlytics2020/Data_Public/master/NotebookExampleData/Week2/week2_conv1d/IntertialSignals/body_gyro_z_test.txt'
  dat_ = pd.read_csv(label_path, delim_whitespace = True, header = None)
  X[:,:,5] = dat_.to_numpy()

  label_path = 'https://raw.githubusercontent.com/BeaverWorksMedlytics2020/Data_Public/master/NotebookExampleData/Week2/week2_conv1d/IntertialSignals/total_acc_x_test.txt'
  dat_ = pd.read_csv(label_path, delim_whitespace = True, header = None)
  X[:,:,6] = dat_.to_numpy()

  label_path = 'https://raw.githubusercontent.com/BeaverWorksMedlytics2020/Data_Public/master/NotebookExampleData/Week2/week2_conv1d/IntertialSignals/total_acc_y_test.txt'
  dat_ = pd.read_csv(label_path, delim_whitespace = True, header = None)
  X[:,:,7] = dat_.to_numpy()

  label_path = 'https://raw.githubusercontent.com/BeaverWorksMedlytics2020/Data_Public/master/NotebookExampleData/Week2/week2_conv1d/IntertialSignals/total_acc_z_test.txt'
  dat_ = pd.read_csv(label_path, delim_whitespace = True, header = None)
  X[:,:,8] = dat_.to_numpy()


  # Return 
  return X, labels[0].values, list_of_channels

In [5]:
def read_data_train():
  """ Read data """

  # Fixed params
  n_class = 6
  n_steps = 128

  label_path ='https://raw.githubusercontent.com/BeaverWorksMedlytics2020/Data_Public/master/NotebookExampleData/Week2/week2_conv1d/y_train.txt'
  labels = pd.read_csv(label_path, header = None)

  list_of_channels = ['body_acc_x', 'body_acc_y', 'body_acc_z', 'body_gyro_x',
  'body_gyro_y', 'body_gyro_z', 'total_acc_x', 'total_acc_y', 'total_acc_z']

  X = np.zeros((len(labels), n_steps, len(list_of_channels)))

  label_path = 'https://raw.githubusercontent.com/BeaverWorksMedlytics2020/Data_Public/master/NotebookExampleData/Week2/week2_conv1d/IntertialSignals/body_acc_x_train.txt'
  dat_ = pd.read_csv(label_path, delim_whitespace = True, header = None)
  X[:,:,0] = dat_.to_numpy()

  label_path = 'https://raw.githubusercontent.com/BeaverWorksMedlytics2020/Data_Public/master/NotebookExampleData/Week2/week2_conv1d/IntertialSignals/body_acc_y_train.txt'
  dat_ = pd.read_csv(label_path, delim_whitespace = True, header = None)
  X[:,:,1] = dat_.to_numpy()

  label_path = 'https://raw.githubusercontent.com/BeaverWorksMedlytics2020/Data_Public/master/NotebookExampleData/Week2/week2_conv1d/IntertialSignals/body_acc_z_train.txt'
  dat_ = pd.read_csv(label_path, delim_whitespace = True, header = None)
  X[:,:,2] = dat_.to_numpy()

  label_path = 'https://raw.githubusercontent.com/BeaverWorksMedlytics2020/Data_Public/master/NotebookExampleData/Week2/week2_conv1d/IntertialSignals/body_gyro_x_train.txt'
  dat_ = pd.read_csv(label_path, delim_whitespace = True, header = None)
  X[:,:,3] = dat_.to_numpy()

  label_path = 'https://raw.githubusercontent.com/BeaverWorksMedlytics2020/Data_Public/master/NotebookExampleData/Week2/week2_conv1d/IntertialSignals/body_gyro_y_train.txt'
  dat_ = pd.read_csv(label_path, delim_whitespace = True, header = None)
  X[:,:,4] = dat_.to_numpy()

  label_path = 'https://raw.githubusercontent.com/BeaverWorksMedlytics2020/Data_Public/master/NotebookExampleData/Week2/week2_conv1d/IntertialSignals/body_gyro_z_train.txt'
  dat_ = pd.read_csv(label_path, delim_whitespace = True, header = None)
  X[:,:,5] = dat_.to_numpy()

  label_path = 'https://raw.githubusercontent.com/BeaverWorksMedlytics2020/Data_Public/master/NotebookExampleData/Week2/week2_conv1d/IntertialSignals/total_acc_x_train.txt'
  dat_ = pd.read_csv(label_path, delim_whitespace = True, header = None)
  X[:,:,6] = dat_.to_numpy()

  label_path = 'https://raw.githubusercontent.com/BeaverWorksMedlytics2020/Data_Public/master/NotebookExampleData/Week2/week2_conv1d/IntertialSignals/total_acc_y_train.txt'
  dat_ = pd.read_csv(label_path, delim_whitespace = True, header = None)
  X[:,:,7] = dat_.to_numpy()

  label_path = 'https://raw.githubusercontent.com/BeaverWorksMedlytics2020/Data_Public/master/NotebookExampleData/Week2/week2_conv1d/IntertialSignals/total_acc_z_train.txt'
  dat_ = pd.read_csv(label_path, delim_whitespace = True, header = None)
  X[:,:,8] = dat_.to_numpy()


  # Return 
  return X, labels[0].values, list_of_channels

In [8]:
X_train, labels_train, list_ch_train = read_data_train() # train
X_test, labels_test, list_ch_test = read_data_test()

In [9]:
#First dimension is number of samples
#Second dimension is time step
#Third dimension refers to which sensor provides the data

print(X_train.shape)

(7352, 128, 9)



#### Explaining the shape of the data 
There are 7,352 data points that are classified as doing one of those activities <br>
There are 128 time steps <br>
There are 9 values for each time step for each data point for the x,y,z values of the body 
acceleration, general acceleration and gyrscope reading. <br>

### Use train_test_split to split provided "training" set into training and validation data

In [10]:
# your code here
X_tr, X_vld, lab_tr, lab_vld = train_test_split(X_train, labels_train, 
                                                stratify = labels_train, random_state = 123)

One hot coding is when each row represents one label and it the nonzero column represents what the label is<br>
 [0,0,0,0,0,1] <br>
           ^ This label is LAYING, which was previously represented with a 5
           <br><br>

### We load some provided one-hot functions
<br> Use these functions to convert all labels into one-hot encoding. This is a common step in machine learning classification problems.

In [23]:
def one_hot(labels, n_class = 6):
	""" 
		Replace integer entries in labels with their onehot equivalents

		parameters:

			 labels  -- a 1D np.array of integer labels

			 n_class -- the total number of classes
	"""
	
	# Make an identity matrix (ones on the diagonal, zeros everywhere else)
	expansion = np.eye(n_class)
 
	# let the ith entry of y be the (label-1)th column of the identity matrix
	y = expansion[labels-1, :]

	return y

In [12]:
def get_batches(X, y, batch_size = 100):
	""" Return a generator for batches """
	n_batches = len(X) // batch_size
	X, y = X[:n_batches*batch_size], y[:n_batches*batch_size]

	# Loop over batches and yield
	for b in range(0, len(X), batch_size):
		yield X[b:b+batch_size], y[b:b+batch_size]

In [24]:
# your code here
y_tr = one_hot(lab_tr)
y_vld = one_hot(lab_vld)
y_test = one_hot(labels_test)

### Define your Keras model <br>
You wlil want to use

See for examples on creating Keras modelshttps://github.com/keras-team/keras/blob/master/examples/mnist_cnn.py

In [25]:
# fill in with your code below
model = keras.Sequential()
model.add( # add convolutional layer )
model.add( # add pooling layer )
model.add( # what layer is needed to output the predict results )
print(model.summary())  # view model

SyntaxError: ignored

In [26]:
# teacher solutions
model = keras.Sequential()
model.add(keras.layers.Conv1D(filters=18, kernel_size=2, strides=1, padding="same",  activation = tf.nn.relu, input_shape=(128, 9)))
model.add(keras.layers.GlobalMaxPooling1D())
model.add(keras.layers.Dense(6, activation=tf.nn.sigmoid))
model.summary()   

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv1d_1 (Conv1D)            (None, 128, 18)           342       
_________________________________________________________________
global_max_pooling1d_1 (Glob (None, 18)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 6)                 114       
Total params: 456
Trainable params: 456
Non-trainable params: 0
_________________________________________________________________


### Compile your model

In [27]:
lr = 0.0008 # choose a learning rate, this will be a good paramter to tune
model.compile( optimizer=tf.compat.v1.train.AdamOptimizer(lr), loss=keras.losses.categorical_crossentropy, metrics = ['accuracy'] )

### Fit your model

In [None]:
history = model.fit( # your training data,
                    # your training labels,
                    epochs= # how many epochs?,
                    batch_size= # batch size,
                    validation_data=(# your validation data, val labels),
                    verbose=1 
                    )

In [28]:
# teacher solution 
history = model.fit(X_tr,
                    y_tr,
                    epochs=100,
                    batch_size=600,
                    validation_data=(X_vld, y_vld),
                    verbose=1)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

### Code below prints the test loss and accuracy <br>
### After you print your accuracy and loss, play around with the parameters to try and improve the model performance.

In [29]:
score = model.evaluate(X_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 0.45721644163131714
Test accuracy: 0.8394978046417236
