**Professor:** Enrique Garcia Ceja
**email:** enrique.gc@tec.mx

### Team members that contributed to this activity (name and id):

- Claudio Gonzalez  
- Jorge Guijarro
- Frado Palacios  
- Gabriel Melendez


# Exercise: Detecting hand gestures from muscle electrical activity with an Ensemble of Neural Networks.

In this exercise you will train your first deep neural network to detect hand gestures from muscle activity. In fact, you will build two deep neural networks and ensemble them.

The data was collected with a MYO armband electromyography (EMG) sensor (see image below). The data was made available by Kirill Yashuk and can be downloaded [here](https://www.kaggle.com/kyr7plus/emg-4). The armband has 8 sensors that measure electrical activity at a sampling rate of 200Hz.
The dataset contains 4 different gestures: <font color=blue>0-rock, 1-scissors, 2-paper, 3-OK.</font>
The data contains 65 columns. The last column is the class label from 0 to 3. The first 64 columns are electrical measurements. 8 consecutive readings for each of the 8 sensors. The objective is to use the first 64 variables to predict the class.

<table><tr><td><img src="https://github.com/enriquegit/ap-img/blob/main/img/myoband.jpg?raw=true" width="200"></td><td><img src="https://github.com/enriquegit/ap-img/blob/main/img/dnn.png?raw=true" width="200"></td></tr></table>

## Instructions

In this exercise you will train *two* neural networks and combine their results to produce the final predictions. During training time, you train the two networks with the same train data but each network should have a different architecture. There is no point of building the same two networks because they will produce the same predictions. At test time, a given instance is fed into both networks. Each network outputs the predicted probabilities for each class (by specifying the last activation to be softmax). One way to combine the predictions of both networks is to multiply the output probabilities and predict the class with the highest one. When more than two models compose the ensemble, majority voting can also be used.

1. First, you need to randomly split the dataset into several subsets to avoid overfitting. Specifically, you will need **4 subsets**:

<table><tr><td><img src="https://github.com/enriquegit/ap-img/blob/main/img/splits.png?raw=true" width="250"></td></tr></table>

  - **train set (60%):** This one is used to train the two models.
  - **val1 set (10%):** This is used to fine tune parameters of your nueral networks.
  - **val2 set (15%):** This one is used to validate the performance when combining the networks.
  - **test set (15%):** This one is used at the end only once to test the generalization performance of your model once you are happy with the performance on the *val2 set*.

2. Build and train a first neural network using the *train set*. Use the *val1 set* to estimate its performance and fine tune its parameters.

3. Build and train a second neural network using the *train set*. Use the *val1 set* to estimate its performance while fine tuning its parameters. This network needs to have a different architecture from the previous one.

4. Once you are happy with both networks, it is now time to combine them. Generate predictions on the *val2 set* with your two networks. The predictions should be the probabilities for each class (not the final class). Combine the probabilities of each network by multiplying them. Obtain the final predicted classes by selecting the class with the highest probability.

5. Evaluate the performance of the combined models and of each of the individual models on the same *val2 set*. The performance of the combined models should be better than the performance of the best individual model. If this is not the case, iterate from step 2.

6. Once you are happy with your results, evaluate the performance of the combined models and each individual one on the *test set* just once.

### Tips:

- You can pass your validation data to the `fit` function with the `validation_data` argument, e.g., `model.fit(..., validation_data = (val1_features, val1_labels))`.
- You can plot accuracy and loss curves to analyze your models behavior. See *demo_sports* notebook for an example.

*NOTE:* In this case it was suggested to train both models with the same train data. With *Bagging* [1], which is an ensemble learning method, instead of using the same train data, new train sets are generated with bootstrapping. To construct a dataset, the method samples $N$ data points from the original train set where $N$ is the number of elements on the original train set. The sampling is made *with replacement*. Thus, the new set will contain duplicates and some elements of the original data set will not be present. The purpose of this is to have different models. If the same data is used to train them, they will be very similar and there is no need to train several of them. It is left as an exercise for the reader to implement bootstraping instead of using the same train set.

[1] Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123-140.

In [1]:
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras import Model
from sklearn.metrics import accuracy_score, recall_score
from sklearn import preprocessing
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split

In [2]:
# Read the data
dataset = pd.concat(
    map(pd.read_csv, ['0.csv', '1.csv', '2.csv', '3.csv']), ignore_index=True)

seed = 123 #set seed for reproducibility
np.random.seed(seed)
dataset = shuffle(dataset) #shuffle rows

# Since there are not too many missing values we will just drop rows that contain missing values.
dataset = dataset.dropna()

# Print the dataset size after removing rows with missing values.
print(dataset.shape)

(11630, 65)


In [9]:
# Print first rows of data.
dataset.head()


Unnamed: 0,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,...,V56,V57,V58,V59,V60,V61,V62,V63,V64,label
550,-12,-4,-18.0,-59,-2,28,5,-22,11.0,4,...,31,19.0,1.0,1,-1,-5,9.0,-42,-32,rock
7169,8,13,8.0,1,-22,-5,-2,10,2.0,-9,...,0,1.0,2.0,3,-4,-7,13.0,-1,0,paper
6458,19,8,-1.0,3,6,-18,-1,-11,-4.0,7,...,-2,-15.0,-7.0,5,5,16,26.0,10,-8,paper
9330,-3,-4,1.0,0,1,47,10,1,4.0,-1,...,15,-10.0,-14.0,-9,-8,-7,-13.0,-8,-16,ok
3100,-3,-8,1.0,0,-39,-33,0,14,17.0,2,...,12,-11.0,1.0,4,0,-8,-16.0,1,4,scissors


In [11]:
# Convert features and class to numpy arrays.
features = dataset.drop('label', axis=1)

labels = dataset[['label']]

features = features.values

labels = labels.values

Neural networks need the labels to be one-hot encoded. Currently, our labels are stored as strings. First we need to convert them into integers.

In [12]:
# Convert labels to integers and store the result in labels_int
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()

labels_int = le.fit_transform(labels.ravel())

# Display first 5 labels as strings.
print(labels[0:5])

# Display first 5 labels as integers.
print(labels_int[0:5]) # display first labels

[['rock']
 ['paper']
 ['paper']
 ['ok']
 ['scissors']]
[2 1 1 0 3]


In [6]:
# One hot encode the labels and store the result in a variable called 'labels'
# You can use tf.keras.utils.to_categorical() function. Its first argument is an array of ints (e.g., labels_int)
# The second argument is the number of classes.

labels = tf.keras.utils.to_categorical(labels_int, 4)

In [7]:
# Print first five one-hot encoded labels

labels[0:5,]

array([[0., 0., 1., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [1., 0., 0., 0.],
       [0., 0., 0., 1.]])

## Step 1: Split the data.

As explained in step 1 from the instructions section, the *train set* (60%) should consist of aprox. 6978 instances. The *val1 set* (10%) should have aprox. 1163 instances. The *val2 set* (15%) and the *test set* (15%) should have aprox. 1744 instances each.

In [13]:
# Currently, the features are stored in the 'features' variable and the labels in 'labels'.
# First split into 2 subsets. train set = 60% (6978 instances) and a temporal set that contains the rest of the data (40%).
# We will use the train_test_split() function.
# The train_size argument specifies the number of instances to be included in the train set.

train_features, tmp_features, train_labels, tmp_labels = train_test_split(features, labels,
                                                                            train_size = 6978, random_state=1234)

# Now train_features and train_labels contain the train set.
# tmp_features and tmp_labels contain the rest of the data.

# Let's print the shape of the train set. It should now contain 6978 instances.
print(train_features.shape)
print(train_labels.shape)

(6978, 64)
(6978, 1)


In [None]:
# Now, tmp_features has 40% of the total original data.
# We need to split tmp_features and tmp_labels into the remaining test, val1 and val2 sets.
# We can start by spliting tmp_features into two subsets.
# The first subset correspons to the 10% of the total (1163 instances) of the val1 data.
# The second subset is the remaining 30% of the total which will be split later into 15% for test and 15% for val2.

# Use the train_test_split() function to split tmp_features and tmp_labels.
# Store 10% of the TOTAL dataset into val1_features, val1_labels and the remaining into tmp2_features and tmp2_labels.

#### YOUR CODE HERE ####


# Print size of val1 set. It should have 1163 instances.
print(val1_labels.shape)

In [None]:
# Now, tmp2_features, tmp2_labels contains 30% of the TOTAL dataset.
# Split tmp2_features, tmp2_labels into two equally sized datasets.
# test_features, test_labels and val2_features, val2_labels.

#### YOUR CODE HERE ####



# Print test and val2 sizes. Their sizes should be aprox. 1744.
print(test_labels.shape)
print(val2_labels.shape)

Noramlize the features between 0 and 1. Remember that normalization parameters are learned just from the training data.

In [None]:
# Normalize features between 0 and 1.
# Remember that normalization parameters are learned just from the training data.

# Learn parameters from train set.
normalizer = preprocessing.MinMaxScaler().fit(train_features)

# Use the learned normalizer to normalize train_features and store the result in train_normalized.
train_normalized = normalizer.transform(train_features)

# Use the learned normalizer to normalize test_features and store the result in test_normalized.
#### YOUR CODE HERE ####


# Use the learned normalizer to normalize val1_features and store the result in val1_normalized.
#### YOUR CODE HERE ####


# Use the learned normalizer to normalize val2_features set and store the result in val2_normalized.
#### YOUR CODE HERE ####



# Step 2: Build and evaluate model 1.

Now it is time to build model 1 (the first neural network). Use the keras Sequential API to build a deep network (at least 2 hidden layers). Store the model in a variable `model1`.

In [None]:
# Define your neural network's architecture (layers, neurons, etc.).
# The first Dense layer should include the input_shape, i.e., the number of input variables.
# The following layers do not need an input_shape argument.
# Since this is a classificaton problem, the last layer should have softmax as activation function.
# keras.Sequential() expects a list of layers, e.g., keras.layers.Dense() which is a fully connected layer.
# The number of neurons of a dense layer can be specified with the 'units' argument.
# The activation function is specified with the 'activation' argument.

#### COMPLETE THE CODE ####
model1 = keras.Sequential([

])


# Print the model summary.
print(model1.summary())

### Define optimizer and compile the model

In [None]:
# We can use for example, Stochastic Gradient Descent as the optimizer.

# Set a learning rate for the opimizer.
lr =  #### COMPLETE THE CODE ####

# Instantiate the optimzier with the specified learning rate.
optimizer = tf.keras.optimizers.SGD(lr)

# Compile the model.
# Since this is a classification problem we need "categorical_crossentropy" as the loss function.
model1.compile(optimizer = optimizer, loss = "categorical_crossentropy", metrics = ['accuracy'])

In [None]:
# Train the model with the normalized train set using the fit() function.
# Use the normalized val1 set as validation data. You can do so with the 'validation_data' argument of the fit() function
# that accepts a tuple as argument: model1.fit(..., validation_data = (val1_normalized, val1_labels))
# Remember to set the number of epochs and the batch_size.

# NOTE: After fitting a model its state is saved. If you change hyperparameters (learning rate, epohcs, etc.)
# you should reinstantiate the model for example by re-reunning: model1 = keras.Sequential([....]]
# compile the model again and fit.

#### COMPLETE THE CODE ####
history = model1.fit(

)

In [None]:
# Plot the accuracy and loss curves.
# Based on the validation accuracy and curves, fine tune your model,
# for example by changing the number of epochs, learning rate, network architecture, etc.

import matplotlib.pyplot as plt
%matplotlib inline
# summarize history for accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

**Evaluate the performance on *val1 set*.**

In [None]:
# The evaluate() function can be used to evaluate the model on a specified dataset.
# It returns the loss and the metrics specified in fit(), in this case 'accuracy'.
# The following code prints the final loss and accuracy and should be the same as printed in the last epoch.
model1.evaluate(val1_normalized, val1_labels) #[loss, accuracy]

# Step 3: build and evaluate model 2.

Build another network. This one should have a different architecture from `model1`.
Store the model in a variable `model2`.

In [None]:
#### YOUR CODE HERE ####
model2 = keras.Sequential([

])

# Print model summary.
print(model2.summary())

In [None]:
# Instantiate an optimizer and specify the learning rate.

#### YOUR CODE HERE ####



# Compile the model with the instantiated optimizer.
#### YOUR CODE HERE ####




In [None]:
# Train the model with the normalized train set. Remember to set the number of epochs,
# the validation_data and batch_size.

#### COMPLETE THE CODE ####
history2 = model2.fit(

)

In [None]:
# Plot the accuracy and loss curves.
# Based on the curves, fine tune your model, for example by changing the number of epochs, learning rate, network architecture, etc.

import matplotlib.pyplot as plt
%matplotlib inline
# summarize history for accuracy
plt.plot(history2.history['accuracy'])
plt.plot(history2.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history2.history['loss'])
plt.plot(history2.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

**Evaluate the performance on *val1 set*.**

In [None]:
# Evaluate model2 using the evaluate() function.

#### YOUR CODE HERE ####



# Step 4 and 5: Combine models and evaluate the performance on *val2 set*

In [None]:
# To combine model1 and model2 we can obtain the actual predictions from each model using the predict() function.
# Its first argument is the features as numpy array.
# This function returns a vector of probabilities for each row.
# Each probability represents the likelihood of the corresponding class.

# Make predictions on the val2 set using model 1.
predictions1 = model1.predict(val2_normalized)

# Make predictions on the val2 set using model 2.
predictions2 = model2.predict(val2_normalized)

# Combine the predictions by multiplying the probabilities.
predictionsCombined = predictions1 * predictions2

# Print the combined predictions of the first 5 instances.
predictionsCombined[0:5,]

In [None]:
# Get the column index with max probability to get the predictions in integer format.
predictions_int = np.argmax(predictionsCombined, axis=1)

# Since the ground truth labels are also one-hot encoded we need to
# get the index of the maximum value to obtain the predictions in integer format.
true_values_int = np.argmax(val2_labels, axis=1)

# Convert back to strings
predictions_str = le.inverse_transform(predictions_int)

true_values_str = le.inverse_transform(true_values_int)

# Accuracy
print(accuracy_score(true_values_str, predictions_str))

# Recall
print(recall_score(true_values_str, predictions_str, average='macro'))

### Evaluate performance of model 1 with *val2 set*

In [None]:
# Compute the accuracy and recall just for model1.

#### YOUR CODE HERE ####



### Evaluate performance of model 2 with *val2 set*

In [None]:
# Compute the accuracy and recall just for model2.

#### YOUR CODE HERE ####



Was the performance of the combined models better than the other two models?
If yes, proceed to evaluate your models with the *test set*. If not, iterate from step 2-5.

# Step 6: Evaluate on test set

### Combined models

In [None]:
# Evaluate the accuracy and recall of the combined models on the test set.

#### YOUR CODE HERE ####



### Model 1

In [None]:
# Evaluate the accuracy and recall of model1 on the test set.

#### YOUR CODE HERE ####



### Model 2

In [None]:
# Evaluate the accuracy and recall of model1 on the test set.

#### YOUR CODE HERE ####



**This is the end of the exercise!**