## Binary Classification

Example of binary classification with an MLP

Uses the UCI sonar dataset

## 1 - Packages

Run the cell below to import the packages we need.

In [1]:
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
import numpy as np
from pandas import read_csv
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import matplotlib.pyplot as plt
import os

%matplotlib inline
plt.rcParams['figure.figsize'] = [9, 6]

Using TensorFlow backend.


## 2 - Loading the dataset

The UCI sonar dataset is a well studied dataset with previous studies producing accuracies in the range of 84% to 88%, which we set as an evaluation metric.

The output label are strings with either "M" or "R". We need to one hot encode these into integers first.

In [5]:
seed = 7
np.random.seed(seed)

absdir = os.path.dirname(os.path.realpath('__file__'))
datapath = "../data/sonar.csv"

dataframe = read_csv(os.path.join(absdir, datapath), header=None)
dataset = dataframe.values

X = dataset[:, 0:60].astype(float)
print(X[0].shape)
y = dataset[:, 60]
print(y[0])

encoder = LabelEncoder()
encoder.fit(y)
encoded_y = encoder.transform(y)
print(encoded_y[0])

(60,)
R
1


## 3 - Building the model

We can start by building a simple NN with the following structure:

```
Input (60) -> Hidden (60 units) -> Output (1 unit)
```

We use the `ReLU` activation function for the hidden units and the `sigmoid` activation for the output unit since its a binary classification problem.

We optimize against the cost function `binary_crossentropy` and use the adam optimizer. We also collect the accuracy/loss metrics.

In [12]:
def base_model():
    model = Sequential()
    model.add(Dense(60, input_dim=60, activation="relu"))
    model.add(Dense(1, activation="sigmoid"))
    model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["acc"])
    return model

## 4 - Automatic KFold validation

We perform 10 fold validation on training the dataset as it is a small dataset and will produce more accurate reports of the model's performance.

In [16]:
estimator = KerasClassifier(build_fn=base_model, epochs=100, batch_size=5, verbose=0)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(estimator, X, encoded_y, cv=kfold)
print(results)
print("Mean Accuracy: {:.2f} Std Dev: {:.2f}".format(results.mean()*100, results.std()*100))

[0.81818183 0.85714287 0.80952382 0.80952382 0.76190477 0.85714286
 0.85714287 0.85       0.70000001 0.95      ]
Mean Accuracy: 82.71 Std Dev: 6.28


## 5 - Data Preparation

Each of the input data is in a different scale. We can try to improve the model's performance through standardization which means each attribute has 0 mean with a standard deviation of 1.

By standardizing the data, we help speed by training by using a larger learning rate since we center the data around its mean which results in a more well-formed shape of the data plot ("circular" or "concentric" plot) to perform gradient descent whereas without it, the data plot in higher dimensional plane will be an ellipsoid which means we need to slow down learning with a lower learning rate to reach convergence

In [17]:
estimators = []
estimators.append(("standardize", StandardScaler()))
estimators.append(("mlp", KerasClassifier(build_fn=base_model, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_y, cv=kfold)
print(results)
print("Mean Accuracy: {:.2f} Std Dev: {:.2f}".format(results.mean()*100, results.std()*100))

[0.81818183 1.         0.76190478 0.90476191 0.85714287 0.85714287
 0.90476191 0.8        0.75000001 0.90000001]
Mean Accuracy: 85.54 Std Dev: 7.22


## 6 - Network architecture

We can experiment with various network architectures to see if it improves model performance.

We can try with a smaller network of the following topology:
```
Input (60) -> Hidden (30 units) -> Output (1 unit) 
```

In [11]:
def smaller_network():
    model = Sequential()
    model.add(Dense(30, input_dim=60, activation="relu"))
    model.add(Dense(1, activation="sigmoid"))
    model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["acc"])
    return model

estimators = []
estimators.append(("standardize", StandardScaler()))
estimators.append(("mlp", KerasClassifier(build_fn=smaller_network, epochs=100, batch_size=5, verbose=0)))

pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_y, cv=kfold)
print(results)
print("Smaller network: Mean accuracy: {:.2f} Std Dev: {:.2f}".format(results.mean()*100, results.std()*100))

[0.86363637 0.95238096 0.80952382 0.80952382 0.85714287 0.76190477
 0.90476191 0.75       0.75000001 0.85000001]
Smaller network: Mean accuracy: 83.09 Std Dev: 6.41


We can try with evaluating a larger network of the following topology:

```
Input (60) -> Hidden (60) -> Hidden (30) -> Output (1 unit)
```



In [12]:
def larger_network():
    model = Sequential()
    model.add(Dense(60, input_dim=60, activation="relu"))
    model.add(Dense(30, activation="relu"))
    model.add(Dense(1, activation="sigmoid"))
    model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["acc"])
    return model

estimators = []
estimators.append(("standardize", StandardScaler()))
estimators.append(("mlp", KerasClassifier(build_fn=larger_network, epochs=100, batch_size=5, verbose=0)))

pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_y, cv=kfold)
print(results)
print("Larger network: Mean accuracy: {:.2f} Std Dev: {:.2f}".format(results.mean()*100, results.std()*100))

[0.81818183 0.95238096 0.71428572 0.90476191 0.85714287 0.80952382
 0.95238096 0.8        0.80000001 0.90000001]
Larger network: Mean accuracy: 85.09 Std Dev: 7.25
