<a href="https://colab.research.google.com/github/Deep-Learning-Challenge/challenge-notebooks/blob/master/1.Multilayer%20Perceptrons/2.Guided%20Projects/2.Binary%20Classification%20Of%20Sonar%20Returns.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" /></a>

# Binary Classification Of Sonar Returns

In this project tutorial, you will discover how to effectively use the Keras library in your machine learning project by working through a binary classification project step-by-step. After completing this step-by-step tutorial, you will know:

* How to load training data and make it available to Keras.
* How to design and train a neural network for tabular data.
* How to evaluate the performance of a neural network model in Keras on unseen data.
* How to perform data preparation to improve skills when using neural networks.
* How to tune the topology and configuration of neural networks in Keras.

Let's get started.

## Sonar Object Classification Dataset

The dataset we will use in this tutorial is the Sonar dataset. This is a dataset that describes sonar chirp returns bouncing off different surfaces. The 60 input variables are the strength of the returns at different angles. It is a binary classification problem that requires a model to differentiate rocks from metal cylinders.

It is a well-understood dataset. All of the variables are continuous and generally in the range of 0 to 1. The output variable is a string M for mine and R for rock, which will need to be converted to integers 1 and 0. The dataset contains 208 observations.

A benefit of using this dataset is that it is a standard benchmark problem. This means that we have some idea of the expected skill of a good model. A neural network should achieve around 84% accuracy with cross-validation with an upper bound on accuracy for custom models at approximately 88%. You can learn more about this dataset on the [UCI Machine Learning repository](https://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+(Sonar,+Mines+vs.+Rocks)).

## Baseline Neural Network Model Performance

## Runtime Setup

In [1]:
import sys

dataset_name = "sonar.csv"
if 'google.colab' in sys.modules:
    DATASET = f"https://github.com/Deep-Learning-Challenge/challenge-notebooks/raw/master/datasets/{dataset_name}"
else:
    DATASET = f"../../datasets/{dataset_name}"
    
DATASET

'../../datasets/sonar.csv'

Let's create a baseline model and result for this problem. We will start by importing all of the classes and functions we will need.

In [2]:
import tensorflow as tf

import logging
tf.get_logger().setLevel(logging.ERROR)

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from tensorflow.keras import utils

from sklearn.model_selection import cross_val_score, StratifiedKFold
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.pipeline import Pipeline

import numpy
from pandas import read_csv

2021-10-16 18:27:50.720585: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-10-16 18:27:50.720666: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


Next, we can initialize the random number generator to ensure that we always get the same results when executing this code. This will help if we are debugging.

In [3]:
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

We can then load the dataset using Pandas and split the columns into 60 input variables (X) and one output variable (Y ). We use Pandas to load the data because it easily handles strings (the output variable), whereas attempting to load the data directly using NumPy would be more difficult.

In [4]:
# load dataset
dataframe = read_csv(DATASET, header=None)
dataset = dataframe.values

# split into input and output variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]

The output variable is string values. We must convert them into integer values 0 and 1. We can do this using the `LabelEncoder` class from scikit-learn. This class will model the encoding required using the entire dataset via the `fit()` function, then apply the encoding to create a new output variable using the `transform()` function.

In [5]:
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)

We are now ready to create our neural network model using Keras. We are going to use scikit-learn to evaluate the model using stratified k-fold cross-validation. This is a resampling technique that will provide an estimate of the performance of the model. To use Keras models with scikit-learn, we must use the `KerasClassifier` wrapper. This class takes a function that creates and returns our neural network model. It also takes arguments that it will pass along to the call to `fit()` such as the number of epochs and the batch size. Let's start by defining the function that creates our baseline model. Our model will have a single, fully connected hidden layer with the same number of neurons as input variables. This is a good default starting point when creating neural networks on a new problem.

The weights are initialized using a small Gaussian random number. The Rectifier activation function is used. The output layer contains a single neuron to make predictions. It uses the sigmoid activation function to produce a probability output in the range of 0 to 1 that can quickly and automatically be converted to crisp class values. Finally, we use the logarithmic loss function (`binary_crossentropy`) during training, the preferred loss function for binary classification problems. The model also uses the efficient Adam optimization
algorithm for gradient descent, and accuracy metrics will be collected when the model is trained.

In [6]:
# baseline model
def create_baseline():
    # create model
    model = Sequential()
    model.add(Dense(60, input_dim=60, kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))
    
    # Compile model
    model.compile(loss='binary_crossentropy', 
                  optimizer='adam', 
                  metrics=['accuracy'])
    return model

Now it is time to evaluate this model using stratified cross-validation in the scikit-learn framework. We pass the number of training epochs to the `KerasClassifier`, again using reasonable default values. Verbose output is also turned off, given that the model will be created ten times for the 10-fold cross-validation being performed.

In [7]:
# evaluate model with standardized dataset
estimator = KerasClassifier(build_fn=create_baseline, 
                            epochs=100, 
                            batch_size=5, 
                            verbose=0)

kfold = StratifiedKFold(n_splits=10, 
                        shuffle=True, 
                        random_state=seed)

results = cross_val_score(estimator, X, encoded_Y, cv=kfold, n_jobs=-1)

2021-10-16 18:43:46.729273: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-10-16 18:43:46.729349: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-10-16 18:43:46.803701: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-10-16 18:43:46.803800: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-10-16 18:43:47.043728: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or dire

Running this code produces the following output showing the mean and standard deviation of the model's estimated accuracy on unseen data.

In [8]:
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Baseline: 79.21% (11.43%)


This is an excellent score without doing any hard work.

## Improve Performance With Data Preparation

It is an excellent practice to prepare your data before modeling. Neural network models are especially suitable for having consistent input values, both in scale and distribution. An effective data preparation scheme for tabular data when building neural network models is standardization. This is where the data is rescaled such that the mean value for each attribute is 0, and the standard deviation is 1. This preserves Gaussian and Gaussian-like distributions while normalizing the central tendencies for each attribute.

We can use scikit-learn to perform the standardization of our Sonar dataset using the `StandardScaler` class. Rather than performing the standardization on the entire dataset, it is good practice to train the standardization procedure on the training data within the pass of a
cross-validation run and use the trained standardization instance to prepare the unseen test fold. This makes standardization a step in model preparation in the cross-validation process, and it prevents the algorithm from knowing unseen data during the evaluation, which might be passed from the data preparation scheme like a crisper distribution.

We can achieve this in scikit-learn using a `Pipeline` class. The pipeline is a wrapper that executes one or more models within a pass of the cross-validation procedure. Here, we can define a pipeline with the `StandardScaler` followed by our neural network model.

In [10]:
# evaluate baseline model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=create_baseline, 
                                          epochs=100,
                                          batch_size=5, 
                                          verbose=0)))
pipeline = Pipeline(estimators)

kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold, n_jobs=-1)

2021-10-16 18:50:37.707357: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-10-16 18:50:37.707728: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-10-16 18:50:37.707949: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-10-16 18:50:37.707999: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-10-16 18:50:37.780208: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or dire

Running this example provides the results below. We do see a small but very nice lift in the mean accuracy.

In [11]:
print("Standardized: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Standardized: 84.60% (6.79%)


## Tuning Layers and Neurons in The Model

There are many things to tune on a neural network, such as the weight initialization, activation functions, and optimization procedure. One aspect that may have an outsized effect is the network's structure, called the network topology. This section looks at two experiments on the network structure: making it smaller and making it larger. These are good experiments to perform when tuning a neural network on your problem.

### Evaluate a Smaller Network

We suspect that there is much redundancy in the input variables for this problem. The data describes the same signal from different angles. Perhaps some of those angles are more relevant than others. We can force a type of feature extraction by the network by restricting the representational space in the first hidden layer

This experiment takes our baseline model with 60 neurons in the hidden layer and reduces it by half to 30. This will pressure the network during training to pick out the most important structure in the model's input data. We will also standardize the previous experiment data with data preparation and try to take advantage of the small lift in performance.

In [12]:
# smaller model
def create_smaller():
    # create model
    model = Sequential()
    model.add(Dense(30, input_dim=60, kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))
    
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

In [13]:
# evaluate baseline model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=create_smaller, 
                                          epochs=100,
                                          batch_size=5, 
                                          verbose=0)))
pipeline = Pipeline(estimators)

kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold, n_jobs=-1)

Running this example provides the following result. We can see that we have a very slight boost in the mean estimated accuracy and an important reduction in the standard deviation (average spread) of the model's accuracy scores. This is an excellent result because we are doing slightly better with a network half the size, which takes half the time to train.

In [14]:
print("Smaller: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Smaller: 86.02% (7.17%)


### Evaluate a Larger Network

A neural network topology with more layers offers more opportunity for the network to extract key features and recombine them in useful nonlinear ways. We can evaluate whether adding more layers to the network improves the performance easily by making another small tweak to the function used to create our model. Here, we add one new layer (one line) to the network that introduces another hidden layer with 30 neurons after the first hidden layer. Our network
now has the topology:

`60 inputs -> [60 -> 30] -> 1 output`

The idea here is that the network can model all input variables before being bottlenecked and forced to halve the representational capacity, much like we did in the experiment above with the smaller network. Instead of squeezing the representation of the inputs themselves, we have an additional hidden layer to aid in the process.

In [24]:
# larger model
def create_larger():
    # create model
    model = Sequential()
    model.add(Dense(60, input_dim=60, kernel_initializer='normal', activation='relu'))
    model.add(Dense(30, kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))
    
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

In [25]:
# evaluate baseline model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=create_larger, 
                                          epochs=100,
                                          batch_size=5, 
                                          verbose=0)))
pipeline = Pipeline(estimators)

kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold, n_jobs=-1)

2021-10-16 19:38:34.730699: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-10-16 19:38:34.730776: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-10-16 19:38:34.928544: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-10-16 19:38:34.929116: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-10-16 19:38:34.960231: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or dire

Running this example produces the results below. We can see that we do get a small lift in the model performance.

In [26]:
print("Larger: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Larger: 85.02% (7.90%)


## Improving the Model

With further tuning of aspects like the optimization algorithm and the number of training epochs, it is expected that other improvements are possible. What is the best score that you can achieve on this dataset?

## Summary

In this lesson, you discovered how you could work through a binary classification problem step-by-step with Keras, precisely:

* How to load and prepare data for use in Keras.
* How to create a baseline neural network model.
* How to evaluate a Keras model using scikit-learn and stratified k-fold cross-validation.
* How data preparation schemes can lift the performance of your models.
* How experiments adjusting the network topology can raise model performance.