In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Neural Networks

- MLP: Multilayer Perceptron is a supervised learning algorithm that consists of interconnected nodes (neurons) in layers, where wighted sums are computed multiple times until the y reach the output layer.

- The disctinction from linear models is implementing activation functions to introduce non-linearity to the model. That's what makes it really powerful.

The advantages of Multi-layer Perceptron are:

- Capability to learn non-linear models.

- Capability to learn models in real-time (on-line learning) using partial_fit.

The disadvantages of Multi-layer Perceptron (MLP) include:

- MLP with hidden layers have a non-convex loss function where there exists more than one local minimum. Therefore different random weight initializations can lead to different validation accuracy.

- MLP requires tuning a number of hyperparameters such as the number of hidden neurons, layers, and iterations.

- MLP is sensitive to feature scaling.

Source: https://scikit-learn.org/stable/modules/neural_networks_supervised.html#multi-layer-perceptron

In [2]:
# Breast cancer dataset example:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier

cancer_data = load_breast_cancer()

X_train, X_test, y_train, y_test = train_test_split(
    cancer_data.data, cancer_data.target,
    random_state=66
    )

In [6]:
mlp = MLPClassifier(random_state=42)
mlp.fit(X_train, y_train)

print(f'Accuracy on training set: {mlp.score(X_train, y_train)}')
print(f'Accuracy on test set: {mlp.score(X_test, y_test)}')

Accuracy on training set: 0.9366197183098591
Accuracy on test set: 0.9230769230769231


Because it is sensitive to feature scale:

In [7]:
# compute the mean value per feature on the training set
mean_on_train = X_train.mean(axis=0)
# compute the standard deviation of each feature on the training set
std_on_train = X_train.std(axis=0)
# subtract the mean, and scale by inverse standard deviation
# afterward, mean=0 and std=1
X_train_scaled = (X_train - mean_on_train) / std_on_train
# use THE SAME transformation (using training mean and std) on the test set
X_test_scaled = (X_test - mean_on_train) / std_on_train

mlp.fit(X_train_scaled, y_train)

print(f'Accuracy on training set: {mlp.score(X_train_scaled, y_train)}')
print(f'Accuracy on test set: {mlp.score(X_test_scaled, y_test)}')

Accuracy on training set: 0.9953051643192489
Accuracy on test set: 0.972027972027972




In [8]:
mlp = MLPClassifier(random_state=42, max_iter=1000)
mlp.fit(X_train_scaled, y_train)

print(f'Accuracy on training set: {mlp.score(X_train_scaled, y_train)}')
print(f'Accuracy on test set: {mlp.score(X_test_scaled, y_test)}')

Accuracy on training set: 1.0
Accuracy on test set: 0.972027972027972


It seems kind of overfitted although it currently performs really well. Let's decrease model complexity by regularizing.

In [18]:
mlp = MLPClassifier(random_state=42, max_iter=1000, alpha=0.8)
mlp.fit(X_train_scaled, y_train)

print(f'Accuracy on training set: {mlp.score(X_train_scaled, y_train)}')
print(f'Accuracy on test set: {mlp.score(X_test_scaled, y_test)}')

Accuracy on training set: 0.9906103286384976
Accuracy on test set: 0.993006993006993
