# Neural Network Classification Problem Fundamentals

Classificaiton problems are used to "classify" inputs into a class as an output.

**Binary Classification**
* One example of classification problems is binary classification. Whether or not inputs map to be classified as something or not. For instance, is an email considered spam or not spam.

**Multi-class Classificaiton**

**Multi-label Classificaiton**


## Topics
1. Architecture of a neural network classfification model.
2. Input shapes and output shapes of a classification model (features and labels).
3. Creating custom data to view and fit.
4. Steps in modeling.
5. Differeneet classification evaluation methods.
6. Saving and loading models.

In [None]:
import itertools
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.datasets import make_circles
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
import tensorflow as tf

## Helper Functions

In [None]:
def plot_decision_boundary(model, X, y):
    """ Plots the decision boundary created by the model predicting X.
    """
    # Grab the x and y limits of graph for the X values (with margin of 0.1)
    x_min, x_max = X[:, 0].min() - 0.1, X[:, 0].max() + 0.1
    y_min, y_max = X[:, 1].min() - 0.1, X[:, 1].max() + 0.1
    
    # Creating prediction data
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
                         np.linspace(y_min, y_max, 100))
    
    # Create X value (we're going to make predictions on these)
    x_in = np.c_[xx.ravel(), yy.ravel()]  # stack 2D arrays together
    
    # Make predictions
    y_pred = model.predict(x_in)
    
    # Check for multi-class
    if len(y_pred[0]) > 1:
        # Multiclass classification
        y_pred = np.argmax(y_pred, axis=1).reshape(xx.shape)
    else:
        # Binary classification
        y_pred = np.round(y_pred).reshape(xx.shape)
   
    # Plot the decision boundary
    plt.contourf(xx, yy, y_pred, cmap=plt.cm.RdYlBu, alpha=0.7)
    plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.RdYlBu)
    plt.xlim(xx.min(), xx.max())
    plt.ylim(yy.min(), yy.max())

In [None]:
def plot_confusion_matrix(y_true, y_pred):
    """ Creates the confusion matrix, and plots it """
    figsize = (10, 10)

    # CReate the confusion matrix
    cm = confusion_matrix(y_true, tf.round(y_pred))

    cm_norm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]  # Normalize our confusion matrix
    n_classes = cm.shape[0]

    # Prettifying it
    fig, ax = plt.subplots(figsize=figsize)
    # Create a matrix plot
    cax = ax.matshow(cm, cmap=plt.cm.Blues)
    fig.colorbar(cax)

    # Create classes
    classes = False
    if classes:
        labels = classes
    else:
        labels = np.arange(cm.shape[0])

    # Label the axes
    ax.set(title='Confusion Matrix',
          xlabel='Predicted Label',
          ylabel='True Label',
          xticks=np.arange(n_classes),
          yticks=np.arange(n_classes),
          xticklabels=labels,
          yticklabels=labels)

    # Make Labels bigger
    ax.yaxis.label.set_size(20)
    ax.xaxis.label.set_size(20)
    ax.title.set_size(20)

    # Set the threshold
    threshold = (cm.max() + cm.min()) / 2

    # Plot the text on each cell
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, f'{cm[i, j]} ({cm[i, j]:.1f}%)',
                 horizontalalignment='center',
                 color='white' if cm[i,j] > threshold else 'black',
                size=15)

## Example Data to Fit & view

In [None]:
# Make 1000 examples
n_samples = 1000

# Making a dataset that sets 2 inputs (x and y position of a dot on a graph),
# and 1 output (which circle the data lays on)
X, y = make_circles(n_samples, noise=0.03, random_state=42)

In [None]:
# Check out the features
X, y[:10]  # This is a binary classification problem

### Visualizing the Data

In [None]:
circles = pd.DataFrame({'X0': X[:, 0], 'X1': X[:, 1], 'label': y})
circles

In [None]:
# Plotting the data
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.RdYlBu)

In [None]:
# Inspecting the input and output shapes
X.shape, y.shape

In [None]:
# Number of samples
len(X), len(y)

In [None]:
# Look at an example
X[0], y[0]

## Modeling, Compiling, and Fitting the Data

In [None]:
X.dtype, y.dtype

In [None]:
# 1. Create Model
model_1 = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(2,), name='InputLayer'),
    tf.keras.layers.Dense(100, name='HiddenLayer-1'),
    tf.keras.layers.Dense(1, name='OutputLayer')
])

# 2. Compile Model
model_1.compile(loss=tf.keras.losses.BinaryCrossentropy(),
                optimizer=tf.keras.optimizers.legacy.SGD(),
                metrics=['accuracy'])

# 3. Fit Model
model_1.fit(X, y, epochs=100, verbose=0)

In [None]:
# 4. Evaluating Model
model_1.evaluate(X, y)

### Improving Model

The model above is hitting an accuracy of 50% which is just terrible!

**NOTE**: the model above doesn't split the dataset into training and test.

#### Recreating Model to Visualize the Predictions & Optimize the Model

In [None]:
# Improving our model
# 1. Create Model
model_2 = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(2,), name='Input'),
    tf.keras.layers.Dense(100, name='Hidden-1'),
    tf.keras.layers.Dense(10, name='Hidden-2'),
    tf.keras.layers.Dense(1, name='Output')
])

# 2. Compiling Model
model_2.compile(loss=tf.keras.losses.BinaryCrossentropy(),
                optimizer=tf.keras.optimizers.legacy.Adam(),
                metrics=['accuracy'])

# 3. Fitting Model
model_2.fit(X, y, epochs=100, verbose=0)

In [None]:
# 4. Evaluate Model
model_2.evaluate(X, y)

In [None]:
# Viewing the decision boundary of model_2
plot_decision_boundary(model_2, X, y)

#### Analyzing plot

Well there's your problem. Our model is trying to treat the binary output as a linear function. This is suggesting that we forgot to introduce non-linearity into our model!

**Non-Linearity**
Non-linearity in Neural Networks is introduced through the activation functions for each hidden layer. Without defining the activation functions, there is no non-linearity introduced into the model, so we cannot generate outputs of anything other than linear plots.

### Improving Model by Introducing Non-Linearity (Activation Functions)

In [None]:
# 1. Create the model (using a classification activation function (sigmoid))
model_3 = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape=(2,), name='Input'),
    tf.keras.layers.Dense(5, activation=tf.keras.activations.relu, name='Hidden-1'),  # Relu introduces non-linearity
    tf.keras.layers.Dense(5, activation=tf.keras.activations.relu, name='Hidden-2'),  # Relu introduces non-linearity
    tf.keras.layers.Dense(1, activation=tf.keras.activations.sigmoid, name='Output')  # Sigmoid outputs a 1 or 0, so good for binary classification
])

# 2. Compiling the model
model_3.compile(loss=tf.keras.losses.BinaryCrossentropy(),
                optimizer=tf.keras.optimizers.legacy.Adam(),
                metrics=['accuracy'])

# 3. Fitting the model
history_3 = model_3.fit(X, y, epochs=100, verbose=0)

In [None]:
# 4. Evaluating the model
model_3.evaluate(X, y)

In [None]:
# Viewing the decision boundary of model_3
plot_decision_boundary(model_3, X, y)

### Evaluating & Improving Our Model

First step is to do this right and setup a training and test dataset.

In [None]:
# Splitting the data using Sk learn
len(X)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8)
X_train[:5], y_train[:5], len(X_train), len(X_test)

In [None]:
# 1. Create Model
model_4 = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape=(2,), name='Input'),
    tf.keras.layers.Dense(5, activation=tf.keras.activations.relu, name='Hidden-1'),
    tf.keras.layers.Dense(5, activation=tf.keras.activations.relu, name='Hidden-2'),
    tf.keras.layers.Dense(1, activation=tf.keras.activations.sigmoid, name='Output')
])

# 2. Compile Model
model_4.compile(loss=tf.keras.losses.BinaryCrossentropy(),
                optimizer=tf.keras.optimizers.legacy.Adam(learning_rate=0.01),
                metrics=['accuracy'])

# 3. Fit Model
history_4 = model_4.fit(X_train, y_train, epochs=100)

In [None]:
predictions = model_4.evaluate(X_test, y_test)

In [None]:
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.title('Train')
plot_decision_boundary(model_4, X_train, y_train)

plt.subplot(1, 2, 2)
plt.title('Test')
plot_decision_boundary(model_4, X_test, y_test)

### Visualizing the Training of the Model

In [None]:
# What is the history data during the fit?
history_4.history

In [None]:
# Cleanup
history_4_df = pd.DataFrame(history_4.history)

In [None]:
# Plotting the data
history_4_df.plot()
plt.title('Model 4 Loss Curve')

## Finding Best Learning Rate
#### Using Loss Curves (See Above) to Determine Best Learning Rate

To find the ideal learning rate (the learnig rate where the loss decreeases the most during training) we're going to use the following steps
1. A learning rate callback: An extra piece of functionality you can add to your *while* its training.
2. Another model (we could use the same one as above, but we're practicing building models here)
3. A modified loss curves plot.

In [None]:
# Creating a new model to find best learning rate using callback

# 1. Create Model
model_5 = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(2,)),
    tf.keras.layers.Dense(5, activation=tf.keras.activations.relu),
    tf.keras.layers.Dense(5, activation=tf.keras.activations.relu),
    tf.keras.layers.Dense(1, activation=tf.keras.activations.sigmoid)
])

# 2. Compile Model
model_5.compile(loss=tf.keras.losses.BinaryCrossentropy(),
                optimizer=tf.keras.optimizers.legacy.Adam(),
                metrics=['accuracy'])

# 2.1. Creating learning rate callback
learning_rate_scheduler = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-4 * 10**(epoch/20))

# 3. Fit the model with the learning rate scheduler
history_5 = model_5.fit(X_train, y_train, epochs=100, callbacks=[learning_rate_scheduler])

In [None]:
history_5_df = pd.DataFrame(history_5.history)
history_5_df.plot()
plt.title('Model 5 (Learning Rate Scheduler)')

In [None]:
# Plot learning rate versus the loss
lrs = 1e-4 * (10 ** (tf.range(100) / 20))
plt.figure(figsize=(10,7))
plt.semilogx(lrs, history_5.history['loss'])
plt.xlabel('Learning Rate')
plt.ylabel('Loss')
plt.title('Learning Rate vs Loss')

#### Finding Ideal Learning Rate from Above Graph
The ideal learning rate is between slightly before where the learning rate "flattens out", and the lowest point on the curve.

For the above example, the ideal learning rate is somewhere betwee, .01 and .1

### Try a New Model using Learning Rate with .03 to See if that Improves Model-4

In [None]:
# 1. Create Model
model_6 = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape=(2,)),
    tf.keras.layers.Dense(5, activation=tf.keras.activations.relu),
    tf.keras.layers.Dense(5, activation=tf.keras.activations.relu),
    tf.keras.layers.Dense(1, activation=tf.keras.activations.sigmoid)
])

# 2. Compile Model with the Ideal Learning Rate found in Plot Above (0.03)
model_6.compile(loss=tf.keras.losses.BinaryCrossentropy(),
                optimizer=tf.keras.optimizers.legacy.Adam(learning_rate=0.03),
                metrics=['accuracy'])

# 3. Fit Model
model_6.fit(X_train, y_train, epochs=20)

In [None]:
y_pred_6 = model_6.predict(X_test)

#### Findings

The epoch doesn't hit 99% accuracy for model_4 until ~13 epochs, where model_5 hits 99% accuracy at ~9 epochs.

In [None]:
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.title('Train')
plot_decision_boundary(model_6, X_train, y_train)

plt.subplot(1, 2, 2)
plt.title('Test')
plot_decision_boundary(model_6, X_test, y_test)

## More Classificaiton Evaluation Methods

Alongside visualizing our model results as mush as possible, kthere are a handful of other classification evaluation methods & metrics to be familiar with:
* Accuracy - Most Common
* Precision - Less false positives
* Recall - Less false negatives
* F1-score
* Confusion Matrix
* SKLearn Classification Report

In [None]:
# Check the accuracy of our model
loss, accuracy = model_6.evaluate(X_test, y_test)
loss, accuracy

In [None]:
# Confusion Matrix of our Model
# Note: y pred comes out as a decimal for its estimates, so to get actual guess in binary form, need to round the prediction.
confusion_matrix(y_test, tf.round(y_pred_6))

In [None]:
# Pretty confusion matrix
plot_confusion_matrix(y_test, y_pred_6)