# Tuning Neural Networking in Keras

<a href="https://colab.research.google.com/github/coding-dojo-data-science/week-11-lecture-2-tuning-deep-learning-models/blob/main/SOLUTIONS%20Code-along%20Tuning%20Neural%20Networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

We will use the version of Keras that comes in the Tensorflow package, as it has the most up to date tools.

Keras works as weapper for deep learning model to be used as classification or regression estimators in sklearn

In [None]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
from seaborn import heatmap

from sklearn.metrics import recall_score, precision_score, accuracy_score, f1_score, \
classification_report, ConfusionMatrixDisplay
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

# new libraries
import tensorflow as tf
import tensorflow.keras as keras
from keras.models import Sequential
from keras.layers import Dense, Dropout

# This is used to overcome an issue with setting up tensorflow in M1/M1
# Suspect tensorflow-macos not very fit with GPU, use CPU only with
tf.config.set_visible_devices([], 'GPU')

# Set Random seed for consistency
keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

## Useful Functions

In [None]:


def eval_classification(true, pred, name, labels=None):
    """shows classification_report and confusion matrix
    for classification model predictions.  Returns a dataframe of metrics"""

    print(name, '\n')
    print(classification_report(true, pred, target_names=labels))
    ConfusionMatrixDisplay.from_predictions(true, pred, display_labels=labels, )

    plt.show()

    scores = pd.DataFrame()
    scores['Model Name'] = [name]
    scores['Precision'] = [precision_score(true, pred)]
    scores['Recall'] = [recall_score(true, pred)]
    scores['F1 Score'] = [f1_score(true, pred)]
    scores['Accuracy'] = [accuracy_score(true, pred)]
    scores.set_index('Model Name', inplace=True)

    return scores

def eval_nn_classification(class_model, X_train, y_train, X_test, y_test, model_name='', labels = ['No Diabetes', 'Diabetes']):
    """Wrapper for eval_classification, makes it work for neural networks
    Prints classification report and confusion matrix.  Returns dataframe of scores."""
    # Get predictions
    train_pred_proba = class_model.predict(X_train)
    test_pred_proba = class_model.predict(X_test)


    # round predictions to integers instead of floats using np.rint()
    train_preds = np.rint(train_pred_proba)
    test_preds = np.rint(test_pred_proba)

    # Define labels for the confusion matrix

    

    ## Evaluate the model
    train_scores = eval_classification(y_train, train_preds,
                                    name=f'{model_name}_train',
                                    labels=labels)
    test_scores = eval_classification(y_test, test_preds,
                                    name=f'{model_name}_test',
                                    labels=labels)
    scores = pd.concat([train_scores, test_scores])
    return scores

def plot_history(history):
  """Takes a keras model learning history and plots each metric
  Returns None"""

  metrics = history.history.keys()

  for metric in metrics:
      if not 'val' in metric:
        plt.plot(history.history[f'{metric}'], label=f'{metric}')
        if f'val_{metric}' in metrics:
          plt.plot(history.history[f'val_{metric}'], label=f'val_{metric}')
        plt.legend()
        plt.title(f'{metric}')
        plt.show()

# Data

We will be using the diabetes dataset from the previous lecture.

**NOTE**

These datasets are very small for deep learning.  Deep learning models usually work best with very large datasets with at least 10,000 or more samples.  They work best on even larger datasets than that.  But, for demonstration we will use these smaller datasets.

In [None]:
## Load data
diabetes = pd.read_csv('https://raw.githubusercontent.com/ninja-josh/image-storage/main/diabetes.csv')
diabetes.head()

# Classification

Let's start with modeling the classification dataset

In [None]:
## Overall look at data
diabetes.info()

In [None]:
## Check for duplicates
diabetes.duplicated().any()

In [None]:
## Look for outliers
diabetes.describe()

In [None]:
## Check class balance
diabetes['Outcome'].value_counts()

In [None]:
## Data Cleaning
no_glucose = diabetes['Glucose'] == 0
no_blood = diabetes['BloodPressure'] == 0
no_skin = diabetes['SkinThickness'] == 0
no_insulin = diabetes['Insulin'] == 0
no_bmi = diabetes['BMI'] == 0

#class_df_clean excludes rows that have no values == 0 in the above columns
class_df_clean = diabetes[~(no_glucose |
                                     no_blood |
                                     no_skin |
                                     no_insulin |
                                     no_bmi)]
class_df_clean.describe()

In [None]:
# Define X and Y and complete the train test split
X = diabetes.drop(columns = 'Outcome')
y = diabetes['Outcome']

X_train, X_test, y_train, y_test = train_test_split(X,y, random_state = 42)

## Scaling

Always scale your data for deep learning.  Otherwise you get a problem call 'Exploding Weights'.  Some weights will be updated much faster than others because the inputs are at larger scales.  This tends to hurt learning as data on smaller scales does not update as fast and doesn't get to contribute as much to the decision making process.  By scaling we put all features on the same footing.

In [None]:
# Scale the data
scaler = MinMaxScaler()

scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

## First Simple Model

We always want to start simple, as deep learning models can get very complex fast and more complex models take more time to train and are more prone to overfitting.  A well performing simple model is better than a well performing complex model.

## Input layer
The first layer we will define is not technically the input layer.  We will define the first hidden layer with a special argument that tells Keras how to create a input layer:

`input_dim=`

Input layers can also be defined manually using tensorflow.keras.layers.InputLayer

## Activation function

For the single hidden layer we will try just 3 nodes and use a ReLU activation.  ReLUs tend to perform well for hidden nodes.

## Output Layer

For out output layer (last layer) we just use one node because we only want the output of the model to be one number.  We will use a linear activation function.  This will simply output the value from the weights and bias in the node with no change.  The output will be a continuous number, a float.  This will make our model a regression model.




In [None]:
# Check the shape of input

X_train.shape[1]

In [None]:
# Set Random Seeds
keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

# Build your model
class_model = Sequential()
class_model.add(Dense(10, activation= 'relu', input_dim=X_train.shape[1]))
class_model.add(Dense(5, activation= 'relu'))
class_model.add(Dense(2, activation= 'relu'))

# One output node with 'sigmoid' activation
class_model.add(Dense(1, activation='sigmoid'))
class_model.summary()


## Compiling

Compiling the model puts all the pieces together to make it ready to train.  We need to specify:

* **Optimizer:** An Adam optimizer is a favorite and often performs well, it's a good place to start.
  - Other optimizers : Gradient Descent,Stochastic Gradient Descent,Adagrad,RMSProp
* **Loss Function:** 'mse' or mean squared error.  This is the number our model will try to reduce in each epoch.  Since this is a regression model we want our model to minimize the mean squared error.
* **Metrics:** 'mae' or mean absolute error.  We can provide a list of any appropriate metrics we want the model to keep track at each epoch.




In [None]:
from keras.metrics import Precision, Recall

# Compile your model with loss='bce, set metrics = ['acc', Precision(), Recall()]
precision = Precision(name='precision')
recall = Recall(name='recall')

class_model.compile(optimizer = 'adam', loss = 'bce',
                    metrics = ['accuracy', precision, recall])


# Training

Let's try training our model for 100 few epochs.  Sometimes that is enough, and it will give us an idea whether our model is learning anything.

In [None]:
# fit your model
history = class_model.fit(X_train, y_train,
                        validation_data = (X_test, y_test),
                        epochs = 300, verbose =1)


In [None]:
# Visualize Learning History

plot_history(history)

## Evaluation


In [None]:
# Get predictions
base_scores = eval_nn_classification(class_model, X_train, y_train, X_test, y_test, model_name='base_model')
base_scores

# <center> Attack Bias or Variance? </center>

How should we tune this classification model?

# Improving ANN - 2nd Model

In [None]:
# Instantiate your sequential model


# Build Model


# Add output layer with 1 node



# Check summary of network
class_model2.summary()

## Compiling

Compiling the model puts all the pieces together to make it ready to train.  We need to specify:

* **Optimizer:** An Adam optimizer is a favorite and often performs well, it's a good place to start.
  - Other optimizers : Gradient Descent,Stochastic Gradient Descent,Adagrad,RMSProp
* **Loss Function:** 'mse' or mean squared error.  This is the number our model will try to reduce in each epoch.  Since this is a regression model we want our model to minimize the mean squared error.
* **Metrics:** 'mae' or mean absolute error.  We can provide a list of any appropriate metrics we want the model to keep track at each epoch.




In [None]:
# Compile your model.
precision = Precision()
recall = Recall()




# Training

Let's try training our model for 100 epochs.  Sometimes that is enough, and it will give us an idea whether our model is learning anything.

In [None]:
# Fit your model





In [None]:
# Plot Learning History



## Evaluation


In [None]:
## Evaluate model

scores_2 = eval_nn_classification(class_model2, X_train, y_train, X_test, y_test, model_name='model_2')
scores = pd.concat([base_scores, scores_2])
scores

# <center> Attack Bias or Variance? </center>

How should we tune this classification model?

# Improving ANN - 3rd Model

In [None]:
# Instantiate your sequential model
class_model3 = Sequential()

# Build Model



# Add output layer with 1 node



# Check summary of network
class_model3.summary()



## Compiling

Compiling the model puts all the pieces together to make it ready to train.  We need to specify:

* **Optimizer:** An Adam optimizer is a favorite and often performs well, it's a good place to start.
  - Other optimizers : Gradient Descent,Stochastic Gradient Descent,Adagrad,RMSProp
* **Loss Function:** 'mse' or mean squared error.  This is the number our model will try to reduce in each epoch.  Since this is a regression model we want our model to minimize the mean squared error.
* **Metrics:** 'mae' or mean absolute error.  We can provide a list of any appropriate metrics we want the model to keep track at each epoch.




In [None]:
# Compile your model.
precision = Precision()
recall = Recall()


# Training

Let's try training our model for 100 epochs.  Sometimes that is enough, and it will give us an idea whether our model is learning anything.

In [None]:
# Fit your model


In [None]:
# Apply the custom function to see how your model is doing


## Evaluation


In [None]:
## Evaluate model

scores_3 = eval_nn_classification(class_model3, X_train, y_train, X_test, y_test, model_name='model_3')
scores = pd.concat([scores, scores_3])
scores

# <center> Attack Bias or Variance? </center>

How should we tune this classification model?

# Improving ANN - 4th Model

In [None]:
# Instantiate your sequential model
class_model4 = Sequential()

## Build Model

# Add output layer with 1 node



# Check summary of network
class_model4.summary()



## Compiling

Compiling the model puts all the pieces together to make it ready to train.  We need to specify:

* **Optimizer:** An Adam optimizer is a favorite and often performs well, it's a good place to start.
  - Other optimizers : Gradient Descent,Stochastic Gradient Descent,Adagrad,RMSProp
* **Loss Function:** 'mse' or mean squared error.  This is the number our model will try to reduce in each epoch.  Since this is a regression model we want our model to minimize the mean squared error.
* **Metrics:** 'mae' or mean absolute error.  We can provide a list of any appropriate metrics we want the model to keep track at each epoch.




In [None]:
# Compile your model.
precision = Precision()
recall = Recall()

class_model4.compile(optimizer= 'adam', loss = 'bce', metrics= ['acc', precision, recall])

# Training

Let's try training our model for 100 epochs and add EarlyStopping.

In [None]:
from tensorflow.keras.callbacks import EarlyStopping

In [None]:
# Fit your model

#Early Stopping?
early_stop = EarlyStopping(patience=3)
# Fit Model


In [None]:
# Apply the custom function to see how your model is doing
plot_history(history)

## Evaluation


In [None]:
## Evaluate model

scores_4 = eval_nn_classification(class_model4, X_train, y_train, X_test, y_test, model_name='model_2')
scores = pd.concat([scores, scores_4])
scores

## Choose a model

Which model performed best?