# Introduction to Neural Networking in Keras

<a href="https://colab.research.google.com/github/coding-dojo-data-science/week-11-lecture-2-tuning-deep-learning-models/blob/main/Code-along%20Tuning%20Neural%20Networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

We will use the version of Keras that comes in the Tensorflow package, as it has the most up to date tools.

Keras works as weapper for deep learning model to be used as classification or regression estimators in sklearn

In [None]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
from seaborn import heatmap

from sklearn.metrics import mean_absolute_error, r2_score, mean_squared_error, \
precision_score, recall_score, accuracy_score, f1_score, ConfusionMatrixDisplay, \
classification_report
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

# new libraries
import tensorflow as tf
import tensorflow.keras as keras
from keras.models import Sequential
from keras.layers import Dense

# Set random seeds for consistent outcomes
keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

### Plot History

Since we will be plotting histories for all of our models, lets create a function to do it quickly.

In [1]:
def plot_history(history):
  """Takes a keras model learning history and plots each metric"""
  
  metrics = history.history.keys()
  
  for metric in metrics:
      if not 'val' in metric:
        plt.plot(history.history[f'{metric}'], label=f'{metric}')
        if f'val_{metric}' in metrics:
          plt.plot(history.history[f'val_{metric}'], label=f'val_{metric}')
        plt.legend()
        plt.title(f'{metric}')
        plt.show()
        
def eval_regression(true, pred, name='Model'):
    """Evaluates true and predicted values from a regression model.  
    Outputs a dataframe of metrics"""
    scores = pd.DataFrame()
    scores['Model Name'] = [name]
    scores['RMSE'] = [np.sqrt(mean_squared_error(true, pred))]
    scores['MAE'] = [mean_absolute_error(true, pred)]
    scores['R2'] = [r2_score(true, pred)]
    scores.set_index('Model Name', inplace=True)

    return scores

# Data

We will be working with 2 different datasets in this project, 1 is a regression dataset and the other is a classification dataset.  This way you can practice doing both using deep learning.

**NOTE**

These datasets are very small for deep learning.  Deep learning models usually work best with very large datasets with at least 10,000 or more samples.  They work best on even larger datasets than that.  But, for demonstration we will use these smaller datasets.

## Regression
This is a dataset of housing prices in Boston from 1978.  Each row is a house and the dataset includes several features regarding each house.  Our target today will be the price of the home.



In [None]:
regression_df = pd.read_csv('https://raw.githubusercontent.com/ninja-josh/image-storage/main/Boston_Housing_from_Sklearn.csv')

# Regression

Let's start with modeling the regression dataset

In [None]:
regression_df.head()

In [None]:
regression_df.info()

In [None]:
regression_df.duplicated().any()

In [None]:
regression_df.describe()

In [None]:
# Define X and Y and complete the train test split
X = regression_df.drop(columns = 'PRICE')
y = regression_df['PRICE']

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 42)

## Scaling

Always scale your data for deep learning.  Otherwise you get a problem call 'Exploding Weights'.  Some weights will be updated much faster than others because the inputs are at larger scales.  This tends to hurt learning as data on smaller scales does not update as fast and doesn't get to contribute as much to the decision making process.  By scaling we put all features on the same footing.

In [None]:
# Scale the data


## First Simple Model

We always want to start simple, as deep learning models can get very complex fast and more complex models take more time to train and are more prone to overfitting.  A well performing simple model is better than a well performing complex model.

## Input layer
The first layer we will define is not technically the input layer.  We will define the first hidden layer with a special argument that tells Keras how to create a input layer:

`input_dim=`

Input layers can also be defined manually using tensorflow.keras.layers.InputLayer

## Activation function

For the single hidden layer we will try just 3 nodes and use a ReLU activation.  ReLUs tend to perform well for hidden layers.

## Output Layer

For out output layer (last layer) we just use one node because we only want the output of the model to be one number.  We will use a linear activation function.  This will simply output the value from the weights and bias in the node with no change.  The output will be a continuous number, a float.  This will make our model a regression model.




# Note:
### The first layer you define will NOT be the input layer!  Keras will create an input layer on its own, implicitly.

In [None]:
# Set Random Seeds
keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

# Instantiate your sequential model

# Add first hidden layer with 3 neurons THIS IS NOT THE INPUT LAYER!

# Tell Keras how to construct the input layer shape using input_dim

# Add output layer with 1 node

# Check summary of network 

# Compile your model.


## Compiling

Compiling the model puts all the pieces together to make it ready to train.  

For this step, we need to specify a few other hyperparameters:

* **Optimizer:** An Adam optimizer is a favorite and often performs well, it's a good place to start.
  - Other optimizers : Gradient Descent, Stochastic Gradient Descent, Adagrad, RMSProp
* **Loss Function:** 'mse' or mean squared error.  This is the number our model will try to reduce in each epoch.  Since this is a regression model we want our model to minimize the mean squared error.  A loss function ALWAYS needs to be a measurement of the total error that the model can REDUCE.  R^2 won't work because higher is better. We don't want the model to reduce R^2!
* **Metrics:** 'mae' or mean absolute error.  We can provide a list of any appropriate metrics we want the model to keep track at each epoch.




# Training (AKA fitting)

Let's try training our model for 100 few epochs.  Sometimes that is enough, and it will give us an idea whether our model is learning anything.

In [None]:
# Fit your model



In [None]:
# Apply the custom function plot_history() to see how your model is doing
plot_history(history)

## Evaluation


In [None]:
# Make predictions and evaluate your model
train_preds = reg_model.predict(X_train)
test_preds = reg_model.predict(X_test)

train_scores = eval_regression(y_train, train_preds, name='base_reg_train')
test_scores = eval_regression(y_test, test_preds, name='base_reg_test')

reg_scores = pd.concat([train_scores, test_scores])
reg_scores

# <center> Temperature Check: </center>
## On a scale of 0 - 5, how confident do you feel in coding neural networks?

0. What is a neural net?
1. I know what a neural net is, but I don't know how to even start coding one.
2. I kinda get how the code flows, but need help from someone else to create my own.
3. I understand the general idea, and could code a neural net in Keras if I had an example in front of me.
4. I feel confident in coding a neural network with some reference materials.
5. Move over, Josh.  I can finish this code-along.


# 🦾 Your Turn: Classification Models in Keras

Classification models are similar, except that we need to adjust:
* The final activation of the output layer, and
* The loss function and metrics in the compile step.

We will also need to do some processing of the predictions after training to make them integers instead of floats.

### Remember: 
MAE, MSE, RMSE, and R2 are regression metrics,

accuracy, recall, precision, and F1-Score are classification metrics.

## Classification Dataset
The classification dataset describes diabetes rates among Pima Indians.  Each row is a person and this dataset includes features regarding health related measurements.  The target is binary and represents whether or not a person will diagnosed with diabetes.  This is another old dataset first presented in 1988.



In [None]:
classification_df = pd.read_csv('https://raw.githubusercontent.com/ninja-josh/image-storage/main/diabetes.csv')
classification_df.head()

In [None]:
classification_df.info()

In [None]:
classification_df.duplicated().any()

In [None]:
classification_df.describe()

We see minimums for Glucose, BloodPression, SkinThickness, Insulin, and BMI of 0s.  Those are impossible for humans, so lets drop those rows.

In [None]:
no_glucose = classification_df['Glucose'] == 0
no_blood = classification_df['BloodPressure'] == 0
no_skin = classification_df['SkinThickness'] == 0
no_insulin = classification_df['Insulin'] == 0
no_bmi = classification_df['BMI'] == 0

#class_df_clean excludes rows that have no values == 0 in the above columns
class_df_clean = classification_df[~(no_glucose |
                                     no_blood |
                                     no_skin |
                                     no_insulin |
                                     no_bmi)]
class_df_clean.describe()

We lost a lot of data, going from 768 samples to 392 samples.  In the future we might impute this data using means, medians, or other imputation strategies.  For this exercise we won't focus on that.

In [None]:
# Define X and y and train test split
X = class_df_clean.drop(columns = 'Outcome')
y = class_df_clean['Outcome']

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 42, stratify = y)

In [None]:
# Scale the data



## Build the Classification Model

We need to do a few things differently here because this is a binary classification:

1. The activation of our final layer needs to be 'sigmoid'.  


(If this were multiclass classification, we would set the final activation as 'softmax' and the number of output nodes would be the number of classes in our y_train.)

In [None]:
# Set Random Seeds
keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

# Build your model

# One output node with 'sigmoid' activation



## More Changes for Classification:

1.  We need to change our loss to 'binary_crossentropy', or 'bce'.  If this were multiclass we would use 'categorical_crossentrobpy'.

2. Our metrics should be classification metrics.  We will use accuracy and import recall and precision. 

In [None]:
from keras.metrics import Precision, Recall

# Compile your model with loss='bce, set metrics = ['acc', Precision(), Recall()]



In [None]:
# fit your model



In [None]:
# See how your model is doing



## Evaluation

Keras models always output floats, not integers.  In this case the final sigmoid activation function will return a number between 0 and 1.  If the number is closer to 1, the model predicts the sample is more likely to be class 1.  If it is closer to 2, the sample is predicted to be more likely to be class 0.  

This is similar to the output of .predict_proba() with Scikit-Learn models.

### Converting Floats to Ints

In order to use Scikit-Learn metrics functions, the float outputs of the model need to be converted to ints.  We don't want to just use `int(pred)` or `pred.astype(int)` because that will just drop the decimal and all our predictions would be 0s.  

Instead we want to **round** the predictions to the nearest integer. To round all of the numbers in an array we can use the NumPy function, `np.rint()` which is short for 'round to integer'.  

In [None]:
model.predict(X_train)[:5]

In [None]:
# Get predictions



# round predictions to integers instead of floats using np.rint()


# the following code should show whole number predictions, 1.0 or 0.0
print(test_preds[:5])
print(train_preds[:5])

In [None]:
# Define labels for the confusion matrix
labels = ['No Diabetes', 'Diabetes']

train_scores = eval_classification(y_train, train_preds, 
                                   name='base_class_model_train',
                                  labels=labels)
test_scores = eval_classification(y_test, test_preds, 
                                   name='base_class_model_test',
                                  labels=labels)
class_scores = pd.concat([train_scores, test_scores])
class_scores