<a href="https://colab.research.google.com/github/DavidSenseman/BIO1173/blob/master/Class_05_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---------------------------
**COPYRIGHT NOTICE:** This Jupyterlab Notebook is a Derivative work of [Jeff Heaton](https://github.com/jeffheaton) licensed under the Apache License, Version 2.0 (the "License"); You may not use this file except in compliance with the License. You may obtain a copy of the License at

> [http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

------------------------

# **BIO 1173: Intro Computational Biology**

**Module 5: Regularization and Dropout**

* Instructor: [David Senseman](mailto:David.Senseman@utsa.edu), [Department of Integrative Biology](https://sciences.utsa.edu/integrative-biology/), [UTSA](https://www.utsa.edu/)

### Module 5 Material

* Part 5.1: Part 5.1: Introduction to Regularization: Ridge and Lasso
* Part 5.2: Using K-Fold Cross Validation with Keras
* **Part 5.3: Using L1 and L2 Regularization with Keras to Decrease Overfitting**
* Part 5.4: Drop Out for Keras to Decrease Overfitting
* Part 5.5: Benchmarking Keras Deep Learning Regularization Techniques



### Google CoLab Instructions

The following code ensures that Google CoLab is running the correct version of TensorFlow.
  Running the following code will map your GDrive to ```/content/drive```.

In [None]:
try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
    COLAB = True
    print("Note: using Google CoLab")
    %tensorflow_version 2.x
except:
    print("Note: not using Google CoLab")
    COLAB = False

### Lesson Setup

Run the next code cell to load necessary packages

In [None]:
# You MUST run this code cell first
import pandas as pd
import os
import numpy as np
import pandas as pd
import time

import os
import shutil
path = '/'
memory = shutil.disk_usage(path)
dirpath = os.getcwd()
print("Your current working directory is : " + dirpath)
print("Disk", memory)

## Datasets for this lesson

For this lesson we will be using the [Obesity Prediction Dataset](https://www.kaggle.com/datasets/mrsimple07/obesity-prediction) for the Examples and the [Body Performance Dataset](https://www.kaggle.com/datasets/kukuroo3/body-performance-data) for the **Exercises**.  

### Obesity Prediction Dataset

The **_Obesity Prediction dataset_** provides comprehensive information on individuals' demographic characteristics, physical attributes, and lifestyle habits, aiming to facilitate the analysis and prediction of obesity prevalence. 

The 7 categories of obesity/demographics measurements are:

* **Age:** The age of the individual, expressed in years (Mean=49.9 yrs +/- 18.1)
* **Gender:** The gender of the individual coded `Male` and `Female`
* **Height:** The height of the individual measured in centimeters (Mean=170 cm +/- 10.3)
* **Weight:** The weight of the individual measured in kilograms (Mean=71.2 kg +/- 15.5)
* **BMI:** Body mass index, a calculated metric derived from the individual's weight and height (Mean=24.9 +/- 6.19)
* **PhysicalActivityLevel:** This variable quantifies the individual's level of physical activity (Mean=2.53 +/- 1.12) 
* **ObesityCategory:** Categorization of individuals based on their BMI into different obesity categories
 
The output from `opDF.info` is:
~~~text
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 7 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Age                    1000 non-null   int64  
 1   Gender                 1000 non-null   object 
 2   Height                 1000 non-null   float64
 3   Weight                 1000 non-null   float64
 4   BMI                    1000 non-null   float64
 5   PhysicalActivityLevel  1000 non-null   int64  
 6   ObesityCategory        1000 non-null   object 
dtypes: float64(3), int64(2), object(2)
memory usage: 54.8+ KB
~~~
As you can see, two columns, `Age` and `ObesityCategory`, are non-numeric and will need to be converted into numeric values. Since all columns have the same `Non-Null Count` (_n_=1000) there is no missing data. 

The output from `opDF['ObesityCategory'].value_counts()` is as follows:
~~~text
ObesityCategory
Normal weight    371
Overweight       295
Obese            191
Underweight      143
Name: count, dtype: int64
~~~
As you can see, the column `ObesityCategory` has four categorical values.

The output from `opDF['PhysicalActivityLevel'].value_counts()` is as follows:
~~~text
PhysicalActivityLevel
4    259
3    255
2    247
1    239
Name: count, dtype: int64
~~~

There are four different activity levels, ranging from 1 to 4.

### Body Performance dataset

[Body Performance](https://www.kaggle.com/datasets/kukuroo3/body-performance-data)

For the Examples in this lesson, we will be using the [Body Performance dataset](https://www.kaggle.com/datasets/kukuroo3/body-performance-data) provided by the [Seoul Olympic Games Korea Sports Promotion Foundation](https://www.bigdata-culture.kr/bigdata/user/data_market/detail.do?id=ace0aea7-5eee-48b9-b616-637365d665c1). 

This dataset has 12 categories of body performance for a relatively large number of men and women (_n_=13,303). To speed-up the training of neural networks in the Examples, we will only use a fraction of the total number. 

The 12 categories of fitness measurements are:
* **age:** 20 ~64
* **gender:** M,F
* **height_cm:** (If you want to convert to feet, divide by 30.48)
* **weight_kg:**
* **body fat_%:**
* **diastolic:** diastolic blood pressure (min)
* **systolic:** systolic blood pressure (min)
* **gripForce:**
* **sit and bend forward_cm:**
* **sit-ups counts:**
* **broad jump_cm:**
* **class:** A,B,C,D ( A: best) / stratified

The output for the command `df.info()` is as follows:
~~~text
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13393 entries, 0 to 13392
Data columns (total 12 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   age                      13393 non-null  float64
 1   gender                   13393 non-null  object 
 2   height_cm                13393 non-null  float64
 3   weight_kg                13393 non-null  float64
 4   body fat_%               13393 non-null  float64
 5   diastolic                13393 non-null  float64
 6   systolic                 13393 non-null  float64
 7   gripForce                13393 non-null  float64
 8   sit and bend forward_cm  13393 non-null  float64
 9   sit-ups counts           13393 non-null  float64
 10  broad jump_cm            13393 non-null  float64
 11  class                    13393 non-null  object 
dtypes: float64(10), object(2)
~~~

As you can see, all but two columns, `age` and `class`, are numeric. Since all columns have the same `Non-Null Count` there is no missing data. 

### Create functions for this lesson

The code in the cell below creates 2 useful functions for this lesson, `elaspedTime(start,stop)` and `rename_col_by_index(dataframe, index_mapping)`. 

In [None]:
# Create functions

# Simple function to print out elasped time
def elaspedTime(start,end):
    # Print out time
    seconds = int((end-start))
    seconds = seconds % (24 * 3600)
    hour = seconds // 3600
    seconds %= 3600
    minutes = seconds // 60
    seconds %= 60
    print("Elapsed time = %d:%02d:%02d" % (hour, minutes, seconds))
    print()

# Simple function to change column name in a dataframe
def rename_col_by_index(dataframe, index_mapping):
    dataframe.columns = [index_mapping.get(i, col) for i, col in enumerate(dataframe.columns)]
    return dataframe

# Part 5.3: L1 and L2 Regularization to Decrease Overfitting

L1 and L2 regularization are two common regularization techniques that can reduce the effects of overfitting [[Cite:ng2004feature]](http://cseweb.ucsd.edu/~elkan/254spring05/Hammon.pdf). These algorithms can either work with an objective function or as a part of the backpropagation algorithm. In both cases, the regularization algorithm is attached to the training algorithm by adding an objective. 

## L1 Regularization
In the context of a neural network with L1 regularization, the objective function typically consists of two main components: the original loss function and the L1 regularization term. The objective function serves as a measure that the optimization algorithm aims to minimize during the training process.

The objective function with L1 regularization can be represented as:

`Objective function = Loss function + λ * L1 regularization term`

where:

* **Loss function:** The original loss function used to evaluate the performance of the neural network on the training data, such as the cross-entropy loss or mean squared error.
* **λ (lambda):** The regularization parameter that controls the strength of the L1 regularization penalty.
* **L1 regularization term:** The sum of absolute values of the weights in the neural network.

The addition of the L1 regularization term to the objective function encourages sparsity in the weights of the neural network by penalizing large weights. This helps prevent overfitting and can lead to a simpler and more interpretable model. The trade-off between minimizing the loss function and reducing the magnitude of the weights is controlled by the regularization parameter λ.

During the training process, the neural network adjusts its weights by minimizing the composite objective function, striking a balance between fitting the training data well (minimizing the loss function) and reducing model complexity (L1 regularization).

These algorithms work by adding a weight penalty to the neural network training. This penalty encourages the neural network to keep the weights to small values. Both L1 and L2 calculate this penalty differently. You can add this penalty calculation to the calculated gradients for gradient-descent-based algorithms, such as backpropagation. The penalty is negatively combined with the objective score for objective-function-based training, such as simulated annealing.


## L1 vs L2 Regularization
Both L1 and L2 work similarily in that they penalize the size of the weight, but in significantly different ways. L2 will force the weights into a pattern similar to a Gaussian distribution while the L1 will force the weights into a pattern similar to a Laplace distribution, as demonstrated in the following figure.

![L1 vs L2](https://biologicslab.co/BIO1173/images/class_9_l1_l2.png "L1 vs L2")

As you can see, L1 algorithm is more tolerant of weights further from 0, whereas the L2 algorithm is less tolerant. We will highlight other important differences between L1 and L2 in the following sections. You also need to note that both L1 and L2 count their penalties based only on weights; they do not count penalties on bias values. Keras allows [l1/l2 to be directly added to your network](http://tensorlayer.readthedocs.io/en/stable/modules/cost.html).

## Example 1: L1 Regularization of a classification neural network

In Example 1, L1 regularization will be demonstrated using the Obesity Prediction dataset and a classification neural network that will predict the Obesity Category of individuals. To make coding easier to follow, Example 1 has been broken down into 3 steps labeled "A", "B" and "C". 

### Example 1A: Create feature vector

The code in the cell below reads the Obesity Prediction dataset, `obesity_prediction.csv` from the course HTTPS server and and creates a new DataFrame called `opDF`. The column `Gender` is mapped with the string `Male` being mapped to `0` and `Female` mapped to `1`. The columns `Age`, `Height`, `Weight` and `BMI` are standardized to their Zscores. Since `ObesityCategory` is the y-value for this neural network, it is dropped when generating the list with the names of the columns to be used for the x-values (`opX_columns`). 

Since we will be building a neural network for **_classification_** we need to One-Hot encode the column `ObesityCategory` using the following code chunk:
~~~text
# One-Hot encode the column containing the y-values
dummies = pd.get_dummies(opDF['ObesityCategory']) # Classification
ObCategories = dummies.columns
opY = dummies.values
opY = np.asarray(opY).astype('float32')
~~~

Finally, the cell prints out the categorical values (names) that were One-Hot encoded using the "starred" print statement:
~~~text
# Print y categorical names
print(*ObCategories)
~~~

In [None]:
# Example 1A; Create feature vector

from scipy.stats import zscore

# Read the data set
opDF = pd.read_csv(
    "https://biologicslab.co/BIO1173/data/obesity_prediction.csv",
    na_values=['NA','?'])

# Map Gender
mapping =  {'Male': 0,
            'Female': 1}
opDF['Gender'] = opDF['Gender'].map(mapping)

# Standardize ranges
opDF['Age'] = zscore(opDF['Age'])
opDF['Height'] = zscore(opDF['Height'])
opDF['Weight'] = zscore(opDF['Weight'])
opDF['BMI'] = zscore(opDF['BMI'])

# Map ObesityCategory
#mapping =  {'Underweight': 0,
#            'Normal weight': 1,
#            'Overweight': 2,
#            'Obese': 3}
#opDF['ObesityCategory'] = opDF['ObesityCategory'].map(mapping)

# Generate list of columns for x
opX_columns = opDF.columns.drop('ObesityCategory')  # 

# Generate x-values as numpy array
opX = opDF[opX_columns].values
opX = np.asarray(opX).astype('float32')

# One-Hot encode the column containing the y-values
dummies = pd.get_dummies(opDF['ObesityCategory']) # Classification
ObCategories = dummies.columns
opY = dummies.values
opY = np.asarray(opY).astype('float32')

# Print y categorical names
print(*ObCategories)

If your code is correct, you should see the following output:
~~~text
Normal weight Obese Overweight Underweight
~~~

### Example 1B: L1 Regularization to Decrease Overfitting 

We now create a Keras network with L1 regression. 

The specific Keras code chunk that adds L1 regularization is the following:
~~~text
activity_regularizer=regularizers.l1(1e-4)
~~~
It should be noted that the L1 regularizer is added to each hidden layer, but **not** to the output layer:
~~~text
# Hidden 1
    model.add(Dense(50, input_dim=opX.shape[1], 
            activation='relu',
            activity_regularizer=regularizers.l1(1e-4))) 
    # Hidden 2
    model.add(Dense(25, activation='relu', 
                    activity_regularizer=regularizers.l1(1e-4))) 
     # Output
    model.add(Dense(opY.shape[1],activation='softmax'))
~~~

In [None]:
# Example 1B: L1 Regularization to Decrease Overfitting

import pandas as pd
import os
import numpy as np
from sklearn import metrics
from sklearn.model_selection import KFold
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras import regularizers

# Set variables
EPOCHS=100 # number of epochs for each loop
numK=5     # Set number of K-folds

# Record the start time in T_start
T_start = time.time()

# Cross-validate using KFold
kf = KFold(numK, shuffle=True, random_state=42)
    
# Initialize arrays
oos_y = []
oos_pred = []

# Initialize fold 
fold = 0

# Start loop here ---------------------------------#

for train, test in kf.split(opX):
    fold+=1   # increment fold
    print(f"Starting Fold #{fold}...")

    # Split data for this fold
    x_train = opX[train]
    y_train = opY[train]
    x_test = opX[test]
    y_test = opY[test]
    
    #kernel_regularizer=regularizers.l2(0.01),

    # Create new model for this fold
    model = Sequential()
    # Hidden 1
    model.add(Dense(50, input_dim=opX.shape[1], 
            activation='relu',
            activity_regularizer=regularizers.l1(1e-4))) 
    # Hidden 2
    model.add(Dense(25, activation='relu', 
                    activity_regularizer=regularizers.l1(1e-4))) 
     # Output
    model.add(Dense(opY.shape[1],activation='softmax'))
    # Compile model for this fold
    model.compile(loss='categorical_crossentropy', optimizer='adam')
    # Train model for this fold
    model.fit(x_train,y_train,validation_data=(x_test,y_test),
              verbose=0,epochs=EPOCHS)
    
    # Use model to make predictions  
    pred = model.predict(x_test)

    # Add actual y-values for the data used this fold
    oos_y.append(y_test)
    # raw probabilities to chosen class (highest probability)
    pred = np.argmax(pred,axis=1) 
    oos_pred.append(pred)        

    # Measure this fold's accuracy
    y_compare = np.argmax(y_test,axis=1) # For accuracy calculation
    score = metrics.accuracy_score(y_compare, pred)
    print(f"Fold score (accuracy): {score}")

# End loop ----------------------------------------------------#

# Build the oos prediction list and calculate the error.
oos_y = np.concatenate(oos_y)
oos_pred = np.concatenate(oos_pred)
oos_y_compare = np.argmax(oos_y,axis=1) # For accuracy calculation

score = metrics.accuracy_score(oos_y_compare, oos_pred)
print(f"Final score (accuracy): {score}")    
    
# Write the cross-validated prediction
oos_y = pd.DataFrame(oos_y)
oos_pred = pd.DataFrame(oos_pred)
oosDF = pd.concat( [oos_pred, oos_y],axis=1 )

# Uncomment the next line to print file
#oosDF.to_csv(filename_write,index=False)

# Record the end time in T_end
T_end = time.time()

# Print out elapsed time
elaspedTime(T_start,T_end)


If your code is correct, you should see something similar to following output:
~~~text
Starting Fold #1...
7/7 [==============================] - 0s 2ms/step
Fold score (accuracy): 0.955
Starting Fold #2...
7/7 [==============================] - 0s 2ms/step
Fold score (accuracy): 0.985
Starting Fold #3...
7/7 [==============================] - 0s 2ms/step
Fold score (accuracy): 0.965
Starting Fold #4...
7/7 [==============================] - 0s 2ms/step
Fold score (accuracy): 0.975
Starting Fold #5...
7/7 [==============================] - 0s 2ms/step
Fold score (accuracy): 0.97
Final score (accuracy): 0.97
Elapsed time = 0:01:10
~~~
The `Final score (accuracy): 0.97` is very high.

### Example 1C: Print out actual and predicted y-values

The code in the cell below prints out the predicted and the actual Obesity Categories for the "out-of-sample" individuals. As mentioned previously, "out-of-sample" refers to data that was _not_ used in the process of developing the neural network model. It is only used to evaluate the accuracy and performance of the model on new, unseen data to assess its generalizability and potential for predicting future outcomes. 

In [None]:
# Example 1C: Print out actual and predicted y-values 

# Rename columns
new_column_mapping = {0: 'Predicted Ob Class', 1: 'Actual: 0'}
oosDF = rename_col_by_index(oosDF, new_column_mapping)

# Set display options
pd.set_option('display.max_rows', 8)
pd.set_option('display.max_columns', 8)

# Display DataFrame
display(oosDF)

If your code is correct you should see something similar to the following table:

![__](https://biologicslab.co/BIO1173/images/class_05_3_Exm1C.png)

By inspection of the output above, you can see the model's predictions of the Obesity Category are very good for the out-of-sample individuals as would be expected with a `Final score (accuracy): 0.97`. In the output shown above, there were no errors in the model's predictions.

## **Exercise 1: L1 Regularization of a classification neural network**

For **Exercise 1**, you are to use the Body Performance dataset and build a classification neural network that will predict the fitness `class` of individuals. To make coding easier, **Exercise 1** has been broken down into 3 steps labeled "A", "B" and "C". 

### **Exercise 1A: Create feature vector**

In the cell below, read the Body Performance dataset, `bodyPerformance.csv` from the course HTTPS server and and create a new DataFrame called `bpBigDF`. Since this is a fairly large dataset, you can speed up training time by only using a part of it. Use this code chunk to create a DataFrame with only 20% of the samples:
~~~text
bpDF=bpBigDF.sample(frac=0.20)
~~~

You need to map the column `gender` with the categorical values `M` and `F` to integers.  

You should also standardize some, but not all of the other numeric values using this code chunk:
~~~text
# Standardize ranges
bpDF['age'] = zscore(bpDF['age'])
bpDF['height_cm'] = zscore(bpDF['height_cm'])
bpDF['weight_kg'] = zscore(bpDF['weight_kg'])
bpDF['diastolic'] = zscore(bpDF['diastolic'])
bpDF['systolic'] = zscore(bpDF['systolic'])
bpDF['gripForce'] = zscore(bpDF['gripForce'])
~~~

Since you are building a classification neural network to predict `class`, you will need to drop that column when creating your list of X columns:
~~~text
# Generate list of columns for x
bpX_columns = bpDF.columns.drop('class')  # class is y-value 
~~~
Using this list, you can generate the x values for your model using the following code chunk:
~~~text
# Generate x-values as numpy array
bpX = bpDF[bpX_columns].values
bpX = np.asarray(bpX).astype('float32')
~~~

Since this is a classification neural network, you will also need to One-Hot encode the column `class` using the following code chunk:
~~~text
# One-Hot encode the column containing the y-values
dummies = pd.get_dummies(bpDF['class']) # Classification
BpCategories = dummies.columns
bpY = dummies.values
bpY = np.asarray(bpY).astype('float32')
~~~

Finally, prints out the categorical values (names), `BpCategories` that were One-Hot encoded, using the "starred" print statement:
~~~text
# Print y categorical names
print(*ObCategories)
~~~

In [None]:
# Insert your code for Exercise 1A here



If your code is correct, you should see the following output:
~~~text
A B C D
~~~

### **Exercise 1B: L1 Regularization to Decrease Overfitting** 

In the cell below, create a Keras network with L1 regression to predict the fitness `class` in the Body Performance Dataset using the feature vector that you prepared in **Exercise 1A**. The code in Example 1B should act as your template. 

In [None]:
# Insert your code for Exercise 1B here



If your code is correct, you should see something similar to following output:
~~~text
Starting Fold #1...
17/17 [==============================] - 0s 1ms/step
Fold score (accuracy): 0.6567164179104478
Starting Fold #2...
17/17 [==============================] - 0s 1ms/step
Fold score (accuracy): 0.6007462686567164
Starting Fold #3...
17/17 [==============================] - 0s 1ms/step
Fold score (accuracy): 0.6156716417910447
Starting Fold #4...
17/17 [==============================] - 0s 1ms/step
Fold score (accuracy): 0.6455223880597015
Starting Fold #5...
17/17 [==============================] - 0s 1ms/step
Fold score (accuracy): 0.6448598130841121
Final score (accuracy): 0.6326987681970885
Elapsed time = 0:02:44
~~~
The `Final score (accuracy): 0.6326987681970885` is not especially accurate.

### **Exercise 1C: Print out actual and predicted y-values**

In the cell below, print out the predicted and the actual Fitness `class` for the out-of-sample individuals. Label your columns using the following code chunk:
~~~text
# Rename columns
new_column_mapping = {0: 'Predicted Fitness Class', 1: 'Actual: 0'}
oosDF = rename_col_by_index(oosDF, new_column_mapping)
~~~

In [None]:
# Insert your code for Exercise 1C here



If your code is correct you should see something similar to the following table:

![__](https://biologicslab.co/BIO1173/images/class_05_3_Exe1C.png)

By inspection of the output above, you can see the model's predictions of the fitness `class` is not perfect for the out-of-sample individuals. This should not come as a big surprise given a `Final score (accuracy): 0.6327`. 

In the output shown above, there were some errors in the model's predictions. The model correctly predicted the `class` level for 6 of the individuals, but made incorrect predictions for 2 subjects, `index 2675` and `index 2677`. 

## **Exercise 2: L2 Regularization of a classification neural network**

L2 regularization might generate better results than L1 regularization under the following conditions:

* **Feature Correlation:** When features are highly correlated, L2 regularization tends to perform better than L1 regularization. L2 regularization encourages sparse coefficients but does not force them to zero, allowing correlated features to share the regularization penalty more evenly.
* **Data Noise:** In the presence of noisy data, L2 regularization can be more effective at smoothing out the noise due to its tendency to distribute the penalty more uniformly across all weights. L2 regularization helps prevent individual noisy data points from disproportionately influencing the model.
* **Model Stability:** L2 regularization often leads to more stable and well-conditioned models compared to L1 regularization. L2 regularization prevents the weights from growing excessively large, which can improve the numerical stability of the optimization process and enhance generalization performance.
* **Small Dataset:** When working with a small dataset, L2 regularization can help prevent overfitting by providing smoother and more continuous solutions. L2 regularization penalizes large weights more gently than L1 regularization, making it more suitable for avoiding overfitting in smaller datasets.
* **Uniform Impact on All Weights:** If the goal is to ensure that all weights have some level of regularization, rather than inducing sparsity, L2 regularization is preferred. L2 regularization treats all weights equally, promoting a more balanced impact on the model parameters.

In summary, L2 regularization often outperforms L1 regularization in scenarios where feature correlation, data noise, model stability, small dataset size, or a uniform impact on all weights are important considerations for achieving better results and improved generalization performance.

For **Exercise 2**, you are to again use the Body Performance dataset and build a classification neural network that will predict the fitness `class` of individuals. As before, **Exercise 2** has been broken down into 3 steps labeled "A", "B" and "C". 

### **Exercise 2A: Create feature vector**

In the cell below create a feature vector for the Body Performance dataset. You should use **exactly** the same code that you wrote for **Exercise 1B**. As above, only use 20% of the dataset for your neural network model. 

In [None]:
# Insert your code for Exercise 2A here



If your code is correct, you should see the following output:
~~~text
A B C D
~~~

### **Exercise 2B: L1 Regularization to Decrease Overfitting** 

In the cell below, create a Keras network with L2 regression. The code for **Exercise 2B** should be _identical_ to the code you wrote for **Exercise 1B** with only one difference. To enable L2 regularization you will need to make the following code changes. 

Change the line:
~~~text
activity_regularizer=regularizers.l1(1e-4)
~~~
to read:
~~~text
kernel_regularizer=regularizers.l2(0.01))) 
~~~
It is somewhat hard to spot the difference. Can you see what is different?

As above, the L2 regularizer must be added to each hidden layer, but not to the output layer:
~~~text
    # Hidden 1
    model.add(Dense(50, input_dim=bpX.shape[1], 
            activation='relu',
            kernel_regularizer=regularizers.l2(0.01))) 
    # Hidden 2
    model.add(Dense(25, activation='relu', 
                    kernel_regularizer=regularizers.l2(0.01))) 
    # Output
    model.add(Dense(bpY.shape[1],activation='softmax'))
~~~

In [None]:
# Insert your code for Exercise 2B here



If your code is correct, you should see something similar to following output:
~~~text
Starting Fold #1...
17/17 [==============================] - 0s 1ms/step
Fold score (accuracy): 0.5746268656716418
Starting Fold #2...
17/17 [==============================] - 0s 1ms/step
Fold score (accuracy): 0.5783582089552238
Starting Fold #3...
17/17 [==============================] - 0s 2ms/step
Fold score (accuracy): 0.6026119402985075
Starting Fold #4...
17/17 [==============================] - 1s 2ms/step
Fold score (accuracy): 0.5522388059701493
Starting Fold #5...
17/17 [==============================] - 0s 2ms/step
Fold score (accuracy): 0.5887850467289719
Final score (accuracy): 0.5793206420306084
Elapsed time = 0:02:47

~~~
The `Final score (accuracy): 0.5793206420306084` is not that much of an improvement over the `0.632698768197088` score obtained with L1 Regularization in **Exercise 1B**. 

### **Exercise 2C: Print out actual and predicted y-values**

In the cell below, print out the predicted and the actual Fitness `class` for the "out-of-sample" individuals. Label your columns using the following code chunk:
~~~text
# Rename columns
new_column_mapping = {0: 'Predicted Fitness Class', 1: 'Actual: 0'}
oosDF = rename_col_by_index(oosDF, new_column_mapping)
~~~

In [None]:
# Insert your code for Exercise 2C here

# Rename columns
new_column_mapping = {0: 'Predicted Fitness Class', 1: 'Actual: 0'}
oosDF = rename_col_by_index(oosDF, new_column_mapping)

# Set display options
pd.set_option('display.max_rows', 8)
pd.set_option('display.max_columns', 8)

# Display DataFrame
display(oosDF)

If your code is correct you should see something similar to the following table:

![__](https://biologicslab.co/BIO1173/images/class_05_3_Exe1C.png)

By inspection of the output above, you can see the model's predictions of the fitness `class` is not perfect for the out-of-sample individuals. This should not come as a big surprise given a `Final score (accuracy): 0.6327`. 

In the output shown above, there were some errors in the model's predictions. The model correctly predicted the `class` level for 6 of the individuals, but made incorrect predictions for 2 subjects, `index 2675` and `index 2677`. 

## **Lesson Turn-in**

When you have completed all of the code cells, and run them in sequential order (the last code cell should be number 12), not counting the optional File Clean-up below), use the **File --> Print.. --> Save to PDF** to generate a PDF of your JupyterLab notebook. Save your PDF as `Class_05_3.lastname.pdf` where _lastname_ is your last name, and upload the file to Canvas.