<a href="https://colab.research.google.com/github/DavidSenseman/BIO1173_Fall2025/blob/main/F25_Class_02_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---------------------------
**COPYRIGHT NOTICE:** This Jupyterlab Notebook is a Derivative work of [Jeff Heaton](https://github.com/jeffheaton) licensed under the Apache License, Version 2.0 (the "License"); You may not use this file except in compliance with the License. You may obtain a copy of the License at

> [http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

------------------------

# **BIO 1173: Intro Computational Biology**

##### **Module 2: Neural Networks with Tensorflow and Keras**

* Instructor: [David Senseman](mailto:David.Senseman@utsa.edu), [Department of Biology, Health and the Environment](https://sciences.utsa.edu/bhe/), [UTSA](https://www.utsa.edu/)

### Module 2 Material

* Part 2.1: Introduction to Neural Networks with Tensorflow and Keras
* Part 2.2: Encoding Feature Vectors
* **Part 2.3: Early Stopping and Dropout to Prevent Overfitting**
* Part 2.4: Saving and Loading a Keras Neural Network

## Google CoLab Instructions

You MUST run the following code cell to get credit for this class lesson. By running this code cell, you will map your GDrive to /content/drive and print out your Google GMAIL address. Your Instructor will use your GMAIL address to verify the author of this class lesson.

In [1]:
# You must run this cell first
try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
    from google.colab import auth
    auth.authenticate_user()
    COLAB = True
    print("Note: Using Google CoLab")
    import requests
    gcloud_token = !gcloud auth print-access-token
    gcloud_tokeninfo = requests.get('https://www.googleapis.com/oauth2/v3/tokeninfo?access_token=' + gcloud_token[0]).json()
    print(gcloud_tokeninfo['email'])
except:
    print("**WARNING**: Your GMAIL address was **not** printed in the output below.")
    print("**WARNING**: You will NOT receive credit for this lesson.")
    COLAB = False

Mounted at /content/drive
Note: Using Google CoLab
david.senseman@gmail.com


Make sure your GMAIL address is included as the last line in the output above.

### Define functions

The cell below creates the function(s) needed for this lesson.

In [2]:
# Simple function to print out elasped time
def hms_string(sec_elapsed):
    h = int(sec_elapsed / (60 * 60))
    m = int((sec_elapsed % (60 * 60)) / 60)
    s = sec_elapsed % 60
    return "{}:{:>02}:{:>05.2f}".format(h, m, s)

# Simple function to change column name in a dataframe
def rename_col_by_index(dataframe, index_mapping):
    dataframe.columns = [index_mapping.get(i, col) for i, col in enumerate(dataframe.columns)]
    return dataframe

# **Early Stopping in Keras to Prevent Overfitting**

It can be difficult to determine how many epochs to cycle through to train a neural network. **_Overfitting_** will occur if you train the neural network for too many epochs, and the neural network will not perform well on new data, despite attaining a good accuracy on the training set. Overfitting occurs when a neural network is trained to the point that it begins to memorize rather than generalize, as demonstrated in Figure 3.OVER.

**Figure 3.OVER: Training vs. Validation Error for Overfitting**
![Training vs. Validation Error for Overfitting](https://biologicslab.co/BIO1173/images/class_3_training_val.png "Training vs. Validation Error for Overfitting")

It is important to segment the original dataset into several datasets:

* **Training Set**
* **Validation Set**
* **Holdout Set**

You can construct these sets in several different ways. The following programs demonstrate some of these.

The first method is a training and validation set. We use the training data to train the neural network until the validation set no longer improves. This attempts to stop at a near-optimal training point. This method will only give accurate "out of sample" predictions for the validation set; this is usually 20% of the data. The predictions for the training data will be overly optimistic, as these were the data that we used to train the neural network. Figure 3.VAL demonstrates how we divide the dataset.

**Figure 3.VAL: Training with a Validation Set**
![Training with a Validation Set](https://biologicslab.co/BIO1173/images/class_1_train_val.png "Training with a Validation Set")

## **Early Stopping**

We will now see an example of classification training with early stopping. We will train a neural network until the error no longer improves on the validation set.

### Example 1: Early Stopping with Classification

The code in the cell below builds and trains a **_classification_** neural network called `irisModel`. The model is trained/fitted to the Iris flower dataset (`iris.csv`) downloaded from the course HTTPS server and stored in the DataFrame `irisDF`.

The independent variables, or X-values, are the values in the columns `sepal_length`, `sepal_width`, `petal_length` and `petal_width`. The values are store in a Numpy array called `irisX`.

The dependent variable (Y-value) is the column `species` which contains the names of the three Iris species in the dataset, _Iris setosa_, _Iris_ versicolor_ and _Iris virginica_.

Since the species names are entered in the `species` column as strings, it is necessary to use One-Hot Encoding to convert these strings into the values `0` and `1` using the command `pd.get_dummies()`. The variable holding the dependent values, `irisY`, is created from the `dummies.values` as shown below.

#### **Early Stopping**

In order to implement _Early Stopping_, it is first necessary to split the dataset into 4 separate groups: X train, X test, Y train and Y test using the function `train_test_split()`. The argument `test_size=0.25` tells the function that 75% of the data should be put into the two train sets (i.e. `irisX_train` and `irisY_train`) and the remaining 25% should be put into the two validation sets `irisX_test` and `irisY_test`.

Since the separation of data into training and test sets is a random process, the argument `random_state=42` is used for teaching/demonstration purposes to insure that the `split` occurs at the same places when the code is re-run. In normal use, you wouldn't set the random seed.

The model, `irisModel`, is a densely connected sequential neural network with two hidden layers. The 1st layer has 50 neurons, the 2nd hidden layer 25. The activation function for both hidden layers is `relu`. Since this function of this model is classification, the `softmax` activation function is used in the output layer. The model is compiled with the 'categorical_crossentropy` loss function and the `adam` optimizer.

The code for implementing early stopping variable `irisMoniter` is shown below:

~~~text
# Build monitor for early stopping
irisMonitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5,
        verbose=1, mode='auto', restore_best_weights=True)
~~~
The meaning/function of the different arguments will be discussed below.

Finally, the model is fitted to the Iris data with the number of epochs set to **1000!** Don't worry, you won't have to wait forever for the training to complete--thanks to early stopping.

In [3]:
# Example 1: Early stopping with classification

import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Input
from keras.callbacks import EarlyStopping
import numpy as np
from sklearn.model_selection import train_test_split

# Read dataset and create DataFrame-------------------------------------
irisDF = pd.read_csv("https://biologicslab.co/BIO1173/data/iris.csv",
    na_values=['NA', '?'])

# Create feature vector-------------------------------------------------
# Generate X-values
irisX = irisDF[['sepal_length', 'sepal_width',
                'petal_length', 'petal_width']].values
irisX = np.asarray(irisX).astype('float32')

# Generate Y-values
dummies = pd.get_dummies(irisDF['species']) # Classification
SpeciesNames = dummies.columns  # List with species names
irisY = dummies.values # Y-values
irisY = np.asarray(irisY).astype('float32')

# Split into validation and training sets------------------------------
irisX_train, irisX_test, irisY_train, irisY_test = train_test_split(
    irisX, irisY, test_size=0.25, random_state=42)

# Build neural network-------------------------------------------------
irisModel = Sequential()
irisModel.add(Input(shape=(irisX.shape[1],)))  # Hidden 1
irisModel.add(Dense(25, activation='relu'))
irisModel.add(Dense(irisY.shape[1],activation='softmax'))
irisModel.compile(loss='categorical_crossentropy', optimizer='adam')

# Build monitor for early stopping-------------------------------------
irisMonitor = EarlyStopping(monitor='val_loss',
              min_delta=1e-3, patience=5, verbose=1,
              mode='auto', restore_best_weights=True)

# Train model----------------------------------------------------------
irisModel.fit(irisX_train,irisY_train,validation_data=(irisX_test,irisY_test),
        callbacks=[irisMonitor],verbose=2,epochs=1000)

Epoch 1/1000
4/4 - 3s - 834ms/step - loss: 4.2131 - val_loss: 4.3378
Epoch 2/1000
4/4 - 0s - 24ms/step - loss: 3.9467 - val_loss: 4.0535
Epoch 3/1000
4/4 - 0s - 21ms/step - loss: 3.6868 - val_loss: 3.7858
Epoch 4/1000
4/4 - 0s - 21ms/step - loss: 3.4288 - val_loss: 3.5220
Epoch 5/1000
4/4 - 0s - 22ms/step - loss: 3.1907 - val_loss: 3.2643
Epoch 6/1000
4/4 - 0s - 22ms/step - loss: 2.9552 - val_loss: 3.0160
Epoch 7/1000
4/4 - 0s - 22ms/step - loss: 2.7374 - val_loss: 2.7830
Epoch 8/1000
4/4 - 0s - 22ms/step - loss: 2.5301 - val_loss: 2.5675
Epoch 9/1000
4/4 - 0s - 22ms/step - loss: 2.3387 - val_loss: 2.3756
Epoch 10/1000
4/4 - 0s - 22ms/step - loss: 2.1772 - val_loss: 2.2020
Epoch 11/1000
4/4 - 0s - 22ms/step - loss: 2.0306 - val_loss: 2.0503
Epoch 12/1000
4/4 - 0s - 22ms/step - loss: 1.9009 - val_loss: 1.9167
Epoch 13/1000
4/4 - 0s - 22ms/step - loss: 1.7819 - val_loss: 1.8002
Epoch 14/1000
4/4 - 0s - 22ms/step - loss: 1.6724 - val_loss: 1.6978
Epoch 15/1000
4/4 - 0s - 23ms/step - loss:

<keras.src.callbacks.history.History at 0x7d72605e03d0>

If your code is correct, you should see something similar to the following output:

~~~text
Epoch 1/1000
4/4 - 2s - 612ms/step - loss: 1.9500 - val_loss: 1.7919
Epoch 2/1000
4/4 - 0s - 13ms/step - loss: 1.8188 - val_loss: 1.6893
Epoch 3/1000
4/4 - 0s - 13ms/step - loss: 1.7055 - val_loss: 1.5989
Epoch 4/1000
4/4 - 0s - 12ms/step - loss: 1.6103 - val_loss: 1.5197
Epoch 5/1000
4/4 - 0s - 12ms/step - loss: 1.5213 - val_loss: 1.4477

....................................

Epoch 277/1000
4/4 - 0s - 13ms/step - loss: 0.1340 - val_loss: 0.1158
Epoch 278/1000
4/4 - 0s - 12ms/step - loss: 0.1331 - val_loss: 0.1156
Epoch 279/1000
4/4 - 0s - 12ms/step - loss: 0.1328 - val_loss: 0.1157
Epoch 280/1000
4/4 - 0s - 12ms/step - loss: 0.1323 - val_loss: 0.1171
Epoch 281/1000
4/4 - 0s - 12ms/step - loss: 0.1323 - val_loss: 0.1168
Epoch 282/1000
4/4 - 0s - 12ms/step - loss: 0.1311 - val_loss: 0.1156
Epoch 282: early stopping
Restoring model weights from the end of the best epoch: 277.
<keras.src.callbacks.history.History at 0x7d674e76da80>

~~~

Even though the number of epochs was set to 1000, the training/fitting should have stopped much earlier. For example, on the machine this assigment is being created, the training stopped after only 282 epochs with epoch 277 having the best predictions.

### **Arguments that Control the EarlyStopping Object**

There are a number of parameters (arguments) that are specified to the **EarlyStopping** object.

* **min_delta** This value should be kept small. It simply means the minimum change in error to be registered as an improvement.  Setting it even smaller will not likely have a great deal of impact.
* **patience** How long should the training wait for the validation error to improve?  
* **verbose** How much progress information do you want?
* **mode** In general, always set this to "auto".  This allows you to specify if the error should be minimized or maximized.  Consider accuracy, where higher numbers are desired vs log-loss/RMSE where lower numbers are desired.
* **restore_best_weights** This should always be set to true.  This restores the weights to the values they were at when the validation set is the highest.  Unless you are manually tracking the weights yourself (we do not use this technique in this course), you should have Keras perform this step for you.

As you can see from above, the entire number of requested epochs were not used.  The neural network training stopped once the validation set no longer improved.

### **Exercise 1: Early Stopping with Classification: Heart Disease data**

In the cell below, write the code to read the Heart Failure dataset ("heart_disease.csv") from the course HTTPS server and store the data in a DataFrame called `hdDF`.

You can use this code chunk to read the datafile and create your DataFrame:
~~~text
# Read dataset and create DataFrame
hdDF = pd.read_csv(
    "http://biologicslab.co/BIO1173/data/heart_disease.csv",
    na_values=['NA', '?'])
~~~

For you independent variables (X-values)  **only** use the columns Age, RestingBP, Cholesterol, MaxHR and Oldpeak. You should name the Numpy array holding your X-values `hdX`.

Use the column `HeartDisease` as your dependent variable (Y-values). You will need to One-Hot Encode this column, and use the `dummies.values` as your Y-values, `hfY` as shown in the code chunk below:
~~~text
# Generate Y-values
dummies = pd.get_dummies(hdDF['HeartDisease']) # Classification
DiseaseNames = dummies.columns
hdY = dummies.values  # Y-values
hdY = np.asarray(hdY).astype('float32')
~~~

Use the `train_test_split(hdX, hdY, test_size=0.25, random_state=42)` function to create `hdX_train`, `hdX_test`, `hdY_train` and `hdY_test` datasets.

Build a Sequential neural network called `hdModel` with 2 hidden layers with 50 neurons in the first layer and 25 neurons in the second layer. Use `relu` activation for these two hidden layers. The output layer should use `softmax` activation. Don't forget that the number of neurons in your Output layers needs to be defined by this variable: `hdY.shape[1]`.

Compile your model using `categorical_crossentropy` as the loss function with `adam` as the optimizer.

After your model has been compiled, create an object called `hdMonitor` to provide `EarlyStopping()` with the same arguments are shown in Example 1.

Finally, train your model for 1000 epochs but use your `hdMonitor` to enable early stopping.

In [None]:
# Insert your code for Exercise 1 here

If your code is correct, the training of your `hdModel` neural network should have stopped early, before reaching 40 epochs.
~~~text

Epoch 27/1000
22/22 - 0s - 3ms/step - loss: 0.5966 - val_loss: 0.5157
Epoch 28/1000
22/22 - 0s - 3ms/step - loss: 0.5934 - val_loss: 0.4955
Epoch 29/1000
22/22 - 0s - 3ms/step - loss: 0.5720 - val_loss: 0.5079
Epoch 30/1000
22/22 - 0s - 3ms/step - loss: 0.5388 - val_loss: 0.6093
Epoch 31/1000
22/22 - 0s - 3ms/step - loss: 0.6133 - val_loss: 0.5236
Epoch 32/1000
22/22 - 0s - 3ms/step - loss: 0.5204 - val_loss: 0.6646
Epoch 33/1000
22/22 - 0s - 3ms/step - loss: 0.6642 - val_loss: 0.5518
Epoch 33: early stopping
Restoring model weights from the end of the best epoch: 28.
<keras.src.callbacks.history.History at 0x7d6698297460>
~~~

In the example shown above, the training stopped after 33 epochs. The minimum `val_loss` occurred after epoch 28, and started to increase due to overfitting.  

### Example 2: Compute Accuracy Score

Let's see what effect early stopping might have on the accuracy of the `irisModel`?

The code below illustrates how to compute the accuracy score for the model `irisModel` created in Example 1 using the Keras `model.predict()` function and the `accuracy_score()` function from the `scikit-learn` metrics package. To keep variable names separate between Examples and Exercises, the prefix `iris` has been added to different variables that are generated.

In [4]:
# Example 2: Compute accuracy score

from sklearn.metrics import accuracy_score

irisPred = irisModel.predict(irisX_test)
irisPredict_classes = np.argmax(irisPred,axis=1)
irisExpected_classes = np.argmax(irisY_test,axis=1)
irisCorrect = accuracy_score(irisExpected_classes,irisPredict_classes)
print(f"Accuracy: {irisCorrect}")

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 194ms/step
Accuracy: 0.9736842105263158


If your code is correct you should see something similar to the following output:
~~~text
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 189ms/step
Accuracy: 1.0
~~~
WOW! Perfect accuracy!

### **Exercise 2: Compute Accuracy Score**

In the cell below, compute the accuracy score for your `hdModel` and print out the results. Add the prefix `hd` to your variables to keep them from interferring with the variables generated in Example 2 above.

In [None]:
# Insert your code for Exercise 2 here

If your code is correct you should see something similar to the following output:
~~~text
8/8 [==============================] - 0s 3ms/step
Accuracy: 0.7347826086956522
~~~

According to the output above, your `apModel` is only about 75% accurate when it comes to predicting apple quality. Apparently, it's a harder to predict an apple's `Quality` with a classification neural network than to identify the `species` name of an Iris flower.

# **Drop Out for Keras to Decrease Overfitting**

Hinton, Srivastava, Krizhevsky, Sutskever, & Salakhutdinov (2012) introduced the **_dropout regularization_** algorithm. [[Cite:srivastava2014dropout]](http://www.jmlr.org/papers/volume15/nandan14a/nandan14a.pdf)

Although dropout works differently than L1 and L2, it accomplishes the same goal—the prevention of overfitting. However, the algorithm does the task by actually _removing_ neurons and connections—at least temporarily. Unlike L1 and L2, no weight penalty is added. Dropout does not directly seek to train small weights.

Dropout works by causing hidden neurons of the neural network to be unavailable during part of the training. Dropping part of the neural network causes the remaining portion to be trained to still achieve a good score even without the dropped neurons. This technique decreases co-adaptation between neurons, which results in less overfitting.

Most neural network frameworks implement dropout as a separate layer. Dropout layers function like a regular, densely connected neural network layer. The only difference is that the dropout layers will periodically drop some of their neurons during training. You can use dropout layers on regular feedforward neural networks.

A _program_ can implement a dropout layer as a dense layer that can eliminate some of its neurons. Contrary to popular belief about the dropout layer, such a program does not permanently remove these discarded neurons. In other words, a dropout layer does **_not_** lose any of its neurons during the training process, and it will still have the same number of neurons after training. In this way, the program only _temporarily masks_ the neurons rather than dropping them.

Figure 5.DROPOUT shows how a dropout layer might be situated with other layers.

**Figure 5.DROPOUT: Dropout Regularization**
![Dropout Regularization](https://biologicslab.co/BIO1173/images/class_9_dropout.png "Dropout Regularization")

The discarded neurons and their connections are shown as dashed lines. The input layer has two input neurons as well as a bias neuron. The second layer is a dense layer with three neurons and a bias neuron. The third layer is a dropout layer with six regular neurons even though the program has dropped 50% of them.

While the program drops these neurons, it neither calculates nor trains them. However, the final neural network will use _all_ of these neurons for the output. As previously mentioned, the program only temporarily discards the neurons.

The program chooses different sets of neurons from the dropout layer during subsequent training iterations. Although we chose a probability of 50% for dropout, the computer will not necessarily drop three neurons. It is as if we flipped a coin for each of the dropout candidate neurons to choose if that neuron was dropped out. You must know that the program should never drop the bias neuron. Only the regular neurons on a dropout layer are candidates.

The implementation of the training algorithm influences the process of discarding neurons. The dropout set frequently changes once per training iteration or batch. The program can also provide intervals where all neurons are present. Some neural network frameworks give additional hyper-parameters to allow you to specify exactly the rate of this interval.

## Why does Dropout work?

Why dropout is capable of decreasing overfitting is a common question. The answer is that dropout can reduce the chance of **_codependency_** developing between two neurons. Two neurons that develop codependency will not be able to operate effectively when one is dropped out. As a result, the neural network can no longer rely on the presence of every neuron, and it trains accordingly. This characteristic decreases its ability to memorize the information presented, thereby forcing generalization.

Dropout also decreases overfitting by **_forcing a bootstrapping process_** upon the neural network. Bootstrapping is a prevalent ensemble technique. Ensembling is a technique of machine learning that combines multiple models to produce a better result than those achieved by individual models. The ensemble is a term that originates from the musical ensembles in which the final music product that the audience hears is the combination of many instruments.  

**_Bootstrapping_** is one of the most simple ensemble techniques. The bootstrapping programmer simply trains several neural networks to perform precisely the same task. However, each neural network will perform differently because of some training techniques and the random numbers used in the neural network weight initialization. The difference in weights causes the performance variance. The output from this ensemble of neural networks becomes the average output of the members taken together. This process decreases overfitting through the consensus of differently trained neural networks.  

Dropout works somewhat like bootstrapping. You might think of each neural network that results from a different set of neurons being dropped out as an individual member in an ensemble. As training progresses, the program creates more neural networks in this way. However, dropout does not require the same amount of processing as bootstrapping. The new neural networks created are temporary; they exist only for a training iteration. The final result is also a single neural network rather than an ensemble of neural networks to be averaged together.

This short YouTube video shows how dropout works: [Dropout tutorial](https://youtu.be/NhZVe50QwPM?si=Zr-6qrPdE9YXTj3Q)

### Example 3: Dropout for Keras

For Example 3 we will create a neural network with 3 hidden layers and 2 dropout layers to demonstrate how to implement dropout layers in classification neural network. The dataset for Example 1 is the [Body Performance Dataset](https://www.kaggle.com/datasets/kukuroo3/body-performance-data) that we have used in previous lessons.

This dataset has the following 12 categories:

* **age:** 20 ~64
* **gender:** M,F
* **height_cm:** (If you want to convert to feet, divide by 30.48)
* **weight_kg:**
* **body fat_%:**
* **diastolic:** diastolic blood pressure (min)
* **systolic:** systolic blood pressure (min)
* **gripForce:**
* **sit and bend forward_cm:**
* **sit-ups counts:**
* **broad jump_cm:**
* **class:** A,B,C,D ( A: best) / stratified

To help follow the code examples, Example 1 has been divided into 3 sections, A`,`B` and `C`.

### Example 3A: Create feature vector

The code in the cell below reads the Body Performance dataset, `bodyPerformance.csv` from the course HTTPS server and creates a new DataFrame called `bpBigDF`. To speedup training, only 30% of `bpBigDF` we used to create the DataFrame for this example, `bpDF`.

The column `gender` is mapped (`M`=`0`,`F`=`1`). While only the columns `age`, `height_cm`, `weight_kg`, `diastolic`, `systolic` and `gripForce` are standardized, all of the columns, except `class` are used for creating the x-value variable. Since we will are building a classification neural network, column `classes` is One-Hot encoded to gererate the y-values.

As always, both the x-values and the y-values must be converted to type `float32` to avoid errors during training.

In [5]:
# Example 3A: Create feature vector

from scipy.stats import zscore

# Read the data set
bpBigDF = pd.read_csv(
    "https://biologicslab.co/BIO1173/data/bodyPerformance.csv",
    na_values=['NA','?'])

# Only use 30% for neural network
bpDF=bpBigDF.sample(frac=0.30, random_state=2)

# Map Gender
mapping =  {'M': 0,
            'F': 1}
bpDF['gender'] = bpDF['gender'].map(mapping)

# Standardize ranges
bpDF['age'] = zscore(bpDF['age'])
bpDF['height_cm'] = zscore(bpDF['height_cm'])
bpDF['weight_kg'] = zscore(bpDF['weight_kg'])
bpDF['diastolic'] = zscore(bpDF['diastolic'])
bpDF['systolic'] = zscore(bpDF['systolic'])
bpDF['gripForce'] = zscore(bpDF['gripForce'])

# Generate list of columns for x
bpX_columns = bpDF.columns.drop('class')  # `class` is y-value

# Generate x-values as numpy array
bpX = bpDF[bpX_columns].values

# Convert x-values to float 32
bpX = np.asarray(bpX).astype('float32')

# One-Hot encode column containing y-values
dummies = pd.get_dummies(bpDF['class']) # Classification
BpCategories = dummies.columns
bpY = dummies.values

# Convert y-values to float 32
bpY = np.asarray(bpY).astype('float32')

# Print y categorical names
print(*BpCategories)

A B C D


If your code is correct you should see the 4 different fitness levels in your output:
~~~text
A B C D
~~~

### Example 3B: Keras with dropout for Classification

The code in the cell below creates a sequential neural network with 3 hidden layers of densely connected neurons. A dropout layer is added after the first and second hidden layers, but _not_ after the last hidden (3rd) layer. A dropout layer is not usually added after the last hidden layer.

Since this neural network is designed for classification, the output layer has 4 neurons -- one neuron for each fitness class. The number of output neurons is set by the argument `bpY.shape[1]` as demonstrated in the following code chunk:
~~~text
model.add(Dense(bpY.shape[1],activation='softmax')) # Output layer
~~~
The code is setup to take advantage of **_K_-fold Cross Validation**. The number of _K_-folds to be employed is set by the variable `numK=5`. During each _K_ turn of the `for loop`, a new neural network is created and trained for the number of epochs specified in the variable `EPOCHS=100`.  

In [7]:
print(x_train)

[[-6.3892961e-01  0.0000000e+00  1.3777179e+00 ...  1.6400000e+01
   4.5000000e+01  2.1400000e+02]
 [-1.0046929e+00  1.0000000e+00 -2.7060202e-01 ...  2.5200001e+01
   4.0000000e+01  1.9000000e+02]
 [-7.1208227e-01  1.0000000e+00 -1.8953745e+00 ...  2.3200001e+01
   1.6000000e+01  1.3700000e+02]
 ...
 [ 1.4824973e+00  0.0000000e+00 -4.6901457e-02 ...  5.8000002e+00
   3.5000000e+01  1.9700000e+02]
 [-6.3892961e-01  0.0000000e+00  5.1823682e-01 ...  2.0600000e+01
   5.0000000e+01  2.2900000e+02]
 [-2.7316636e-01  0.0000000e+00  1.0598276e+00 ...  1.3100000e+01
   4.7000000e+01  2.1300000e+02]]


In [8]:
# Example 3B: Keras with dropout for Classification L1

import pandas as pd
import os
import numpy as np
import time
from sklearn import metrics
from sklearn.model_selection import KFold
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout,Input
from tensorflow.keras import regularizers

# Set variables
EPOCHS=100 # number of epochs for each loop
numK=5     # number of K-folds

# Consistent tensor shapes
input_shape = bpX.shape[1]
output_shape = bpY.shape[1] # classification

# Record start time
start_time = time.time()

# Cross-validate
kf = KFold(numK, shuffle=True, random_state=42)

# Initialize arrays and counter
oos_y = []
oos_pred = []
fold = 0

# Start loop ------------------------------------------#

for train, test in kf.split(bpX):
    fold+=1
    print(f"Starting Fold #{fold}...")

    # Split data for this fold
    x_train = bpX[train]
    y_train = bpY[train]
    x_test = bpX[test]
    y_test = bpY[test]

    # To use L2 regularization, substitute
    # kernel_regularizer=regularizers.l2(0.01)

    # Create new model for this K fold
    model = Sequential()
    model.add(Input(shape=(input_shape,)))  # Input
    model.add(Dense(50, activation='relu')) # Hidden 1
    model.add(Dropout(0.5))  # Add dropout after Hidden 1
    model.add(Dense(25, activation='relu', \
                activity_regularizer=regularizers.l1(1e-4))) # Hidden 2
    model.add(Dropout(0.5))  # Add dropout after Hidden 2
    model.add(Dense(10, activation='relu', \
                activity_regularizer=regularizers.l1(1e-4))) # Hidden 3
    # Usually don't add dropout after last hidden layer
    model.add(Dense(output_shape,activation='softmax')) # Output layer

    # Compile model for classification
    model.compile(loss='categorical_crossentropy', optimizer='adam')

    # Fit model
    model.fit(x_train,y_train,validation_data=(x_test,y_test),\
              verbose=0,epochs=EPOCHS)

    # Use model to make predictions
    pred = model.predict(x_test)
    # Add actual y-values for the data used this fold
    oos_y.append(y_test)
    # raw probabilities to chosen class (highest probability)
    pred = np.argmax(pred,axis=1)
    oos_pred.append(pred)

    # Measure this fold's accuracy
    y_compare = np.argmax(y_test,axis=1) # For accuracy calculation
    score = metrics.accuracy_score(y_compare, pred)
    print(f"Fold score (accuracy): {score}")

# End loop ---------------------------------------------#

# Final Processing

# Build the oos prediction list and calculate the error.
oos_y = np.concatenate(oos_y)
oos_pred = np.concatenate(oos_pred)
oos_y_compare = np.argmax(oos_y,axis=1) # For accuracy calculation
score = metrics.accuracy_score(oos_y_compare, oos_pred)
print(f"Final score (accuracy): {score}")

# Write the cross-validated prediction
oos_y = pd.DataFrame(oos_y)
oos_pred = pd.DataFrame(oos_pred)
oosDF = pd.concat( [oos_pred, oos_y],axis=1 )

# Print elapsed time
elapsed_time = time.time() - start_time
print("Elapsed time: {}".format(hms_string(elapsed_time)))

Starting Fold #1...
[1m26/26[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 13ms/step
Fold score (accuracy): 0.5696517412935324
Starting Fold #2...
[1m26/26[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 13ms/step
Fold score (accuracy): 0.5435323383084577
Starting Fold #3...
[1m26/26[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 13ms/step
Fold score (accuracy): 0.5572139303482587
Starting Fold #4...
[1m26/26[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 15ms/step
Fold score (accuracy): 0.5853051058530511
Starting Fold #5...
[1m26/26[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 13ms/step
Fold score (accuracy): 0.572851805728518
Final score (accuracy): 0.5657043305126929
Elapsed time: 0:02:55.71


This code will run pretty slowly unless you have GPU. On my Windows machine with an NVIDIA graphics card, it took about 4 minutes to complete 5 folds with 100 epochs per fold.

If your code is correct, you should see something similar to the following output:
~~~text
Starting Fold #1...
26/26 [==============================] - 0s 2ms/step
Fold score (accuracy): 0.5883084577114428
Starting Fold #2...
26/26 [==============================] - 0s 1ms/step
Fold score (accuracy): 0.5808457711442786
Starting Fold #3...
26/26 [==============================] - 0s 1ms/step
Fold score (accuracy): 0.5833333333333334
Starting Fold #4...
26/26 [==============================] - 0s 2ms/step
Fold score (accuracy): 0.5815691158156912
Starting Fold #5...
26/26 [==============================] - 0s 2ms/step
Fold score (accuracy): 0.6239103362391034
Final score (accuracy): 0.5915878546540567
Elapsed time: 0:04:20.22
~~~


### Example 3C: Print out actual and predicted y-values

The code in the cell below prints out the predicted and the actual Body Performance classes for the out-of-sample (oos) individuals. The function `new_column_mapping()` created at the start of this lesson is used make the column labels more informative.

In [9]:
# Example 3C: Print out actual and predicted y-values

# Rename columns
new_column_mapping = {0: 'Predicted Fitness Class', 1: 'Actual: 0'}
oosDF = rename_col_by_index(oosDF, new_column_mapping)

# Set display options
pd.set_option('display.max_rows', 8)
pd.set_option('display.max_columns', 8)

# Display DataFrame
display(oosDF)

Unnamed: 0,Predicted Fitness Class,Actual: 0,1,2,3
0,1,0.0,0.0,1.0,0.0
1,0,1.0,0.0,0.0,0.0
2,2,0.0,0.0,0.0,1.0
3,0,0.0,1.0,0.0,0.0
...,...,...,...,...,...
4014,0,0.0,0.0,1.0,0.0
4015,2,0.0,0.0,0.0,1.0
4016,3,0.0,0.0,0.0,1.0
4017,2,0.0,0.0,1.0,0.0


If your code is correct, you should see something similar to the following table:

![___](https://biologicslab.co/BIO1173/images/class_05_4_Exm1C.png)

For the 8 predictions shown in the above table, 4 were correct and 4 were incorrect. That error rate is what you might expect with a final accuracy score of around 60%.

## **Exercise 3: Dropout for Keras**

For **Exercise 3** you are to create a classification neural network with 3 hidden layers and 2 dropout layers. The dataset you should use is the [Obesity Prediction Dataset](https://www.kaggle.com/datasets/mrsimple07/obesity-prediction) that we have seen in previous lessons.

The 7 categories of obesity/demographics measurements are:

* **Age:** The age of the individual, expressed in years (Mean=49.9 yrs +/- 18.1)
* **Gender:** The gender of the individual coded Male and Female
* **Height:** The height of the individual measured in centimeters (Mean=170 cm +/- 10.3)
* **Weight:** The weight of the individual measured in kilograms (Mean=71.2 kg +/- 15.5)
* **BMI:** Body mass index, a calculated metric derived from the individual's weight and height (Mean=24.9 +/- 6.19)
* **PhysicalActivityLevel:** This variable quantifies the individual's level of physical activity (Mean=2.53 +/- 1.12)
* **ObesityCategory:** Categorization of individuals based on their BMI into different obesity categories. The Obesity Categories are: `Underweight`, `Normal weight`, `Overweight`, and `Obese`.

You are to construct a neural network that can predict the correct `ObesityCategory`.

To help you in your coding, **Exercise 1** has been divided into 3 sections, A`, `B` and `C`.

### **Exercise 3A: Create feature vector**

In the cell below, write the code to read the Obesity Prediction dataset `obesity_prediction.csv` and create a new DataFrame called `opDF`. Since this dataset isn't too large, use the **entire** dataset to create `opDF` (i.e. don't sample it).

You will need to map the column `Gender` to convert the strings `Male` and `Female` to integers. Also standardize the columns `Age`, `Height`, `Weight` and `BMI` to their Zscores.

Since the column, `ObesityCategory`, will be the y-value for training your neural network, make sure to drop it when generating your list of column names for generatung x-values (`opX_columns`). You can use the following code chunk to create your x-values:
~~~text
# Generate x-values as numpy array
opX = opDF[opX_columns].values
~~~
Since you are building a neural network for classification, you will need to One-Hot encode the `ObesityCategory` column. Use the `dummies.values`, created as part of your One-Hot encoding, as your y-value, `opY`. You should also save the `dummies.columns` in a variable called `ObCategories`.

Don't forget to convert all your x-values and y-values to `float32` or you will get an error message when you try to train your neural network.

Finally, print out the categorical values (names) that were One-Hot encoded using the "starred" print statement:
~~~text
# Print y categorical names
print(*ObCategories)
~~~

In [10]:
# Insert your code for Exercise 3A here

from scipy.stats import zscore

# Read the data set
opDF = pd.read_csv(
    "https://biologicslab.co/BIO1173/data/obesity_prediction.csv",
    na_values=['NA','?'])

# Map Gender
mapping =  {'Male': 0,
            'Female': 1}
opDF['Gender'] = opDF['Gender'].map(mapping)

# Standardize ranges
opDF['Age'] = zscore(opDF['Age'])
opDF['Height'] = zscore(opDF['Height'])
opDF['Weight'] = zscore(opDF['Weight'])
opDF['BMI'] = zscore(opDF['BMI'])

# Generate list of columns for x
opX_columns = opDF.columns.drop('ObesityCategory')  #

# Generate x-values as numpy array
opX = opDF[opX_columns].values
opX = np.asarray(opX).astype('float32')

# One-Hot encode the column containing the y-values
dummies = pd.get_dummies(opDF['ObesityCategory']) # Classification
ObCategories = dummies.columns
opY = dummies.values
opY = np.asarray(opY).astype('float32')

# Print y categorical names
print(*ObCategories)

Normal weight Obese Overweight Underweight


If your code is correct, you should see the names of the 5 obesity categories:
~~~text
Normal weight Obese Overweight Underweight
~~~

Exercise 3B: Keras with dropout for Classification

In the cell below, create a sequential neural network with 3 hidden layers of densely connected neurons. Add a dropout layer after the first and second hidden layers, but not after the 3rd hidden layer.

Since this neural network is designed for classification, make sure your output layer has 5 neurons -- one neuron for each Obesity Category. You can set the correct number of output neurons using the following code chunk:

model.add(Dense(opY.shape[1],activation='softmax')) # Output layer

Setup your code to take advantage of K-fold Cross Validation. Set the number of K-folds to 5 and the number of epochs to 100 for each time through the loop.

In [11]:
# Insert your code for Exercise 3B here

import pandas as pd
import os
import numpy as np
import time
from sklearn import metrics
from sklearn.model_selection import KFold
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout, Input
from tensorflow.keras import regularizers

# Set variables
EPOCHS=100 # number of epochs for each loop
numK=5     # Set number of K-folds

# Consistent tensor shapes
input_shape = opX.shape[1]
output_shape = opY.shape[1] # classification

# Record start time
start_time = time.time()

# Cross-validate
kf = KFold(numK, shuffle=True, random_state=42)

# Initialize arrays and counter
oos_y = []
oos_pred = []
fold = 0

# Start loop ------------------------------------------#

for train, test in kf.split(opX):
    fold+=1
    print(f"Starting Fold #{fold}...")

    # Split data for this fold
    x_train = opX[train]
    y_train = opY[train]
    x_test = opX[test]
    y_test = opY[test]

    # Create new model for this fold
    model = Sequential()
    model.add(Input(shape=(input_shape,)))  # Input
    model.add(Dense(50, activation='relu')) # Hidden 1
    model.add(Dropout(0.5))  # Add dropout after Hidden 1
    model.add(Dense(25, activation='relu', \
                activity_regularizer=regularizers.l1(1e-4))) # Hidden 2
    model.add(Dropout(0.5))  # Add dropout after Hidden 2
    model.add(Dense(10, activation='relu', \
                activity_regularizer=regularizers.l1(1e-4))) # Hidden 3
    # Usually don't add dropout after last hidden layer
    model.add(Dense(output_shape,activation='softmax')) # Output

    # Compile model for classification
    model.compile(loss='categorical_crossentropy', optimizer='adam')

    # Fit model
    model.fit(x_train,y_train,validation_data=(x_test,y_test),\
              verbose=0,epochs=EPOCHS)

    # Use model to make predictions
    pred = model.predict(x_test)
     # Add actual y-values for the data used this fold
    oos_y.append(y_test)
    # raw probabilities to chosen class (highest probability)
    pred = np.argmax(pred,axis=1)
    oos_pred.append(pred)

    # Measure this fold's accuracy
    y_compare = np.argmax(y_test,axis=1) # For accuracy calculation
    score = metrics.accuracy_score(y_compare, pred)
    print(f"Fold score (accuracy): {score}")

# End loop ---------------------------------------------#

# Build the oos prediction list and calculate the error.
oos_y = np.concatenate(oos_y)
oos_pred = np.concatenate(oos_pred)
oos_y_compare = np.argmax(oos_y,axis=1) # For accuracy calculation

score = metrics.accuracy_score(oos_y_compare, oos_pred)
print(f"Final score (accuracy): {score}")

# Write the cross-validated prediction
oos_y = pd.DataFrame(oos_y)
oos_pred = pd.DataFrame(oos_pred)
oosDF = pd.concat( [oos_pred, oos_y],axis=1 )

# Print elapsed time
elapsed_time = time.time() - start_time
print("Elapsed time: {}".format(hms_string(elapsed_time)))


Starting Fold #1...
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 67ms/step
Fold score (accuracy): 0.975
Starting Fold #2...




[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 56ms/step
Fold score (accuracy): 0.955
Starting Fold #3...




[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 55ms/step
Fold score (accuracy): 0.955
Starting Fold #4...
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 53ms/step
Fold score (accuracy): 0.965
Starting Fold #5...
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 54ms/step
Fold score (accuracy): 0.965
Final score (accuracy): 0.963
Elapsed time: 0:01:27.25


Training of your neural network should run faster that the one in Example 1.

If your code is correct, you should see the following output:
~~~text
Starting Fold #1...
7/7 [==============================] - 0s 2ms/step
Fold score (accuracy): 0.95
Starting Fold #2...
7/7 [==============================] - 0s 2ms/step
Fold score (accuracy): 0.97
Starting Fold #3...
7/7 [==============================] - 0s 1ms/step
Fold score (accuracy): 0.935
Starting Fold #4...
7/7 [==============================] - 0s 2ms/step
Fold score (accuracy): 0.965
Starting Fold #5...
7/7 [==============================] - 0s 1ms/step
Fold score (accuracy): 0.91
Final score (accuracy): 0.946
Elapsed time: 0:01:30.00
~~~

The final accuracy score for your neural network is about 95%, which is very good.

### **Exercise 3C: Print out actual and predicted y-values**

In the cell below, print out the predicted and the actual Obesity Catagories for the out-of-sample (oos) individuals.

Use the following code chunk to change the column names to make them easier to interpret.
~~~text
# Rename columns
new_column_mapping = {0: 'Predicted Obesity Category', 1: 'Actual: 0'}
oosDF = rename_col_by_index(oosDF, new_column_mapping)
~~~


In [12]:
# Example 3C: Print out actual and predicted y-values

# Rename columns
new_column_mapping = {0: 'Predicted Obesity Category', 1: 'Actual: 0'}
oosDF = rename_col_by_index(oosDF, new_column_mapping)

# Set display options
pd.set_option('display.max_rows', 8)
pd.set_option('display.max_columns', 8)

# Display DataFrame
display(oosDF)

Unnamed: 0,Predicted Obesity Category,Actual: 0,1,2,3
0,2,0.0,0.0,1.0,0.0
1,0,1.0,0.0,0.0,0.0
2,3,0.0,0.0,0.0,1.0
3,2,0.0,0.0,1.0,0.0
...,...,...,...,...,...
996,0,1.0,0.0,0.0,0.0
997,1,0.0,1.0,0.0,0.0
998,2,0.0,0.0,1.0,0.0
999,1,0.0,1.0,0.0,0.0


If your code is correct, you should see something similar to the following table:

![___](https://biologicslab.co/BIO1173/images/class_05_4_Exe1C.png)

All 8 predictions shown in the above table are correct. Almost perfect predictions is what you might expect with a final accuracy score of around 95%.

## **Lesson Turn-in**

When you have completed and run all of the code cells, use the **File --> Print.. --> Save to PDF** to generate a PDF of your Colab notebook. Save your PDF as `Class_02_3.lastname.pdf` where _lastname_ is your last name, and upload the file to Canvas.

## **Lizard Tail**


![___](https://upload.wikimedia.org/wikipedia/commons/thumb/9/98/Apple_II_typical_configuration_1977.png/2560px-Apple_II_typical_configuration_1977.png)

## **Apple II (original)**

The **Apple II** (stylized as apple ][) is a personal computer released by Apple Inc. in June 1977. It was one of the first successful mass-produced microcomputer products and is widely regarded as one of the most important personal computers of all time due to its role in popularizing home computing and influencing later software development.

The Apple II was designed primarily by Steve Wozniak. The system is based around the 8-bit MOS Technology 6502 microprocessor. Jerry Manock designed the foam-molded plastic case, Rod Holt developed the switching power supply, while Steve Jobs was not involved in the design of the computer. It was introduced by Jobs and Wozniak at the 1977 West Coast Computer Faire, and marks Apple's first launch of a computer aimed at a consumer market—branded toward American households rather than businessmen or computer hobbyists.


The three computers that Byte magazine referred to as the "1977 Trinity" of home computing: Commodore PET 2001, Apple II, and TRS-80 Model I
Byte magazine referred to the Apple II, Commodore PET 2001, and TRS-80 as the "1977 Trinity". As the Apple II had the defining feature of being able to display color graphics, the Apple logo was redesigned to have a spectrum of colors.

The Apple II was the first in a series of computers collectively referred to by the Apple II name. It was followed by the Apple II+, Apple IIe, Apple IIc, Apple IIc Plus, and the 16-bit Apple IIGS—all of which remained compatible. Production of the last available model, the Apple IIe, ceased in November 1993.

**History**

By 1976, Steve Jobs had convinced product designer Jerry Manock (who had formerly worked at Hewlett Packard designing calculators) to create the "shell" for the Apple II—a smooth case inspired by kitchen appliances that concealed the internal mechanics. The earliest Apple II computers were assembled in Silicon Valley and later in Texas; printed circuit boards were manufactured in Ireland and Singapore. The first computers went on sale on June 10, 1977 with an MOS Technology 6502 microprocessor running at 1.023 MHz (2⁄7 of the NTSC color subcarrier), two game paddles (bundled until 1980, when they were found to violate FCC regulations), 4 KiB of RAM, an audio cassette interface for loading programs and storing data, and the Integer BASIC programming language built into ROMs. The video controller displayed 24 lines by 40 columns of monochrome, uppercase-only text on the screen (the original character set matches ASCII characters 20h to 5Fh), with NTSC composite video output suitable for display on a video monitor or on a regular TV set (by way of a separate RF modulator).

The original retail price of the computer with 4 KiB of RAM was US \$1,298 (equivalent to \$6,530 in 2023) and with the maximum 48 KiB of RAM, it was US \$2,638 (equivalent to \$13,260 in 2023) To reflect the computer's color graphics capability, the Apple logo on the casing has rainbow stripes, which remained a part of Apple's corporate logo until early 1998. Perhaps most significantly, the Apple II was a catalyst for personal computers across many industries; it opened the doors to software marketed at consumers.

Certain aspects of the system's design were influenced by Atari, Inc.'s arcade video game Breakout (1976), which was designed by Wozniak, who said: "A lot of features of the Apple II went in because I had designed Breakout for Atari. I had designed it in hardware. I wanted to write it in software now". This included his design of color graphics circuitry, the addition of game paddle support and sound, and graphics commands in Integer BASIC, with which he wrote Brick Out, a software clone of his own hardware game. Wozniak said in 1984: "Basically, all the game features were put in just so I could show off the game I was familiar with—Breakout—at the Homebrew Computer Club. It was the most satisfying day of my life I demonstrated Breakout—totally written in BASIC. It seemed like a huge step to me. After designing hardware arcade games, I knew that being able to program them in BASIC was going to change the world."