<a href="https://colab.research.google.com/github/DavidSenseman/BIO1173/blob/main/Class_03_5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---------------------------
**COPYRIGHT NOTICE:** This Jupyterlab Notebook is a Derivative work of [Jeff Heaton](https://github.com/jeffheaton) licensed under the Apache License, Version 2.0 (the "License"); You may not use this file except in compliance with the License. You may obtain a copy of the License at

> [http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

------------------------

# **BIO 1173: Intro Computational Biology**

**Module 3: Introduction to TensorFlow**

* Instructor: [David Senseman](mailto:David.Senseman@utsa.edu), [Department of Biology, Health and the Environment](https://sciences.utsa.edu/bhe/), [UTSA](https://www.utsa.edu/)

### Module 3 Material


* Part 3.1: Deep Learning and Neural Network Introduction
* Part 3.2: Using Keras to Build Regression Models
* Part 3.3: Using Keras to Build Classification Models
* Part 3.4: Saving and Loading a Keras Neural Network
* **Part 3.5: Early Stopping in Keras to Prevent Overfitting**


## Google CoLab Instructions

You MUST run the following code cell to get credit for this class lesson. By running this code cell, you will map your GDrive to /content/drive and print out your Google GMAIL address. Your Instructor will use your GMAIL address to verify the author of this class lesson.

In [None]:
# You must run this cell first
try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
    from google.colab import auth
    auth.authenticate_user()
    COLAB = True
    print("Note: Using Google CoLab")
    import requests
    gcloud_token = !gcloud auth print-access-token
    gcloud_tokeninfo = requests.get('https://www.googleapis.com/oauth2/v3/tokeninfo?access_token=' + gcloud_token[0]).json()
    print(gcloud_tokeninfo['email'])
except:
    print("**WARNING**: Your GMAIL address was **not** printed in the output below.")
    print("**WARNING**: You will NOT receive credit for this lesson.")
    COLAB = False

Make sure your GMAIL address is included as the last line in the output above.

# **Early Stopping in Keras to Prevent Overfitting**

It can be difficult to determine how many epochs to cycle through to train a neural network. **_Overfitting_** will occur if you train the neural network for too many epochs, and the neural network will not perform well on new data, despite attaining a good accuracy on the training set. Overfitting occurs when a neural network is trained to the point that it begins to memorize rather than generalize, as demonstrated in Figure 3.OVER.

**Training vs. Validation Error for Overfitting**
![Training vs. Validation Error for Overfitting](https://biologicslab.co/BIO1173/images/class_3_training_val.png "Training vs. Validation Error for Overfitting")

It is important to segment the original dataset into several datasets:

* **Training Set**
* **Validation Set**
* **Holdout Set**

You can construct these sets in several different ways. The following programs demonstrate some of these.

The first method is a training and validation set. We use the training data to train the neural network until the validation set no longer improves. This attempts to stop at a near-optimal training point. This method will only give accurate "out of sample" predictions for the validation set; this is usually 20% of the data. The predictions for the training data will be overly optimistic, as these were the data that we used to train the neural network. Figure 3.VAL demonstrates how we divide the dataset.

**Training with a Validation Set**
![Training with a Validation Set](https://biologicslab.co/BIO1173/images/class_1_train_val.png "Training with a Validation Set")

## **Early Stopping**

We will now see an example of classification training with early stopping. We will train a neural network until the error no longer improves on the validation set.

### Example 1: Early Stopping with Classification

The code in the cell below builds and trains a **_classification_** neural network called `irisModel`. The model is trained/fitted to the Iris flower dataset (`iris.csv`) downloaded from the course HTTPS server and stored in the DataFrame `irisDF`.

The independent variables, or X-values, are the values in the columns `sepal_length`, `sepal_width`, `petal_length` and `petal_width`. The values are store in a Numpy array called `irisX`.

The dependent variable (Y-value) is the column `species` which contains the names of the three Iris species in the dataset, _Iris setosa_, _Iris_ versicolor_ and _Iris virginica_.

Since the species names are entered in the `species` column as strings, it is necessary to use One-Hot Encoding to convert these strings into the values `0` and `1` using the command `pd.get_dummies()`. The variable holding the dependent values, `irisY`, is created from the `dummies.values` as shown below.

#### **Early Stopping**

In order to implement _Early Stopping_, it is first necessary to split the dataset into 4 separate groups: X train, X test, Y train and Y test using the function `train_test_split()`. The argument `test_size=0.25` tells the function that 75% of the data should be put into the two train sets (i.e. `irisX_train` and `irisY_train`) and the remaining 25% should be put into the two validation sets `irisX_test` and `irisY_test`.

Since the separation of data into training and test sets is a random process, the argument `random_state=42` is used for teaching/demonstration purposes to insure that the `split` occurs at the same places when the code is re-run. In normal use, you wouldn't set the random seed.

The model, `irisModel`, is a densely connected sequential neural network with two hidden layers. The 1st layer has 50 neurons, the 2nd hidden layer 25. The activation function for both hidden layers is `relu`. Since this function of this model is classification, the `softmax` activation function is used in the output layer. The model is compiled with the 'categorical_crossentropy` loss function and the `adam` optimizer.

The code for implementing early stopping variable `irisMoniter` is shown below:

~~~text
# Build monitor for early stopping
irisMonitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5,
        verbose=1, mode='auto', restore_best_weights=True)
~~~
The meaning/function of the different arguments will be discussed below.

Finally, the model is fitted to the Iris data with the number of epochs set to **1000!** Don't worry, you won't have to wait forever for the training to complete--thanks to early stopping.

In [None]:
# Example 1: Early stopping with classification

import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Input
from keras.callbacks import EarlyStopping
import numpy as np
from sklearn.model_selection import train_test_split

# Read dataset and create DataFrame-------------------------------------
irisDF = pd.read_csv("https://biologicslab.co/BIO1173/data/iris.csv",
    na_values=['NA', '?'])

# Create feature vector-------------------------------------------------
# Generate X-values
irisX = irisDF[['sepal_length', 'sepal_width',
                'petal_length', 'petal_width']].values
irisX = np.asarray(irisX).astype('float32')

# Generate Y-values
dummies = pd.get_dummies(irisDF['species']) # Classification
SpeciesNames = dummies.columns  # List with species names
irisY = dummies.values # Y-values
irisY = np.asarray(irisY).astype('float32')

# Split into validation and training sets------------------------------
irisX_train, irisX_test, irisY_train, irisY_test = train_test_split(
    irisX, irisY, test_size=0.25, random_state=42)

# Build neural network-------------------------------------------------
irisModel = Sequential()
irisModel.add(Input(shape=(irisX.shape[1],)))  # Hidden 1
irisModel.add(Dense(25, activation='relu'))
irisModel.add(Dense(irisY.shape[1],activation='softmax'))
irisModel.compile(loss='categorical_crossentropy', optimizer='adam')

# Build monitor for early stopping-------------------------------------
irisMonitor = EarlyStopping(monitor='val_loss',
              min_delta=1e-3, patience=5, verbose=1,
              mode='auto', restore_best_weights=True)

# Train model----------------------------------------------------------
irisModel.fit(irisX_train,irisY_train,validation_data=(irisX_test,irisY_test),
        callbacks=[irisMonitor],verbose=2,epochs=1000)


If your code is correct, you should see something similar to the following output:

~~~text
Epoch 1/1000
4/4 - 2s - 612ms/step - loss: 1.9500 - val_loss: 1.7919
Epoch 2/1000
4/4 - 0s - 13ms/step - loss: 1.8188 - val_loss: 1.6893
Epoch 3/1000
4/4 - 0s - 13ms/step - loss: 1.7055 - val_loss: 1.5989
Epoch 4/1000
4/4 - 0s - 12ms/step - loss: 1.6103 - val_loss: 1.5197
Epoch 5/1000
4/4 - 0s - 12ms/step - loss: 1.5213 - val_loss: 1.4477

....................................

Epoch 277/1000
4/4 - 0s - 13ms/step - loss: 0.1340 - val_loss: 0.1158
Epoch 278/1000
4/4 - 0s - 12ms/step - loss: 0.1331 - val_loss: 0.1156
Epoch 279/1000
4/4 - 0s - 12ms/step - loss: 0.1328 - val_loss: 0.1157
Epoch 280/1000
4/4 - 0s - 12ms/step - loss: 0.1323 - val_loss: 0.1171
Epoch 281/1000
4/4 - 0s - 12ms/step - loss: 0.1323 - val_loss: 0.1168
Epoch 282/1000
4/4 - 0s - 12ms/step - loss: 0.1311 - val_loss: 0.1156
Epoch 282: early stopping
Restoring model weights from the end of the best epoch: 277.
<keras.src.callbacks.history.History at 0x7d674e76da80>

~~~

Even though the number of epochs was set to 1000, the training/fitting should have stopped much earlier. For example, on the machine this assigment is being created, the training stopped after only 282 epochs with epoch 277 having the best predictions.

### **Arguments that Control the EarlyStopping Object**

There are a number of parameters (arguments) that are specified to the **EarlyStopping** object.

* **min_delta** This value should be kept small. It simply means the minimum change in error to be registered as an improvement.  Setting it even smaller will not likely have a great deal of impact.
* **patience** How long should the training wait for the validation error to improve?  
* **verbose** How much progress information do you want?
* **mode** In general, always set this to "auto".  This allows you to specify if the error should be minimized or maximized.  Consider accuracy, where higher numbers are desired vs log-loss/RMSE where lower numbers are desired.
* **restore_best_weights** This should always be set to true.  This restores the weights to the values they were at when the validation set is the highest.  Unless you are manually tracking the weights yourself (we do not use this technique in this course), you should have Keras perform this step for you.

As you can see from above, the entire number of requested epochs were not used.  The neural network training stopped once the validation set no longer improved.

### **Exercise 1: Early Stopping with Classification: Heart Disease data**

In the cell below, write the code to read the Heart Failure dataset ("heart_disease.csv") from the course HTTPS server and store the data in a DataFrame called `hdDF`.

You can use this code chunk to read the datafile and create your DataFrame:
~~~text
# Read dataset and create DataFrame
hdDF = pd.read_csv(
    "http://biologicslab.co/BIO1173/data/heart_disease.csv",
    na_values=['NA', '?'])
~~~

For you independent variables (X-values)  **only** use the columns Age, RestingBP, Cholesterol, MaxHR and Oldpeak. You should name the Numpy array holding your X-values `hdX`.

Use the column `HeartDisease` as your dependent variable (Y-values). You will need to One-Hot Encode this column, and use the `dummies.values` as your Y-values, `hfY` as shown in the code chunk below:
~~~text
# Generate Y-values
dummies = pd.get_dummies(hdDF['HeartDisease']) # Classification
DiseaseNames = dummies.columns
hdY = dummies.values  # Y-values
hdY = np.asarray(hdY).astype('float32')
~~~

Use the `train_test_split(hdX, hdY, test_size=0.25, random_state=42)` function to create `hdX_train`, `hdX_test`, `hdY_train` and `hdY_test` datasets.

Build a Sequential neural network called `hdModel` with 2 hidden layers with 50 neurons in the first layer and 25 neurons in the second layer. Use `relu` activation for these two hidden layers. The output layer should use `softmax` activation. Don't forget that the number of neurons in your Output layers needs to be defined by this variable: `hdY.shape[1]`.

Compile your model using `categorical_crossentropy` as the loss function with `adam` as the optimizer.

After your model has been compiled, create an object called `hdMonitor` to provide `EarlyStopping()` with the same arguments are shown in Example 1.

Finally, train your model for 1000 epochs but use your `hdMonitor` to enable early stopping.

In [None]:
# Insert your code for Exercise 1 here



If your code is correct, the training of your `hdModel` neural network should have stopped early, before reaching 40 epochs.
~~~text

Epoch 27/1000
22/22 - 0s - 3ms/step - loss: 0.5966 - val_loss: 0.5157
Epoch 28/1000
22/22 - 0s - 3ms/step - loss: 0.5934 - val_loss: 0.4955
Epoch 29/1000
22/22 - 0s - 3ms/step - loss: 0.5720 - val_loss: 0.5079
Epoch 30/1000
22/22 - 0s - 3ms/step - loss: 0.5388 - val_loss: 0.6093
Epoch 31/1000
22/22 - 0s - 3ms/step - loss: 0.6133 - val_loss: 0.5236
Epoch 32/1000
22/22 - 0s - 3ms/step - loss: 0.5204 - val_loss: 0.6646
Epoch 33/1000
22/22 - 0s - 3ms/step - loss: 0.6642 - val_loss: 0.5518
Epoch 33: early stopping
Restoring model weights from the end of the best epoch: 28.
<keras.src.callbacks.history.History at 0x7d6698297460>

In the example shown above, the training stopped after 33 epochs. The minimum `val_loss` occurred after epoch 28, and started to increase due to overfitting.  

~~~

### Example 2: Compute Accuracy Score

Let's see what effect early stopping might have on the accuracy of the `irisModel`?

The code below illustrates how to compute the accuracy score for the model `irisModel` created in Example 1 using the Keras `model.predict()` function and the `accuracy_score()` function from the `scikit-learn` metrics package. To keep variable names separate between Examples and Exercises, the prefix `iris` has been added to different variables that are generated.

In [None]:
# Example 2: Compute accuracy score

from sklearn.metrics import accuracy_score

irisPred = irisModel.predict(irisX_test)
irisPredict_classes = np.argmax(irisPred,axis=1)
irisExpected_classes = np.argmax(irisY_test,axis=1)
irisCorrect = accuracy_score(irisExpected_classes,irisPredict_classes)
print(f"Accuracy: {irisCorrect}")

If your code is correct you should see something similar to the following output:
~~~text
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 189ms/step
Accuracy: 1.0
~~~
WOW! Perfect accuracy!

### **Exercise 2: Compute Accuracy Score**

In the cell below, compute the accuracy score for your `hdModel` and print out the results. Add the prefix `hd` to your variables to keep them from interferring with the variables generated in Example 2 above.

In [None]:
# Insert your code for Exercise 2 here



If your code is correct you should see something similar to the following output:
~~~text
8/8 [==============================] - 0s 3ms/step
Accuracy: 0.7347826086956522
~~~

According to the output above, your `apModel` is only about 75% accurate when it comes to predicting apple quality. Apparently, it's a harder to predict an apple's `Quality` with a classification neural network than to identify the `species` name of an Iris flower.

## Early Stopping with Regression

The following code demonstrates how we can apply early stopping to a regression problem.  The technique is similar to the early stopping for classification code that we just saw.

### Example 3: Early Stopping with Regression

The regression neural network `petalModel`, constructed in the cell below, is designed to predict the `petal_length`, the Y-value, based on the flower's `sepal_length`, `sepal_width`, `petal_width` and `species`.

We begin by reading the Iris Flower dataset to create a DataFrame called `petalDF`.

The next step is to prepare the feature vector. The first step is to take care of columns with non-numeric values. Since the column `species` contains the categorical values `Iris-setosa`, `Iris-versicolor` and `Iris-virginica`, we will map these strings to the integers `1`, `2`, and `3`, respectively. This mapping must be done _before_ the column `species` can be included as part of the X-values.

Here is the code chunk that generates the X-values and stores them in a Numpy array called `petalX`:
~~~text
# Generate X-values
petalX = petalDF[['sepal_length', 'sepal_width',
                'petal_width', 'species']].values
petalX = np.asarray(petalX).astype('float32')
~~~

And since we are building a **_regression_** neural network to predict `petal_length`, the numerical values in the column `petal_length` are used _directly_ to generate the Y-values as shown in this code chunk:
~~~text
# Generate Y-values
petalY = petalDF['petal_length'].values # Y-values
petalY = np.asarray(petalY).astype('float32')
~~~~
In other words, you do **not** One-Hot Encode the column with the Y-values.

You should also note that when building a _regression_ neural network there is only one neuron is the output layer and no activation function as show here:

~~~text
petalModel.add(Dense(1)) # Output
~~~

Finally, we need to change the loss function to RMSE when the model is compiled:
~~~text
petalModel.compile(loss='mean_squared_error', optimizer='adam')
~~~


In [None]:
# Example 3: Early stopping with regression

from keras.models import Sequential
from keras.layers import Dense, Input
from keras.callbacks import EarlyStopping
import numpy as np
from sklearn.model_selection import train_test_split

# Read dataset and create DataFrame-------------------------------------
petalDF = pd.read_csv("https://biologicslab.co/BIO1173/data/iris.csv",
    na_values=['NA', '?'])

# Create feature vector-------------------------------------------------

# Define the mapping dictionary
mapping = {'Iris-setosa': 1,
           'Iris-versicolor': 2,
           'Iris-virginica': 3}

# Map the integer column to strings
petalDF['species'] = petalDF['species'].map(mapping)

# Generate X-values
petalX = petalDF[['sepal_length', 'sepal_width',
                'petal_width', 'species']].values
petalX = np.asarray(petalX).astype('float32')

# Generate Y-values
petalY = petalDF['petal_length'].values # Y-values
petalY = np.asarray(petalY).astype('float32')

# Split into validation and training sets------------------------------
petalX_train, petalX_test, petalY_train, petalY_test = train_test_split(
    petalX, petalY, test_size=0.25, random_state=42)

# Build neural network-------------------------------------------------
petalModel = Sequential()
petalModel.add(Input(shape=(petalX.shape[1],)))  # Hidden 1
petalModel.add(Dense(25, activation='relu')) # Hidden 2
petalModel.add(Dense(1)) # Output
petalModel.compile(loss='mean_squared_error', optimizer='adam')

# Build monitor for early stopping-------------------------------------
petalMonitor = EarlyStopping(monitor='val_loss',
               min_delta=1e-3, patience=5,
               verbose=1, mode='auto', restore_best_weights=True)

# Train model----------------------------------------------------------
petalModel.fit(petalX_train,petalY_train,validation_data=(petalX_test,petalY_test),
              callbacks=[petalMonitor],verbose=2,epochs=1000)

If your code is correct, you should see something similar to the following output:
~~~text
Epoch 1/1000
4/4 - 1s - 324ms/step - loss: 3.3680 - val_loss: 2.8964
Epoch 2/1000
4/4 - 0s - 13ms/step - loss: 2.6492 - val_loss: 2.4229
Epoch 3/1000
4/4 - 0s - 13ms/step - loss: 2.1816 - val_loss: 2.1311
Epoch 4/1000
4/4 - 0s - 12ms/step - loss: 1.8088 - val_loss: 1.9884
Epoch 5/1000
4/4 - 0s - 13ms/step - loss: 1.6537 - val_loss: 1.9301

.....................................

Epoch 171/1000
4/4 - 0s - 13ms/step - loss: 0.1106 - val_loss: 0.1013
Epoch 172/1000
4/4 - 0s - 12ms/step - loss: 0.1104 - val_loss: 0.1010
Epoch 173/1000
4/4 - 0s - 12ms/step - loss: 0.1103 - val_loss: 0.1007
Epoch 174/1000
4/4 - 0s - 12ms/step - loss: 0.1100 - val_loss: 0.1006
Epoch 175/1000
4/4 - 0s - 12ms/step - loss: 0.1100 - val_loss: 0.1007
Epoch 175: early stopping
Restoring model weights from the end of the best epoch: 170.
<keras.src.callbacks.history.History at 0x7d667844dbd0>
~~~

In this example, training stopped very quickly. The epoch with the lowest loss was epoch 170.

### **Exercise 3: Early Stopping with Regression**

In the cell below, use the Apple Quality dataset to construct a regression neural network that can predict the 'Acidity' of an apple. This is the same dataset that you used in Class_03_4.

Start by reading the dataset and creating a DataFrame called `acidDF` using this code chunk:

~~~text
acidDF = pd.read_csv(
    "https://biologicslab.co/BIO1173/data/apple_quality.csv",
    na_values=['NA', '?'])

~~~

The goal of your neural network model, `acidModel`, will be to predict the acidity of apples in the Apple Quality dataset using the values in the following columns: 'Size', 'Weight', 'Sweetness', 'Crunchiness', 'Juiciness', 'Ripeness', and 'Quality'.

For pre-processing, you will need to map the strings `good` and `bad` in the column 'Quality' to the integers `1` and `0` respectively.

Since all of the other columns are numeric, there is no need pre-process any of these columns. Moreover, the numerical values all have a similar magnitude so you don't need to standardize any column to their z-scores.

You can use this code chunk to generate your x-values after you have mapped the strings in the column 'Quality'

~~~text
# Generate X-values
acidX = acidDF[['Size', 'Weight', 'Sweetness', 'Crunchiness',
                'Juiciness', 'Ripeness', 'Quality']].values
~~~

Since this is a regression neural network, **_DON'T_** one-hot encode the y-value. Instead, use this code chunk:

~~~text
# Generate Y-values
acidY = acidDF['Acidity'].values # Y-values
acidY = np.asarray(acidY).astype('float32')
~~~

You can re-use the code in Example 3 to split the data into Training/Validation sets.

You can also re-use the code in Example 3 to construct your neural network `acidMode`, after making the appropiate changes in the variable names.

Create an Early Stopping monitor using this code chunk:

~~~text
acidMonitor = EarlyStopping(monitor='val_loss',
               min_delta=1e-3, patience=5,
               verbose=1, mode='auto', restore_best_weights=True)
~~~

Finally, train (fit) your model `acidModel` on your X-values (acidX) and your Y-values (acidY) for 1009 epochs.


In [None]:
# Insert your code for Exercise 3 here



If your code is correct, you should see something similar to the following output:
~~~text
Epoch 1/1000
94/94 - 2s - 17ms/step - loss: 4.4234 - val_loss: 3.9446
Epoch 2/1000
94/94 - 0s - 2ms/step - loss: 3.6717 - val_loss: 3.6419
Epoch 3/1000
94/94 - 0s - 2ms/step - loss: 3.3946 - val_loss: 3.4282
Epoch 4/1000
94/94 - 0s - 2ms/step - loss: 3.1914 - val_loss: 3.2761
Epoch 5/1000
94/94 - 0s - 2ms/step - loss: 3.0424 - val_loss: 3.1461

..................................

Epoch 88/1000
94/94 - 0s - 2ms/step - loss: 1.8761 - val_loss: 2.1031
Epoch 89/1000
94/94 - 0s - 2ms/step - loss: 1.8716 - val_loss: 2.0911
Epoch 90/1000
94/94 - 0s - 2ms/step - loss: 1.8649 - val_loss: 2.0919
Epoch 91/1000
94/94 - 0s - 2ms/step - loss: 1.8652 - val_loss: 2.0982
Epoch 92/1000
94/94 - 0s - 2ms/step - loss: 1.8620 - val_loss: 2.0971
Epoch 92: early stopping
Restoring model weights from the end of the best epoch: 87.
<keras.src.callbacks.history.History at 0x7d6658782fb0
~~~

In this example, the best epoch was 87.

### Example 4: Compute the RMSE

When working with neural networks that perform a regression analysis, it is customary to use the Root Mean Square Error (RMSE) as a measurement of predictive accuracy. The code in the cell below shows how to compute the RMSE for the `petalModel` neural network and then print out the result.

In [None]:
import numpy as np
from sklearn.metrics import mean_squared_error

# Compute RMSE
petalPred = petalModel.predict(petalX_test)
petalScore = np.sqrt(mean_squared_error(petalPred, petalY_test))

# Print out the results
print(f"Final score (RMSE): {petalScore}")


If your code is correct you should see something similar to the following output:
~~~text
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step
Final score (RMSE): 5.776879787445068
~~~

### **Exercise 4: Compute the RMSE**

In the cell below, write the code to compute RMSE for your neural network model `acidModel` and then print out the results.

In [None]:
# Insert your code for Exercise 4 here



If your code is correct you should see something similar to the following output:
~~~text
32/32 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step
Final score (RMSE): 1.4691604375839233

~~~

### Example 5: _Ad Hoc_ Prediction

The code in the cell below uses the `petalModel` to predict the petal length of the first flower in the Iris Flower dataset (`Flower_0`).

The X-values for `Flower_0` are simply copied from the Numpy array `petalX` using the following code chunk:
~~~text
# Load X-values
Flower_0 = petalX[[0]]
~~~~

The code then uses the `petalModel` to predict what it thinks the `petal_length` should be, based on the flower's `sepal length`, `sepal_width`, `petal_width`, `species`. This prediction is stored in `petalPred`.

Finally, the code prints out the results.

In [None]:
# Example 5: Ad hoc prediction

# Load X-values
Flower_0 = petalX[[0]]

# Model predicts Y-value
petalPred = petalModel.predict(Flower_0)

# Print out the results
print(f"Flower_0 has X-values: {Flower_0}")
print(f"Model predicts its petal length is: {petalPred}")
print(f"Its actual petal length is: {petalDF['petal_length'].values[0]}")

If your code is correct you should see something similar to the following output:
~~~text
1/1 [==============================] - 0s 22ms/step
Flower_0 has X-values: [[5.1 3.5 0.2 1. ]]
Model predicts its petal length is: [[3.6087673]]
Its actual petal length is: 1.4
~~~

In this particular example, the predicted `petal_length`(3.6087673 cm) is clearly different from the actual `petal length` (1.4 cm). However, the magnitude of this error is not unexpected.

In Example 4, the RMSE for `petalModel` was calculated to be 1.8 cm. If you add this RMSE value (1.8 cm) to the actual petal length (1.4 cm), the result is 3.2 cm, which is pretty close to the predicted petal length of 3.6 cm.

### **Exercise 5: _Ad Hoc_ Prediction**

In the cell below, use your `acidModel` to predict the acidity of `Apple_0`.   

In [None]:
# Insert your code for Exercise 5 here



If your code is correct you should see something similar to the following output:
~~~text
1/1 [==============================] - 0s 21ms/step
Apple_0 has X-values: [[-3.9700484 -2.5123365 -1.0120087  1.8449004  5.3463297  0.3298398
   1.       ]]
Model predicts its acidity is: [[-1.3125017]]
Its actual acidity is: -0.491590483
~~~

### Lesson Summary

The primary objective of this lesson was to demonstrate how to implement **_Early Stopping_** in the training of neural networks.

Clearly, the ability to stop training when the loss function on the validation training set reaches a minimum can save a significant amount of time. However, perhaps more importantly, Early Stopping can prevent a neural network from **_overtraining_**.

Overtraining occurs when the neural network can predict training examples with very high accuracy but cannot generalize to new data. In other words, the neural network starts learning **_specific details_** about the training data. While this improves the model's loss function in the particular training set, it will actually performs worse when presented with new data that it hasn't seen before.


## **Lesson Turn-in**

When you have completed and run all of the code cells, use the **File --> Print.. --> Save to PDF** to generate a PDF of your Colab notebook. Save your PDF as `Class_03_5.lastname.pdf` where _lastname_ is your last name, and upload the file to Canvas.

## **Poly-A Tail**

## **Grace Hopper**

![__](https://upload.wikimedia.org/wikipedia/commons/9/98/Commodore_Grace_M._Hopper%2C_USN_%28covered%29_head_and_shoulders_crop.jpg)


**Grace Brewster Hopper (née Murray; December 9, 1906 – January 1, 1992)** was an American computer scientist, mathematician, and United States Navy rear admiral. She was a pioneer of computer programming. Hopper was the first to devise the theory of machine-independent programming languages, and used this theory to develop the FLOW-MATIC programming language and COBOL, an early high-level programming language still in use today. She was also one of the first programmers on the Harvard Mark I computer. She is credited with writing the first computer manual, "A Manual of Operation for the Automatic Sequence Controlled Calculator."

Before joining the Navy, Hopper earned a Ph.D. in both mathematics and mathematical physics from Yale University and was a professor of mathematics at Vassar College. She left her position at Vassar to join the United States Navy Reserve during World War II. Hopper began her computing career in 1944 as a member of the Harvard Mark I team, led by Howard H. Aiken. In 1949, she joined the Eckert–Mauchly Computer Corporation and was part of the team that developed the UNIVAC I computer. At Eckert–Mauchly she managed the development of one of the first COBOL compilers.

She believed that programming should be simplified with an English-based computer programming language. Her compiler converted English terms into machine code understood by computers. By 1952, Hopper had finished her program linker (originally called a compiler), which was written for the A-0 System. In 1954, Eckert–Mauchly chose Hopper to lead their department for automatic programming, and she led the release of some of the first compiled languages like FLOW-MATIC. In 1959, she participated in the CODASYL consortium, helping to create a machine-independent programming language called COBOL language, which was based on English words. Hopper promoted the use of the language throughout the 60s.

The U.S. Navy Arleigh Burke-class guided-missile destroyer USS Hopper was named for her, as was the Cray XE6 "Hopper" supercomputer at NERSC, and the Nvidia GPU architecture "Hopper". During her lifetime, Hopper was awarded 40 honorary degrees from universities across the world. A college at Yale University was renamed in her honor. In 1991, she received the National Medal of Technology. On November 22, 2016, she was posthumously awarded the Presidential Medal of Freedom by President Barack Obama. In 2024, the Institute of Electrical and Electronics Engineers (IEEE) dedicated a marker in honor of Grace Hopper at the University of Pennsylvania for her role in inventing the A-0 compiler during her time as a Lecturer in the School of Engineering, citing her inspirational impact on young engineers.

**Early life and education**

Grace Brewster Murray was born in New York City. She was the eldest of three children. Her parents, Walter Fletcher Murray and Mary Campbell Van Horne, were of Scottish and Dutch descent, and attended West End Collegiate Church. Her great-grandfather, Alexander Wilson Russell, an admiral in the US Navy, fought in the Battle of Mobile Bay during the Civil War.

Grace was very curious as a child; this was a lifelong trait. At the age of seven, she decided to determine how an alarm clock worked and dismantled seven alarm clocks before her mother realized what she was doing (she was then limited to one clock). Later in life, she was known for keeping a clock that ran backward, she explained, "Humans are allergic to change. They love to say, 'We've always done it this way.' I try to fight that. That's why I have a clock on my wall that runs counterclockwise." For her preparatory school education, she attended the Hartridge School in Plainfield, New Jersey. Grace was initially rejected for early admission to Vassar College at age 16 (because her test scores in Latin were too low), but she was admitted the next year. She graduated Phi Beta Kappa from Vassar in 1928 with a bachelor's degree in mathematics and physics and earned her master's degree at Yale University in 1930.

## **Career**

**World War II**

Hopper tried to be commissioned in the Navy early in World War II, however she was turned down. At age 34, she was too old to enlist and her weight-to-height ratio was too low. She was also denied on the basis that her job as a mathematician and mathematics professor at Vassar College was valuable to the war effort. During the war in 1943, Hopper obtained a leave of absence from Vassar and was sworn into the United States Navy Reserve; she was one of many women who volunteered to serve in the WAVES.

She had to get an exemption to be commissioned; she was 15 pounds (6.8 kg) below the Navy minimum weight of 120 pounds (54 kg). She reported in December and trained at the Naval Reserve Midshipmen's School at Smith College in Northampton, Massachusetts. Hopper graduated first in her class in 1944, and was assigned to the Bureau of Ships Computation Project at Harvard University as a lieutenant, junior grade. She served on the Mark I computer programming staff headed by Howard H. Aiken.

Hopper and Aiken co-authored three papers on the Mark I, also known as the Automatic Sequence Controlled Calculator. Hopper's request to transfer to the regular Navy at the end of the war was declined due to her advanced age of 38. She continued to serve in the Navy Reserve. Hopper remained at the Harvard Computation Lab until 1949, turning down a full professorship at Vassar in favor of working as a research fellow under a Navy contract at Harvard.

**UNIVAC**

In 1949, Hopper became an employee of the Eckert–Mauchly Computer Corporation as a senior mathematician and joined the team developing the UNIVAC I.Hopper also served as UNIVAC director of Automatic Programming Development for Remington Rand. The UNIVAC was the first known large-scale electronic computer to be on the market in 1951, and was more competitive at processing information than the Mark I.

When Hopper recommended the development of a new programming language that would use entirely English words, she "was told very quickly that she couldn't do this because computers didn't understand English." Still, she persisted. "It's much easier for most people to write an English statement than it is to use symbols", she explained. "So I decided data processors ought to be able to write their programs in English, and the computers would translate them into machine code."

Her idea was not accepted for three years. In the meantime, she published her first paper on the subject, compilers, in 1952. In the early 1950s, the company was taken over by the Remington Rand corporation, and it was while she was working for them that her original compiler work was done. The program was known as the A compiler and its first version was A-0.

In 1952, she had an operational link-loader, which at the time was referred to as a compiler. She later said that "Nobody believed that", and that she "had a running compiler and nobody would touch it. They told me computers could only do arithmetic."

In 1954 Hopper was named the company's first director of automatic programming. Beginning in 1954, Hopper's work was influenced by the Laning and Zierler system, which was the first compiler to accept algebraic notation as input. Her department released some of the first compiler-based programming languages, including MATH-MATIC and FLOW-MATIC.

Hopper said that her compiler A-0, "translated mathematical notation into machine code. Manipulating symbols was fine for mathematicians but it was no good for data processors who were not symbol manipulators. Very few people are really symbol manipulators. If they are, they become professional mathematicians, not data processors. It's much easier for most people to write an English statement than it is to use symbols. So I decided data processors ought to be able to write their programs in English, and the computers would translate them into machine code. That was the beginning of COBOL, a computer language for data processors. I could say 'Subtract income tax from pay' instead of trying to write that in octal code or using all kinds of symbols. COBOL is the major language used today in data processing."

**COBOL**

In the spring of 1959, computer experts from industry and government were brought together in a two-day conference known as the Conference on Data Systems Languages (CODASYL). Hopper served as a technical consultant to the committee, and many of her former employees served on the short-term committee that defined the new language COBOL (an acronym for COmmon Business-Oriented Language). The new language extended Hopper's FLOW-MATIC language with some ideas from the IBM equivalent, COMTRAN. Hopper's belief that programs should be written in a language that was close to English (rather than in machine code or in languages close to machine code, such as assembly languages) was captured in the new business language, and COBOL went on to be the most ubiquitous business language to date. Among the members of the committee that worked on COBOL was Mount Holyoke College alumna Jean E. Sammet.

From 1967 to 1977, Hopper served as the director of the Navy Programming Languages Group in the Navy's Office of Information Systems Planning and was promoted to the rank of captain in 1973. She developed validation software for COBOL and its compiler as part of a COBOL standardization program for the entire Navy.