<a href="https://colab.research.google.com/github/DavidSenseman/BIO1173/blob/master/Class_04_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---------------------------
**COPYRIGHT NOTICE:** This Jupyterlab Notebook is a Derivative work of [Jeff Heaton](https://github.com/jeffheaton) licensed under the Apache License, Version 2.0 (the "License"); You may not use this file except in compliance with the License. You may obtain a copy of the License at

> [http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

------------------------

# **BIO 1173: Intro Computational Biology**

##### **Module 4: Training for Tabular Data**

* Instructor: [David Senseman](mailto:David.Senseman@utsa.edu), [Department of Integrative Biology](https://sciences.utsa.edu/integrative-biology/), [UTSA](https://www.utsa.edu/)

### Module 4 Material

* **Part 4.1: Encoding a Feature Vector for Keras Deep Learning**
* Part 4.2: Keras Multiclass Classification for Deep Neural Networks with ROC and AUC
* Part 4.3: Keras Regression for Deep Neural Networks with RMSE
* Part 4.4: Backpropagation, Nesterov Momentum, and ADAM Neural Network Training
* Part 4.5: Neural Network RMSE and Log Loss Error Calculation from Scratch


### Google CoLab Instructions

The following code ensures that Google CoLab is running the correct version of TensorFlow.
  Running the following code will map your GDrive to ```/content/drive```.

In [None]:
try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
    COLAB = True
    print("Note: using Google CoLab")
    %tensorflow_version 2.x
except:
    print("Note: not using Google CoLab")
    COLAB = False

### Lesson Setup

Run the next code cell to load necessary packages

In [None]:
# You MUST run this code cell first
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation

import numpy as np
import pandas as pd


import os
import shutil
path = '/'
memory = shutil.disk_usage(path)
dirpath = os.getcwd()
print("Your current working directory is : " + dirpath)
print("Disk", memory)
print("Tensorflow version =", (tf.__version__))
print("Available GPU acceleration =", tf.test.gpu_device_name())


## Datasets for Class_04_1

For Class_04_1 we will be using the Wisconsin Breast Cancer dataset for the Examples and the Heart Disease dataset for the **Exercises**.

### **Breast Cancer Wisconsin (Diagnostic) Data Set** 

[Breast Cancer Wisconsin (Diagnostic) Data Set](https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data)


![___](https://biologicslab.co/BIO1173/images/breast_cancer.png)


The average risk of developing breast cancer in the United States is 13%, or 1 in 8. Approximately 42,000 women in the US die from breast cancer each year. Like most cancers, early detection and treatment is singularily important in preventing mortallity. 

The Breast Cancer Wisconsin (BCW) dataset contains detailed microscopic measurements of cell nuclei obtained by fine needle aspirates (FNAs) from breast tumors found in 569 women. Some of these tumors were later determined to be **_malignant_** (cancerous), while other tumors were found to be **_benign_** (non-cancerous). Being able to differentiate cancerous from non-cancerous tumors is of obvious importance.  

Fine needle aspiration (FNA), also called a fine needle aspiration biopsy, is a minimally invasive procedure that uses a thin needle and syringe to extract a sample of cells, tissue, or fluid from an abnormal area or lump in the body. The sample is then examined under a microscope to confirm a diagnosis or guide treatment.

![___](https://biologicslab.co/BIO1173/images/fna_tech.png)


The list of features computed from digitized images of breast mass cell nuclei obtained from by FNA in the Breast Cancer Wisconsing datasete are as follows:

**Attribute Information:**

* **ID number**
* **Diagnosis:** (M = malignant, B = benign)

Ten real-valued features are computed for each cell nucleus:

*  **radius:** (mean of distances from center to points on the perimeter)
* **texture:** (standard deviation of gray-scale values)
* **perimeter:**
* **area:**
* **smoothness:** (local variation in radius lengths)
* **compactness:** (perimeter<sup>2</sup> / area - 1.0)
* **concavity:** (severity of concave portions of the contour)
* **concave points:** (number of concave portions of the contour)
* **symmetry:**
* **fractal dimension:** ("coastline approximation" - 1)

The mean, standard error and "worst" or largest (mean of the three
largest values) of these features were computed for each image,
resulting in 30 features. For instance, field 3 is Mean Radius, field
13 is Radius SE, field 23 is Worst Radius.1) ID number
2) Diagnosis (M = malignant, B = benign)
3-32)





### **Heart Disease Dataset**

[Heart Disease Dataset](https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction)


![___](https://biologicslab.co/BIO1173/images/HD.jpg)

**Description**

Cardiovascular diseases (CVDs) are the number 1 cause of death globally, taking an estimated 17.9 million lives each year, which accounts for 31% of all deaths worldwide. Four out of 5CVD deaths are due to heart attacks and strokes, and one-third of these deaths occur prematurely in people under 70 years of age. Heart failure is a common event caused by CVDs and this dataset contains 11 features that can be used to predict a possible heart disease.

People with cardiovascular disease or who are at high cardiovascular risk (due to the presence of one or more risk factors such as hypertension, diabetes, hyperlipidaemia or already established disease) need early detection and management wherein a machine learning model can be of great help.

* **Age:** age of the patient [years]
* **Sex:** sex of the patient [M: Male, F: Female]
* **ChestPainType:** chest pain type [TA: Typical Angina, ATA: Atypical Angina, NAP: Non-Anginal Pain, ASY: Asymptomatic]
* **RestingBP:** resting blood pressure [mm Hg]
* **Cholesterol:** serum cholesterol [mm/dl]
* **FastingBS:** fasting blood sugar [1: if FastingBS > 120 mg/dl, 0: otherwise]
* **RestingECG:** resting electrocardiogram results [Normal: Normal, ST: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV), LVH: showing probable or definite left ventricular hypertrophy by Estes' criteria]
* **MaxHR:** maximum heart rate achieved [Numeric value between 60 and 202]
* **ExerciseAngina:** exercise-induced angina [Y: Yes, N: No]
* **Oldpeak:** oldpeak = ST [Numeric value measured in depression]
* **ST_Slope:** the slope of the peak exercise ST segment [Up: upsloping, Flat: flat, Down: downsloping]
* **HeartDisease:** output class [1: heart disease, 0: Normal]

# Part 4.1: Encoding a Feature Vector for Keras Deep Learning

In Module 4 we will take a closer look at some of the topics introduced previously in Module 3 (e.g., One-Hot Encoding). 

Neural networks can accept many types of data. We will continue our focus on tabular data, where there are well-defined rows and columns. This kind of data is what you would typically see in Microsoft Excel spreadsheet.

Neural networks require numeric input. This numeric form is called a **_feature vector_**. Each input neurons receive one feature (or column) from this vector. Each row of training data typically becomes one vector. 

In this lesson, we will revist how to encode tabular data stored in a Pandas DataFrame into a feature vector that can be used by two types of neural networks: (1) classification and (2) regression.



### Example 1: Read dataset and store values in a DataFrame

Data is the essence of neural networks and deep learning. Neural networks are of little use until they have been trained on **_large_** datasets. Only by making repeated adjustments in the weights of their neural connections, during many rounds of training (epochs) on a particular dataset, can a neural network **_learn_** to make accurate predictions.  

Not surprisingly, building and training neural networks begins with a dataset. The code in the cell below reads the Breast Cancer Wisconsin (BCW) dataset file, `wcbreast.csv`, on the course HTTPS server using the Pandas `pd.read_csv()` function:
~~~text
# Read file and create DataFrame
bcwDF = pd.read_csv(
    "https://biologicslab.co/BIO1173/data/wcbreast.csv",
    index_col=0,
    na_values=['NA','?'])
~~~
As the file is read, the function creates a new DataFrame called `bcwDF`. This name was chosen to remind us that the DataFrame contains the **b** reast **c** ancer **_w_** isconsin dataset.

As a general rule, it is always a good idea to display at least part of your new DataFrame to make sure it was read correctly. Since large DataFrames can have many columns and many, many rows --too many to display in Jupyterlab -- it is helpful to specify the maximum number of rows and columns to display. This is accomplished in the cell below using this code chunk:

~~~text
# Set display options
pd.set_option('display.max_columns', 4)
pd.set_option('display.max_rows', 4)
~~~



In [None]:
# Example 1A: Read data and create dataframe

# Read file and create DataFrame
bcwDF = pd.read_csv(
    "https://biologicslab.co/BIO1173/data/wcbreast.csv",
    index_col=0,
    na_values=['NA','?'])

# Set display options
pd.set_option('display.max_columns', 4)
pd.set_option('display.max_rows', 4)

# Display DataFrame
display(bcwDF)

If your code is correct you should see the following table:

![__](https://biologicslab.co/BIO1173/images/class_04_1_Exm1B.png)


There are several observations that you should make from this table. First, looking at the very bottom you see:
~~~text
569 rows x 32 columns
~~~
This means that our DataFrame `bcwDF` has clinical information for `569` subjects (i.e. 1 row/subject) and that there are `32` clinical features (1 feature/column) recorded for each subject. 

By inspection, we can see at least one column, `diagnosis` has non-numerical values (the strings "M" and "B"), but there could be more, since we can only see a small fraction of the entire 32 columns.

### Example 1B: Display data types

In order to create a feature vector, we need to know which columns in our DataFrame are non-numeric, i.e., which columns contains string values. We can easily print out the different data types in a DataFrame using the Pandas method `df.info`.   

However, in order see **_all_** of the different data types, we need to change the number of rows to display. While we could simply set this option to `33`, (i.e. the number of columns in the DataFrame), here we used a slightly more elegant method using `len(bcwDF.columns)`. This code will automatically computer the number of columns for us.     

In [None]:
# Example 1B: Find data types

# Set max rows to the number of columns
pd.set_option('display.max_rows', len(bcwDF.columns))

# Print data types
bcwDF.info()

If your code is correct, you should see the following output:

~~~text
<class 'pandas.core.frame.DataFrame'>
Index: 569 entries, 0 to 568
Data columns (total 32 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   id                       569 non-null    int64  
 1   diagnosis                569 non-null    object 
 2   mean_radius              569 non-null    float64
 3   mean_texture             569 non-null    float64
 4   mean_perimeter           569 non-null    float64
 5   mean_area                569 non-null    float64
 6   mean_smoothness          569 non-null    float64
 7   mean_compactness         569 non-null    float64
 8   mean_concavity           569 non-null    float64
 9   mean_concave_points      569 non-null    float64
 10  mean_symmetry            569 non-null    float64
 11  mean_fractal_dimension   569 non-null    float64
 12  se_radius                569 non-null    float64
 13  se_texture               569 non-null    float64
 14  se_perimeter             569 non-null    float64
 15  se_area                  569 non-null    float64
 16  se_smoothness            569 non-null    float64
 17  se_compactness           569 non-null    float64
 18  se_concavity             569 non-null    float64
 19  se_concave_points        569 non-null    float64
 20  se_symmetry              569 non-null    float64
 21  se_fractal_dimension     569 non-null    float64
 22  worst_radius             569 non-null    float64
 23  worst_texture            569 non-null    float64
 24  worst_perimeter          569 non-null    float64
 25  worst_area               569 non-null    float64
 26  worst_smoothness         569 non-null    float64
 27  worst_compactness        569 non-null    float64
 28  worst_concavity          569 non-null    float64
 29  worst_concave_points     569 non-null    float64
 30  worst_symmetry           569 non-null    float64
 31  worst_fractal_dimension  569 non-null    float64
dtypes: float64(30), int64(1), object(1)
memory usage: 146.7+ KB
~~~

You should make the following observations from this output:
* The target column is the column that you seek to predict. By convention, the target column, containing the Y-values, will usually, be the rightmost column in a display, or last column in a list. In this particular example, the target column, `diagnosis`, is the second column in the list.   
* There is a column called `id` which identifies each subject. We should exclude this columns from our analysis because it contains no information useful for making a prediction.
* From the data types output, we can see that with the exception of the column `diagnois`, all of the fields, are **_numeric_**. Non-numeric values are classified as `object` while numeric values are classified as being either `int64` or `float64`. Numeric columns might not require further processing before there are used to generate our X-values.
* Categorical values (strings) are only found in the target column (`diagnosis`) which we will take care of later when we generate our Y-values. 


### **Exercise 1A: Read dataset and store values in a DataFrame**

In the cell below, use the Pandas function `pd.read_csv()` to read the Heart Disease data file `heart_disease.csv` located on the course HTTPS server. Save the data to a new DataFrame called `hdDF`.

Set the display for 6 rows and 6 columns and then print out a display of `hdDF`. 
 

In [None]:
# Insert your code for Exercise 1A here



If your code is correct, you should see the following table.

![Heart Failure DataFrame](https://biologicslab.co/BIO1173/images/class_04_1_Exe1A.png)

Your `hdDF` DataFrame has information on `918` subjects (number of rows = 918) and `12` clinical values for each subject (number of columns = 12). There are clearly more than one column with non-numeric values, but you won't know exactly how many until you run **Exercise 1B**.

### **Exercise 1B: Display data types**

In the cell below, write the code to print out the different data types in your DataFrame `hdDF` using the Pandas method `df.info()`. Use `len(hdDF.columns)` to set the number of rows to display, before you print out the data types.     

In [None]:
# Insert your code for Exercise 1B here



If your code is correct, you should see the following output:
~~~text
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 918 entries, 0 to 917
Data columns (total 12 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Age             918 non-null    int64  
 1   Sex             918 non-null    object 
 2   ChestPainType   918 non-null    object 
 3   RestingBP       918 non-null    int64  
 4   Cholesterol     918 non-null    int64  
 5   FastingBS       918 non-null    int64  
 6   RestingECG      918 non-null    object 
 7   MaxHR           918 non-null    int64  
 8   ExerciseAngina  918 non-null    object 
 9   Oldpeak         918 non-null    float64
 10  ST_Slope        918 non-null    object 
 11  HeartDisease    918 non-null    int64  
dtypes: float64(1), int64(6), object(5
~~~~

You should make the following observations from the above output:
* The target column is the column that you usually want to predict with a classification neural network. In this instance, the last column in this list, `HeartDisease`, will be your target column (Y-values). 
* The column `FastingBS` doesn't appear to contain information that would be especially useful for predicting heart disease, so you will need to drop it. 
* Some fields are numeric (data type `int64` or `float64`) and might not require further processing.
* There are categorical values (data type `object`) in 5 columns including: `Sex`, `ChestPainType`, `RestingECG`, `ExerciseAngina` and `ST_Slope`. The categorical values (strings) in these columns will need to be taken care of, before you can use them in generating your X-values.

### Example 2: Drop unecessary columns

The `id` column in the `bcwDF` DataFrame does not contain information useful for predicting breast cancer, so it will be excluded from the feature vector containing the X-values, using the Pandas method `df.drop()` as shown by the next code chunk:
~~~text
bcwDF.drop('id', axis=1, inplace=True)
~~~
The argument `axis=1` means to drop the entire column, while the argument `inplace=True` means to change the DataFrame **_permanently_**.

**NOTE:** After you run a code cell where you drop a column, you will get an error if you try to re-run the same cell, since there is no longer any column to drop. To run the cell again, you will need to re-read the datafile and re-create the DataFrame.

In [None]:
# Example 2: Drop unecesary columns 

# Drop specific column
bcwDF.drop('id', axis=1, inplace=True)

# Set the max rows and max columns
pd.set_option('display.max_columns', 4)
pd.set_option('display.max_rows', 4)

# Display the updated DataFrame
display(bcwDF)

If you code is correct you should see the following table:

![__](https://biologicslab.co/BIO1173/images/class_04_1_Exm2.png)

You should note that the column `id` has been removed. Instead of the original `32` columns, there are now only `31` columns.

### **Exercise 2: Drop unecessary columns**

Since the column `FastingBS` in the Heart Disease dataset doesn't contain information that will be especially useful for predicting heart disease, this column should not be included in the analysis. In the cell below, write the code to drop this column. Display 6 rows and 6 columns of your updated DataFrame.

In [None]:
# Insert your code for Exercise 2 here



If your code is correct you should see the following table:

![_ _](https://biologicslab.co/BIO1173/images/class_04_2_exe2.png)

Since the column `FastingBS` wasn't displayed previously (**Example 1A**), you can't tell if it was dropped. However, the number of columns is now `11`, instead of the original `12` so you can assume your code was successful. 

### Example 3: One-Hot Encode Categorical Variables

For Example 3, we are going to use your Heart Disease DataFrame, `hdDF`, for the example.  

Your `hdDF` DataFrame has 5 columns that have categorical variables (strings): 
1. `Sex`
2. `ChestPainType`
3. `RestingECG`
4. `ExerciseAngina`
5. `ST_Slope`.

You will need to One-Hot Encode each of these `5` columns separately. 

To help you get started, Example 3 illustrates how to One-Hot encode the first column, `Sex`.

The following line of code uses the Pandas function, `pd.get_dummies()` to One-Hot encode the string values in the `Sex` column and turn them into the integer values `0` and `1`. 

You should notice that the code below uses `try:` and `except:` blocks. Normally, you would **not** use `try` and `except` when creating a feature vector. They have been added here simply as a teaching aid.

The `try` block lets you test a block of code for errors while the `except` block lets you handle the error. The `try` block demonstrates the 3 steps you need to One-Hot Encode the column `Sex`. The `except` block is there in case you try to re-run this cell after you have already dropped the column `Sex`. Instead of giving an error message and stopping, the `except` block simply warns you that the column has already been dropped.


In [None]:
# Example 3: One-Hot encode categorical variables


# Adding try and except blocks as teaching aid-------------------------
try: 
    # Step 1 - Get dummy values
    dummies = pd.get_dummies(hdDF['Sex'],prefix="Sex", dtype=int)
    
    # Step 2 - Add dummies to DataFrame
    hdDF = pd.concat([hdDF,dummies],axis=1)

    # Step 3- Drop column replaced by dummies
    hdDF.drop('Sex', axis=1, inplace=True)
    print("Column 'Sex' has been dropped")
    
except:
    print("ERROR: Column 'Sex' may have been already been dropped")

# Set the max rows and max columns
pd.set_option('display.max_columns', 6)
pd.set_option('display.max_rows', 6)

# Display the updated DataFrame
display(hdDF)

If your code is correct, you should see the following table.

![__](https://biologicslab.co/BIO1173/images/class_04_1_Exm3.png)


If you happen to re-run this cell again, you would see the following error message at the top:
~~~text
ERROR: Column 'Sex' may have been already been dropped
~~~

### **Exercise 3: One-Hot encode categorical variables**

Your `hdDF` DataFrame still has 4 columns with categorical variables: 
1. `ChestPainType`
2. `RestingECG`
3. `ExerciseAngina`
4. `ST_Slope`.

For **Exercise 3**, you are to One-Hot Encode these remaining `4` columns using the example shown in Example 3. 

### **Exercise 3A: One-Hot encode the column `ChestPainType`**

In the cell below, One-Hot encode the column `ChestPainType` in the Heart Disease DataFrame `hdDf`. After you add the dummies back into to the DataFrame, drop the column `ChestPainType`. 

Set the display for 6 columns and 6 rows and print out a display of your updated DataFrame. 

In [None]:
# Insert your code for Exercise 3A here



If your code is correct, you should see the following table.

![__](https://biologicslab.co/BIO1173/images/class_04_1_Exe3A.png)


If you happen to re-run this cell again, you would see the following error message at the top:
~~~text
ERROR: Column 'ChestPainType' may have been already been dropped
~~~

### **Exercise 3B: One-Hot encode the column `RestingECG`**

In the cell below, One-Hot encode the column `RestingECG` in the Heart Disease DataFrame, `hdDF`. Add the dummies to the DataFrame and then drop the column `RestingECG`. Set the display for 6 columns and 6 rows and print out a display of your updated DataFrame. 

In [None]:
# Insert your code for Exercise 3B here



### **Exercise 3C: One-Hot encode the column `ExerciseAngina`**

In the cell below, One-Hot encode the column `ExerciseAngina` in the Heart Disease DataFrame, `hdDF`. Add the dummies to the DataFrame and then drop the column `ExerciseAngina`. Set the display for 6 columns and 6 rows and print out a display of your updated DataFrame. 

In [None]:
# Insert your code for Exercise 3C here



### **Exercise 3D: One-Hot encode the column `ST_Slope`**

In the cell below, One-Hot encode the column `ST_Slope` in the Heart Disease DataFrame, `hdDF`. Add the dummies to the DataFrame and then drop the column `ST_Slope`. Set the display for 6 columns and 6 rows and print out a display of your updated DataFrame. 

In [None]:
# Insert your code for Exercise 3D here



If your code is correct, you should see the following table.

![__](https://biologicslab.co/BIO1173/images/class_04_1_Exe3D.png)


You should notice that the number of columns in your DataFrame, `hdDF`, has increased from the `12` original columns to total of **20** columns!. This a clear example of **_column inflation_** in an invariable consequence of using One-Hot Encoding. 

### Sample Code: Print out column names

If your coding has been correct so far, your DataFrame, `hdDF` should have 20 columns. This is an inconviently large number of columns to display on your computer screen. In this situation, you can take advantage of the Pandas method `pd.columns` to print out a complete list of all the column names in DataFrame, using this line of code:
~~~text
print(list(hdDF.columns))
~~~
Run the next cell to check whether your coding is correct so far, before going on to the next part of the lesson.

In [None]:
# Run this cell to print the column names in hdDF

print(list(hdDF.columns))

If your coding has been correct so far, you should see the following list of `20` column names:
~~~text
['Age', 'RestingBP', 'Cholesterol', 'MaxHR', 'Oldpeak', 'HeartDisease', 'Sex_F', 'Sex_M', 'ChestPainType_ASY', 'ChestPainType_ATA', 'ChestPainType_NAP', 'ChestPainType_TA', 'RestingECG_LVH', 'RestingECG_Normal', 'RestingECG_ST', 'ExerciseAngina_N', 'ExerciseAngina_Y', 'ST_Slope_Down', 'ST_Slope_Flat', 'ST_Slope_Up']
~~~

If the output from `print(list(hfDF.columns))` doesn't match the output above, figure out which columns have not successfully One-Hot encoded, and/or dropped, and make the necessary code fixes. 

When you are making significant changes to a DataFrame, its always a good idea to go back to where you originally created the DataFrame, in the lesson, **Exercise 1A**, re-run this cell make a new copy of `hdDF` and then re-run your code cells. 

## Generate X and Y for a Classification Neural Network

Now that unecessary columns have been dropped, and all of the string data has been One-Hot encoded (except for the target column), we are ready to use the data stored in the updated DataFrame as input for a neural network. 

There are two basic ways to used tabular data as input into a neural network. The neural network can perform **_classification_** or **_regression_**. There are small number of very important differences in how you generate X and Y values for these different functions.

We will begin creating a feature vector for a classification neural network.

### Example 4: Create Feature Vector for _Classification_ Neural Network

The goal of a classification neural network is to accurately categorize input data (X-values) into predefined classes or categories (Y-values). The network learns to identify patterns and features within the input data that are associated with each class, allowing it to make predictions about the class of new, unseen data. During training (fitting), the neural network tries to minimize the classification error as a way to improve the overall accuracy of the network's predictions.

When using tabular data, i.e. data stored in a DataFrame, the first step is to figure out which columns in the DataFrame will be used to generate the X-values and which column will be used for the Y-values. For the Breast Cancer Wisconsin dataset, our goal will be to classify tumor cells as cancerous or non-cancerous so the information in the column `diagnois` will be our Y-values and the remaining columns will be our X-values. 

In a classification neural network, the output layer will have exactly the same number of neurons as the number of predefined classes or categories. Since the Breast Cancer Wisconsin dataset has only two predifined classes/categories: (1) `M` for malignant and (2) `B` for benign, the output layer will have exactly two neurons. One neuron will represent `Malignant` and the other `Benign`. 

At the end of each run (epoch) both the `Malignant` output neuron and the `Benign` output neuron will have a "voltage" (numerical value). The  "voltage" will be close to `1` in the `Malignant` output neuron if the neural network predicted the breast tumor was cancerous. On the other hand, the `Benign` output neuron will have a much smaller value, closer to `0`. Conversely, if the neural network predicted the tumor was non-cancerous, the `Benign` output neuron will have a value close to `1` while the `Malignant` output neuron will have a much lower voltage, closer to `0`. In other words, the output neuron with the highest value represents the neural network's classification prediction.

In Module 3, we created our X-values by specifying the name of each column in the DatFrame that we wanted to include. This approach works fine when there are a relatively small number of columns, but becomes tedious with DataFrames will a large number of columns. 

In the code in the cell below, demonstrates an alternative approach better suited for large DataFrames. We start by creating a Python `list` variable called `bcwX_columns`. This variable will contain the name of every column in the DataFrame, except if one (or more) column names are dropped. Here is the code chunk:

~~~text
# Create list of column names for X-values
bcwX_columns = bcwDF.columns.drop('diagnosis')
~~~
As you can see, we dropped the column `diagnosis` from our list. The column `diagnosis` contains our Y-values, and we don't want them mixed in with our X-values! 

We next use the Pandas method `df.values`, to convert each numerical value in the columns specified by `bcwX_columns` into a Numpy array called `bcwX` using this code chunk:
~~~text
# Generate X feature vector
bcwX = bcwDF[bcwX_columns].values
~~~~
Keep in mind that the real work of this code chunk is `.values` at the end of the code line. It is this little bit of code that converts the numerical values in the DataFrame into a Numpy array. Feature vectors for neural network are Numpy arrays, **not** DataFrames.

The next line of code `bcwX = np.asarray(bcwX).astype('float32')` makes sure that all of our X-values of given the same `float32` data type. If you fail to do this, Keras/Tensorflow will sometimes generate an error.

Now that our X-values have been generated, we can now generate our Y-values. How you generate Y-values is different for classification neural networks than for regression neural networks. Here is the rule: 

#### **RULE: For Classification Neural Networks, Y-values must always be _One-Hot Encoded_**

This rule is true even if the column holding the Y-values is numeric. In addition to converting stings to integers, One-Hot Encoding also provides the correct **_format_** for the Y-values.

Here is the code chunk for generating the Y-values using One-Hot Encoding: 

~~~text
# Generate Y feature vector
dummies = pd.get_dummies(bcwDF['diagnosis'], dtype=int) 
bcwY = dummies.values
~~~
The first code line creates a new DataFrame called `dummies` containing just the values in the `diagnosis` column. The second code line uses the `value` method to convert the numerical values in `dummies` into a Numpy array called `bcwY`. 

As above, we use the line of code `bcwY = np.asarray(bcwY).astype('float32')` to sure all of the Y-values are type `float32`. 

To verify our X and Y feature vectors are in the correct format, the first 4 values in `bcwX` and `bcwY` are printed out.

In [None]:
# Example 4: Create feature vector

# Create list of column names for X-values
bcwX_columns = bcwDF.columns.drop('diagnosis')

# Generate X feature vector
bcwX = bcwDF[bcwX_columns].values
bcwX = np.asarray(bcwX).astype('float32')

# Generate Y feature vector
dummies = pd.get_dummies(bcwDF['diagnosis'], dtype=int) 
bcwY = dummies.values
bcwY = np.asarray(bcwY).astype('float32')

# Print out X and Y
np.set_printoptions(suppress=True,precision=4)
print("The first 4 X-values are:")
print(bcwX[0:4])
print("\nTheir corresponding Y-values are:")
print(bcwY[0:4])

If your code is correct you should see the following output:
~~~text
The first 4 X-values are:
[[  17.99     10.38    122.8    1001.        0.1184    0.2776    0.3001
     0.1471    0.2419    0.0787    1.095     0.9053    8.589   153.4
     0.0064    0.049     0.0537    0.0159    0.03      0.0062   25.38
    17.33    184.6    2019.        0.1622    0.6656    0.7119    0.2654
     0.4601    0.1189]
 [  20.57     17.77    132.9    1326.        0.0847    0.0786    0.0869
     0.0702    0.1812    0.0567    0.5435    0.7339    3.398    74.08
     0.0052    0.0131    0.0186    0.0134    0.0139    0.0035   24.99
    23.41    158.8    1956.        0.1238    0.1866    0.2416    0.186
     0.275     0.089 ]
 [  19.69     21.25    130.     1203.        0.1096    0.1599    0.1974
     0.1279    0.2069    0.06      0.7456    0.7869    4.585    94.03
     0.0061    0.0401    0.0383    0.0206    0.0225    0.0046   23.57
    25.53    152.5    1709.        0.1444    0.4245    0.4504    0.243
     0.3613    0.0876]
 [  11.42     20.38     77.58    386.1       0.1425    0.2839    0.2414
     0.1052    0.2597    0.0974    0.4956    1.156     3.445    27.23
     0.0091    0.0746    0.0566    0.0187    0.0596    0.0092   14.91
    26.5      98.87    567.7       0.2098    0.8663    0.6869    0.2575
     0.6638    0.173 ]]

Their corresponding Y-values are:
[[0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]]
~~~

As you can see, our two feature vectors, `bcwX` and `bcwY`, are Numpy arrays.

### **Exercise 4: Create Feature Vectors for _Classification_ Neural Network**

In the cell below, create X and Y feature vectors for a classification neural network. Use the column `HeartDisease` for your Y-values, and all of the other columns for your X-values. 

Even though the values in the column `HeartDisease` are already numerical (not strings), you will still need to One-Hot encode this column to create the correct format. 

Call your X-feature vector, `hdX` and your Y feature vector, `hdY`. After generating these variables, print out their values. 

In [None]:
# Insert your code for Exercise 4 here 



If your code is correct you should see the following output:
~~~text
The first 4 X-values are:
[[ 40.  140.  289.  172.    0.    0.    1.    0.    1.    0.    0.    0.
    1.    0.    1.    0.    0.    0.    1. ]
 [ 49.  160.  180.  156.    1.    1.    0.    0.    0.    1.    0.    0.
    1.    0.    1.    0.    0.    1.    0. ]
 [ 37.  130.  283.   98.    0.    0.    1.    0.    1.    0.    0.    0.
    0.    1.    1.    0.    0.    0.    1. ]
 [ 48.  138.  214.  108.    1.5   1.    0.    1.    0.    0.    0.    0.
    1.    0.    0.    1.    0.    1.    0. ]]

The corresponding Y-values are:
[[1 0]
 [0 1]
 [1 0]
 [0 1]]
~~~

You might notice that there are a lot of `1` and `0` in your X feature vector. This is because you One-Hot Encoded several non-numeric columns in **Exercise 3**.

### Example 5:  Construct, Compile and Train Classification Neural Network

When building a classification neural network, there are two important points to remember:

* Classification neural networks have an output neuron count equal to the number of classes.
* Classification neural networks should use the **softmax** activation function in the output layer and **categorical_crossentropy** as the loss function when you compile your neural network.

When building a new neural network, you must begin by telling Keras the name we what for our new network, what kind of neural network we want to build. In this example, we are building a simple linear network with the name `bcwModel`, so we begin construction with the line:

~~~text
bcwModel = Sequential()
~~~

Next we start adding "hidden" layers to our network in sequential order. There are no "hard-and-fast" rules when it comes to how many layers a network should have and how many neurons should be put in each layer. There are trade offs when it comes to the number of layers and neurons that will discussed later in this course. 

For now, we will construct a simple classification neural network with just 2 hidden layers and one output layer. We have to tell the first hidden layer how many inputs (X values) it will be receiving. We do this with the `input_dim` argument as shown here:

~~~text
input_dim=bcwX.shape[1]
~~~~

For this example, we will put 50 neurons in the first layer, and half that many (25) in the next (2nd) hidden layer. The argument `Dense` tell Keras to connect every neuron in that layer to every neuron in the next layer. We use the `relu` activation function since it is considered to be the "best" for hidden layer neurons in this kind of simple neural network.

As mentioned at the beginning of Example 5, we need to make sure there is one neuron in the output layer for **_each_** category that we want to classify. This is accomplished by the argument `bcwY.shape[1]` in this line of code:

~~~text
bcwModel.add(Dense(bcwY.shape[1],activation='softmax')) # Output
~~~

The other point to remember is to use the loss function `categorical_crossentropy` and the `adam` optimizer when compiling a classification neural network. 

Once the neural network has been successfully compliled (one of the jobs of the compiler is to spot errors in your neural network), it can be "fitted" (trained) to the X and Y feature vectors. A complete "run" of the neural network is called an `epoch`. At the end of each epoch, the model evaluates its "inaccuracy" (loss), prints it out (if `verbose=2`), makes small modifications to the weigth/strength of each connection between neurons, and runs again using the updated weights. 

By minimumizing the loss function, the model should gradually find the best weight for each neural connection.

In this example, the number of epochs is set to 100.  


In [None]:
# Example 5: Construct, compile and train model

# Construct model------- ---------------------------------------------------
bcwModel = Sequential()
bcwModel.add(Dense(50, input_dim=bcwX.shape[1], 
                  activation='relu')) # Hidden 1
bcwModel.add(Dense(25, activation='relu')) # Hidden 2
bcwModel.add(Dense(bcwY.shape[1],activation='softmax')) # Output

# Compile model--------------------------------------------------------------
bcwModel.compile(loss='categorical_crossentropy', optimizer='adam')

# Train model----------------------------------------------------------------
bcwModel.fit(bcwX,bcwY,verbose=2,epochs=100)


If your code is correct you should see similiar to the following output. Notice that the loss value started at 0.4838 and gradually decreased to a value of 0.0675 after 100 epochs.

~~~text
Epoch 1/100
18/18 - 1s - loss: 12.6481 - 564ms/epoch - 31ms/step
Epoch 2/100
18/18 - 0s - loss: 3.8977 - 76ms/epoch - 4ms/step
Epoch 3/100
18/18 - 0s - loss: 0.4218 - 76ms/epoch - 4ms/step
Epoch 4/100
18/18 - 0s - loss: 0.3493 - 93ms/epoch - 5ms/step
Epoch 5/100
18/18 - 0s - loss: 0.3490 - 84ms/epoch - 5ms/step

............................

Epoch 95/100
18/18 - 0s - loss: 0.2079 - 59ms/epoch - 3ms/step
Epoch 96/100
18/18 - 0s - loss: 0.1955 - 56ms/epoch - 3ms/step
Epoch 97/100
18/18 - 0s - loss: 0.1360 - 58ms/epoch - 3ms/step
Epoch 98/100
18/18 - 0s - loss: 0.1302 - 61ms/epoch - 3ms/step
Epoch 99/100
18/18 - 0s - loss: 0.2276 - 59ms/epoch - 3ms/step
Epoch 100/100
18/18 - 0s - loss: 0.2246 - 61ms/epoch - 3ms/step

<keras.callbacks.History at 0x201cc164dc0>
~~~



### **Exercise 5: Construct, Compile and Train Classification Neural Network**

In the cell below, construct, compile and train a classification neural network called `hdModel`. Use Example 5 as a template. 

Make sure to change the input dimensions of your first Hidden layer to `hdX.shape[1]` and the number of neurons in your output layer to `hdY.shape[1]`. If you get an error, it's mostly likely that you forgot to change these values to correctly match your X and Y feature vectors.


In [None]:
# Insert your code for Exercise 5 here




If your code is correct you should see something similar to the output shown below. In this example, the loss function decreased from 2.9779 to 0.3442 after training 100 cycles (epochs).
~~~text
Epoch 1/100
29/29 - 0s - loss: 2.9779 - 491ms/epoch - 17ms/step
Epoch 2/100
29/29 - 0s - loss: 1.2561 - 145ms/epoch - 5ms/step
Epoch 3/100
29/29 - 0s - loss: 0.9402 - 136ms/epoch - 5ms/step
Epoch 4/100
29/29 - 0s - loss: 0.8456 - 115ms/epoch - 4ms/step
Epoch 5/100
29/29 - 0s - loss: 0.6659 - 106ms/epoch - 4ms/step

......................................

Epoch 95/100
29/29 - 0s - loss: 0.3640 - 102ms/epoch - 4ms/step
Epoch 96/100
29/29 - 0s - loss: 0.3801 - 102ms/epoch - 4ms/step
Epoch 97/100
29/29 - 0s - loss: 0.3506 - 99ms/epoch - 3ms/step
Epoch 98/100
29/29 - 0s - loss: 0.4067 - 98ms/epoch - 3ms/step
Epoch 99/100
29/29 - 0s - loss: 0.4458 - 99ms/epoch - 3ms/step
Epoch 100/100
29/29 - 0s - loss: 0.3442 - 122ms/epoch - 4ms/step

<keras.callbacks.History at 0x201cf6e5ca0>

~~~~

## Generate Feature Vectors for a Regression Neural Network

As mentioned above, the procedure for generating X and Y feature vectors for a regression neural network is somewhat different the procedure used above. Even though these differences are not large, they are important. If your X and Y feature vectors are not generated in the correct format, your neural network will not compile and run.

### Example 6: Generate Feature Vectors for Regression Neural Network


For regression, we want to predict a variable that has a **_range of values_**. For Example 6, we will generate X and Y feature vectors for a regression neural network designed to predict the `mean_area` of tumor cell nuclei in the Breast Cancer Wisconsin dataset. 

We begin by creating a new DataFrame from the same CSV file, but this we will call `areaDF`to remind us that is DataFrame is predicting the average (mean) area.  

After creating our DataFrame, our next step is to **_pre-process_** the data. There are a variety of data manipulation steps that come under the heading `pre-processing`. Some of these manipulations are optional while other step are required. One required data manipulation is to make sure all of the categorical values (i.e. strings) have been converted to numerical values.

For our regression neural network, we definitely want to include the column `diagnosis` as part of our X feature vector. It is certainly reasonable to assume that `mean_area` might be different in tumors that are `Malignant` than tumors that are `Benign`. However, since the column `diagnosis` contains string values (i.e. `M` and `B`) we will first need to convert these strings to numerical values. 

We saw earlier that we could use One-Hot Encoding to make this conversion. In this example, we will use an alternative method called `mapping`. Mapping has the advantage over One-Hot Encoding, that it doesn't increase the number of columns. It's relative easy to employ, provided the number of different categorical values is relatively small as is it here.  

Here is the code chunk that maps the letter `M` to `1` and the letter `B` to `0`:
~~~text
mapping = {'M': 1, 'B': 0}
areaDF['diagnosis'] = areaDF['diagnosis'].map(mapping)
~~~~
From a purely mathematic perspective, you could assign these two string values to any integer without any effect on the accuracy of the regression neural network. However, the usual convention is to assign `1` to the "disease" condition, and `0` to the "non-diseased" condition.

Next, we need to create the Y feature vector. A very important difference between regression and catagorical networks, is how the Y feature vector is created. With regression, we simply assign the numerical values in the correct DataFrame column to our Numpy array as shown in this code chunk:
~~~text
areaY = areaDF['mean_area'].values
~~~

Finally, we print out the first 4 values in the X and Y feature vectors so we can make sure they have the correct format.


In [None]:
# Example 6: Generate X and Y for regression

# Read file and create DataFrame-------------------------------------
areaDF = pd.read_csv(
    "https://biologicslab.co/BIO1173/data/wcbreast.csv",
    index_col=0,
    na_values=['NA','?'])

# Pre-process variables----------------------------------------------
# Map categorical values in Quality to integers
mapping = {'B': 0, 'M': 1}
areaDF['diagnosis'] = areaDF['diagnosis'].map(mapping)

# Create X feature vector--------------------------------------------
areaX_columns = areaDF.columns.drop('mean_area')
areaX = areaDF[areaX_columns].values
areaX = np.asarray(areaX).astype('float32')

# Create Y feature vector--------------------------------------------
areaY = areaDF['mean_area'].values
areaY = np.asarray(areaY).astype('float32')

# Print out X and Y
np.set_printoptions(suppress=True,precision=4)
print("The first 4 X-values are:")
print(areaX[0:4])
print("\nThe corresponding Y-values are:")
print(areaY[0:4])


If your code is correct you should see the following output:
~~~text
The first 4 X-values are:
[[  842302.            1.           17.99         10.38        122.8
         0.1184        0.2776        0.3001        0.1471        0.2419
         0.0787        1.095         0.9053        8.589       153.4
         0.0064        0.049         0.0537        0.0159        0.03
         0.0062       25.38         17.33        184.6        2019.
         0.1622        0.6656        0.7119        0.2654        0.4601
         0.1189]
 [  842517.            1.           20.57         17.77        132.9
         0.0847        0.0786        0.0869        0.0702        0.1812
         0.0567        0.5435        0.7339        3.398        74.08
         0.0052        0.0131        0.0186        0.0134        0.0139
         0.0035       24.99         23.41        158.8        1956.
         0.1238        0.1866        0.2416        0.186         0.275
         0.089 ]
 [84300904.            1.           19.69         21.25        130.
         0.1096        0.1599        0.1974        0.1279        0.2069
         0.06          0.7456        0.7869        4.585        94.03
         0.0061        0.0401        0.0383        0.0206        0.0225
         0.0046       23.57         25.53        152.5        1709.
         0.1444        0.4245        0.4504        0.243         0.3613
         0.0876]
 [84348304.            1.           11.42         20.38         77.58
         0.1425        0.2839        0.2414        0.1052        0.2597
         0.0974        0.4956        1.156         3.445        27.23
         0.0091        0.0746        0.0566        0.0187        0.0596
         0.0092       14.91         26.5          98.87        567.7
         0.2098        0.8663        0.6869        0.2575        0.6638
         0.173 ]]

The corresponding Y-values are:
[1001.  1326.  1203.   386.1]
~~~

The last 4 values, `1001`, `1326`, `1203` and `386.1` are the `mean_area` values for the 4 first subjects in our feature vectors. Our goal is to build and train a regression neural network than accurately predict these `mean_area` numbers, given the values in the X feature vector.

### **Exercise 6: Generate Feature Vectors for Regression Neural Network**

In the cell below, generate X and Y for a regression neural network that can predict the maximum heart rate,`MaxHR`, for the Heart Disease dataset. Start by reading the `heart_disease.cvs` dataset and making a new DataFrame called `restDF`. 

Start by pre-processing the 5 columns with categorical (strings) values using One-Hot Encoding:
1. `Sex`
2. `ChestPainType`
3. `RestingECG`
4. `ExerciseAngina`
5. `ST_Slope`

Here is a code chunk example for One-Hot Encoding the first column `Sex`:
~~~text
# Sex
dummies = pd.get_dummies(HRateDF['Sex'],prefix="Sex", dtype=int)
HRateDF = pd.concat([HRateDF,dummies],axis=1)
HRateDF.drop('Sex', axis=1, inplace=True)
~~~

Once you completed the One-Hot Encoding, you should ready to generate your X feature vector. Since the goal of your regression neural network is to predict maximum heart rate `MaxHR`, make sure to drop the column name when you create your list of X column names, `HRateX`. 

When you have generated X and Y, print out the values in `hd2X` and `hd2Y`. 

In [None]:
# Insert your code for Exercise 6 here
 


If your code is correct you should see the following output:

~~~text
The first 4 X-values are:
[[ 40.  140.  289.    0.    0.    0.    0.    1.    0.    1.    0.    0.
    0.    1.    0.    1.    0.    0.    0.    1. ]
 [ 49.  160.  180.    0.    1.    1.    1.    0.    0.    0.    1.    0.
    0.    1.    0.    1.    0.    0.    1.    0. ]
 [ 37.  130.  283.    0.    0.    0.    0.    1.    0.    1.    0.    0.
    0.    0.    1.    1.    0.    0.    0.    1. ]
 [ 48.  138.  214.    0.    1.5   1.    1.    0.    1.    0.    0.    0.
    0.    1.    0.    0.    1.    0.    1.    0. ]]

The corresponding Y-values are:
[172. 156.  98. 108.]
~~~~

The last 4 numbers are the maximum heart rates of the first 4 subjects. When you build your regression neural network, you will try to predict these values. 

### Example 7: Construct, compile and train a regression neural network

The X and Y feature vectors are now ready for training a regression neural network. When building a neural network for regression, there are three points to remember. 

* **First:** a regression neural network only has a single neuron in its output layer. The "voltage" in this single output neuron at the end of each epoch, represents the network's numerical prediction. So in this example, the numerical value in the output neuron would be the `mean_area` value of the tumor cell nucleus. 

The difference between the `areaModel` model's predicted `mean_area` values, and the actual actual `mean_area` values stored in the Y feature vector, `areaY`, is the loss value. The better the neural network becomes at predicting mean area the smaller the loss value becomes. 

* **Second:** a regression neural network does **not** use an activation function in the output layer. 

* **Third:** when you compile a regression neural network, the loss function should be `mean_squared_error`. 


In [None]:
# Example 7: Construct, compile and train

# Construct model------------------------------------------
areaModel = Sequential()
areaModel.add(Dense(25, input_dim=areaX.shape[1], 
                  activation='relu')) # Hidden 1
areaModel.add(Dense(10, activation='relu')) # Hidden 2
areaModel.add(Dense(1)) # Output: 1 neuron, no activation

# Compile model---------------------------------------------
areaModel.compile(loss='mean_squared_error', optimizer='adam')

# Train model
areaModel.fit(areaX,areaY,verbose=2,epochs=100)


If your code is correct you should see something similar to the following output:

~~~text
Epoch 1/100
18/18 - 1s - loss: 1518572929024000.0000 - 530ms/epoch - 29ms/step
Epoch 2/100
18/18 - 0s - loss: 781124628381696.0000 - 107ms/epoch - 6ms/step
Epoch 3/100
18/18 - 0s - loss: 406562879307776.0000 - 113ms/epoch - 6ms/step
Epoch 4/100
18/18 - 0s - loss: 154065895948288.0000 - 97ms/epoch - 5ms/step
Epoch 5/100
18/18 - 0s - loss: 60408803098624.0000 - 73ms/epoch - 4ms/step

.................................

Epoch 95/100
18/18 - 0s - loss: 272357.2188 - 77ms/epoch - 4ms/step
Epoch 96/100
18/18 - 0s - loss: 272300.7188 - 80ms/epoch - 4ms/step
Epoch 97/100
18/18 - 0s - loss: 272286.0938 - 71ms/epoch - 4ms/step
Epoch 98/100
18/18 - 0s - loss: 272171.0938 - 65ms/epoch - 4ms/step
Epoch 99/100
18/18 - 0s - loss: 272537.3125 - 64ms/epoch - 4ms/step
Epoch 100/100
18/18 - 0s - loss: 272240.4688 - 63ms/epoch - 4ms/step

<keras.callbacks.History at 0x1fded47ef70>
~~~
In this particular example, the loss function (RMSE) reported a loss of 1518572929024000.0000 after the 1st epoch. After 100 epochs, the loss had decrease to 272240.4688. While these might look to be very "bad" values, they made not be as bad as they look. We will look more closely how to interpret RSME values in an upcoming lesson. 

### **Exercise 7: Construct, Compile, Train Regression Neural Network**

In the cell below create a regression neural network for the Heart Disease dataset that can predict the maximum heart rate (`MaxHR`).  Use Example 7 as a template. Call your neural network, `HRateModel`. Train (fit) you model for 100 epochs.

Don't forget to set the input dimension of your 1st hidden layer correctly, our you will get an error.

In [None]:
# Insert your code for Exercise 7 here




If your code is correct you should see something similar to the following output:

~~~text
Epoch 1/100
29/29 - 0s - loss: 18498.4375 - 475ms/epoch - 16ms/step
Epoch 2/100
29/29 - 0s - loss: 6507.8062 - 89ms/epoch - 3ms/step
Epoch 3/100
29/29 - 0s - loss: 1651.9683 - 85ms/epoch - 3ms/step
Epoch 4/100
29/29 - 0s - loss: 1025.9800 - 82ms/epoch - 3ms/step
Epoch 5/100
29/29 - 0s - loss: 1000.3663 - 82ms/epoch - 3ms/step

.............

Epoch 95/100
29/29 - 0s - loss: 687.0899 - 106ms/epoch - 4ms/step
Epoch 96/100
29/29 - 0s - loss: 687.4363 - 119ms/epoch - 4ms/step
Epoch 97/100
29/29 - 0s - loss: 684.6712 - 117ms/epoch - 4ms/step
Epoch 98/100
29/29 - 0s - loss: 683.4319 - 115ms/epoch - 4ms/step
Epoch 99/100
29/29 - 0s - loss: 681.8236 - 107ms/epoch - 4ms/step
Epoch 100/100
29/29 - 0s - loss: 681.7521 - 122ms/epoch - 4ms/step

<keras.callbacks.History at 0x1fdebc655b0>
~~~

In the example above, the RMSE loss went from 18498.4375 after the first epoch, down to 681.7521 after the 100th epoch. Clearly your model improved it's ability to predict maximumm heart rate `MaxHR` with training.

## **Lesson Turn-in**

When you have completed all of the code cells, run all of them in sequential order (the last code cell should be number 22, not counting the Appendix below). Then use the **File --> Print.. --> Save to PDF** to generate a PDF of your JupyterLab notebook. Save your PDF as `Class_04_1.lastname.pdf` where _lastname_ is your last name, and upload the file to Canvas.

## Appendix

The code in the cells use the Pandas method `pd.describe()` to print out a statistical summary of the column `mean_area` in the Wisconsin Breast Cancer dataset and the column `MaxHR` in the Heart Disease dataset. 

In [None]:
areaDF['mean_area'].describe()

In [None]:
HRateDF['MaxHR'].describe()