<a href="https://colab.research.google.com/github/DavidSenseman/BIO1173/blob/master/Class_03_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---------------------------
**COPYRIGHT NOTICE:** This Jupyterlab Notebook is a Derivative work of [Jeff Heaton](https://github.com/jeffheaton) licensed under the Apache License, Version 2.0 (the "License"); You may not use this file except in compliance with the License. You may obtain a copy of the License at

> [http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

------------------------

# BIO 1173: Intro Computational Biology

**Module 3: Introduction to TensorFlow**

* Instructor: [David Senseman](mailto:David.Senseman@utsa.edu), [Department of Integrative Biology](https://sciences.utsa.edu/integrative-biology/), [UTSA](https://www.utsa.edu/)


### Module 3 Material

* Part 3.1: Deep Learning and Neural Network Introduction
* Part 3.2: Introduction to Tensorflow and Keras
* **Part 3.3: Saving and Loading a Keras Neural Network**
* Part 3.4: Early Stopping in Keras to Prevent Overfitting
* Part 3.5: Extracting Weights and Manual Calculation


# Google CoLab Instructions

The following code ensures that Google CoLab is running the correct version of TensorFlow.
  Running the following code will map your GDrive to ```/content/drive```.

In [None]:
try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
    COLAB = True
    print("Note: using Google CoLab")
    %tensorflow_version 2.x
except:
    print("Note: not using Google CoLab")
    COLAB = False

### Lesson Setup

Uncomment and run the next code cell **_only_** if you receive an error message about not having the package `h5py`.

In [None]:
# Load the package h5py ONLY if you get an error message after you run the next cell
#!pip install h5py==3.7.0

In [None]:
# You MUST run this code cell first
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from sklearn import metrics
from sklearn.metrics import accuracy_score

import numpy as np
import pandas as pd
import h5py
import requests

import os
import shutil
path = '/'
memory = shutil.disk_usage(path)
dirpath = os.getcwd()
print("Your current working directory is : " + dirpath)
print("Disk", memory)
print("Numpy version =", (np.__version__)) 
print("Tensorflow version =", (tf.__version__))
print("Available GPU acceleration =", tf.test.gpu_device_name())
print("Current version of h5py =", (h5py.__version__))

# Part 3.3: Saving and Loading a Keras Neural Network

Complex neural networks will take a long time to fit/train.  It is helpful to be able to save these neural networks so that you can reload them later.  A reloaded neural network will not require retraining.  Keras provides three formats for neural network saving.

* **JSON** - Stores the neural network structure (no weights) in the [JSON file format](https://en.wikipedia.org/wiki/JSON).
* **HDF5** - Stores the complete neural network (with weights) in the [HDF5 file format](https://en.wikipedia.org/wiki/Hierarchical_Data_Format). Do not confuse HDF5 with [HDFS](https://en.wikipedia.org/wiki/Apache_Hadoop).  They are different.  We do **not** use HDFS in this class, only HDF5.

Usually, you will want to save in HDF5.

### Example 1: Build and train a neural network

The code in the cell below reads the Apple Quality dataset from the course HTTP server and creates a DataFrame called `apDF`. 

**Data Pre-processing:** There isn't too much to do to get this datafile ready for processing except to map the two string classes in the `Quality`, "bad" and "good" to `0` and `1`, respectively. For this assigment we won't bother to spit the data into Training/Validation sets, nor will we bother to suffle the data. 

Our goal is to build a neural network called `apModel` that can classify apples based on their Size, Weigth, Sweetness, Crunchiness, Juiciness, Acidity and Ripeness. These are therefore the _independent variables_ ("features") which we will assign to the variable `apX`. 

The _dependent variable_ or "response variable" is apple `Quality` which will be assigned to the variable `apY`. For classification we will need to One-Hot Encode the `Quality` column to get the correct format. 

Because we want `apModel` to act as a _classifier_, (instead of being a "regressor") we will use the `softmax` activation function in the output layer. We will also compile the model using `categorical_crossentropy` as the loss function instead of using `mean_squared_error`.   

In [None]:
# Example 1: Build and train a neural network

# Read in data and create DataFrame
apDF = pd.read_csv(
    "https://corgi.genomelab.utsa.edu/BIO1173/Datasets/apple_quality.csv", 
    na_values=['NA', '?'])

# Pre-process data
mapping = {'bad': 0, 'good': 1}
apDF['Quality'] = apDF['Quality'].map(mapping)

# Generate independent variables x
apX = apDF[['Size', 'Weight', 'Sweetness', 'Crunchiness',
       'Juiciness', 'Acidity', 'Ripeness']].values

# One-Hot Encode dependent variables y
dummies = pd.get_dummies(apDF['Quality'], dtype=int) # Classification
apY = dummies.values

# Build neural network
apModel = Sequential()
apModel.add(Dense(50, input_dim=apX.shape[1], activation='relu')) # Hidden 1
apModel.add(Dense(25, activation='relu')) # Hidden 2
apModel.add(Dense(apY.shape[1],activation='softmax')) # Output
apModel.compile(loss='categorical_crossentropy', optimizer='adam')

# Fit model to the data
apModel.fit(apX,apY,verbose=2,epochs=100)

### **Exercise 1: Build and train a neural network** 

In the cell below build and train a new neural network called `pimaModel` on the dataset `pima.csv` that is located on the course HTTP server. Since there are **no** string values in this dataset, no pre-processing of the data is needed. 

The goal of your neural network model `pimaModel` is to classify the subjects in the Pima dataset (Native American women of the Pima Indian tribe) using their Pregnancies, Glucose, BloodPressure, SkinThickness, Insulin, BMI, DiabetesPedigreeFunction, and Age as independent variables or "features". You should label these independent values as `pimaX`. 

The dependent variable, or "reponse variable", should be the `Outcome` column. A value of `1` in the `Outcome` column indicates that the female subject has been clinically diagnosed with having Type II Diabetes while a value of `0` means she hasn't been diagnosed with this pathology. You will need to One-Hot Encode the `Outcome` column to create "dummies" and then assign the dependent variable `pimaY` to these `dummies.values` as shown in Example 1.

You should use the same neural network architecture as shown in Example 1 for your `pimaModel`. Train/fit your model on `pimaX` and `pimaY` for 100 epochs.  

In [None]:
# Insert your code for Exercise 1 here 



### Example 2: Determine the model's RSME and Accuracy

The overall objective of this assignment is to convince you that can save a _trained_ neural network to a file, and then later, recreate the neural network from the file, without changing the model's accuracy. 

Why is this important? 

As you already know, it can take significant time and processing power to train even relatively small neural networks that we created so far in this course. Neural networks that are used commercially (think "Siri" or "Alexa" or ChatGPT) are many times larger and require enormous resources as well as weeks (or months) to train. Obviously, if you had to train a neural network every time you wanted to use it, it won't be very practical and there would be little interest in "AI". However, once the neural network has been trained, you can save it to a file, and then re-use it over and over again, without any loss in the neural network's ability to solve problems (i.e. accuracy).      

The code in the cell below calculates ability of the `apModel` neural network to predict an apple's quality based on its Size, Weigth, Sweetness, Crunchiness, Juiciness, Acidity and Ripeness. Two measures of predictive ability are computed, the Root Mean Square Error (RMSE) and Accuracy. The code stores the RSME value in the variable `apScore` and the Accuracy value in the variable `apCorrect` and then prints out thes values.


In [None]:
# Example 2: Determine the model's RMSE and Accuracy

from sklearn.metrics import accuracy_score

# Measure RMSE 
apPred = apModel.predict(apX)
apScore = np.sqrt(metrics.mean_squared_error(apPred,apY))
print(f"Before save score (RMSE): {apScore}")

# Measure Accuracy
apPredict_classes = np.argmax(apPred,axis=1)
apExpected_classes = np.argmax(apY,axis=1)
apCorrect = accuracy_score(apExpected_classes,apPredict_classes)
print(f"Before save Accuracy: {apCorrect}")


If your code is correct you should see something similiar to the following output:

### **Exercise 2: Determine the model's RSME and Accuracy**

In the cell below, determine the RSME and Accuracy of your `pimaModel`. Store the RSME value in a variable called `pimaScore` and the Accuracy value in a variable called `pimaCorrect`. Print out these values as shown in Example 2 above.

In [None]:
# Insert your code for Exercise 2 here



If your code is correct you should see something similar to the following output:

The code below sets up a neural network and reads the data (for predictions), but it does not clear the model directory or fit the neural network. The code loads the weights from the previous fit. Now we reload the network and perform another prediction. The RMSE should match the previous one exactly if we saved and reloaded the neural network correctly.

### Example 3: Save the model

The code in the cell below saves the _trained_ neural network `apModel` as a file in two different file formats: JSON and HDF5. The code saves each file in the current working directory (`save_path = "."`). The filename of the JSON file is `apModel.json` while the filename of the HDF5 file is `apModel.h5`.

In [None]:
# Example 3: Save the model

# Save path is the current directory
save_path = "."

# Save neural network structure to JSON (no weights)
apModel_json = apModel.to_json()
with open(os.path.join(save_path,"apModel.json"), "w") as json_file:
    json_file.write(apModel_json)

# Save entire network to HDF5 (save everything)
apModel.save(os.path.join(save_path,"apModel.h5"))

# Print out the files in current directory
files = os.listdir()
print(files)

If your code is correct you should see the following output:

The advantage of the JSON format is that it can be visually inspected -- just click on the file name in the file browser panel. The JSON file perserves the model's _architecture_ which you can see by looking at the JSON file.

On the other hand, you **can't** view the contents of the HDF5 file since it is not UTF-8 encoded (i.e., it's _formated_). However, you should _always_ save your model in the HDF5 format since this **_preserves architecture and the values of the weights_** of the model's connections. By preserving the values of the weights, you don't have to train/fit the model again. 

### **Exercise 3: Save the model**

In the code cell below save the _trained_ neural network `pimaModel` as a JSON file with the filename, `pimaModel.jspn`, and as a HDF5 file with the filenmane `pimaModel.h5`. Save both files to your current working directory (`save_path = "."`). 

In [None]:
# Insert your code for Exercise 3 here



If your code is correct you should see the following output:

### Example 4: Create new model from saved model

Once a trained model has been saved in the HDF5 format, it is a simple matter to read the file to make an exact copy of the model using the Keras function `load_model()` as shown in the cell below. This new neural network is called `apModel_2` to differentiate it from the one built previously.  

In [None]:
# Example 4: Create new model from saved model

from tensorflow.keras.models import load_model

# Look in current folder
save_path = "."

# Create model2 from the saved model
apModel_2 = load_model(os.path.join(save_path,"apModel.h5"))

# Print out model summary
apModel_2.summary()


### **Exercise 4: Create new model from saved model**

In the cell below create a new neural network called `pimaModel_2` from the file `pimaModel.h5` in your current directory. Print out a summary of `pimaModel_2`.

In [None]:
# Insert your code for Exercise 4 here



If your code is correct you should see something similiar to the following:

### Example 5: Compare the Predictive Accuracy of the old and new models

The code in the cell below computes the RMSE error and the Accuracy of the new model `apModel_2` with the original `apModel` and prints out these values for comparison. 

In [None]:
# Example 5: Determine new model's RMSE and Accuracy

# Measure RMSE error.
apPred_2 = apModel_2.predict(apX)
apScore_2 = np.sqrt(metrics.mean_squared_error(apPred_2,apY))
print(f"Before save score (RMSE): {apScore}")
print(f"After save score (RMSE) : {apScore_2}")

# Measure the accuracy
apPredict_classes_2 = np.argmax(apPred_2,axis=1)
apExpected_classes_2 = np.argmax(apY,axis=1)
apCorrect_2 = accuracy_score(apExpected_classes_2,apPredict_classes_2)
print(f"Before save Accuracy: {apCorrect}")
print(f"After save Accuracy : {apCorrect_2}")


If your code is correct you should see the following output:

You should note that the new model `apModel_2` has exactly the same predictive accuracy as the original model `apModel` built in Exercise 1. 

### **Exercise 5: Compare the Predictive Accuracy of the old and new models**

In the cell below write the code to compute the RMSE and Accuracy values for your `pimaModel_2` and print out these values along with the values computed previously in **Exercise 2**.

In [None]:
# Insert your code for Exercise 5 here



If your code is corrct you should see the following output:

You should note that your new model `pimaModel_2` has exactly the same predictive accuracy as the original model `pimaModel` that you built in **Exercise 1**.

## **Lesson Turn-in**

When you have completed all of the code cells, and run them in sequential order (the last code cell should be number 13), use the **File --> Print.. --> Save to PDF** to generate a PDF of your JupyterLab notebook. Save your PDF as `Class_03_3.lastname.pdf` where _lastname_ is your last name, and upload the file to Canvas.