<a href="https://colab.research.google.com/github/DavidSenseman/BIO1173/blob/master/Class_03_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---------------------------
**COPYRIGHT NOTICE:** This Jupyterlab Notebook is a Derivative work of [Jeff Heaton](https://github.com/jeffheaton) licensed under the Apache License, Version 2.0 (the "License"); You may not use this file except in compliance with the License. You may obtain a copy of the License at

> [http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

------------------------

# **BIO 1173: Intro Computational Biology**

**Module 3: Introduction to TensorFlow**

* Instructor: [David Senseman](mailto:David.Senseman@utsa.edu), [Department of Integrative Biology](https://sciences.utsa.edu/integrative-biology/), [UTSA](https://www.utsa.edu/)


### Module 3 Material

* Part 3.1: Deep Learning and Neural Network Introduction
* **Part 3.2: Introduction to Tensorflow and Keras**
* Part 3.3: Saving and Loading a Keras Neural Network
* Part 3.4: Early Stopping in Keras to Prevent Overfitting
* Part 3.5: Extracting Weights and Manual Calculation


## Google CoLab Instructions

The following code ensures that Google CoLab is running the correct version of TensorFlow.

In [None]:
try:
    %tensorflow_version 2.x
    COLAB = True
    print("Note: using Google CoLab")
except:
    print("Note: not using Google CoLab")
    COLAB = False

### Lesson Setup

Run the next code cell to load necessary packages

In [None]:
# You MUST run this code cell first

# Import TensorFlow modules
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.callbacks import EarlyStopping

# Import Keras modules
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation

# Import scikit-learn metrics
from sklearn import metrics

# Import other needed packages
import time
import numpy as np
import pandas as pd
import requests
import os
os.environ['tf.compat.v1.logging.set_verbosity'] = '1'
import shutil
path = '/'
memory = shutil.disk_usage(path)
dirpath = os.getcwd()

# Print out diagnostics
print("Your current working directory is : " + dirpath)
print("Disk", memory)
print("TensorFlow version:", tf.version.VERSION)
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

# Part 3.2: Introduction to Tensorflow and Keras

TensorFlow [[Cite:GoogleTensorFlow]](https://research.google/pubs/pub45381/) is an open-source software library for machine learning in various kinds of perceptual and language understanding tasks. It is currently used for research and production by different teams in many commercial Google products, such as speech recognition, Gmail, Google Photos, and search, many of which had previously used its predecessor DistBelief. TensorFlow was originally developed by the Google Brain team for Google's research and production purposes and later released under the Apache 2.0 open source license on November 9, 2015.

* [TensorFlow Homepage](https://www.tensorflow.org/)
* [TensorFlow GitHib](https://github.com/tensorflow/tensorflow)
* [TensorFlow Google Groups Support](https://groups.google.com/forum/#!forum/tensorflow)
* [TensorFlow Google Groups Developer Discussion](https://groups.google.com/a/tensorflow.org/forum/#!forum/discuss)
* [TensorFlow FAQ](https://www.tensorflow.org/resources/faq)

## Why TensorFlow

* Supported by Google
* Works well on Windows, Linux, and Mac
* Excellent GPU support
* Python is an easy to learn programming language
* Python is extremely popular in the data science community

## Deep Learning Tools
TensorFlow is not the only game in town. The biggest competitor to TensorFlow/Keras is PyTorch. Listed below are some of the deep learning toolkits actively being supported:

* **[TensorFlow](https://www.tensorflow.org/)** - Google's deep learning API.  The focus of this class, along with Keras.
* **[Keras](https://keras.io/)** - Acts as a higher-level to Tensorflow.
* **[PyTorch](https://pytorch.org/)** - PyTorch is an open-source machine learning library based on the Torch library, used for computer vision and natural language applications processing. Facebook's AI Research lab primarily develops PyTorch. 

Other deep learning tools:

* **[Deeplearning4J](http://deeplearning4j.org/)** - Java-based. Supports all major platforms. GPU support in Java!
* **[H2O](http://www.h2o.ai/)** - Java-based.  

In my opinion, the two primary Python libraries for deep learning are PyTorch and Keras. Generally, PyTorch requires more lines of code to perform the deep learning applications presented in this course. This trait of PyTorch gives Keras an easier learning curve than PyTorch. However, if you are creating entirely new neural network structures in a research setting, PyTorch can make for easier access to some of the low-level internals of deep learning.

## Using TensorFlow Directly

Most of the time in the course, we will communicate with TensorFlow using Keras [[Cite:franccois2017deep]](https://www.manning.com/books/deep-learning-with-python), which allows you to specify the number of hidden layers and create the neural network. TensorFlow is a low-level mathematics API, similar to [Numpy](http://www.numpy.org/). However, unlike Numpy, TensorFlow is built for deep learning. TensorFlow compiles these compute graphs into highly efficient C++/[CUDA](https://en.wikipedia.org/wiki/CUDA) code.

------------------------------------

### **CUDA CODE**

**CUDA (Compute Unified Device Architecture)** is a parallel computing platform and programming model developed by NVIDIA. It enables developers to utilize the power of NVIDIA GPUs (Graphics Processing Units) for general-purpose computation.

CUDA code refers to the code written in CUDA programming language, which is an extension of C/C++. With CUDA, developers can write programs that run on GPUs, enabling massive parallel processing and acceleration of computations.

Deep neural networks (DNNs) are computationally intensive and often involve performing millions or billions of mathematical operations. Traditional CPUs (Central Processing Units) may not provide sufficient computational power for training and inference of these complex models.

This is where CUDA plays a crucial role. By utilizing CUDA code, developers can offload computationally intensive operations in deep neural networks to NVIDIA GPUs. GPUs excel at parallel processing and have thousands of cores that can perform calculations simultaneously. This parallelism greatly speeds up the computations required by DNNs, making training and inference significantly faster.

CUDA code enables the efficient utilization of GPU resources, such as parallel execution of threads, shared memory, and optimized memory access patterns. It provides libraries, functions, and APIs specifically designed for deep learning frameworks like TensorFlow, PyTorch, and Keras.

The importance of CUDA code for building deep neural networks lies in its ability to accelerate the training process. It reduces the time required for model training, allowing researchers and developers to experiment with larger datasets, more complex models, and deeper architectures. The accelerated training with CUDA ultimately leads to faster model convergence and the ability to iterate on network designs more quickly, accelerating the development of advanced neural networks.

-------------------------------


## TensorFlow Linear Algebra Examples

**TensorFlow** is a library for linear algebra. **Keras** is a higher-level abstraction for neural networks that you build upon TensorFlow. In this section, I will demonstrate some basic linear algebra that directly employs TensorFlow and does not use Keras. First, we will see how to multiply a row and column matrix using a TensorFlow constant.

A TensorFlow constant is a type of tensor, which is a multi-dimensional array, that holds a fixed value. In TensorFlow, a constant is an operation that produces a tensor with a specified shape and value, and its value does not change during the execution of a computational graph.

Constants are commonly used to store and represent fixed values or parameters in a TensorFlow program. For example, a constant may be used to define the learning rate of a neural network, the dimensions of input data, or the weights of a pre-trained model.

Defining a TensorFlow constant involves specifying its shape and value. Once created, the constant can be used as input for various operations, such as arithmetic operations, matrix operations, or as an input placeholder for a machine learning model.

It's important to note that **tensorflow _constants_ are immutable**, meaning their values cannot be changed once they are assigned. This immutability allows TensorFlow to optimize and execute computations more efficiently, as it can assume the constant value remains the same throughout the graph execution. However, as will be shown below, tensorflow variables are mutatable -- that's why they are called **tensorflow _variables_**.

## Using a `constant op` to Create a `constant tensor`

In TensorFlow, a **_constant op_** refers to an _operation_ that creates and produces a **_constant tensor_** as its output. It is a specific type of operation used to generate tensors with a fixed value.

A constant op in TensorFlow has the following key properties:

* **Creation of a Tensor:** The primary purpose of a constant op is to create a tensor and assign a fixed value to it. The value can be specified explicitly, such as a scalar value, an array, or a multidimensional array. Once created, the constant tensor holds the same value throughout the execution of a TensorFlow graph.
* **Fixed Value:** As the name suggests, the value of a constant op remains constant; it does not change during the execution of the graph. This allows TensorFlow to optimize the computation and enhance performance, as it can assume the constant value remains unchanged.
* **Immutable:** Similar to other TensorFlow constants, a constant op is immutable. Once created, the constant tensor's value cannot be modified or updated. This property ensures that the value maintains its integrity during the execution, contributing to consistency and reproducibility.

Constant ops serve as _fundamental building blocks_ in TensorFlow. They are utilized for a range of purposes, including defining and setting up fixed parameters, storing constant values or literals, specifying dimensions, or providing inputs to various computations.


### Example 1: Create a Tensor using using `Matmul`

The code in the cell below uses the command `tf.constant()` to creates two TensorFlow constants, `matrix1` and `matrix2`. The code then uses the Tensorflow function, `tf.matmul()`, to multiply these TensorFlow constants together. The output of the `matmul` op creates `product1` which is a type of tensor. Finally, the code prints out the contents of `product1` with, and without the modifier `float` in the print statement.

In [None]:
# Example 1: Create a Tensor using Matmul

# Create a Constant op that produces a 1x2 matrix.  The op is
# added as a node to the default graph.
#
# The value returned by the constructor represents the output
# of the Constant op.
matrix1 = tf.constant([[3.0, 3.0]])

# Create another Constant that produces a 2x1 matrix.
matrix2 = tf.constant([[2.0],[2.0]])

# Create a Matmul op that takes 'matrix1' and 'matrix2' as inputs.
# The returned value, 'product', represents the result of the matrix
# multiplication.
product1 = tf.matmul(matrix1, matrix2)

print(product1)
print(float(product1))

If your code is working correctly you should see the following output:

~~~text
tf.Tensor([[12.]], shape=(1, 1), dtype=float32)
12.0
~~~

### **Exercise 1: Create a Constant Op using `Matmul`**

In the cell below, create two TensorFlow constants. Called the first constant `matrix3`, with the values `[6.0, 6.0]` and the second constant `matrix4`, with the values `[4.0, 4.0]`. 

Use the `tf.matmul()` operation to generate the product of these two constants. Name the return value of the matmul op `product2`. Finally print out the contents of `product2` with and without the `float` modifier as shown in Example 1. 

In [None]:
# Insert your code for Exercise 1 here



If your code is correct you should see the following output:

~~~text
tf.Tensor([[48.]], shape=(1, 1), dtype=float32)
48.0
~~~

### Example 2: Subtract a constant from a variable

The code in Example 1 showed how to create two TensorFlow constants with the function, `tf.constant()` and then used the TensorFlow function, `tf.matmul()` to multiply these constants together.  

In Example 2, we use the TensorFlow operation `tf.Variable()` to create a TensorFlow variable called `x1`. We also create a TensorFlow constant called `a1`. We then use the TensorFlow operation `tf.subtract()` to subtract the constant from the variable to produce a tensor called `sub1`.

Finally, the contents of `sub1` are printed out.

In [None]:
# Example 2: Subtract a constant from a variable

# Create a tensorflow variable
x1 = tf.Variable([1.0, 2.0])

# Create a tensorflow constant
a1 = tf.constant([3.0, 3.0])

# Subtract 'a1' from 'x1' using a subtraction op.  
sub1 = tf.subtract(x1, a1)

# Print out the contents
print(sub1)
print(sub1.numpy())

If your code is working correctly you should see the following output:

~~~text
tf.Tensor([-2. -1.], shape=(2,), dtype=float32)
[-2. -1.]
~~~

### **Exercise 2: Subtract a constant from a variable**

In **Exercise 2**, create a TensorFlow variable called `x2` with the value `[2.0, 2.0]` and a TensorFlow constant called `a2` with the values `[1.0, 3.0]`. Perform a subtraction op to subtract `a2` from `x2` to generate a tensor called `sub2`. Print out the contents of `sub2`.

In [None]:
# Insert your code for Exercise 2 here



If your code is working correctly you should see the following output:

~~~text
tf.Tensor([ 1. -1.], shape=(2,), dtype=float32)
[ 1. -1.]
~~~

### Example 3: Change values in a TensorFlow variable

While tensorflow _constants_ are immutable (i.e. their value can't be changed once they are created), a TensorFlow _variable_ is just that -- **_variable_**! In other words, TensorFlow variables are _mutable_. 

Once a tensorflow variable has been created, its value can be changed using the `.assign()` method as shown in the cell below.


In [None]:
# Example 3: Change values in a TensorFlow variable

# Use the assign method
x1.assign([4.0, 6.0])

If your code is working correctly you should see the following output:

~~~text
<tf.Variable 'UnreadVariable' shape=(2,) dtype=float32, numpy=array([4., 6.], dtype=float32)>
~~~

### **Exercise 3: Change values in a TensorFlow variable**

For **Exercise 3** change the values in the `tensorflow.variable` created in **Exercise 2**, `x2` to the values `[4.0, 2.0]`.

In [None]:
# Insert your code for Exercise 3 here



If your code is working correctly you should see the following output:

~~~text
<tf.Variable 'UnreadVariable' shape=(2,) dtype=float32, numpy=array([4., 2.], dtype=float32)>
~~~

In the next section, we will see a TensorFlow example that has nothing to do with neural networks.

## TensorFlow Mandelbrot Set Example

Next, we examine another example where we use TensorFlow directly. To demonstrate that TensorFlow is mathematical and does not only provide neural networks, we will also first use it for a non-machine learning rendering task. The code presented here can render a [Mandelbrot set](https://en.wikipedia.org/wiki/Mandelbrot_set).

The code below **_only_** generates a Python function. As you should recall, a function, by itself, doesn't do anything. So don't expect to see any output when you run the next cell.

In [None]:
# Create functions for generating the Mandelbrot set using TensorFLow

import PIL.Image
from io import BytesIO
from IPython.display import Image, display

def render(a):
  a_cyclic = (a*0.3).reshape(list(a.shape)+[1])
  img = np.concatenate([10+20*np.cos(a_cyclic),
                        30+50*np.sin(a_cyclic),
                        155-80*np.cos(a_cyclic)], 2)
  img[a==a.max()] = 0
  a = img
  a = np.uint8(np.clip(a, 0, 255))
  f = BytesIO()
  return PIL.Image.fromarray(a)

#@tf.function
def mandelbrot_helper(grid_c, current_values, counts,cycles):
  
  for i in range(cycles):
    temp = current_values*current_values + grid_c
    not_diverged = tf.abs(temp) < 4
    current_values.assign(temp),
    counts.assign_add(tf.cast(not_diverged, tf.float32))

def mandelbrot(render_size,center,zoom,cycles):
  f = zoom/render_size[0]
  real_start = center[0]-(render_size[0]/2)*f
  real_end = real_start + render_size[0]*f 
  imag_start = center[1]-(render_size[1]/2)*f
  imag_end = imag_start + render_size[1]*f 

  real_range = tf.range(real_start,real_end,f,dtype=tf.float64)
  imag_range = tf.range(imag_start,imag_end,f,dtype=tf.float64)
  real, imag = tf.meshgrid(real_range,imag_range)
  grid_c = tf.constant(tf.complex(real, imag))
  current_values = tf.Variable(grid_c)
  counts = tf.Variable(tf.zeros_like(grid_c, tf.float32))

  mandelbrot_helper(grid_c, current_values,counts,cycles)
  return counts.numpy()

If your code was correct, you shouldn't see any output. 

With the above code defined, we can now calculate and render a Mandlebrot plot.

**Fun Fact:** The fractal corresponding to the Mandelbrot set has a finite area estimated at 1.506484 square units. Mathematicians haven’t pinpointed the exact number yet and don’t know whether it’s rational or not. On the other hand, the perimeter of the Mandelbrot set is infinite. Check out the [coastline paradox](https://en.wikipedia.org/wiki/Coastline_paradox) to learn about an interesting parallel of this weird fact in real life.

### Example 4: Generate a Mandelbrot image

The code in the cell below uses the function `mandelbrot()` created above to generate a Mandelbrot image with a render size of only 640 X 480 pixels. The values used for the render size can be changed. Larger values generate larger images, but take longer to compute. Since this is a comparatively small image, it should render in a relatively short period of time. The cell uses code to record the length of time your computer/laptop needed to render the image.      


In [None]:
# Example 4: Generate Mandelbrot image 640x480

# Record the start time in st
st = time.time()

# Set render size here
counts = mandelbrot(
    render_size=(640,480),
    center=(-0.5,0),
    zoom=4,   
    cycles=200
)  

# Generate the img using render() function
img = render(counts)

# Print out the image size
print(img.size)

# Record the end time in et
et = time.time()

# Print out time
seconds = int((et-st))
seconds = seconds % (24 * 3600)
hour = seconds // 3600
seconds %= 3600
minutes = seconds // 60
seconds %= 60
print("Elapsed time = %d:%02d:%02d" % (hour, minutes, seconds))

# Print the image to the notebook
img

# Uncomment the next line if you want to save your image to your laptop
#img.save("test.png")


Here is the output generated by a relatively powerful Windows workstation:

~~~text
(640, 480)
Elapsed time = 0:00:02
~~~

### **Exercise 4: Generate a Mandelbrot image**

In the cell below, write the code to generate a Mandelbrot image using Example 4 as a template. You should increase the render size. See if your computer can render an "HD" image (1920 X 1080 pixels). If that "works" (i.e. the Kernel doesn't stop) you could try a "4K" image (3840 X 2160 pixels). The Kernel died when I tried to generate a 4K Mandelbrot image, but maybe you computer is setup more efficiently. 

**WARNING: MAKE SURE TO _SAVE_ YOUR NOTEBOOK BEFORE YOU RUN EXERCISE 4!**

Before you run **Exercise 4** save your notebook just in case Jupyter Lab "chokes" trying to generate a large image. With a large image, it will take some time so be patient. 

In [None]:
# Insert your code for Exercise 4 here



Mandlebrot rendering programs are both simple and infinitely complex at the same time. This view shows the entire Mandlebrot universe simultaneously, as a view completely zoomed out. However, if you zoom in on any non-black portion of the plot, you will find infinite hidden complexity. 

# Introduction to Keras

[Keras](https://keras.io/) is a layer on top of TensorFlow that makes it much easier to create neural networks. Rather than define the graphs, as you see above, you set the individual layers of the network with a much more high-level API (Application Programming Interface). Unless you are researching entirely new structures of deep neural networks, it is unlikely that you need to program TensorFlow directly.  

**For this class, we will usually use TensorFlow _through_ Keras, rather than direct TensorFlow**


## Simple TensorFlow Regression

Example 5 below shows how to encode the Apple Quality dataset for regression and predict values. We will see if we can predict the ripeness of an apple based on an apples's size, weight, sweetness, crunchiness, and other features. Example 5 is divided into a number of steps to make understanding of the coding easier to follow. 

**WARNING:** In the first step, the Apple Quality dataset is read from the course server. If you are on campus and connected to the UTSA network, you should be able to download the datafile to your laptop without any difficulty. However, to reach this file if you are off-campus, you will need to connect to the University using the VPN (Virtual Private Network) link that uses the [GlobalProtect software](vpn.uts.edu) which is free for current UTSA students.  

### Example 5 - Step 1: Create a DataFrame for the Apple Quality dataset

The code below downloads the Apple Quality datasete and creates a DataFrame called `df_0`. There are no missing values in this dataset, so we don't have to fill in any missing values. 

All of the data is numerical with the exception of the column `Quality` which can be either `bad` or `good`. The code below uses the Pandas `map()` method to convert the string `bad` to `0` and the string `good` to the value `1`. 


In [None]:
# Example 5 - Step 1: Read data and create DataFrame

df_0 = pd.read_csv(
    "https://biologicslab.co/BIO1173/data/apple_quality.csv", 
    na_values=['NA', '?'])

# Define the mapping dictionary
mapping = {'bad': 0, 'good': 1}

# Map the integer column to strings
df_0['Quality'] = df_0['Quality'].map(mapping)

# Make a copy of the DataFrame for Exercise 6
df_1 = df_0.copy()

# Create variable for latter
appleNum = df_0['A_id']

# Set the max rows and max columns
pd.set_option('display.max_rows', 8)
pd.set_option('display.max_columns', 8)

# Display the DataFrame
display(df_0)

If your code is correct you should see the following output:

![__](http://biologicslab.co/BIO1173/images/class_3_2_Pic1.png)


### Example 5 - Step 2: Assign Independent and Dependent Variables

In [Regression Analysis](https://en.wikipedia.org/wiki/Regression_analysis), the goal is to estimate the relationships between a [dependent variable](https://en.wikipedia.org/wiki/Dependent_and_independent_variables) (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one or more [independent variables](https://en.wikipedia.org/wiki/Dependent_and_independent_variables)  (often called 'predictors', 'covariates', 'explanatory variables' or 'features').  

With multiple independent variables, the equation for a linear regression is 

> $ y_{i} = \alpha + \beta x_{i,1} + \beta  x_{i,2} + \beta x_{i,n} $

where $y_{i}$ is the **_independent_** variable and $x_{i,}$ are the **_dependent_** variables. In the words of "machine learning", the $y$-value is the **_response_** variability that we are trying to predict, while the $x$-values are the **_features_** that we are using to predict the value of $y$. While we know already know the values for $x$ and $y$, what we don't know is the values of the coefficients, $\alpha$ and the $\beta$ for each category of independent variable. 

In this example we will build a neural network that can "learn" the relationships that exists betweeen different values of $x$ with the different values of $y$. In short, our neural network **_solves_** the multiple regression equation through trial-and-error. 

In Step 2 below, we define the $x$-values to be the values contained in the `df_0` DataFrame columns, `Size`, `Weight`, `Sweetness`, `Crunchiness`, `Juiciness`, `Acidity` and `Quality`. The code that does this is:

~~~text
# Assign the independent variables (x)
x_0 = df_0[['Size', 'Weight', 'Sweetness', 'Crunchiness',
       'Juiciness', 'Acidity', 'Quality']].values
~~~

The code also assigns the numbers in the column `Ripeness` to the response variable $y$ using this line of code:

~~~text
# Assign the dependent variables (y)
y_0 = df_0['Ripeness'].values
~~~

In both lines of code, we are using the Pandas method `.values()` to convert the numerical values in the `df_0` DataFrame into Numpy arrays. The last line of code prints out the Numpy array created for the $y$-values.

In [None]:
# Example 5 - Step 2: Assign dependent and independent variables

# Assign the independent variables (x)
x_0 = df_0[['Size', 'Weight', 'Sweetness', 'Crunchiness',
       'Juiciness', 'Acidity', 'Quality']].values

# Assign the dependent variables (y)
y_0 = df_0['Ripeness'].values

# Print y_0 values
print(y_0)

If you compare the Numpy array printed above for the $y$-values, you will see that they are exactly the same as the `Ripeness` values that were printed out in Example 5 - Step 1 above.

### Example 5 - Step 3: Use Keras to Build a Neural Network

The next step is to build our neural network. Using Keras this step is relatively easy -- once you understand what the various commands mean.

We begin by telling Keras what kind of model we want. There are actually several different types of neural networks including:

* **Feedforward Neural Network (FNN):** The simplest type, where information flows from the input layer directly through hidden layers to the output layer without loops. It’s often used for classification or segmentation tasks.
* **Autoencoder:**  An unsupervised learning model that aims to reconstruct its own inputs. It’s used for dimensionality reduction and learning generative models.
* **Probabilistic Neural Network (PNN):**  A four-layer feedforward network that approximates class probability distributions using Parzen windows and non-parametric functions.
* **Recurrent Neural Network (RNN):**  Contains loops to allow information to persist over time. Useful for sequence data and time-series analysis.
* **Long Short-Term Memory (LSTM):**  A specialized RNN architecture that addresses the vanishing gradient problem and can learn long-term dependencies.
* **Convolutional Neural Network (CNN):**  Designed for image and spatial data, using convolutional layers to extract features.
* **Radial Basis Function Neural Network (RBFNN):**  Utilizes radial basis functions for activation, often used in function approximation.
* **Modular Neural Network:**  Composed of interconnected modules, each handling specific subtasks.
* **Generative Adversarial Network (GAN):**  Consists of a generator and a discriminator, used for generating new data samples.

For our regression model we only need a simple Feedforward Neural Network (FNN). In Keras this type of network is called `Sequential()`.


~~~text
# Specify the model type as sequential
model_0 = Sequential()
~~~

In our model we are going to have 4 layers:

* one input layer
* two hidden layers
* one output layer 

The next step in building our neural network is to tell Keras to "add" a layer containing `25` neurons using the Keras model.add() function. This layer is actually the the first _hidden_ layer. However, it also tells Keras to add an input layer because it contains the **input_dim** parameter. This parameter tells Keras the number of inputs the dataset has. In this example there are 7 different inputs: 'Size', 'Weight', 'Sweetness', 'Crunchiness','Juiciness', 'Acidity', and 'Quality'. The code:

> `input_dim=x_0.shape[1]`

tells Keras that the network will need 7 input neurons: one input neuron for every column in the data set.

Here is the code that adds the first hidden layer:

~~~text
# Add the first hidden layer with 25 neurons
model_0.add(Dense(25, input_dim=x_0.shape[1], activation='relu')) 
~~~

For the input layer we specify the `relu` type of activation with the code fragment `activation='relu'`. The activation type specifies how a specific **input** into a particular neuron is converted into an **output** that is sent on to all the neurons in the next layer.

Next, we tell Keras that we want a second hidden layer with 10 neurons. In general, the more hidden layers a neural network has, the "deeper the learning" the neural network is capable of obtaining. For example, the neural network that forms [ChatGPT](https://datascience.stackexchange.com/questions/118273/specifics-about-chatgpts-architecture#:~:text=Number%20of%20layers%3A%2096%20Number%20of%20attention,heads%3A%2096%20Dimensions%20of%20its%20hidden%20layers%3A%2012288) has **12288** hidden layers.  

The code for adding the second hidden layer with 10 neurons is:

~~~text
# Add the second hidden layer with 10 neurons
model_0.add(Dense(10, activation='relu'))
~~~

As with the first hidden layer, we will use the `relu` type of activation. The argument `Dense` tells Keras that **_every_** neuron in the first hidden layer should be connected to **_every_** neuron in the second hidden layer as well as in the output layer.

The output layer is where we will "find" our answer. In this case, the single neuron in the output layer will learn to "predict" the value of the dependent variable ($y$) given different values of the independent variables ($x's)$. 

In other words, the numerical value in the output neuron, at the end of training, will predict the `Ripeness` of a particular apple given its 'Size', 'Weight', 'Sweetness', 'Crunchiness','Juiciness', 'Acidity', and 'Quality'. 

Notice that we don't specify an activation type for the output layer, since this the last neuron in the sequence.

~~~text
# Add the output layer with 1 neuron
model_0.add(Dense(1))
~~~

Once we have specified all of the different layers that we want in our model, the next step is to **_compile_** the model. The compile step sets up the framework for your model. It involves:

* Checking for format errors.
* Defining the loss function, which quantifies how well the model’s predictions match the actual target values.
* Choosing an optimizer (such as stochastic gradient descent) or setting the learning rate.
* Selecting metrics to evaluate the model’s performance during training.

In our model, we will select the [mean squared error](https://en.wikipedia.org/wiki/Mean_squared_error) as the loss function, and 'adam' as the optimizer. The Adam optimizer is a popular algorithm used in deep learning that helps adjust the parameters of a neural network in real-time to improve its accuracy and speed. Adam stands for _Adaptive Moment Estimation_, which means that it adapts the learning rate of each parameter based on its historical gradients and momentum. 

~~~text
# Complile the model with MSE loss function and Adam optimizer
model_0.compile(loss='mean_squared_error', optimizer='adam')
~~~

The last line of code uses the Keras `model.summary()` function to print out a summary of our model.

In [None]:
# Example 5 - Step 3: Buid the neural network 

# Specify the model type as sequential
model_0 = Sequential()

# Add the first hidden layer with 25 neurons
model_0.add(Dense(25, input_dim=x_0.shape[1], activation='relu')) 

# Add the second hidden layer with 10 neurons
model_0.add(Dense(10, activation='relu'))

# Add the output layer with a single neuron
model_0.add(Dense(1))

# Complile the model with MSE loss function and Adam optimizer
model_0.compile(loss='mean_squared_error', optimizer='adam')

# Print a summary of the model
model_0.summary()

If your code is correct you should see the following output:

~~~text
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 25)                200       
                                                                 
 dense_1 (Dense)             (None, 10)                260       
                                                                 
 dense_2 (Dense)             (None, 1)                 11        
                                                                 
=================================================================
Total params: 471
Trainable params: 471
Non-trainable params: 0
_________________________________________________________________
~~~

As you can see, our "compiled" model has 3 layers that are densely connected, meaning that each neuron in the first hidden layer is connect to every neuron in the second layer, and every neuron in the second hidden layer is connected to every neuron in the output layer. (The input layer is not included in the summary).

The "Trainable parameters" are the **_biases_** and the **_weights_** of all of the connections between the neurons. During "fitting" (training), the model will adjust the weights of the connections, after each _epoch_ in an effort to improve the model's ability to predict the dependent variables.

### Example 5 - Step 4: "Fit" the Model

In machine learning, the term "fit the model" means to **_train_** the model using the data. During training, the model learns from the data to adjust its parameters (weights and biases) to minimize the loss function.

The **fit** step involves:

* Forward passes (feeding input data through the network).
* Backward passes (calculating derivatives using backpropagation).
* Updating weights based on gradients to improve predictions.

The fit step is by far the most computationally demanding step. This is where GPU's and TPU's are used to speed up the training. With relatively small neural networks like this one, a relatively modern central processing unit (CPU) can handled all the computations involved in a reasonable time period. 

The command:

~~~text
# Fit the model to the data
model_0.fit(x_0,y_0,verbose=2,epochs=100)
~~~

has the following 4 arguments: 

* the $x$-values
* the $y$-values
* the level of verbosity (how much feedback should be printed out during training)
* the number of `epochs`

In Step 4, the number of epoch is set to `100`. An epoch means training the neural network with all the training data for one cycle. In an epoch, all of the data is used exactly once. A forward pass and a backward pass together are counted as one pass. 

With the verbosity set to `2`, Keras will print out the loss value, the number of milliseconds the epoch required, and the time per step. 

In [None]:
# Example 5 - Step 4 Fit the model 

# Fit the model to the data
model_0.fit(x_0,y_0,verbose=2,epochs=100)

Notice that the **loss value** decreases from `3.3706` after the 1st epoch (`Epoch 1/100`), to less than half that amount, `1.5958` after the 6th epoch (`Epoch 6/100`). 

This decrease in loss is due to the neural network **_learning_**. After each epoch, the network makes slight adjustments in the network's _trainable parameters_ (i.e. biases and connection weights), and runs the complete dataset through the model again to see if the updated parameters do a better job of predicting the `Ripeness` of each apple in the dataset.

~~~text
Epoch 1/100
125/125 - 1s - loss: 3.3706 - 784ms/epoch - 6ms/step
Epoch 2/100
125/125 - 0s - loss: 2.2415 - 441ms/epoch - 4ms/step
Epoch 3/100
125/125 - 0s - loss: 1.9131 - 442ms/epoch - 4ms/step
Epoch 4/100
125/125 - 0s - loss: 1.7376 - 410ms/epoch - 3ms/step
Epoch 5/100
125/125 - 0s - loss: 1.5958 - 410ms/epoch - 3ms/step
Epoch 6/100
~~~

### **Exercise 5: Simple Tensorflow Regression**

For **Exercise 5** you are to build the same 4 layer Feed Forward Neural network (FNN) demonstrated in Example 5. Use the DataFrame copy, `df_1`, made in Example 5 - Step 1 as your dataset. Therefore, your code should completely skip Example 5 -Step 1 and start with Step 2 where you define the values for the independent and dependent variables. 

For **Exercise 5** your goal is to build a neural network that can predict the `Quality` of a particular apple (instead of `Ripeness`). Therefore, `Quality` will be the _dependent_ variable, $y$.  

You want your network to predict an apple's quality (either "bad" or "good") based on it's Size, Weight, Sweetness, Crunchiness, Juiciness, Acidity, and Ripeness. Therefore, values in these columns will be the _independent_ variables, $x$. 

To keep things straight, call your dependent variables `x_1`, your independent variables `y_1` and your neural network model `model_1`. 

Instead of breaking your code into several smaller code cells (as was done in Example 5), your code should be in a **_single_** code cell.  


In [None]:
# Insert your code for Exercise 5 here



If your code for **Exercise 5** is correct, the output from your **Exercise 5** should start with:

~~~text
[1 1 0 ... 0 1 1]
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_3 (Dense)             (None, 25)                200       
                                                                 
 dense_4 (Dense)             (None, 10)                260       
                                                                 
 dense_5 (Dense)             (None, 1)                 11        
                                                                 
=================================================================
Total params: 471
Trainable params: 471
Non-trainable params: 0
_________________________________________________________________
Epoch 1/100
125/125 - 1s - loss: 0.7681 - 730ms/epoch - 6ms/step
Epoch 2/100
125/125 - 0s - loss: 0.2139 - 449ms/epoch - 4ms/step
Epoch 3/100
125/125 - 0s - loss: 0.1676 - 436ms/epoch - 3ms/step
Epoch 4/100
125/125 - 1s - loss: 0.1475 - 526ms/epoch - 4ms/step
Epoch 5/100
125/125 - 0s - loss: 0.1354 - 399ms/epoch - 3ms/step
Epoch 6/100
125/125 - 0s - loss: 0.1263 - 399ms/epoch - 3ms/step
~~~

and end with something similar to the following:

~~~text
Epoch 95/100
125/125 - 0s - loss: 0.0543 - 410ms/epoch - 3ms/step
Epoch 96/100
125/125 - 0s - loss: 0.0539 - 421ms/epoch - 3ms/step
Epoch 97/100
125/125 - 0s - loss: 0.0530 - 409ms/epoch - 3ms/step
Epoch 98/100
125/125 - 0s - loss: 0.0534 - 439ms/epoch - 4ms/step
Epoch 99/100
125/125 - 0s - loss: 0.0528 - 499ms/epoch - 4ms/step
Epoch 100/100
125/125 - 1s - loss: 0.0531 - 550ms/epoch - 4ms/step

<keras.callbacks.History at 0x20090ca8f10>
~~~

## Introduction to Neural Network Hyperparameters

If you look at the above code, you will see that the neural network contains four layers. The first layer is the input layer because it contains the **input_dim** parameter that the programmer sets to be the number of inputs the dataset has. The network needs one input neuron for every column in the data set (including dummy variables).  

There are also several hidden layers, with 25 and 10 neurons each. You might be wondering how the programmer chose these numbers. Selecting a hidden neuron structure is one of the most common questions about neural networks. Unfortunately, there is no right answer. These are hyperparameters. They are settings that can affect neural network performance, yet there are no clearly defined means of setting them.

In general, more hidden neurons mean more capability to fit complex problems. However, too many neurons can lead to overfitting and lengthy training times. Too few can lead to underfitting the problem and will sacrifice accuracy. Also, how many layers you have is another hyperparameter. In general, more layers allow the neural network to perform more of its feature engineering and data preprocessing. But this also comes at the expense of training times and the risk of overfitting. In general, you will see that neuron counts start larger near the input layer and tend to shrink towards the output layer in a triangular fashion. 

Some techniques use machine learning to optimize these values. These will be discussed later in this course.

## Controlling the Amount of Output

The program produces one line of output for each training epoch. You can eliminate this output by setting the verbose setting of the fit command:

* **verbose=0** - No progress output (use with Jupyter if you do not want output).
* **verbose=1** - Display progress bar, does not work well with Jupyter.
* **verbose=2** - Summary progress output (use with Jupyter if you want to know the loss at each epoch).

## Regression Prediction

Next, we will perform actual predictions. The program assigns these predictions to the **pred** variable. For Example 5, these will be predictions of apple **Ripeness** from the neural network; For Exercise 5, these will be predictions of apple **Quality** from the neural network. 

### Example 6: Use model to make predictions

The code in the cell below uses Keras' `model.predict()` function to predict the `Ripeness` of each of the 4000 apples in the Apple Quality dataset based on its 'Size', 'Weight', 'Sweetness', 'Crunchiness','Juiciness', 'Acidity', and 'Quality'. The predictions are stored in a variable called `pred_0`. 

Keep in mind that these `Ripeness` **_predictions_** are being made by the neural network model, `model_0` after it was trained ('fitted') to the dataset. 

In [None]:
# Example 6: Predict the Ripeness of each apple in the dataset

# Use model_0 to make Ripeness predictions
pred_0 = model_0.predict(x_0)

# Print out the shape of pred_0
print(f"Shape of pred_0: {pred_0.shape}")

# Print out the Ripeness predictions
print(pred_0[0:10])

If your code is correct you should see something **_similiar_** to the following output:

~~~text
125/125 [==============================] - 0s 1ms/step
Shape of pred_0: (4000, 1)
[[ 0.41945717]
 [ 0.7749809 ]
 [ 0.05281208]
 [-3.345723  ]
 [-1.6600199 ]
 [ 1.6624726 ]
 [-2.3613348 ]
 [ 1.0669048 ]
 [ 4.800175  ]
 [ 2.9758077 ]]
~~~

Notice that this is a 2D array? You can always see the dimensions of what Keras returns by printing out **pred.shape**. Neural networks can return multiple values, so the result is always an array. Here the neural network only returns one value per prediction (there are 4000 apples, so 4000 predictions). However, a 2D range is needed because the neural network has the potential of returning more than one value. 

### **Exercise 6: Use model to make predictions**

In the cell below, use Keras' `model.predict()` function to predict the `Quality` of each of the 4000 apples in the Apple Quality dataset based on its 'Size', 'Weight', 'Sweetness', 'Crunchiness','Juiciness', 'Acidity', and 'Ripeness'. Store these  predictions in a variable called `pred_1`. 

Remember that the strings values in the `Quality` column were previously mapped to integers in Example 5-Step 1. So a `Quality` prediction near `0` would indicate `bad` while a prediction near `1` would indicate `good`.

In [None]:
# Insert your code for Exercise 6 here



If your code is correct you should see something **_similar_** to the following output:

~~~text
125/125 [==============================] - 0s 2ms/step
Shape of pred_1: (4000, 1)
[[ 0.89541984]
 [ 1.0507172 ]
 [ 0.00844564]
 [ 1.0844108 ]
 [ 0.69040906]
 [-0.00781254]
 [ 0.9231552 ]
 [ 0.9487672 ]
 [ 0.00710112]
 [-0.16769826]]
~~~

### Example 7A: Determine the accuracy of the model's predictions

An obvious question is how good are the neural network's predictions?  Since we know the correct `Ripeness` for each apple in the dataset, we can measure how close each neural network prediction was to the actual value.

A common measure in regression analysis is the [Root-mean-square error (RMSE)](https://en.wikipedia.org/wiki/Root-mean-square_deviation).

RMSE measures of the differences between predicted values and true values. The code in the cell below computes the RMSE of the `Ripeness` predictions made by `model_0` with the actual `Ripeness` values in the Apple Quality dataset. The RMSE is stored in a variable called `score_0`.


In [None]:
# Example 7A: Determine the RMSE for model_0

# Measure RMSE error
score_0 = np.sqrt(metrics.mean_squared_error(pred_0,y_0))

# Print out the RSME
print(f"Final score (RMSE) for model_0: {score_0}")

So what does the RMSE value printed out above mean?  

RMSE is always non-negative.  A value of 0 (almost never achieved in practice) would indicate a perfect fit to the data. 

In general, a lower RMSE is better than a higher one. However, comparisons across different types of data would be invalid because the measure is dependent on the scale of the numbers used.

### Example 7B: Compare predictions to actual values

The number printed above is the average number of predictions made by `model_0` that were above or below the expected output. 

We can also print out the first ten apples with their `Ripeness` predictions and their actual `Ripeness` visually compare these values together.

In [None]:
# Example 7B: Print out predictions and actual values

# Use for loop for printing values
for i in range(10):
    print(f"{i+1}. Apple number: {appleNum[i]}, Ripeness: {y_0[i]}, " 
          + f"predicted Ripeness: {pred_0[i]}")

If your code is correct you should see an output that is similar to the following:

~~~text
1. Apple number: 0, Ripeness: 0.329839797, predicted Ripeness: [-0.09674692]
2. Apple number: 1, Ripeness: 0.867530082, predicted Ripeness: [0.6076343]
3. Apple number: 2, Ripeness: -0.038033328, predicted Ripeness: [0.05641791]
4. Apple number: 3, Ripeness: -3.413761338, predicted Ripeness: [-3.7153735]
5. Apple number: 4, Ripeness: -1.303849429, predicted Ripeness: [-1.446837]
6. Apple number: 5, Ripeness: 1.914615916, predicted Ripeness: [1.9780008]
7. Apple number: 6, Ripeness: -1.847416733, predicted Ripeness: [-1.7325082]
8. Apple number: 7, Ripeness: 0.974437858, predicted Ripeness: [1.2185928]
9. Apple number: 8, Ripeness: 4.080920787, predicted Ripeness: [3.857073]
10. Apple number: 9, Ripeness: 1.620856772, predicted Ripeness: [2.3414161]
~~~

Clearly, `model_0` is doing a reasonable, but definitely **not** a perfect job of predicting the `Ripeness` of an individual apple. 

### **Exercise 7A: Determine the quality of the model's predictions**

In **Exercise 7A**, write the code to compute the [Root-mean-square error (RMSE)](https://en.wikipedia.org/wiki/Root-mean-square_deviation) for apple `Quality` predicted by `model_1`. Call the variable holding RMSE `score_1`. Print out the RSME for `model_1`.


In [None]:
# Insert your code for Exercise 7A here



If your code is correct you should see something **_similar_** to the following output:

~~~text
Final score (RMSE) for model_1: 0.22579524176257046
~~~

### **Exercise 7B: Compare predictions to actual values**

In the cell below write the code to print out predictions made by `model_1` for the `Quality` of the first 10 apples as well as their actual `Quality` values, side-by-side. 

In [None]:
# Insert your code for Exercise 7B here



If your code is correct you should see an output that is something **_similiar_** to the following:

~~~text
1. Apple number: 0, Quality: 1, predicted Quality: [0.98019767]
2. Apple number: 1, Quality: 1, predicted Quality: [1.0774481]
3. Apple number: 2, Quality: 0, predicted Quality: [0.04945722]
4. Apple number: 3, Quality: 1, predicted Quality: [0.92150337]
5. Apple number: 4, Quality: 1, predicted Quality: [0.96303636]
6. Apple number: 5, Quality: 0, predicted Quality: [-0.057796]
7. Apple number: 6, Quality: 1, predicted Quality: [1.0382084]
8. Apple number: 7, Quality: 1, predicted Quality: [0.9843851]
9. Apple number: 8, Quality: 0, predicted Quality: [0.17345132]
10. Apple number: 9, Quality: 0, predicted Quality: [-0.16333194]
~~~

Compared to `model_0`, `model_1` seems to be able to make more accurate predictions. This is consistent with the observation that the RSME for `model_1` was smaller than the RMSE for `model_0`.  

## Simple TensorFlow Classification

**_Classification_** is how a neural network attempts to classify the input into one or more classes.  The simplest way of evaluating a classification network is to track the percentage of training set items classified incorrectly.  

We typically score human results in this manner.  For example, you might have taken multiple-choice exams in school in which you had to shade in a bubble for choices A, B, C, or D.  If you chose the wrong letter on a 10-question exam, you would earn a 90%.  In the same way, we can grade computers; however, most classification algorithms do not merely choose A, B, C, or D.  Computers typically report a classification as their percent confidence in each class.  Figure 3.EXAM shows how a computer and a human might respond to question number 1 on an exam.

**Figure 3.EXAM: Classification Neural Network Output**
![Classification Neural Network Output](https://biologicslab.co/BIO1173/images/class-multi-choice.png)

As you can see, the human test taker marked the first question as "B." However, the computer test taker had an 80% (0.8) confidence in "B" and was also somewhat sure with 10% (0.1) on "A." The computer then distributed the remaining points to the other two.  

In the simplest sense, the machine would get 80% of the score for this question if the correct answer were "B." The computer would get only 5% (0.05) of the points if the correct answer were "D." 

We previously saw how to train a neural network to predict either the `Ripeness` or the `Quality` of an apple. In this last section, we will now see how to build a neural network to predict a **_class_**. In particular, we are going to build a neural network that can predict the **_species_** of Iris flower, based either on it's sepal dimensions or on its petal dimensions. 

The code to classify Iris flowers is similar to the example above; however, there are several important differences:

* The output neuron count matches the number of classes (in the case of Iris, 3).
* The Softmax transfer function is utilized by the output layer.
* The loss function is cross entropy.
  

## Example 8: Predict Iris Species

As was done in Example 7, the code for Example 8 will be divided into a series of steps to make it easier to follow the code logic.


### Example 8 - Step 1: Generate independent and dependent variables

As was done in Example 7, the code for Example 8 will be divided into a series of steps to make it easier to follow the code logic. 

In Step 1, the Iris flower dataset is read and stored in a DataFrame called `irisDF_0`.  The independent variables that we will use for this model will be the limited to the dimensions of the flower's **_sepals_**, i.e., the flower's sepal length and the flower's sepal width. These values will be stored in the variable `irisX_0`. The code for creating `irisX_0` is:

~~~text
# Create the independent variables (x) from SEPAL dimensions
irisX_0 = irisDF_0[['sepal_length', 'sepal_width']].values
~~~

In the Apple Quality dataset, the `species` column notes the name of each flower as a string value, either `Iris-setosa`,  `Iris-versicolor` or  `Iris-virginica`. Before we can use the data in the `species` column as our dependent variable, we will need to One-Hot Encode it. The code that does this is as follows: 

~~~text
# One-hot encode the Iris species
dummies = pd.get_dummies(irisDF_0['species'], dtype=int) # Classification
species = dummies.columns
irisY_0 = dummies.values
~~~

To make sure the One-Hot encoding worked correctly, the last line of code prints the values of the `dummies`.

In [None]:
# Example 8 - Step 1: Generate independent and dependent variables 

# Read the dataset into the DataFrame irisDF
irisDF = pd.read_csv(
    "http://biologicslab.co/BIO1173/data/iris.csv", na_values=['NA', '?'])

# Create the independent variables (x) from SEPAL dimensions
irisX_0 = irisDF[['sepal_length', 'sepal_width']].values

# One-hot encode the Iris species
dummies = pd.get_dummies(irisDF['species'], dtype=int) # Classification
species = dummies.columns
irisY_0 = dummies.values

# Print out the first 10 values in irisY_0
print(irisY_0[0:10])


If your code is correct you should see the following output:

~~~text
[[1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]]
~~~

### Example 8 - Step 2: Build the neural network

As mentioned above, the neural network that we will use to classify Iris flowers based on their sepal dimensions, is very similar to the neural network in Example 7 with the following exceptions:

* The output neuron count matches the number of classes (in the case of Iris, 3).
* The `Softmax` transfer function is utilized by the output layer.
* The loss function is `cross entropy`.

The code in the cell below builds a 4-layer neural network called `irisModel_0`. Notice that the input layer is specified by `input_dim=irisX_0.shape[1]`

In [None]:
# Example 8 - Step 2: Build the neural network

# Build neural network
irisModel_0 = Sequential()
irisModel_0.add(Dense(50, input_dim=irisX_0.shape[1], activation='relu')) # Hidden 1
irisModel_0.add(Dense(25, activation='relu')) # Hidden 2
irisModel_0.add(Dense(irisY_0.shape[1],activation='softmax')) # Output

# Compile the model
irisModel_0.compile(loss='categorical_crossentropy', optimizer='adam')

# Print the model summary
irisModel_0.summary()


If your code is correct you should see the following output:

~~~text
Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_6 (Dense)             (None, 50)                150       
                                                                 
 dense_7 (Dense)             (None, 25)                1275      
                                                                 
 dense_8 (Dense)             (None, 3)                 78        
                                                                 
=================================================================
Total params: 1,503
Trainable params: 1,503
Non-trainable params: 0
___________________________
~~~

### Example 8 - Step 3: Fit the neural network

Since the compile step didn't report any errors, we can proceed to **_train_** ("fit") the network. The code below fits the model to the independent variables `irisX_0` and the dependent variables `irisY_0` for 100 epochs.

In [None]:
# Example 8 - Step 3: Fit the neural network

# Train the model for 100 epochs
irisModel_0.fit(irisX_0,irisY_0,verbose=2,epochs=100)

If your code is correct you should see something similar to the following:

~~~text
Epoch 1/100
5/5 - 0s - loss: 1.2033 - 298ms/epoch - 60ms/step
Epoch 2/100
5/5 - 0s - loss: 1.0882 - 38ms/epoch - 8ms/step
Epoch 3/100
5/5 - 0s - loss: 1.0726 - 51ms/epoch - 10ms/step

....

Epoch 95/100
5/5 - 0s - loss: 0.4981 - 20ms/epoch - 4ms/step
Epoch 96/100
5/5 - 0s - loss: 0.4973 - 18ms/epoch - 4ms/step
Epoch 97/100
5/5 - 0s - loss: 0.4976 - 20ms/epoch - 4ms/step
Epoch 98/100
5/5 - 0s - loss: 0.4985 - 19ms/epoch - 4ms/step
Epoch 99/100
5/5 - 0s - loss: 0.4954 - 18ms/epoch - 4ms/step
Epoch 100/100
5/5 - 0s - loss: 0.4944 - 21ms/epoch - 4ms/step

<keras.callbacks.History at 0x1d40c7cdf40>
~~~

### Example 8 - Step 4: Print out number of species found

The code below prints out the names of the species that are included in the dataset.

In [None]:
# Example 8 - Step 4: Print out number of species found

print(species)

If your code is correct you should see

~~~text
Index(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'], dtype='object')
~~~

### Example 8 - Step 5: Print out the predictions

Now that we have trained our neural network, `irisModel_0`, we would like to be able to use it. As before, we will generate predictions. Instead of given us a single prediction, as the neural network model did in Example 7 and **Exercise 7**, our new model will make **_3 predictions_**, one prediction for each **_class_** in the dependent variable. Since there are 3 Iris species, each time we ask `irisModel_0` to make a prediction, it will predict the probability of the unknown flower is (1) _Iris setosa_, (2) _Iris versicolor_, and (3) _Iris virginica_.

In [None]:
# Example 8 - Step 5: Print out the predictions

# Compute the model predictions
irisPred_0 = irisModel_0.predict(irisX_0)

# Change print from scientific notation
np.set_printoptions(suppress=True)

# Print out the results
print(f"Shape of irisPred_0: {irisPred_0.shape}")
print(irisPred_0[0:10])

If your code is correct your should see something similiar to the following:

~~~text
5/5 [==============================] - 0s 4ms/step
Shape of irisPred_0: (150, 3)
[[0.97666174 0.01666728 0.00667099]
 [0.7832944  0.14594442 0.0707612 ]
 [0.96728146 0.02339293 0.00932566]
 [0.95840997 0.02962664 0.01196341]
 [0.99154884 0.00618186 0.0022694 ]
 [0.9940876  0.00432969 0.00158269]
 [0.9932969  0.00494042 0.00176264]
 [0.9702553  0.02116432 0.00858041]
 [0.93333316 0.04713726 0.01952958]
 [0.8738986  0.08644547 0.03965589]]
~~~

Each row of numbers represents the model's prediction for one Iris flower in the dataset. The first column represents the **_probability_** that the flower's species is _I. setosa_, the second column is the probability for _I. versicolor_, and the third column is the probability for _I. virginica_. 

You should note two things about these predictions. First, generally, one column has a significantly higher  probability than the other two columns. For the data above, the values in the first column are significantly higher, so these first ten flowers are all _Iris setosa_. (This is what happens when you don't suffle the data!).

The second thing to notice is that 3 probabilites for any particular flower always add up to 1. For example, the cell below adds the probabilites for the second flower: 

In [None]:
# Add up the probabilites
0.7832944 + 0.14594442 + 0.0707612

As you can see, it's pretty close to 1. That makes sense since there is a 100% probability that any flower will be one of the three possible species in the dataset.

### Example 8 - Step 6: Print out the actual values

For comparison, we can print out the actual values of the first 10 flowers in the dataset as shown in the cell below. These values were stored in the variable `irisY_0`. 

In [None]:
# Example 8 - Step 6: Print out the actual values

print((irisY_0[0:10]))

### Example 8 - Step 7: Print out predicted and expected values

Usually, the program considers the column with the highest prediction to be the prediction of the neural network.  It is easy to convert the predictions to the expected Iris species.  The `np.argmax()` function finds the index of the maximum prediction for each row.

In [None]:
# Example 8 - Step 7: Print out predicted and expected values

# Find the maximum prediction for each row
irisPredict_0_classes = np.argmax(irisPred_0,axis=1)

# Find the expected value for each row
irisExpected_0_classes = np.argmax(irisY_0,axis=1)

# Print out the results
print(f"Predictions: {irisPredict_0_classes}")
print(f"Expected: {irisExpected_0_classes}")

If your code is correct you should see something similar to the following:

~~~text
Predictions: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 1 0 0 0 0 0 0 0 0 2 1 2 1 2 1 1 1 2 1 1 1 2 1 1 1 1 1 2 1 1 1 2 1
 1 2 2 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 2 2 1 1
 2 2 1 1 1 1 2 2 2 2 1 2 2 1 2 1 1 2 2 2 2 2 2 2 2 1 1 1 2 1 2 1 1 1 2 2 1
 1 1]
Expected: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]
~~~

### Example 8 - Step 8: Convert index values into species names

It is not too difficult to turn these indexes back into the names of the Iris species. We use the species list `species` that we created earlier.

In [None]:
# Example 8 - Step 8: Convert index values into species names

print(species[irisPredict_0_classes[1:10]])

If your code is correct you should see:
~~~text
Index(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa'],
      dtype='object')
~~~
As mentioned above, the first 10 flowers were all _I. setosa_. 

### Example 8 - Step 9: Compute the accuracy score

Accuracy might be a more easily understood error metric.  It is essentially a test score.  For all of the Iris predictions, what percent were correct?  The downside is it does not consider how confident the neural network was in each prediction.

The code in the cell below uses the `accuracy_score()` function from the **scikit-learn** package (nickname `sklearn`), to compute the accuracy of the predicitions made by `irisModel_0`' and stores this value in a variable called `irisCorrect_0`. 

In [None]:
# Example 8 - Step 9: Compute the accuracy score

from sklearn.metrics import accuracy_score

# Compute accuracy
irisCorrect_0 = accuracy_score(irisExpected_0_classes,irisPredict_0_classes)

# Print out the results
print(f"Accuracy: {irisCorrect_0}")

The accuracy of the `irisModel_0` is roughly 75% accurate. 

### Example 8 - Step 10A: Use the model to make an _ad hoc_ prediction

The code below performs an _ad hoc_ prediction. Suppose we measure the sepal length and width of flower from an unknown Iris species. We can "feed" this information into our _trained_ neural network model and ask it to predict was species the flower came from. This is an example of an _ad hoc_ prediction. 

In [None]:
# Example 8 - Step 10A: Use the model to make an ad hoc prediction

# Specify the sepal length and width for an unknown Iris flower
sample_flower = np.array( [[6.6,2.9]], dtype=float)

# Use the neural network to predict the species
irisPred = irisModel_0.predict(sample_flower)

# Print out the results
print(irisPred)
irisPred = np.argmax(irisPred)
print(f"Model predicts that the sepal dimensions {sample_flower} are mostly likely from: {species[irisPred]}")

If your code is correct you should see the following output:

~~~text
1/1 [==============================] - 0s 21ms/step
[[0.00267348 0.47305235 0.5242742 ]]
Model predicts that the sepal dimensions [[6.6 2.9]] are mostly likely from: Iris-virginica
~~~

### Example 8 - Step 10B: Use the model to make two _ad hoc_ predictions

You can also predict two sample flowers. Notice that the **argmax** in the second prediction requires **axis=1**?  Since we have a 2D array now, we must specify which axis to take the **argmax** over.  The value **axis=1** specifies we want the max column index for each row.


In [None]:
# Example 8 - Step 10B: Use the model to make two ad hoc predictions

# Specify the sepal length and width for the two Iris flower
sample_flowers = np.array( [[5.9,2.8],[5.1,3.5]],\
        dtype=float)

# Use the neural network to predict the species
irisPred = irisModel_0.predict(sample_flowers)

# Print out the results
print(irisPred)
irisPred = np.argmax(irisPred, axis=1)
print(f"Model predicts that the sepal dimensions {sample_flowers} are mostly likely from: {species[irisPred]}")


If your code is correct you should see something similiar to the following output:

~~~text
1/1 [==============================] - 0s 20ms/step
[[0.01684416 0.53766084 0.445495  ]
 [0.97666174 0.01666728 0.00667099]]
Model predicts that the sepal dimensions [[5.9 2.8]
 [5.1 3.5]] are mostly likely from: Index(['Iris-versicolor', 'Iris-setosa'], dtype='object')
~~~

## **Exercise 8: Predict Iris Species**

For **Exercise 8** you are build the same neural network created above in Example 7. Call your new neural network `irisModel_1`. The primary difference will be the _features_ used to predict the species of an Iris flower. In Example 8, the independent variables ('features') were the _sepal_ dimensions (i.e. sepal length and sepal width). For you new model, `irisModel_1`, your independent variables will be the **_petal_** dimensions (i.e. petal length and petal width). 

### **Exercise 8: Step 1- Generate independent and dependent variables**

Use the cell below to generate your independent and dependent variables. You should called the values of your independent variable `irisX_1` since they will be different than the x values in Example 8. For consistency, you should call the values of your dependent variable `irisY_1`. 

You **don't** need to read the datafile again. You can just re-use the same DataFrame, `irisDF`, created in Example 8-Step 1. The most important step will be the creation of the independent variables, `irisX_1`. Remember, your independent variables are the columns **PETAL LENGTH** and the **PETAL WIDTH**!

You **don't** need to create new dummies or One-Hot encode the dependent variable `irisY_1`. Simply add the following line of code:  `irisY_1` = `irisY_0`. The dependent variable will be the same as it was in Example 8.

To make sure your code is correct, print out the values in `irisY_1`.

In [None]:
# Insert your code for Example 8 - Step 1 here




If your code is correct you should see the following output:

~~~text
[[1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]]
~~~

### **Exercise 8 - Step 2: Build the neural network**

In the cell below write the code to build your neural network. Call your new network `irisModel_1`. Since your new model is being trained on different data, it will make very different predictions than `irisModel_0` after it has been trained.

In [None]:
# Insert your code for Example 8 - Step 2 here



If your code is correct you should see something similar to the following output:

~~~text
Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_9 (Dense)             (None, 50)                150       
                                                                 
 dense_10 (Dense)            (None, 25)                1275      
                                                                 
 dense_11 (Dense)            (None, 3)                 78        
                                                                 
=================================================================
Total params: 1,503
Trainable params: 1,503
Non-trainable params: 0
_________________________________________________________________
~~~


### **Exercise 8 - Step 3: Fit the model**

In the cell below write your code to fit your `irisModel_1` to `irisX_1` and `irisY_1`. Set the verbosity to 2 and set the training to 100 epochs.


In [None]:
# Insert your code for Example 8 - Step 3 here



If your code is correct you should see something similar to the following:

~~~text
Epoch 1/100
5/5 - 0s - loss: 1.1227 - 71ms/epoch - 14ms/step
Epoch 2/100
5/5 - 0s - loss: 0.9542 - 39ms/epoch - 8ms/step
Epoch 3/100
5/5 - 0s - loss: 0.8172 - 61ms/epoch - 12ms/step
Epoch 4/100
5/5 - 0s - loss: 0.7831 - 33ms/epoch - 7ms/step
Epoch 5/100
5/5 - 0s - loss: 0.7126 - 49ms/epoch - 10ms/step
.....

Epoch 95/100
5/5 - 0s - loss: 0.1546 - 16ms/epoch - 3ms/step
Epoch 96/100
5/5 - 0s - loss: 0.1497 - 16ms/epoch - 3ms/step
Epoch 97/100
5/5 - 0s - loss: 0.1502 - 16ms/epoch - 3ms/step
Epoch 98/100
5/5 - 0s - loss: 0.1468 - 16ms/epoch - 3ms/step
Epoch 99/100
5/5 - 0s - loss: 0.1456 - 17ms/epoch - 3ms/step
Epoch 100/100
5/5 - 0s - loss: 0.1438 - 17ms/epoch - 3ms/step

<keras.callbacks.History at 0x1481e80ab20>
~~~

### **Exercise 8 - Step 4: Print out number of species found**

In the cell below print out the names of the three Iris species.


In [None]:
# Insert your code for Example 8 - Step 4 here



If your code is correct you should see the following:

~~~text
Index(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'], dtype='object')
~~~

### **Exercise 8 - Step 5: Print out the predictions**

In the cell below compute the predictions made by `irisModel_1` and store these values in a new variable called `iris_Pred_1`. Print out the predictions for the first 10 Iris flowers.

In [None]:
# Insert your code for Example 8 - Step 5 here



If your code is correct you should see something similar to the following:

~~~text
5/5 [==============================] - 0s 3ms/step
Shape of irisPred_1: (150, 3)
[[0.99408364 0.00583091 0.00008547]
 [0.99408364 0.00583091 0.00008547]
 [0.9955899  0.00433504 0.00007501]
 [0.99186146 0.0080388  0.00009962]
 [0.99408364 0.00583091 0.00008547]
 [0.9902014  0.00965793 0.00014074]
 [0.9955643  0.00435396 0.00008182]
 [0.99186146 0.0080388  0.00009962]
 [0.99408364 0.00583091 0.00008547]
 [0.9890809  0.01081453 0.00010463]]
~~~

### **Exercise 8 - Step 6: Print out the actual values**

In the cell below print out the actual values of the first 10 flowers in the dataset. These values were stored in the variable `irisY_1`. 

In [None]:
# Insert your code for Example 8 - Step 6 here



If your code is correct you should see the following output:

~~~text
[[1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]]
~~~

### **Exercise 8 - Step 7: Print out predicted and expected values**

In the cell below print out the predicted and expected values for `irisModel_1`. Use `np.argmax(irisPred_1, axis=1)` to create a variable called `irisPredict_1_classes`. Also use `np.argmax(irisY_1,axis=1)` to create the variable `irisExpected_1_classes`. Print out the contents of these two variables.

In [None]:
# Insert your code for Example 8 - Step 7 here



If your code is correct you should see something similar to the following:

~~~text
Predictions: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 2 2 2 2
 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]
Expected: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]
~~~

### **Exercise 8 - Step 9: Compute the accuracy score**

In the cell below compute the accuracy score for `irisModel_1` and store the value in a variable called `irisCorrect_1`. Print out the value of `irisCorrect_1`.

In [None]:
# Insert your code for Example 8 - Step 9 here



If your code is correct you should see something similar to the following:

~~~text
Accuracy: 0.9733333333333334
~~~

This accuracy for `irisModel_1` is substantially higher than the accuracy of `irisModel_0` created in Example 8. 

Since _exactly_ the same dataset, as well as the same 4-layer neural network architecture was used for both Example 8 and **Exercise 8**, the best explanation of this difference in accuracy is that **_petal_** dimensions are more different in these 3 Iris species than the differences in their **_sepal_** dimensions. 

You can verify this fact by visual inspect of mean sepal and petal values shown in the Appendix that have been included at the end of this notebook.

### **Exercise 8 - Step 10A: Use the model to make an _ad hoc_ prediction**

Use `irisModel_1` to make an _ad hoc_ prediction of an unknown Iris species that had a petal length of 1.5 cm, and a petal width of 0.3 cm. Print out the model's prediction. 

In [None]:
# Insert your code for Example 8 - Step 10A here



If your code is correct you should see something similar to the following:

~~~text
1/1 [==============================] - 0s 21ms/step
[[0.99389863 0.00600599 0.00009542]]
Model predicts that a flower with petal dimensions [[1.5 0.3]] is: Iris-setosa
~~~

### **Exercise 8 - Step 10B: Use the model to make two _ad hoc_ predictions**

Use `irisModel_1` to make two _ad hoc_predictions. The petal dimensions of the first unknown flower are 4.2 and 1.3 cm and 5.1 and 3.5 cm for the second flower. As above print out the model's predictions.

In [None]:
# Insert your code for Example 8 - Step 10A here




If your code is correct you should see something similar to the following:

~~~text
1/1 [==============================] - 0s 21ms/step
[[0.00292124 0.85300654 0.14407225]
 [0.         0.00028006 0.9997199 ]]
Model predicts the two flowers with petal dimensions [[4.2 1.3]
 [5.1 3.5]] are: Index(['Iris-versicolor', 'Iris-virginica'], dtype='object')
~~~

## **Lesson Turn-in**

When you have completed all of the code cells, and run them in sequential order (the last code cell should be number 44, **not** counting the 4 code cells in the Appendix below), use the **File --> Print.. --> Save to PDF** to generate a PDF of your JupyterLab notebook. Save your PDF as `Class_03_2.lastname.pdf` where _lastname_ is your last name, and upload the file to Canvas.

## Appendix

In the cells below, the Pandas `df.groupby()` function was used to compute the mean values of sepal length, sepal width, petal width, and petal width in the three different Iris flower species, _I. setosa_, _I. versicolor_, and _I. virginica_. 


In [None]:
# Group Iris flower species by mean Sepal Length
sepal_length = irisDF.groupby('species')['sepal_length'].mean()
sepal_length

In [None]:
# Group Iris flower species by mean Sepal Width
sepal_width = irisDF.groupby('species')['sepal_width'].mean()
sepal_width

In [None]:
# Group Iris flower species by mean Petal Length
petal_length = irisDF.groupby('species')['petal_length'].mean()
petal_length

In [None]:
# Group Iris flower species by mean Petal Width
petal_width = irisDF.groupby('species')['petal_width'].mean()
petal_width