<a href="https://colab.research.google.com/github/DavidSenseman/BIO1173/blob/main/Class_03_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---------------------------
**COPYRIGHT NOTICE:** This Jupyterlab Notebook is a Derivative work of [Jeff Heaton](https://github.com/jeffheaton) licensed under the Apache License, Version 2.0 (the "License"); You may not use this file except in compliance with the License. You may obtain a copy of the License at

> [http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

------------------------

# **BIO 1173: Intro Computational Biology**

**Module 3: Introduction to TensorFlow**

* Instructor: [David Senseman](mailto:David.Senseman@utsa.edu), [Department of Biology, Health and the Environment](https://sciences.utsa.edu/bhe/), [UTSA](https://www.utsa.edu/)

### Module 3 Material

* Part 3.1: Deep Learning and Neural Network Introduction
* **Part 3.2: Using Keras to Build Regression Models**
* Part 3.3: Using Keras to Build Classification Models
* Part 3.4: Saving and Loading a Keras Neural Network
* Part 3.5: Early Stopping in Keras to Prevent Overfitting

## Google CoLab Instructions

You MUST run the following code cell to get credit for this class lesson. By running this code cell, you will map your GDrive to ```/content/drive```  and print out your Google GMAIL address. Your Instructor will use your GMAIL address to verify the author of this class lesson.

In [None]:
# YOU MUST RUN THIS CELL FIRST

try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
    from google.colab import auth
    auth.authenticate_user()
    COLAB = True
    print("Note: using Google CoLab")
    import requests
    gcloud_token = !gcloud auth print-access-token
    gcloud_tokeninfo = requests.get('https://www.googleapis.com/oauth2/v3/tokeninfo?access_token=' + gcloud_token[0]).json()
    print(gcloud_tokeninfo['email'])
except:
    print("Note: not using Google CoLab")
    COLAB = False

Make sure your GMAIL address is visible in the output above.

# **Using Keras to Build Regression Models**

[Keras](https://keras.io/) is a **software layer** or API (application interface) that "sits" on top of TensorFlow. This makes buidling neural networks much easier.

For example, the following code uses Keras to add one layer to a neural network called `model`:
~~~text
model.add(Dense(25, input_dim=x_0.shape[1], activation='relu'))
~~~

When you run this line of code, this apparently simple function, `model.add(Dense(25...)` executes a large number of hidden commands "behind the scenes".

For example, here the code Keras uses to just define the class `SimpleDense(Layer)`:

~~~text
class SimpleDense(Layer):
    def __init__(self, units=32):
        super().__init__()
        self.units = units

    # Create the state of the layer (weights)
    def build(self, input_shape):
        self.kernel = self.add_weight(
            shape=(input_shape[-1], self.units),
            initializer="glorot_uniform",
            trainable=True,
            name="kernel",
        )
        self.bias = self.add_weight(
            shape=(self.units,),
            initializer="zeros",
            trainable=True,
            name="bias",
        )

    # Defines the computation
    def call(self, inputs):
        return ops.matmul(inputs, self.kernel) + self.bias

# Instantiates the layer.
linear_layer = SimpleDense(4)

# This will also call `build(input_shape)` and create the weights.
y = linear_layer(ops.ones((2, 2)))
assert len(linear_layer.weights) == 2

# These weights are trainable, so they're listed in `trainable_weights`:
assert len(linear_layer.trainable_weights) == 2
~~~

Rather than asking you to create each individual layer in a network with hundreds lines of Python and Tensorflow code, you will be asked only use a handful of Keras commands.

Unless you are researching entirely new structures of deep neural networks, it is unlikely that you need to program TensorFlow directly. For this class, we will usually use TensorFlow _through_ Keras, rather than directly program in TensorFlow


**Neural Network Classification and Regression**
![Neural Network Classification and Regression](https://biologicslab.co/BIO1173/images/class_2_ann_class_reg.png "Neural Network Classification and Regression")

## Example 1: Simple TensorFlow/Keras Regression

In [Regression Analysis](https://en.wikipedia.org/wiki/Regression_analysis), the goal is to estimate the relationships between a [dependent variable](https://en.wikipedia.org/wiki/Dependent_and_independent_variables) (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one or more [independent variables](https://en.wikipedia.org/wiki/Dependent_and_independent_variables)  (often called 'predictors', 'covariates', 'explanatory variables' or 'features').  

With multiple independent variables, the equation for a linear regression is

> $ Y_{i} = \alpha + \beta X_{i,1} + \beta  X_{i,2} + \beta X_{i,n} $

where $Y_{i}$ is the **_independent_** variable and $X_{i,}$ are the **_dependent_** variables.

In the words of "machine learning", the $y$-value is the _response variable_ that we are trying to predict, while the $X$-values are the **_features_** that we are using to predict the value of $y$. While we know already know the values for $X$ and $Y$, what we don't know is the values of the coefficients, $\alpha$ and the $\beta$ for each category of independent variable.


Example 1 will show how to encode the Apple Quality dataset for regression and predict values. In particular, we will see if we can predict the ripeness of an apple based on an apples's size, weight, sweetness, crunchiness, and other features. Example 1 is divided into 3 steps to make understanding of the coding easier to follow.

### Example 1-Step 1: Read dataset and create a DataFrame.

The first step in most lessons in the course will begin by downloading the dataset from the course HTTPS server, and creating a DataFrame to hold the information. The code in the cell below, downloads the Apple Quality dataset and creates a DataFrame called `ripeDF`. In most Python examples, just the letters `df` is used as the name of a dataframe. In some lessons will adopt this convention, but most of the time we will use a more descriptive name, as in this example, `ripeDF` as the name of our DataFrame.   

As customary, after creating our new DataFrame, we disply the first few rows and columns to make sure the data was read correctly.


In [None]:
# Example 1-Step 1: Read data and create DataFrame

import pandas as pd

# Read the datafile and create DataFrame
ripeDF = pd.read_csv(
    "https://biologicslab.co/BIO1173/data/apple_quality.csv",
    na_values=['NA', '?'])

# Create variable for later
ripeNum = ripeDF['A_id']

# Set the max rows and max columns
pd.set_option('display.max_rows', 8)
pd.set_option('display.max_columns', 8)

# Display the DataFrame
display(ripeDF)

If your code is correct you should see the following output:

![__](http://biologicslab.co/BIO1173/images/class_03/class_03_2_image01a.png)

Notice that the only column that contains non-numeric values is `Quality`.


### Example 1-Step 2: Create Feature Vector

The first step in creating a feature vector is to convert any categorical values (strings) into numerical values. All of the data in the Apple Quality dataset is numeric with the exception of the column `Quality` which has 2 categorical string values, `bad` and `good`.

Instead of using One-Hot Encoding, the code below uses the Pandas `map()` method to convert the string `bad` to `0` and the string `good` to the value `1`.
~~~text
# Define the mapping dictionary
mapping = {'bad': 0, 'good': 1}
# Map the strings to integers
ripeDF['Quality'] = ripeDF['Quality'].map(mapping)
~~~
The next step is the generate the X-values for the regression. In this example we want to use **ALL** of the numeric values in `ripeDF` DataFrame, **except** the values in the column `Ripeness`. This should make sense. We never want to include the Y-values with the X-values.

The $X$-values are the numbers contained in the `ripeDF` DataFrame columns, `Size`, `Weight`, `Sweetness`, `Crunchiness`, `Juiciness`, `Acidity` and `Quality`. One way to extract these **values** and create a Numpy array called `ripeX` is shown in this code chunk:
~~~text
# Generate X-values
ripeX = ripeDF[['Size', 'Weight', 'Sweetness', 'Crunchiness',
       'Juiciness', 'Acidity', 'Quality']].values
ripeX = np.asarray(ripeX).astype('float32')
~~~
Notice the double square brackets `[[ ]]` followed by the word `values`. The command takes the Pandas numerical values in the specified columns and **_converts_** them into a Numpy array called `ripeX`.

After we generate our X-values, we need to make sure all of the values are in the correct format for Keras, which is `float32`. You should **aways** convert your X and Y values to `float32` to avoid errors.


Now we can generate our Y-values. The Y-values are in the `Ripeness` column, and we can use a similar code chunk to extract them into a Numpy array called `ripeY` as shown in this code chunk:
~~~text
# Generate Y-values
ripeY = ripeDF['Ripeness'].values
ripeY = np.asarray(ripeY).astype('float32')
~~~
Again, we are using the Pandas method `.values()` to convert the numerical values in the `ripeDF` DataFrame into Numpy arrays, `ripeX` and `ripeY`. These are the X and Y values that we will use to train our neural network.

The last line of code prints out the Numpy array created for the $Y$-values.

In [None]:
# Example 1-Step 2: Create Feature Vector

import pandas as pd
import numpy as np
from keras.utils import to_categorical

# Convert strings to integers
mapping = {'bad': 0, 'good': 1}  # define mapping
ripeDF['Quality'] = ripeDF['Quality'].map(mapping) # map

# Generate X values
ripeX = ripeDF[['Size', 'Weight', 'Sweetness', 'Crunchiness',
       'Juiciness', 'Acidity', 'Quality']].values
ripeX = np.asarray(ripeX).astype('float32')

# Generate Y values
ripeY = ripeDF['Ripeness'].values
ripeY = np.asarray(ripeY).astype('float32')

# Print Y values
print(ripeY)

If your code is correct, you should see that following output:
~~~text
[ 0.3298398   0.86753008 -0.03803333 ...  4.76385918  0.21448838
 -0.77657147]
~~~

If you compare the Numpy array printed above for the $y$-values, you will see that they are exactly the same as the `Ripeness` values that were printed out in Example 1-Step 1 above.

### Example 1-Step 3: Construct, compile and train the neural network

The last step is to build our neural network, compile and train it. Using Keras these steps are relatively easy -- once you understand what the various commands mean.

We begin by telling Keras what kind of model we want. There are actually several different types of neural networks. In this example we will using a **Feedforward Neural Network (FNN)** which is the simplest type, where information flows from the input layer directly through hidden layers to the output layer without loops. Keras refers to FNNs as "Sequential".

~~~text
# Specify the model type
ripeModel = Sequential()
~~~
For our `ripeModel` model, we are going to have 3 layers:

* one input layer
* one hidden layer
* one output layer

As your might expect, we need to add each layer in the correct order, starting with the input layer.

The input layer is also considered a hidden layer, so it is referred to as `Hidden 1`somewhat special since it needs to have one neuron for each X-value. The line of code for adding the input layer is:
~~~text
ripeModel.add(Input(shape=(ripeX.shape[1],)))  # Input layer
~~~
Pay particular attention to the argument `input_dim=ripeX.shape[1]`. This argument tells Keras exactly how many input neurons should be put into the input layer. When students in the course build neural networks using "copy-and-paste", they often forget to change this parameter which will prevent their model from training.

In this example there are `7` different inputs: 'Size', 'Weight', 'Sweetness', 'Crunchiness','Juiciness', 'Acidity', and 'Quality' so the value of `input_dim=ripeX.shape[1]` would be `7`.

Next, we tell Keras that we want a **_second hidden layer_** with 10 neurons. The code for adding the second hidden layer with 10 neurons is:

~~~text
ripeModel.add(Dense(10, activation='relu'))  # Hidden 2
~~~
The argument `Dense` tells Keras that we want **_every_** neuron in the second hidden layer to be connected to **_every_** neuron in the next layer -- the output layer.

The output layer is where we will "find" our answer. In a regression model, there is only a **_single neuron_** in the output layer. The numerical prediction of `Ripeness` for a given apple, ($Y$) is the numerical value in this one output neuron.  In other words, the numerical value in the output neuron, at the end of training, will predict the `Ripeness` of a particular apple given its 'Size', 'Weight', 'Sweetness', 'Crunchiness','Juiciness', 'Acidity', and 'Quality'.

~~~text
ripeModel.add(Dense(1)) # Output
~~~
Notice that we don't specify an activation type for the output layer, since this the last neuron in the sequence.

Once we have specified all of the different layers that we want in our model, the next step is to **_Compile_** the model. The compile step sets up the framework for your model. It involves:

* Checking for format errors.
* Defining the loss function, which quantifies how well the model’s predictions match the actual target values.
* Choosing an optimizer (such as stochastic gradient descent) or setting the learning rate.
* Selecting metrics to evaluate the model’s performance during training.

In our model, we will select the [mean squared error](https://en.wikipedia.org/wiki/Mean_squared_error) as the loss function, and 'adam' as the optimizer. The Adam optimizer is a popular algorithm used in deep learning that helps adjust the parameters of a neural network in real-time to improve its accuracy and speed. Adam stands for _Adaptive Moment Estimation_, which means that it adapts the learning rate of each parameter based on its historical gradients and momentum.

~~~text
# Complile model
ripeModel.compile(loss='mean_squared_error', optimizer='adam')
~~~

The next line of code uses the Keras `model.summary()` function to print out a summary of our model.
~~~text
# Print model (optional)
ripeModel.summary()
~~~
This step is optional.

The last line of code "fits the model to the data". This is computerspeak for training the model.

~~~text
# Train model
ripeModel.fit(ripeX,ripeY,verbose=2,epochs=100)
~~~

In [None]:
# Example 1-Step 3: Buid the neural network

from keras.models import Sequential
from keras.layers import Dense, Input
from keras.optimizers import Adam


# Specify the model type as sequential
ripeModel = Sequential()

# Add layers
ripeModel.add(Input(shape=(ripeX.shape[1],)))  # Input layer
ripeModel.add(Dense(64, activation='relu'))    # Hidden 1
ripeModel.add(Dense(10, activation='relu'))    # Hidden 2
ripeModel.add(Dense(1))  # Output layer

# Complile model
ripeModel.compile(loss='mean_squared_error', optimizer='adam')

# Print model
ripeModel.summary()

# Train model
ripeModel.fit(ripeX,ripeY,verbose=2,epochs=100)

If your code is correct you should see the following output:

~~~text

Model: "sequential_18"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ dense_37 (Dense)                     │ (None, 64)                  │             512 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_38 (Dense)                     │ (None, 10)                  │             650 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_39 (Dense)                     │ (None, 1)                   │              11 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 1,173 (4.58 KB)
 Trainable params: 1,173 (4.58 KB)
 Non-trainable params: 0 (0.00 B)
Epoch 1/100
125/125 - 1s - 9ms/step - loss: 2.4290
Epoch 2/100
125/125 - 0s - 2ms/step - loss: 1.7537
Epoch 3/100
125/125 - 0s - 2ms/step - loss: 1.5671
Epoch 4/100
125/125 - 0s - 2ms/step - loss: 1.4324
Epoch 5/100
125/125 - 0s - 2ms/step - loss: 1.3570

........................

Epoch 95/100
125/125 - 0s - 2ms/step - loss: 0.6189
Epoch 96/100
125/125 - 0s - 2ms/step - loss: 0.6410
Epoch 97/100
125/125 - 0s - 2ms/step - loss: 0.6214
Epoch 98/100
125/125 - 0s - 3ms/step - loss: 0.6280
Epoch 99/100
125/125 - 1s - 5ms/step - loss: 0.6262
Epoch 100/100
125/125 - 0s - 2ms/step - loss: 0.6237
<keras.src.callbacks.history.History at 0x7bcdefa27880>

~~~

-----------------------------------

## **"Fit" the Model?**

In machine learning the term, "fit the model", means to **_train_** the model using the numerical values in our dataset. During training, the model **_learns_** from the data. In essence, the model makes a prediction of what it "thinks" is the correct answer, and then adjusts its trainable parameters (weights and biases) in an effort to make its predictions more accurate (i.e. minimize the loss function).

The **fit step** involves:

* Forward passes (feeding input data through the network).
* Backward passes (calculating derivatives using backpropagation).
* Updating weights based on gradients to improve predictions.

The fit step is by far the most _computationally_ demanding step. This is where GPU's and TPU's are used to speed up the training. With relatively small neural networks like this one, a relatively modern laptop will come with a **central processing unit (CPU)** that can handled all the computations involved in a training a small neural network in a reasonable time period. Increasing, newer laptops are being equipped with "AI chips" that can speed of training by a significant amout.

The command:
~~~text
# Fit the model to the data
model_0.fit(x_0,y_0,verbose=2,epochs=100)
~~~
has the following 4 arguments:

* the $X$-values
* the $Y$-values
* the level of _verbosity_ (how much feedback should be printed out during training)
* the number of `epochs`

In the example above, the number of epoch is set to `100`. An epoch means training the neural network with all the training data for one complete cycle. During an epoch, all of the data is used exactly once. A forward pass and a backward pass together are counted as one pass.

With the verbosity set to `2`, Keras will print out the loss value, the number of milliseconds the epoch required, and the time per step.

-------------------------

Notice that the **loss value** decreases from `3.3706` after the 1st epoch (`Epoch 1/100`),
~~~text
Epoch 1/100
125/125 - 1s - 9ms/step - loss: 2.4290
~~~
to less than a third of that amount, `0.7313` after the last epoch (`Epoch 100/100`)
~~~text
Epoch 100/100
125/125 - 0s - 2ms/step - loss: 0.6237
~~~
This decrease in loss is due to the neural network **_learning_**. After each epoch, the network makes slight adjustments in the network's _trainable parameters_ (i.e. biases and connection weights), and runs the complete dataset through the model again to see if the updated parameters do a better job of predicting the `Ripeness` of each apple in the dataset.

## **Exercise 1: Simple Tensorflow Regression**

For **Exercise 1** you are to build the same Feed Forward Neural network (FNN) demonstrated in Example 1. However, this time your goal will be to build a regression neural network that can predict the `Acidity` of an apple. In other words, the column `Acidity` will be your response variable (Y-values).

As was done in Example 1, **Exercise 3** has been divided into 3 steps:

1. Read dataset and create a DataFrame
2. Create Feature Vector
3. Construct, compile and train the neural network

### **Exercise 1-Step 1: Read dataset and create a DataFrame**

In the cell below, write the code to create a DataFrame called `acidDF` by reading the Apple Quality dataset from the course HTTPS server. Set the dislay options to show `8` rows and `8` columns, and then display your `acidDF` DataFrame.

Make sure to change the code in Example 1 to the following:

~~~text
# Create variable for later
acidNum = acidDF['A_id']
~~~

Otherwise, you will get an incorrect answer later.

In [None]:
# Insert your code for Exercise 1-Step 1 here:



If your code is correct you should see the following output:

![__](http://biologicslab.co/BIO1173/images/class_03/class_03_2_image02a.png)


### **Exercise 1-Step 2: Create Feature Vector**

In the cell below, create a Feature Vector for your regression model. Start by mapping the categorical values in the column `Quality` to integers as was demonstrated in Example 1-Step 2.

Then generate your X-values. Remember that your model will be predicting `Acidity` so you **don't** want to include the values with your other X-values. You can use this code chunk to generate your X-values:

~~~text
# Generate X values
acidX = acidDF[['Size', 'Weight', 'Sweetness', 'Crunchiness',
       'Juiciness', 'Ripeness', 'Quality']].values
acidX = np.asarray(acidX).astype('float32')
~~~
As you can see, the string `Acidity` has been replaced by the string `Ripeness`.

You can use this code chunk to generate your Y-values:

~~~text
# Generate Y values
acidY = acidDF['Acidity'].values
acidY = np.asarray(acidY).astype('float32')
~~~

As a check, print out the first 10 values in your Numpy array `acidY`.

In [None]:
# Insert your code for Exercise 1-Step 2 here



If your code is correct, you should see the following output:
~~~text
[-0.49159048 -0.72280937  2.62163647 ... -1.33461139 -2.22971981
  1.59979646]
~~~

### **Exercise 1-Step 3: Construct, compile and train the neural network**

In the cell below, contruct a regression neural network called `acidModel` by "copy-and-paste" the code in Example 1-Step 3.

Don't forget to change the value of the argument `input_dim` in the input layer. Your model should read:
~~~text
acidModel.add(Dense(25, input_dim=acidX.shape[1],
                        activation='relu'))  # Hidden 1
~~~
As mentioned above, students in this course often forget to change this variable causing the model not to train.

In [None]:
# Insert your code for Exercise 1-Step 3 here



If your code is correct, your output should start with something similar to the following:

~~~text
Model: "sequential_17"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ dense_34 (Dense)                     │ (None, 64)                  │             512 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_35 (Dense)                     │ (None, 10)                  │             650 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_36 (Dense)                     │ (None, 1)                   │              11 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 1,173 (4.58 KB)
 Trainable params: 1,173 (4.58 KB)
 Non-trainable params: 0 (0.00 B)
Epoch 1/100
125/125 - 1s - 10ms/step - loss: 3.6186
Epoch 2/100
125/125 - 0s - 3ms/step - loss: 2.8457
Epoch 3/100
125/125 - 0s - 3ms/step - loss: 2.5317
Epoch 4/100
125/125 - 0s - 2ms/step - loss: 2.3947
Epoch 5/100
125/125 - 0s - 2ms/step - loss: 2.2933
Epoch 6/100
125/125 - 0s - 3ms/step - loss: 2.2158



~~~

and end with something similar to the following:
~~~text
Epoch 95/100
125/125 - 1s - 5ms/step - loss: 0.9980
Epoch 96/100
125/125 - 0s - 2ms/step - loss: 0.9712
Epoch 97/100
125/125 - 0s - 2ms/step - loss: 0.9829
Epoch 98/100
125/125 - 0s - 3ms/step - loss: 0.9880
Epoch 99/100
125/125 - 1s - 5ms/step - loss: 0.9779
Epoch 100/100
125/125 - 0s - 2ms/step - loss: 0.9732
<keras.src.callbacks.history.History at 0x7bcdf04266e0>
~~~

In the particular example, your `acidModel` started with an error rate (`loss:`) equal to `3.6186` after the first epoch. At the end of `100` epochs, the model's prediction of the acidity level had improved substantially, with a loss equal to only `0.9732`.

Due to the random nature that Keras initializes weights and biases, your output will likely be somewhat different.

## Introduction to Neural Network Hyperparameters

If you look at the above code, you will see that the neural network contains four layers. The first layer is the input layer because it contains the **input_dim** parameter that the programmer sets to be the number of inputs the dataset has. The network needs one input neuron for every column in the data set (including dummy variables).  

There are also several hidden layers, with 25 and 10 neurons each. You might be wondering how the programmer chose these numbers. Selecting a hidden neuron structure is one of the most common questions about neural networks. Unfortunately, there is no right answer. These are hyperparameters. They are settings that can affect neural network performance, yet there are no clearly defined means of setting them.

In general, more hidden neurons mean more capability to fit complex problems. However, too many neurons can lead to overfitting and lengthy training times. Too few can lead to underfitting the problem and will sacrifice accuracy. Also, how many layers you have is another hyperparameter. In general, more layers allow the neural network to perform more of its feature engineering and data preprocessing. But this also comes at the expense of training times and the risk of overfitting. In general, you will see that neuron counts start larger near the input layer and tend to shrink towards the output layer in a triangular fashion.

Some techniques use machine learning to optimize these values. These will be discussed later in this course.

## Controlling the Amount of Output

The program produces one line of output for each training epoch. You can eliminate this output by setting the verbose setting of the fit command:

* **verbose=0** - No progress output (use with Jupyter if you do not want output).
* **verbose=1** - Display progress bar, does not work well with Jupyter.
* **verbose=2** - Summary progress output (use with Jupyter if you want to know the loss at each epoch).

## Regression Prediction

Next, we will perform actual predictions. The program assigns these predictions to the **pred** variable. For Example 5, these will be predictions of apple **Ripeness** from the neural network; For Exercise 5, these will be predictions of apple **Quality** from the neural network.

### Example 2: Use Model to Make Predictions

The code in the cell below uses Keras' `model.predict()` function to predict the `Ripeness` of each of the 4000 apples in the Apple Quality dataset based on its 'Size', 'Weight', 'Sweetness', 'Crunchiness','Juiciness', 'Acidity', and 'Quality'. The predictions are stored in a variable called `pred`.

Keep in mind that these `Ripeness` **_predictions_** are being made by the neural network model, `ripeModel` after it was trained ('fitted') to the dataset for 100 epochs.

In [None]:
# Example 2: Predict the Ripeness of each apple in the dataset


# Use model to make Ripeness predictions
ripePred = ripeModel.predict(ripeX)

# Print shape of pred
print(f"Shape of ripePred: {ripePred.shape}")

# Print first 10 predictions
print(ripePred[0:10])

If your code is correct you should see something similar to the following output:
~~~text
125/125 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Shape of ripePred: (4000, 1)
[[-0.17432438]
 [ 0.6893991 ]
 [ 0.19412594]
 [-3.3996456 ]
 [-1.4036642 ]
 [ 1.2713469 ]
 [-1.7385985 ]
 [ 1.4339687 ]
 [ 3.6715446 ]
 [ 2.8603623 ]]
~~~


### **Exercise 2: Use model to make predictions**

In the cell below, use Keras' `model.predict()` function to predict the `Acidity` of each of the 4000 apples in the Apple Quality dataset based on its 'Size', 'Weight', 'Sweetness', 'Crunchiness','Juiciness', 'Ripeness' and `Quality`. Store these  predictions in a variable called `acidPred`.

Print out the shape of `acidPred` and your model's first `10` acidity predictions.

In [None]:
# Insert your code for Exercise 2 here



If your code is correct you should see something similar to the following output:
~~~text
125/125 [==============================] - 0s 2ms/step
Shape of acidPred: (4000, 1)
[[-0.96637136]
 [ 0.5449684 ]
 [ 2.9393399 ]
 [ 1.3829033 ]
 [ 1.7506608 ]
 [-2.8392882 ]
 [ 3.751297  ]
 [-1.6653696 ]
 [-2.7585254 ]
 [-1.591665  ]]
~~~

### Example 3: Determine the accuracy of the model's predictions

An obvious question is how good are the neural network's predictions?  Since we know the correct `Ripeness` for each apple in the dataset, we can measure how close each neural network prediction was to the actual value.

A common measure in regression analysis is the [Root-mean-square error (RMSE)](https://en.wikipedia.org/wiki/Root-mean-square_deviation).

RMSE measures of the differences between predicted values and true values.

The code in the cell below computes the RMSE of the `Ripeness` predictions made by `ripeModel` with the actual `Ripeness` values in the Apple Quality dataset, which are stored in the array `ripeY`. The RMSE is stored in a new variable called `score`.


In [None]:
# Example 3: Determine the RMSE for model_0

import numpy as np
from sklearn import metrics

# Measure RMSE error
score = np.sqrt(metrics.mean_squared_error(ripePred,ripeY))

# Print out the RSME
print(f"Final score (RMSE) for ripeModel: {score}")

If your code is correct, you should see something similar to the following output:
~~~text
Final score (RMSE) for ripeModel: 0.8224384784698486
~~~
So what does this RMSE value mean?

The answer is somewhat complicated. What can be said with certainty, is that an RMSE value equal to `0` would indicate `100%` perfect predictions. However, that rarely happens with real data.

We also know that RMSE will always be non-negative (i.e., positive) since it is the _square_ of two numbers.

However, beyond that, interpreting the meaning of RMSE is not straightforward. In general, a lower RMSE is better than a higher one. However, comparisons across different types of data would be invalid because the magnitude of the RSME is dependent on the scale of the numbers used. In other words, you will get a larger RSME when trying to predict a bigger number than a smaller one.

### Example 4: Compare predictions to actual values

The best way to get a sense of how accurate were the predictions of the `ripeModel` is to print out the actual values next to the model's predictive values.

The code in the cell below uses a `for` loop to print out the first `10` comparisons.

In [None]:
# Example 4: Print out predictions and actual values

import numpy as np

# set the print option to 4 decimal places
np.set_printoptions(formatter={'float': '{:.4f}'.format})

# For loop for printing values
for i in range(10):
    print(f"{i+1}. Apple:{ripeNum[i]}"
         + f"  Ripeness: {ripeY[i]}"
         + f"  Predicted: {ripePred[i]}")

If your code is correct you should see an output that is similar to the following:

~~~text
1. Apple:0  Ripeness: 0.3298397958278656  Predicted: [0.3509]
2. Apple:1  Ripeness: 0.867530107498169  Predicted: [0.5495]
3. Apple:2  Ripeness: -0.03803332895040512  Predicted: [0.1906]
4. Apple:3  Ripeness: -3.4137613773345947  Predicted: [-3.3089]
5. Apple:4  Ripeness: -1.303849458694458  Predicted: [-1.6270]
6. Apple:5  Ripeness: 1.9146158695220947  Predicted: [2.0036]
7. Apple:6  Ripeness: -1.8474167585372925  Predicted: [-1.4015]
8. Apple:7  Ripeness: 0.9744378328323364  Predicted: [1.3325]
9. Apple:8  Ripeness: 4.080920696258545  Predicted: [3.7945]
10. Apple:9  Ripeness: 1.620856761932373  Predicted: [3.3389]
~~~

By inspection, we can see that the `ripeModel` is doing a reasonable job, but definitely **not** a perfect job, of predicting the `Ripeness` of an individual apple.

### **Exercise 3: Determine the quality of the model's predictions**

Write the code in the cell below, to compute the [Root-mean-square error (RMSE)](https://en.wikipedia.org/wiki/Root-mean-square_deviation) for apple `Acidity` predicted by your `acidModel`. Print out the RSME for your `acidModel`.


In [None]:
# Insert your code for Exercise 3 here



If your code is correct you should see something similar to the following output:
~~~text
Final score (RMSE) for acidModel: 2.863641276499843
~~~

### **Exercise 4: Compare predictions to actual values**

In the cell below write the code to print out predictions made by your `acidModel` for the `Acidity` of the first 10 apples as well as their actual `Acidity` values, side-by-side. Don't forget to change the variable `ripeNum` to `acidNum`.

In [None]:
# Insert your code for Exercise 4 here



If your code is correct you should see an output that is something similar to the following:
~~~text
1. Apple:0  Acidity: -0.4915904700756073  Predicted: [0.8089]
2. Apple:1  Acidity: -0.722809374332428  Predicted: [0.8771]
3. Apple:2  Acidity: 2.621636390686035  Predicted: [1.6307]
4. Apple:3  Acidity: 0.7907232046127319  Predicted: [0.1031]
5. Apple:4  Acidity: 0.5019840598106384  Predicted: [-1.5020]
6. Apple:5  Acidity: -2.981523275375366  Predicted: [6.7723]
7. Apple:6  Acidity: 2.414170503616333  Predicted: [-2.2504]
8. Apple:7  Acidity: -1.4701250791549683  Predicted: [-0.4674]
9. Apple:8  Acidity: -4.8719048500061035  Predicted: [2.6443]
10. Apple:9  Acidity: 2.185607671737671  Predicted: [3.5449]
~~~

Compared to `ripeModel`, your `acidModel` seems to be able to make more accurate predictions. This is consistent with the observation that the RSME for your `acidModel` was smaller than the RMSE for the `ripeModel` created in Example 1.  

## **Lesson Turn-in**

When you have completed and run all of the code cells, use the **File --> Print.. --> Save to PDF** to generate a PDF of your Colab notebook. Save your PDF as `Class_03_2.lastname.pdf` where _lastname_ is your last name, and upload the file to Canvas.

## **Poly-A Tail**


## **BASIC**

![__](https://upload.wikimedia.org/wikipedia/commons/7/7b/AtariBASIC.png)


**BASIC (Beginners' All-purpose Symbolic Instruction Code)** is a family of general-purpose, high-level programming languages designed for ease of use. The original version was created by John G. Kemeny and Thomas E. Kurtz at Dartmouth College in 1963. They wanted to enable students in non-scientific fields to use computers. At the time, nearly all computers required writing custom software, which only scientists and mathematicians tended to learn.

In addition to the programming language, Kemeny and Kurtz developed the Dartmouth Time-Sharing System (DTSS), which allowed multiple users to edit and run BASIC programs simultaneously on remote terminals. This general model became popular on minicomputer systems like the PDP-11 and Data General Nova in the late 1960s and early 1970s. Hewlett-Packard produced an entire computer line for this method of operation, introducing the HP2000 series in the late 1960s and continuing sales into the 1980s. Many early video games trace their history to one of these versions of BASIC.

The emergence of microcomputers in the mid-1970s led to the development of multiple BASIC dialects, including Microsoft BASIC in 1975. Due to the tiny main memory available on these machines, often 4 KB, a variety of Tiny BASIC dialects were also created. BASIC was available for almost any system of the era, and became the de facto programming language for home computer systems that emerged in the late 1970s. These PCs almost always had a BASIC interpreter installed by default, often in the machine's firmware or sometimes on a ROM cartridge.

BASIC declined in popularity in the 1990s, as more powerful microcomputers came to market and programming languages with advanced features (such as Pascal and C) became tenable on such computers. By then, most nontechnical personal computer users relied on pre-written applications rather than writing their own programs. In 1991, Microsoft released Visual Basic, combining an updated version of BASIC with a visual forms builder. This reignited use of the language and "VB" remains a major programming language in the form of VB.NET, while a hobbyist scene for BASIC more broadly continues to exist.

**Origin**

John G. Kemeny was the chairman of the Dartmouth College Mathematics Department. Based largely on his reputation as an innovator in math teaching, in 1959 the college won an Alfred P. Sloan Foundation award for \$500,000 to build a new department building. Thomas E. Kurtz had joined the department in 1956, and from the 1960s Kemeny and Kurtz agreed on the need for programming literacy among students outside the traditional STEM fields. Kemeny later noted that "Our vision was that every student on campus should have access to a computer, and any faculty member should be able to use a computer in the classroom whenever appropriate. It was as simple as that."

Kemeny and Kurtz had made two previous experiments with simplified languages, DARSIMCO (Dartmouth Simplified Code) and DOPE (Dartmouth Oversimplified Programming Experiment). These did not progress past a single freshman class. New experiments using Fortran and ALGOL followed, but Kurtz concluded these languages were too tricky for what they desired. As Kurtz noted, Fortran had numerous oddly formed commands, notably an "almost impossible-to-memorize convention for specifying a loop: DO 100, I = 1, 10, 2. Is it '1, 10, 2' or '1, 2, 10', and is the comma after the line number required or not?"

Moreover, the lack of any sort of immediate feedback was a key problem; the machines of the era used batch processing and took a long time to complete a run of a program. While Kurtz was visiting MIT, John McCarthy suggested that time-sharing offered a solution; a single machine could divide up its processing time among many users, giving them the illusion of having a (slow) computer to themselves.[8] Small programs would return results in a few seconds. This led to increasing interest in a system using time-sharing and a new language specifically for use by non-STEM students.

Kemeny wrote the first version of BASIC. The acronym BASIC comes from the name of an unpublished paper by Thomas Kurtz.The new language was heavily patterned on FORTRAN II; statements were one-to-a-line, numbers were used to indicate the target of loops and branches, and many of the commands were similar or identical to Fortran. However, the syntax was changed wherever it could be improved. For instance, the difficult to remember DO loop was replaced by the much easier to remember FOR I = 1 TO 10 STEP 2, and the line number used in the DO was instead indicated by the NEXT I. Likewise, the cryptic IF statement of Fortran, whose syntax matched a particular instruction of the machine on which it was originally written, became the simpler IF I=5 THEN GOTO 100. These changes made the language much less idiosyncratic while still having an overall structure and feel similar to the original FORTRAN.

The project received a $300,000 grant from the National Science Foundation, which was used to purchase a GE-225 computer for processing, and a Datanet-30 realtime processor to handle the Teletype Model 33 teleprinters used for input and output. A team of a dozen undergraduates worked on the project for about a year, writing both the DTSS system and the BASIC compiler.[7] The first version BASIC language was released on 1 May 1964.[10][11]

Initially, BASIC concentrated on supporting straightforward mathematical work, with matrix arithmetic support from its initial implementation as a batch language, and character string functionality being added by 1965. Usage in the university rapidly expanded, requiring the main CPU to be replaced by a GE-235,[7] and still later by a GE-635. By the early 1970s there were hundreds of terminals connected to the machines at Dartmouth, some of them remotely.

Wanting use of the language to become widespread, its designers made the compiler available free of charge. In the 1960s, software became a chargeable commodity; until then, it was provided without charge as a service with expensive computers, usually available only to lease. They also made it available to high schools in the Hanover, New Hampshire, area and regionally throughout New England on Teletype Model 33 and Model 35 teleprinter terminals connected to Dartmouth via dial-up phone lines, and they put considerable effort into promoting the language. In the following years, as other dialects of BASIC appeared, Kemeny and Kurtz's original BASIC dialect became known as Dartmouth BASIC.

New Hampshire recognized the accomplishment in 2019 when it erected a highway historical marker in Hanover describing the creation of "the first user-friendly programming language".[12]