<a href="https://colab.research.google.com/github/DavidSenseman/BIO1173/blob/main/Class_05_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---------------------------
**COPYRIGHT NOTICE:** This Jupyterlab Notebook is a Derivative work of [Jeff Heaton](https://github.com/jeffheaton) licensed under the Apache License, Version 2.0 (the "License"); You may not use this file except in compliance with the License. You may obtain a copy of the License at

> [http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

------------------------

# **BIO 1173: Intro Computational Biology**

**Module 5: Regularization and Dropout**

* Instructor: [David Senseman](mailto:David.Senseman@utsa.edu), [Department of Biology, Health and the Environment](https://sciences.utsa.edu/bhe/), [UTSA](https://www.utsa.edu/)

### Module 5 Material

* Part 5.1: Part 5.1: Introduction to Regularization: Ridge and Lasso
* Part 5.2: Using K-Fold Cross Validation with Keras
* **Part 5.3: Using L1 and L2 Regularization with Keras to Decrease Overfitting**
* Part 5.4: Drop Out for Keras to Decrease Overfitting
* Part 5.5: Benchmarking Keras Deep Learning Regularization Techniques



## Google CoLab Instructions

You MUST run the following code cell to get credit for this class lesson. By running this code cell, you will map your GDrive to /content/drive and print out your Google GMAIL address. Your Instructor will use your GMAIL address to verify the author of this class lesson.

In [None]:
# You must run this cell first
try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
    from google.colab import auth
    auth.authenticate_user()
    COLAB = True
    print("Note: Using Google CoLab")
    import requests
    gcloud_token = !gcloud auth print-access-token
    gcloud_tokeninfo = requests.get('https://www.googleapis.com/oauth2/v3/tokeninfo?access_token=' + gcloud_token[0]).json()
    print(gcloud_tokeninfo['email'])
except:
    print("**WARNING**: Your GMAIL address was **not** printed in the output below.")
    print("**WARNING**: You will NOT receive credit for this lesson.")
    COLAB = False

## **Datasets for Class_05_3**

For this lesson we will be using the [Obesity Prediction Dataset](https://www.kaggle.com/datasets/mrsimple07/obesity-prediction) for the Examples and the [Body Performance Dataset](https://www.kaggle.com/datasets/kukuroo3/body-performance-data) for the **Exercises**.  

### **Obesity Prediction Dataset**

The **_Obesity Prediction dataset_** provides comprehensive information on individuals' demographic characteristics, physical attributes, and lifestyle habits, aiming to facilitate the analysis and prediction of obesity prevalence.

The 7 categories of obesity/demographics measurements are:

* **Age:** The age of the individual, expressed in years (Mean=49.9 yrs +/- 18.1)
* **Gender:** The gender of the individual coded `Male` and `Female`
* **Height:** The height of the individual measured in centimeters (Mean=170 cm +/- 10.3)
* **Weight:** The weight of the individual measured in kilograms (Mean=71.2 kg +/- 15.5)
* **BMI:** Body mass index, a calculated metric derived from the individual's weight and height (Mean=24.9 +/- 6.19)
* **PhysicalActivityLevel:** This variable quantifies the individual's level of physical activity (Mean=2.53 +/- 1.12)
* **ObesityCategory:** Categorization of individuals based on their BMI into different obesity categories

The output from `opDF.info` is:
~~~text
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 7 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Age                    1000 non-null   int64  
 1   Gender                 1000 non-null   object
 2   Height                 1000 non-null   float64
 3   Weight                 1000 non-null   float64
 4   BMI                    1000 non-null   float64
 5   PhysicalActivityLevel  1000 non-null   int64  
 6   ObesityCategory        1000 non-null   object
dtypes: float64(3), int64(2), object(2)
memory usage: 54.8+ KB
~~~
As you can see, two columns, `Age` and `ObesityCategory`, are non-numeric and will need to be converted into numeric values. Since all columns have the same `Non-Null Count` (_n_=1000) there is no missing data.

The output from `opDF['ObesityCategory'].value_counts()` is as follows:
~~~text
ObesityCategory
Normal weight    371
Overweight       295
Obese            191
Underweight      143
Name: count, dtype: int64
~~~
As you can see, the column `ObesityCategory` has four categorical values.

The output from `opDF['PhysicalActivityLevel'].value_counts()` is as follows:
~~~text
PhysicalActivityLevel
4    259
3    255
2    247
1    239
Name: count, dtype: int64
~~~

There are four different activity levels, ranging from 1 to 4.

### **Body Performance dataset**

[Body Performance](https://www.kaggle.com/datasets/kukuroo3/body-performance-data)

For the Examples in this lesson, we will be using the [Body Performance dataset](https://www.kaggle.com/datasets/kukuroo3/body-performance-data) provided by the [Seoul Olympic Games Korea Sports Promotion Foundation](https://www.bigdata-culture.kr/bigdata/user/data_market/detail.do?id=ace0aea7-5eee-48b9-b616-637365d665c1).

This dataset has 12 categories of body performance for a relatively large number of men and women (_n_=13,303). To speed-up the training of neural networks in the Examples, we will only use a fraction of the total number.

The 12 categories of fitness measurements are:
* **age:** 20 ~64
* **gender:** M,F
* **height_cm:** (If you want to convert to feet, divide by 30.48)
* **weight_kg:**
* **body fat_%:**
* **diastolic:** diastolic blood pressure (min)
* **systolic:** systolic blood pressure (min)
* **gripForce:**
* **sit and bend forward_cm:**
* **sit-ups counts:**
* **broad jump_cm:**
* **class:** A,B,C,D ( A: best) / stratified

The output for the command `df.info()` is as follows:
~~~text
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13393 entries, 0 to 13392
Data columns (total 12 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   age                      13393 non-null  float64
 1   gender                   13393 non-null  object
 2   height_cm                13393 non-null  float64
 3   weight_kg                13393 non-null  float64
 4   body fat_%               13393 non-null  float64
 5   diastolic                13393 non-null  float64
 6   systolic                 13393 non-null  float64
 7   gripForce                13393 non-null  float64
 8   sit and bend forward_cm  13393 non-null  float64
 9   sit-ups counts           13393 non-null  float64
 10  broad jump_cm            13393 non-null  float64
 11  class                    13393 non-null  object
dtypes: float64(10), object(2)
~~~

As you can see, all but two columns, `age` and `class`, are numeric. Since all columns have the same `Non-Null Count` there is no missing data.

## Create functions for this lesson

The code in the cell below creates 2 useful functions for this lesson, `elaspedTime(start,stop)` and `rename_col_by_index(dataframe, index_mapping)`.

In [2]:
# Create functions

import pandas as pd

# Simple function to print out elasped time
def elaspedTime(start,end):
    # Print out time
    seconds = int((end-start))
    seconds = seconds % (24 * 3600)
    hour = seconds // 3600
    seconds %= 3600
    minutes = seconds // 60
    seconds %= 60
    print("Elapsed time = %d:%02d:%02d" % (hour, minutes, seconds))
    print()

# Simple function to change column name in a dataframe
def rename_col_by_index(dataframe, index_mapping):
    dataframe.columns = [index_mapping.get(i, col) for i, col in enumerate(dataframe.columns)]
    return dataframe

# **L1 and L2 Regularization to Decrease Overfitting**

L1 and L2 regularization are two common regularization techniques that can reduce the effects of overfitting [[Cite:ng2004feature]](http://cseweb.ucsd.edu/~elkan/254spring05/Hammon.pdf). These algorithms can either work with an objective function or as a part of the backpropagation algorithm. In both cases, the regularization algorithm is attached to the training algorithm by adding an objective.

## **L1 Regularization**
In the context of a neural network with L1 regularization, the objective function typically consists of two main components: the original loss function and the L1 regularization term. The objective function serves as a measure that the optimization algorithm aims to minimize during the training process.

The objective function with L1 regularization can be represented as:

`Objective function = Loss function + λ * L1 regularization term`

where:

* **Loss function:** The original loss function used to evaluate the performance of the neural network on the training data, such as the cross-entropy loss or mean squared error.
* **λ (lambda):** The regularization parameter that controls the strength of the L1 regularization penalty.
* **L1 regularization term:** The sum of absolute values of the weights in the neural network.

The addition of the L1 regularization term to the objective function encourages sparsity in the weights of the neural network by penalizing large weights. This helps prevent overfitting and can lead to a simpler and more interpretable model. The trade-off between minimizing the loss function and reducing the magnitude of the weights is controlled by the regularization parameter λ.

During the training process, the neural network adjusts its weights by minimizing the composite objective function, striking a balance between fitting the training data well (minimizing the loss function) and reducing model complexity (L1 regularization).

These algorithms work by adding a weight penalty to the neural network training. This penalty encourages the neural network to keep the weights to small values. Both L1 and L2 calculate this penalty differently. You can add this penalty calculation to the calculated gradients for gradient-descent-based algorithms, such as backpropagation. The penalty is negatively combined with the objective score for objective-function-based training, such as simulated annealing.


## **L1 vs L2 Regularization**
Both L1 and L2 work similarily in that they penalize the size of the weight, but in significantly different ways. L2 will force the weights into a pattern similar to a Gaussian distribution while the L1 will force the weights into a pattern similar to a Laplace distribution, as demonstrated in the following figure.

![L1 vs L2](https://biologicslab.co/BIO1173/images/class_9_l1_l2.png "L1 vs L2")

As you can see, L1 algorithm is more tolerant of weights further from 0, whereas the L2 algorithm is less tolerant. We will highlight other important differences between L1 and L2 in the following sections. You also need to note that both L1 and L2 count their penalties based only on weights; they do not count penalties on bias values. Keras allows [l1/l2 to be directly added to your network](http://tensorlayer.readthedocs.io/en/stable/modules/cost.html).

## Example 1: L1 Regularization of a classification neural network

In Example 1, L1 regularization will be demonstrated using the Obesity Prediction dataset and a classification neural network that will predict the Obesity Category of individuals. To make coding easier to follow, Example 1 has been broken down into 3 steps labeled "A", "B" and "C".

### Example 1 - Step 1: Create feature vector

The code in the cell below reads the Obesity Prediction dataset, `obesity_prediction.csv` from the course HTTPS server and and creates a new DataFrame called `opDF`. The column `Gender` is mapped with the string `Male` being mapped to `0` and `Female` mapped to `1`. The columns `Age`, `Height`, `Weight` and `BMI` are standardized to their Zscores. Since `ObesityCategory` is the y-value for this neural network, it is dropped when generating the list with the names of the columns to be used for the x-values (`opX_columns`).

Since we will be building a neural network for **_classification_** we need to One-Hot encode the column `ObesityCategory` using the following code chunk:
~~~text
# One-Hot encode the column containing the y-values
dummies = pd.get_dummies(opDF['ObesityCategory']) # Classification
ObCategories = dummies.columns
opY = dummies.values
opY = np.asarray(opY).astype('float32')
~~~

Finally, the cell prints out the categorical values (names) that were One-Hot encoded using the "starred" print statement:
~~~text
# Print y categorical names
print(*ObCategories)
~~~

In [None]:
# Example 1 - Step 1: Create feature vector

import pandas as pd
import numpy as np
import scipy.stats
from scipy.stats import zscore

# Read the data set
opDF = pd.read_csv(
    "https://biologicslab.co/BIO1173/data/obesity_prediction.csv",
#   index_col=0,
    sep=',',
    na_values=['NA','?'])

# Map Gender
mapping =  {'Male': 0,
            'Female': 1}
opDF['Gender'] = opDF['Gender'].map(mapping)

# Standardize ranges
opDF['Age'] = zscore(opDF['Age'])
opDF['Height'] = zscore(opDF['Height'])
opDF['Weight'] = zscore(opDF['Weight'])
opDF['BMI'] = zscore(opDF['BMI'])

# Generate list of columns for x
opX_columns = opDF.columns.drop('ObesityCategory')  #

# Generate x-values as numpy array
opX = opDF[opX_columns].values
opX = np.asarray(opX).astype('float32')

# One-Hot encode the column containing the y-values
dummies = pd.get_dummies(opDF['ObesityCategory']) # Classification
obCategories = dummies.columns
opY = dummies.values
opY = np.asarray(opY).astype('float32')

# Print y categorical names
print(*obCategories)

If your code is correct, you should see the following output:
~~~text
Normal weight Obese Overweight Underweight
~~~

### Example 1 - Step 2: L1 Regularization to Decrease Overfitting

We now create a Keras network with L1 regression.

The specific Keras code chunk that adds L1 regularization is the following:
~~~text
activity_regularizer=regularizers.l1(1e-4)
~~~
It should be noted that the L1 regularizer is added to each hidden layer, but **not** to the output layer:
~~~text
def create_and_compile_model(input_dim):
    model = Sequential()
    model.add(Input(shape=(input_dim,)))  # Input
    model.add(Dense(50, activation='relu', activity_regularizer=regularizers.l1(1e-4)))
    model.add(Dense(25, activation='relu', activity_regularizer=regularizers.l1(1e-4)))
    model.add(Dense(opY.shape[1], activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam')
    return model
~~~

In [None]:
# Example 1 - Step 2: L1 Regularization to Decrease Overfitting

import numpy as np
import time

import keras
from keras.models import Sequential
from keras.layers import Dense, Input
from keras import regularizers

import sklearn
from sklearn.model_selection import KFold
from sklearn import metrics


# Set variables
EPOCHS=100 # number of epochs for each loop
numK=5     # Set number of K-folds

def create_and_compile_model(input_dim):
    model = Sequential()
    model.add(Input(shape=(input_dim,)))  # Input
    model.add(Dense(50, activation='relu', activity_regularizer=regularizers.l1(1e-4)))
    model.add(Dense(25, activation='relu', activity_regularizer=regularizers.l1(1e-4)))
    model.add(Dense(opY.shape[1], activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam')
    return model

fold = 0
kf = KFold(n_splits=5)
oos_y = []
oos_pred = []

# Record the start time in T_start
T_start = time.time()

print("STAND BY: TRAINING IS STARTING")

# Start Loop ------------------------------------------------------------#
for train, test in kf.split(opX):
    fold += 1
    print(f"Starting Fold #{fold}...")

    x_train = opX[train]
    y_train = opY[train]
    x_test = opX[test]
    y_test = opY[test]

    # Create and compile the model for this fold
    model = create_and_compile_model(opX.shape[1])

    # Train model for this fold
    model.fit(x_train, y_train, validation_data=(x_test, y_test), verbose=0, epochs=EPOCHS)

    # Use model to make predictions
    pred = model.predict(x_test)

    # Add actual y-values for the data used this fold
    oos_y.append(y_test)

    # Raw probabilities to chosen class (highest probability)
    pred = np.argmax(pred, axis=1)
    oos_pred.append(pred)

    # Measure this fold's accuracy
    y_compare = np.argmax(y_test, axis=1)  # For accuracy calculation
    score = metrics.accuracy_score(y_compare, pred)
    print(f"Fold score (accuracy): {score:.3f}")

# End Loop ------------------------------------------------------------#

# Build the oos prediction list and calculate the error.
oos_y = np.concatenate(oos_y)
oos_pred = np.concatenate(oos_pred)
oos_y_compare = np.argmax(oos_y,axis=1) # For accuracy calculation

score = metrics.accuracy_score(oos_y_compare, oos_pred)
print(f"Final score (accuracy): {score:.3f}")

# Write the cross-validated prediction
oos_y = pd.DataFrame(oos_y)
oos_pred = pd.DataFrame(oos_pred)
oosDF = pd.concat( [oos_pred, oos_y],axis=1 )

# Record the end time in T_end
T_end = time.time()

# Print out elapsed time
elaspedTime(T_start,T_end)


If your code is correct, you should see something similar to following output:
~~~text
STAND BY: TRAINING IS STARTING
Starting Fold #1...
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step
Fold score (accuracy): 0.975
Starting Fold #2...
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step
Fold score (accuracy): 0.970
Starting Fold #3...
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step
Fold score (accuracy): 0.960
Starting Fold #4...
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step
Fold score (accuracy): 0.970
Starting Fold #5...
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step
Fold score (accuracy): 0.990
Final score (accuracy): 0.973
Elapsed time = 0:01:30

~~~
The `Final score (accuracy): 0.973` is very high.

### Example 1 - Step 3: Print out actual and predicted y-values

The code in the cell below prints out the predicted and the actual Obesity Categories for the "out-of-sample" individuals. As mentioned previously, "out-of-sample" refers to data that was _not_ used in the process of developing the neural network model. It is only used to evaluate the accuracy and performance of the model on new, unseen data to assess its generalizability and potential for predicting future outcomes.

In [None]:
# Example 1 - Step 3: Print out actual and predicted y-values

# Rename columns
new_column_mapping = {0: 'Predicted Ob Class', 1: 'Actual: 0'}
oosDF = rename_col_by_index(oosDF, new_column_mapping)

# Set display options
pd.set_option('display.max_rows', 8)
pd.set_option('display.max_columns', 8)

# Display DataFrame
display(oosDF)

If your code is correct you should see something similar to the following table:

![__](https://biologicslab.co/BIO1173/images/class_05_3_Exm1C.png)

By inspection of the output above, you can see the model's predictions of the Obesity Category are very good for the out-of-sample individuals as would be expected with a `Final score (accuracy): 0.97`. In the output shown above, there were no errors in the model's predictions.

## **Exercise 1: L1 Regularization of a classification neural network**

For **Exercise 1**, you are to use the Body Performance dataset and build a classification neural network that will predict the fitness `class` of individuals. To make coding easier, **Exercise 1** has been broken down into 3 steps.

### **Exercise 1 - Step 1: Create feature vector**

In the cell below, read the Body Performance dataset, `bodyPerformance.csv` from the course HTTPS server and and create a new DataFrame called `bpBigDF`. Since this is a fairly large dataset, you can speed up training time by only using a part of it. Use this code chunk to create a DataFrame with only 20% of the samples:
~~~text
bpDF=bpBigDF.sample(frac=0.20)
~~~

You need to map the column `gender` with the categorical values `M` and `F` to integers.  

You should also standardize some, but not all of the other numeric values using this code chunk:
~~~text
# Standardize ranges
bpDF['age'] = zscore(bpDF['age'])
bpDF['height_cm'] = zscore(bpDF['height_cm'])
bpDF['weight_kg'] = zscore(bpDF['weight_kg'])
bpDF['diastolic'] = zscore(bpDF['diastolic'])
bpDF['systolic'] = zscore(bpDF['systolic'])
bpDF['gripForce'] = zscore(bpDF['gripForce'])
~~~

Since you are building a classification neural network to predict `class`, you will need to drop that column when creating your list of X columns:
~~~text
# Generate list of columns for x
bpX_columns = bpDF.columns.drop('class')  # class is y-value
~~~
Using this list, you can generate the x values for your model using the following code chunk:
~~~text
# Generate x-values as numpy array
bpX = bpDF[bpX_columns].values
bpX = np.asarray(bpX).astype('float32')
~~~

Since this is a classification neural network, you will also need to One-Hot encode the column `class` using the following code chunk:
~~~text
# One-Hot encode the column containing the y-values
dummies = pd.get_dummies(bpDF['class']) # Classification
bpCategories = dummies.columns
bpY = dummies.values
bpY = np.asarray(bpY).astype('float32')
~~~

Finally, prints out the categorical values (names), `bpCategories` that were One-Hot encoded, using the "starred" print statement:
~~~text
# Print y categorical names
print(*bpCategories)
~~~

In [None]:
# Insert your code for Exercise 1 - Step 1 here



If your code is correct, you should see the following output:
~~~text
A B C D
~~~

### **Exercise 1 - Step 2: L1 Regularization to Decrease Overfitting**

In the cell below, create a Keras network with L1 regression to predict the fitness `class` in the Body Performance Dataset using the feature vector that you prepared in **Exercise 1 - Step 1**. The code in Example 1 - Step 2 should act as your template.

In [None]:
# Insert your code for Exercise 1 - Step 2 here



If your code is correct, you should see something similar to following output:
~~~text
STAND BY: TRAINING IS STARTING
Starting Fold #1...
17/17 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step
Fold score (accuracy): 0.629
Starting Fold #2...
17/17 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step
Fold score (accuracy): 0.692
Starting Fold #3...
17/17 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step
Fold score (accuracy): 0.653
Starting Fold #4...
17/17 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step
Fold score (accuracy): 0.632
Starting Fold #5...
17/17 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step
Fold score (accuracy): 0.639
Final score (accuracy): 0.649
Elapsed time = 0:02:30
~~~
The `Final score (accuracy): 0.649` is not especially accurate. We'll see how well it can classify fitness classes.

### **Exercise 1 - Step 3: Print out actual and predicted y-values**

In the cell below, print out the predicted and the actual Fitness `class` for the out-of-sample individuals. Label your columns using the following code chunk:
~~~text
# Rename columns
new_column_mapping = {0: 'Predicted Fitness Class', 1: 'Actual: 0'}
oosDF = rename_col_by_index(oosDF, new_column_mapping)
~~~

In [None]:
# Insert your code for Exercise 1 - Step 3 here



If your code is correct you should see something similar to the following table:

![__](https://biologicslab.co/BIO1173/images/class_05/class_05_3_image01.png)

By inspection of the output above, you can see the model's predictions of the fitness `class` is not perfect for the out-of-sample individuals. This should not come as a big surprise given a `Final score (accuracy): 0.649`.

You should be able to look at this output and recognize which subjects were correctly classified and which one were incorrectly classified.

## **Exercise 2: L2 Regularization of a classification neural network**

L2 regularization might generate better results than L1 regularization under the following conditions:

* **Feature Correlation:** When features are highly correlated, L2 regularization tends to perform better than L1 regularization. L2 regularization encourages sparse coefficients but does not force them to zero, allowing correlated features to share the regularization penalty more evenly.
* **Data Noise:** In the presence of noisy data, L2 regularization can be more effective at smoothing out the noise due to its tendency to distribute the penalty more uniformly across all weights. L2 regularization helps prevent individual noisy data points from disproportionately influencing the model.
* **Model Stability:** L2 regularization often leads to more stable and well-conditioned models compared to L1 regularization. L2 regularization prevents the weights from growing excessively large, which can improve the numerical stability of the optimization process and enhance generalization performance.
* **Small Dataset:** When working with a small dataset, L2 regularization can help prevent overfitting by providing smoother and more continuous solutions. L2 regularization penalizes large weights more gently than L1 regularization, making it more suitable for avoiding overfitting in smaller datasets.
* **Uniform Impact on All Weights:** If the goal is to ensure that all weights have some level of regularization, rather than inducing sparsity, L2 regularization is preferred. L2 regularization treats all weights equally, promoting a more balanced impact on the model parameters.

In summary, L2 regularization often outperforms L1 regularization in scenarios where feature correlation, data noise, model stability, small dataset size, or a uniform impact on all weights are important considerations for achieving better results and improved generalization performance.

For **Exercise 2**, you are to again use the Body Performance dataset and build a classification neural network that will predict the fitness `class` of individuals. As before, **Exercise 2** has been broken down into 3 steps labeled "A", "B" and "C".

### **Exercise 2 - Step 1: Create feature vector**

In the cell below create a feature vector for the Body Performance dataset. You should use **exactly** the same code that you wrote for **Exercise 1 - Step 1**. As above, only use 20% of the dataset for your neural network model.

In [None]:
# Insert your code for Exercise 2 - Step 1 here



If your code is correct, you should see the following output:
~~~text
A B C D
~~~

### **Exercise 2 - Step 2: L2 Regularization to Decrease Overfitting**

In the cell below, create a Keras network with L2 regression. The code for **Exercise 2 - Step 2** should be _identical_ to the code you wrote for **Exercise 1 - Step 2** with only one difference. To enable L2 regularization you will need to make the following code changes.

Change the line:
~~~text
activity_regularizer=regularizers.l1(1e-4)
~~~
to read:
~~~text
kernel_regularizer=regularizers.l2(0.01)
~~~
It is somewhat hard to spot the difference. Can you see what is different?

As above, the L2 regularizer must be added to each hidden layer, but not to the output layer:
~~~text
def create_and_compile_model(input_dim):
    model = Sequential()
    model.add(Input(shape=(input_dim,)))  # Input
    model.add(Dense(50, activation='relu', kernel_regularizer=regularizers.l2(0.01)))
    model.add(Dense(25, activation='relu', kernel_regularizer=regularizers.l2(0.01)))
    model.add(Dense(bpY.shape[1], activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam')
    return model
~~~

In [None]:
# Insert your code for Exercise 2 - Step 2 here



If your code is correct, you should see something similar to following output:
~~~text
STAND BY: TRAINING IS STARTING
Starting Fold #1...
17/17 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step
Fold score (accuracy): 0.560
Starting Fold #2...
17/17 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step
Fold score (accuracy): 0.595
Starting Fold #3...
17/17 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step
Fold score (accuracy): 0.614
Starting Fold #4...
17/17 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step
Fold score (accuracy): 0.610
Starting Fold #5...
17/17 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step
Fold score (accuracy): 0.581
Final score (accuracy): 0.592
Elapsed time = 0:03:41

~~~
The `Final score (accuracy):  0.592` is even worse than the `0.649` score obtained with L1 Regularization in **Exercise 1 - Step 2**.

### **Exercise 2 - Step 3: Print out actual and predicted y-values**

In the cell below, print out the predicted and the actual Fitness `class` for the "out-of-sample" individuals. Label your columns using the following code chunk:
~~~text
# Rename columns
new_column_mapping = {0: 'Predicted Fitness Class', 1: 'Actual: 0'}
oosDF = rename_col_by_index(oosDF, new_column_mapping)
~~~

In [None]:
# Insert your code for Exercise 2 - Step 3 here



If your code is correct you should see something similar to the following table:

![__](https://biologicslab.co/BIO1173/images/class_05/class_05_3_image02.png)

By inspection of the output above, you can see the model's predictions of the fitness `class` is not perfect for the out-of-sample individuals. This should not come as a big surprise given a `Final score (accuracy): 0.592`.

As above, the output shows that there were some errors in the model's predictions. Again, you should be above to look at this kind of output and be able to spot the correct and the incorrect predictions.

## **Lesson Turn-in**

When you have completed and run all of the code cells, use the **File --> Print.. --> Save to PDF** to generate a PDF of your Colab notebook. Save your PDF as `Class_05_3.lastname.pdf` where _lastname_ is your last name, and upload the file to Canvas.

## **Poly-A Tail**

## **Sol-20**

![__](https://upload.wikimedia.org/wikipedia/commons/5/5e/Processor_Technology_SOL_20_Computer.jpg)


The **Sol-20** was the first fully assembled microcomputer with a built-in keyboard and television output, what would later be known as a home computer. The design was the integration of an Intel 8080-based motherboard, a VDM-1 graphics card, the 3P+S I/O card to drive a keyboard, and circuitry to connect to a cassette deck for program storage. Additional expansion was available via five S-100 bus slots inside the machine. It also included swappable ROMs that the manufacturer called 'personality modules', containing a rudimentary operating system.

The design was originally suggested by Les Solomon, the editor of Popular Electronics. He asked Bob Marsh of Processor Technology if he could design a smart terminal for use with the Altair 8800. Lee Felsenstein, who shared a garage working space with Marsh, had previously designed such a terminal but never built it. Reconsidering the design using modern electronics, they agreed the best solution was to build a complete computer with a terminal program in ROM. Felsenstein suggested the name "Sol" because they were including "the wisdom of Solomon" in the box.

The Sol appeared on the cover of the July 1976 issue of Popular Electronics as a "high-quality intelligent terminal". It was initially offered in three versions; the Sol-PC motherboard in kit form, the Sol-10 without expansion slots, and the Sol-20 with five slots.

A Sol-20 was taken to the Personal Computing Show in Atlantic City in August 1976 where it was a hit, building an order backlog that took a year to fill. Systems began shipping late that year and were dominated by the expandable Sol-20, which sold for $1,495 in its most basic fully-assembled form. The company also offered schematics for the system for free for those interested in building their own.

The Sol-20 remained in production until 1979, by which point about 12,000 machines had been sold. By that time, the "1977 trinity" —the Apple II, Commodore PET and TRS-80— had begun to take over the market, and a series of failed new product introductions drove Processor Technology into bankruptcy. Felsenstein later developed the successful Osborne 1 computer, using much the same underlying design in a portable format.

### **History**

**Tom Swift Terminal**

Lee Felsenstein was one of the sysops of Community Memory, the first public bulletin board system. Community Memory opened in 1973, running on a SDS 940 mainframe that was accessed through a Teletype Model 33, essentially a computer printer and keyboard, in a record store in Berkeley, California. The cost of running the system was untenable; the teletype normally cost $1,500 (their first example was donated from Tymshare as junk), the modem another $300, and time on the SDS was expensive – in 1968, Tymshare charged $13 per hour (equivalent to $114 in 2023). Even the reams of paper output from the terminal were too expensive to be practical and the system jammed all the time. The replacement of the Model 33 with a Hazeltine glass terminal helped, but it required constant repairs.

Since 1973, Felsenstein had been looking for ways to lower the cost. One of his earliest designs in the computer field was the Pennywhistle modem, a 300 bits per second acoustic coupler that was the cost of commercial models. When he saw Don Lancaster's TV Typewriter on the cover of the September 1973 Radio Electronics, he began adapting its circuitry as the basis for a design he called the Tom Swift Terminal. The terminal was deliberately designed to allow it to be easily repaired. Combined with the Pennywhistle, users would have a cost-effective way to access Community Memory.

In January 1975, Felsenstein saw a post on Community Memory by Bob Marsh asking if anyone would like to share a garage. Marsh was designing a fancy wood-cased digital clock and needed space to work on it. Felsenstein had previously met Marsh at school and agreed to split the $175 rent on a garage in Berkeley. Shortly after, Community Memory shut down for the last time, having burned out the relationship with its primary funding source, Project One, as well the energy of its founding members.

**Processor Technology**

January 1975 was also the month that the Altair 8800 appeared on the front page of Popular Electronics, sparking off intense interest among the engineers of the rapidly growing Silicon Valley. Shortly thereafter, on 5 March 1975, Gordon French and Fred Moore held the first meeting of what would become the Homebrew Computer Club. Felsenstein took Marsh to one of the meetings, Marsh saw an opportunity supplying add-on cards for the Altair, and in April, he formed Processor Technology with his friend Gary Ingram.

The new company's first product was a 4 kB DRAM memory card for the Altair. A similar card was already available from the Altair's designers, MITS, but it was almost impossible to get working properly. Marsh began offering Felsenstein contracts to draw schematics or write manuals for the products they planned to introduce. Felsenstein was still working on the terminal as well, and in July, Marsh offered to pay him to develop the video portion. This was essentially a version of the terminal where the data would be supplied by the main memory of the Altair rather than a serial port.

The result was the VDM-1, the first graphics card. The VDM-1 could display 16 lines of 64 characters per line, and included the complete ASCII character set with upper- and lower-case characters and a number of graphics characters like arrows and basic math symbols. An Altair equipped with a VDM-1 for output and Processor Technology's 3P+S card running a keyboard for input removed the need for a terminal, yet cost less than dedicated smart terminals like the Hazeltine.

**Intelligent terminal concept**

Before the VDM-1 was launched in late 1975, the only way to program the Altair was through its front-panel switches and LED lamps, or by purchasing a serial card and using a terminal of some sort. This was typically a Model 33, which still cost $1,500 if available. Normally the teletypes were not available – Teletype Corporation typically sold them only to large commercial customers, which led to a thriving market for broken-down machines that could be repaired and sold into the microcomputer market. Ed Roberts, who had developed the Altair, eventually arranged a deal with Teletype to supply refurbished Model 33s to MITS customers who had bought an Altair.

Les Solomon, whose Popular Electronics magazine launched the Altair, felt a low-cost smart terminal would be highly desirable in the rapidly expanding microcomputer market. In December 1975, Solomon traveled to Phoenix to meet with Don Lancaster to ask about using his TV Typewriter as a video display in a terminal. Lancaster seemed interested, so Solomon took him to Albuquerque to meet Roberts. The two immediately began arguing when Lancaster criticized the design of the Altair and suggested changes to better support expansion cards, demands that Roberts flatly refused. Any hopes of a partnership disappeared.
