# What we're going to cover



*   Architecture of a neural network regression model
*   Input & output shapes of a regression model (features and labels/dependendent/independent variables)
*   Creating custom data to view and fit
*   Steps in modelling
    *   Creating, compiling, fitting, evaluating models
* Different evaluation methods
*   Saving and loading models




## Regression Inputs and Outputs
Predicting sale price of a house

**Inputs:** predictors, covariates, features
  *   Number of bedrooms
  *   Bathrooms
  *   Garage space

**Outputs:**
  *   Sale Price

**Numerical Encoding**
  *   e.g. 1-hot encoding

## Input & output shapes

The number of input features = shape

[bedroom, bathroom, garage]

shape = [3]

## Typical architecture of a classification neural network 

The word *typical* is on purpose.

Because the architecture of a classification neural network can widely vary depending on the problem you're working on.

However, there are some fundamentals all deep neural networks contain:
* An input layer.
* Some hidden layers.
* An output layer.

Much of the rest is up to the data analyst creating the model.

The following are some standard values you'll often use in your classification neural networks.

| **Hyperparameter** | **Binary Classification** | **Multiclass classification** |
| --- | --- | --- |
| Input layer shape | Same as number of features (e.g. 5 for age, sex, height, weight, smoking status in heart disease prediction) | Same as binary classification |
| Hidden layer(s) | Problem specific, minimum = 1, maximum = unlimited | Same as binary classification |
| Neurons per hidden layer | Problem specific, generally 10 to 100 | Same as binary classification |
| Output layer shape | 1 (one class or the other) | 1 per class (e.g. 3 for food, person or dog photo) |
| Hidden activation | Usually [ReLU](https://www.kaggle.com/dansbecker/rectified-linear-units-relu-in-deep-learning) (rectified linear unit) | Same as binary classification |
| Output activation | [Sigmoid](https://en.wikipedia.org/wiki/Sigmoid_function) | [Softmax](https://en.wikipedia.org/wiki/Softmax_function) |
| Loss function | [Cross entropy](https://en.wikipedia.org/wiki/Cross_entropy#Cross-entropy_loss_function_and_logistic_regression) ([`tf.keras.losses.BinaryCrossentropy`](https://www.tensorflow.org/api_docs/python/tf/keras/losses/BinaryCrossentropy) in TensorFlow) | Cross entropy ([`tf.keras.losses.CategoricalCrossentropy`](https://www.tensorflow.org/api_docs/python/tf/keras/losses/CategoricalCrossentropy) in TensorFlow) |
| Optimizer | [SGD](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/SGD) (stochastic gradient descent), [Adam](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adam) | Same as binary classification |

***Table 1:*** *Typical architecture of a classification network.* ***Source:*** *Adapted from page 295 of [Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow Book by Aurélien Géron](https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/)*

Don't worry if not much of the above makes sense right now, we'll get plenty of experience as we go through this notebook.

Let's start by importing TensorFlow as the common alias `tf`. For this notebook, make sure you're using version 2.x+.

In [None]:
import tensorflow as tf
print(tf.__version__)

# Introduction to Regression with NEural Netowrks in TensorFlow

There are many definitions for a regression problem but in our case, we're going to simplify it: predicting a numerical variable based on some other combination of variables.

Predicting a number

# Creating data to view and fit

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Create features
X = np.array([-7.0, -4.0, -1.0, 2.0, 5.0, 8.0, 11.0, 14.0])

# Create labels
y = np.array([3.0, 6.0, 9.0, 12.0, 15.0, 18.0, 21.0, 24.0])

# Visualize it
plt.scatter(X, y)

In [None]:
y == X + 10

### Input and Output Shapes

In [None]:
# Create a demo tensor for our housing price prediction problem
house_info = tf.constant(["bedroom", "bathroom", "garage"])
house_price = tf.constant([939700])
house_info, house_price

In [None]:
input_shape = X[0].shape
output_shape = y[0].shape
input_shape, output_shape

In [None]:
# Turn our NumPy arrays into tensors
X = tf.cast(tf.constant(X), dtype=tf.float32)
y = tf.cast(tf.constant(y), dtype=tf.float32)
X, y

## OLD
# Fit the model
# model.fit(X, y, epochs=5) # this will break with TensorFlow 2.7.0+
 
## New
# Fit the model
# model.fit(tf.expand_dims(X, axis=-1), y, epochs=5) # <- updated line

In [None]:
input_shape = X[0].shape
output_shape = y[0].shape
input_shape, output_shape

### Steps in modelling with TensorFlow

1. **Creating a model** - define the input and output layers, as well as the hidden layers of a deep learning model.
2. **Compiling a model** - define the loss function (in other words, the function which tells our model how wrong it is) and the optimizer (tells our model how to imporve the patterns its learning) and evaluation metrics (what we can use to interpret the performance of our model).
3. **Fitting a model** - letting the model try to find patterns between X & y (features and labels)

In [None]:
# Optionally, the first layer can receive an 'input_shape' argument:
# model = tf.keras.Sequential()
# model.add(tf.keras.layers.Dense(8, input_shape=(16,)))

In [None]:
# Set random seed
tf.random.set_seed(42)

# 1. Create a model using the Sequential API
model = tf.keras.Sequential(
    [tf.keras.layers.Dense(1)]
    )
# 2. Compile the model
model.compile(
    loss=tf.keras.losses.mae, # mae is short for mean absolute error
    optimizer=tf.keras.optimizers.SGD(), # sgd is short for stochastic gradient descent
    metrics=["mae"]
              )
# 3. Fit the model
model.fit(tf.expand_dims(X, axis=-1), y, epochs=5)
print('trained')


In [None]:
# Check out values for X and y
X, y

In [None]:
# Try and make a prediction

y_pred = model.predict([17.0])
y_pred

## Improving our model

We can improve our model by altering the steps we took to create the model

1. **Creating a model** - we might add more layers, increase the number of hidden units (all called neurons) within each of the hidden layers, change the activation function of each layer.
2.  **Compiling a model** - we might change the optimization function or perhaps the **learning rate** of the optimization function.
3. **Fitting a model** - we might fit a model for more **epochs** (leave it training for longer) or on more data (give the model more examples to learn from)

In [None]:
# Let's rebuild our model

# 1. Create a model using the Sequential API
model = tf.keras.Sequential([
    tf.keras.layers.Dense(1)
])
# 2. Compile the model
model.compile(loss=tf.keras.losses.mae,
              optimizer=tf.keras.optimizers.SGD(),
              metrics=["mae"]
              )
              
# 3. Fit the model
model.fit(tf.expand_dims(X, axis=-1), y, epochs=100)
print('trained')

In [None]:
# Let's see if our models prediction has improved...
model.predict([17.0])

In [None]:
# Write a model by yourself by changing 1 thing from the previous model

# 1. Create the model
model = tf.keras.Sequential([
                             tf.keras.layers.Dense(50, activation=None),
                             tf.keras.layers.Dense(1)
                             ])

# 2. Compile the model
model.compile(loss="mae",
              optimizer=tf.keras.optimizers.Adam(lr=0.1),
              metrics=["mae"]
              )
              
# 3. Fit the model
model.fit(tf.expand_dims(X, axis=-1), y, epochs=66)


In [None]:
model.predict([17.0])

## Evaluating a model

In practice, a typical workflow you'll go through when building a neural network is:

```
Build a model > fit it > evaluate it > tweak a model > repeat...
```

When it comes to evaluation... there are 3 words you should memorize:
> "Visualize, visualize, visualize"

## Vizualize:
* **The data** - What data are we woring with and what does it look like?
* **The model** - What does our model look like?
* **Training the model** - How does a model perform while it learns?
* **Predictions** - How do the predictions of a model line up against the ground truth (the original labels)?

In [None]:
# Make a bigger dataset

X = tf.range(-100, 100, 4)
X

In [None]:
# Make labels for the dataset
y = X + 10
y

In [None]:
# Visualize the data
import matplotlib.pyplot as plt

plt.scatter(X, y)

### The 3 Sets

* Training set - the model learns from this data, which is typically 70-80% of the total data you have available.
* Validation set - the model gets tuned on this data, 10-15%
* Test set - the model gets evaluated on this data to test what it has learned, 10-15%

In [None]:
X_train = X[:40]
y_train = y[:40]
X_test = X[40:]
y_test = y[40:]

X_train, y_train, X_test, y_test

In [None]:
### Visualizing the data

plt.figure(figsize=(10, 7))
# plot training in blue
plt.scatter(X_train, y_train, c="b", label="Training data")
# plot testing in red
plt.scatter(X_test, y_test, c="r", label="Testing data")
# show a legend
plt.legend()


In [None]:
# Lets build a network for our data
# 1. Create a model
tf.random.set_seed(42)
model = tf.keras.Sequential(
    [
     tf.keras.layers.Dense(10, input_shape=[1], name="input_layer"),
     tf.keras.layers.Dense(1, name="output_layer"), 
    ], name="model_1"
)

# 2. Compile the model
model.compile(loss="mae", optimizer="sgd", metrics=["mae"])

# 3. Fit the model
model.fit(tf.expand_dims(X_train, axis=-1), y_train, epochs=100)


In [None]:
### Visualizing the model
# model.build()

model.summary()

* Total params - total number of parameters in the model.
* Trainable parameters - these are the parameters (patterns) the model can update as it trains.
* Non-trainable params - these parameters aren't updated during the training (this is typical when you bring in already learned patterns or parameters from other models during **transfer learning**)

📖 **Resource:** For a more in-depth overview of the trainable parameters within a layer, check out MIT's introduction to deep learning video.

🔧 **Exercise:** Try playing around with the number of hidden units in the dense layer, see how that effects the number of parameters (total and trainable) by callling `model.summary()`

In [None]:
# Let's fit our model to the training data
# model.fit(tf.expand_dims(X_train, axis=-1), y_train, epochs=100)

In [None]:
# Get a summary of our model
model.summary()

In [None]:
from tensorflow.keras.utils import plot_model
plot_model(model=model, show_shapes=True)

### Visualizing model predictions

To visualize predictions, it's a good idea to plot them against the ground truth labels.

Often you'll see this in the form of y_test or y_true vs y_pred (ground truth vs your model's predictions).

In [None]:
# Make some predictions
y_pred = model.predict(X_test)
y_pred

In [None]:
y_test

# Let's create a plotting function

In [None]:
def plot_predictions(train_data=X_train,
                     train_labels=y_train,
                     test_data=X_test,
                     test_labels=y_test,
                     predictions=y_pred):
  """
  Plots training data, test data and compares predictions to ground truth labels.
  """
  plt.figure(figsize=(10, 7))
  # plot training in blue
  plt.scatter(train_data, train_labels, c="b", label="Training data")
  # plot testing in yellow
  plt.scatter(test_data, test_labels, c="y", label="Testing data")
  # plot predictions in red
  plt.scatter(test_data, predictions, c="r", label="Predictions")
  # show a legend
  plt.legend()

plot_predictions()


### Evaluating our model's predictions with regression evaluation metrics

Depending on the problem you're working on, there will be different evaluation metrics to evaluate your model's performance.

Since we're working on a regression, two of the main metrics:
* MAE - mean absolute error, "on average, how wrong is each of my model's predictions"
* MSE - mean square error, "square the average errors"

In [None]:
# Evaluate the model on the test set
model.evaluate(X_test, y_test)

In [None]:
# Calculate the mean absolute error
print(tf.shape(y_pred), tf.shape(y_test))
mae = tf.keras.metrics.MeanAbsoluteError()
mae.reset_state()
mae.update_state(y_test, tf.squeeze(y_pred))
mae.result().numpy()


In [None]:
# Calculate the mean square error
mse = tf.keras.metrics.MeanSquaredError()
mse.reset_state()
mse.update_state(y_test, tf.squeeze(y_pred))
mse.result().numpy()

In [None]:
# Make some functions to reuse mae and mse (not necessary)
def mae(y_true, y_pred):
  mae = tf.keras.metrics.MeanAbsoluteError()
  mae.reset_state()
  mae.update_state(y_true, tf.squeeze(y_pred))
  return mae.result().numpy()

def mse(y_true, y_pred):
  mse = tf.keras.metrics.MeanSquaredError()
  mse.reset_state()
  mse.update_state(y_true, tf.squeeze(y_pred))
  return mse.result().numpy()

mae(y_test, y_pred), mse(y_test, y_pred)

### Running experiments to improve our model

```
Build :> Fit > Eval. > Tweak > repeat
```
"Experiment, Experiment, Experiment"

1. Get more data - get more examples for your model to train on (more opportunities to learn patterns or relationships between features and labels).

2. Make your model larger (using a more complex model) - this might come in the form of more layers or more hidden units in each layer.

3. Train for longer - give your model more of a chance to find patterns in the data.


3 Modeling Experiments

1. `model_1` - same as original model, 1 layer, trained for 100 epochs
2. `model_2` - 2 layers, trained for 100 epochs
3. `model_3` - 2 layers, trained for 500 epochs


In [None]:
# 1. model_1
tf.random.set_seed(42)
model_1 = tf.keras.Sequential(
    [
     tf.keras.layers.Dense(1),
    ], name="model_1"
)

# 2. Compile the model
model_1.compile(loss="mae", optimizer="sgd", metrics=["mae"])

# 3. Fit the model
model_1.fit(tf.expand_dims(X_train, axis=-1), y_train, epochs=100)


In [None]:
# Make and plot predictions for model_1
y_preds_1 = model_1.predict(X_test)
plot_predictions(predictions=y_preds_1)
mae_1 = mae(y_test, y_preds_1)
mse_1 = mse(y_test, y_preds_1)
mae_1, mse_1

In [None]:
# 1. Build model_2
tf.random.set_seed(42)
model_2 = tf.keras.Sequential(
    [
     tf.keras.layers.Dense(10),
     tf.keras.layers.Dense(1),
    ], name="model_2"
)

# 2. Compile the model
model_2.compile(loss="mae", optimizer="sgd", metrics=["mae", "mse"])

# 3. Fit the model
model_2.fit(tf.expand_dims(X_train, axis=-1), y_train, epochs=100)

In [None]:
y_preds_2 = model_2.predict(X_test)
plot_predictions(predictions=y_preds_2)
mae_2 = mae(y_test, y_preds_2)
mse_2 = mse(y_test, y_preds_2)
mae_2, mse_2

In [None]:
# 1. Build model_3
tf.random.set_seed(42)
model_3 = tf.keras.Sequential(
    [
     tf.keras.layers.Dense(10),
     tf.keras.layers.Dense(1),
    ], name="model_3"
)

# 2. Compile the model
model_3.compile(loss="mae", optimizer="sgd", metrics=["mae", "mse"])

# 3. Fit the model
model_3.fit(tf.expand_dims(X_train, axis=-1), y_train, epochs=500)

In [None]:
y_preds_3 = model_3.predict(X_test)
plot_predictions(predictions=y_preds_3)
mae_3 = mae(y_test, y_preds_3)
mse_3 = mse(y_test, y_preds_3)
mae_3, mse_3

### Comparing Results

🔑 **Note:** You want to start with small experiments (small models) and make sure they work and then increase their scale when necessary.


In [None]:
# Compare model results using pandas DataFrame
import pandas as pd

model_results = [["model_1", mae_1, mse_1],["model_2", mae_2, mse_2],["model_3", mae_3, mse_3]]
all_results = pd.DataFrame(model_results, columns=["model", "mae", "mse"])
all_results

In [None]:
print(model_1.summary(), "\n\n", model_2.summary(), "\n\n", model_3.summary())

**Note:** One of the main goals should be to minimize the time between your experiments. The more experiments you do, the more things you'll figure out which don't work and in turn, get closer to figuring out what does work.

Machine learning practitioner's motto: "*Experiment, Experiment, Experiment*"

One really good habit in ML modelling is to track the results of your experiments.

When doing so, it can be tedious if you're running lots of experiments.

Luckily, there are tools to help.

**Resource:** As you build more models, you'll want to look into using"

* [**TensorBoard**](https://www.tensorflow.org/tensorboard) - a component of the TensorFlow library to help track modelling experiments
* [**Weights & Biases**](https://wandb.ai/site) - a tool for tracking all kinds of ML experiments (plugs straight into TensorBoard)


### Saving our Models

Allows us to use them outside of Google Colab (or wherever they were trained) such as in a web or mobile app.

In [None]:
# Save using hdf5 format
model_1.save('model_1.h5')
model_2.save('model_2.h5')
model_3.save('model_3.h5')

# Load
loaded_model_1 = tf.keras.models.load_model('model_1.h5')
loaded_model_2 = tf.keras.models.load_model('model_2.h5')
loaded_model_3 = tf.keras.models.load_model('model_3.h5')

# Compare model predictions with saved predictions
model_2_preds = model_2.predict(X_test)
loaded_2_preds = loaded_model_2.predict(X_test)
model_2_preds == loaded_2_preds

## Download a model (or any other file) from Google Colab

If you want to dl your files from Google Colab:

1. You can go to the "files" tab and right click on the file you're after and click "download"

2. Use code (see cell below)

3. Save it to Google Drive by connecting and copying it

In [None]:
# Download via code
from google.colab import files
# files.download("/content/model_3.h5")

# files.download("/content/model_2.h5")

# files.download("/content/model_1.h5")

In [None]:
# First mnt google drive connection
# Save a file from Colab to Drive 
# !cp /content/model_1.h5 /content/drive/MyDrive/tensorflow_course

# A Larger Example

In [None]:
# Import required libs
import tensorflow as tf
import pandas as pd
import matplotlib.pyplot as plt


In [None]:
# Read in the insurance dataset
insurance = pd.read_csv('https://raw.githubusercontent.com/stedy/Machine-Learning-with-R-datasets/master/insurance.csv')
insurance[:10]

Dependent variables: charges
Independent variables: age, sex, bmi, children, smoker, region

In [None]:
# pd.get_dummies automcatically one_hot encodes appropriate columns
insurance_one_hot = pd.get_dummies(insurance)
insurance_one_hot.head()

In [None]:
# Create X and y values (features and labels)
X = insurance_one_hot.drop("charges", axis=1)
y = insurance_one_hot["charges"]
X.head(), y.head()


In [None]:
# Create training and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
len(X), len(X_train), len(X_test)

In [None]:
# Build a neural network (sort of like model_2 above)
tf.random.set_seed(42)
# Create a model
insurance_model = tf.keras.Sequential(
    [
     tf.keras.layers.Dense(10),
     tf.keras.layers.Dense(1)
    ]
)
# Compile
insurance_model.compile(loss="mae", optimizer="sgd", metrics=["mae"])

# Fit the model
insurance_model.fit(X_train, y_train, epochs=100)

In [None]:
# Evaluate results of insurance model on test data
insurance_model.evaluate(X_test, y_test)

In [None]:
y_train.median(), y_train.mean()

## Right now it looks like our model isn't performing too well...l let's try to improve it

In [None]:
tf.random.set_seed(42)
insurance_model_adam = tf.keras.Sequential(
    [
     tf.keras.layers.Dense(100),
     tf.keras.layers.Dense(10),
     tf.keras.layers.Dense(1)
    ]
)
# insurance_model_nadam = tf.keras.Sequential(
#     [
#      tf.keras.layers.Dense(100),
#      tf.keras.layers.Dense(10),
#      tf.keras.layers.Dense(1)
#     ]
# )
# Compile
insurance_model_adam.compile(loss="mae", optimizer="adam", metrics=["mae"])
# insurance_model_nadam.compile(loss="mae", optimizer="nadam", metrics=["mae", "mse"])
# Fit the model
history_adam = insurance_model_adam.fit(X_train, y_train, epochs=200, verbose=0)
# history_nadam = insurance_model_nadam.fit(X_train, y_train, epochs=200, verbose=0)
# Evaluate results of insurance model on test data
# insurance_model_adam.evaluate(X_test, y_test)

# What are Optimizers?
Optimizers define how neural networks learn

They find the values of paramaters such that a loss function is at it's lowest.

**Types**

**Gradient Descent (GD):** Takes small steps iteratively until we reach the correct weights.  The weight is only updated once after seeing the entire dataset.

**Stochastic Gradient Descent (SGD):** Updates weights after seeing each datapoint instead of the whole dataset.  This makes very noisy jumps that can go away from the optimal value.

**Mini-Batch Gradient Descent (MBGD):** Updates weights after a few sample datapoints instead of individual.

**SGD + Momentum:** Ignores datapoints that don't follow the momentum of other datapoints.  Can learn faster but can also overshoot.

**SGD + Momentum + Acceleration:** When unusual datapoints occur, the model deccelerates and readjusts so it doesn't overshoot.  Using multiple parameters is not ideal as the learning rate is fixed across the board.

**Adaptive Gradient Algorithm (AdaGrad):** Gradient-based optimization. The learning rate is adapted component-wise to the parameters by incorporating knowledge of past observations. As iterations go on, the learning rate is decreased

**AdaDelta:** A more robust extension of Adagrad that adapts learning rates based on a moving window of gradient updates, instead of accumulating all past gradients. This way, Adadelta continues learning even when many updates have been done.

**Adam:** AdaDelta + expected value of past gradients (momentum).  Slow initially, but pick up speed over time.

**NAdam:** Adam + acceleration.


In [None]:
# plot history (also known as a loss curve or training curve)
pd.DataFrame(history_adam.history).plot()
# pd.DataFrame(history_nadam.history).plot()
plt.ylabel("loss")
plt.xlabel("epochs")


# Early Stopping
**Question:** How long should you train for?

It depends on the problem you're working on.  TensorFlow as created a solution called EarlyStopping Callback.

It is a TensorFlow component you can add to your model to stop training once it stops improving a certain metric.

In [None]:
callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
tf.random.set_seed(42)
insurance_model_early_stopping = tf.keras.Sequential(
    [
     tf.keras.layers.Dense(100),
     tf.keras.layers.Dense(10),
     tf.keras.layers.Dense(1)
    ]
)
# Compile
insurance_model_early_stopping.compile(loss="mae", optimizer="adam", metrics=["mae"])
# Fit the model
history_early_stopping = insurance_model_early_stopping.fit(X_train, y_train, epochs=200, callbacks=[callback])

In [None]:
# plot history (also known as a loss curve or training curve)
pd.DataFrame(history_early_stopping.history).plot()
# pd.DataFrame(history_nadam.history).plot()
plt.ylabel("loss")
plt.xlabel("epochs")


# Preprocessing data (normalization and standardization)

The goal is to change the values of numeric columns in the dataset to a common scale, without distorting differences in the range of values. 

Many ML algos perform better or converge faster when features are on a relatively similar scale and/or close to normally distributed.

**Example Algorithm Families:**
* Linear and logistic regression
* Nearest neighbors
* Neural networks
* Support vector machines with radial bias kernel functions
* Principal components analysis
* Linear discriminant analysis

In terms of scaling values, neural networks tend to prefer normalization.

If you're not sure on which to use, you could try both and see which performs better.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf

# Read in insurance df
insurance = pd.read_csv('https://raw.githubusercontent.com/stedy/Machine-Learning-with-R-datasets/master/insurance.csv')
insurance

In [None]:
X["age"].plot(kind="hist")

In [None]:
X["bmi"].plot(kind="hist")

In [None]:
X["children"].value_counts()

# Scaling
<table >
  <tr>
    <th>Scaling Type</th>
    <th>What it does</th>
    <th>Scikit-Learn Function</th>
    <th>When to use</th>
  </tr>
  <tr>
    <td class="data">Scale (normalization)</td>
    <td class="data">Converts all values to between 0 and 1 whilst preserving the original distribution.</td>
    <td class="data">MinMaxScaler</td>
    <td class="data">Use as default scaler with neural networks.</td>
  </tr>
  <tr>
    <td>Standardization</td>
    <td>Removes the mean and divides each value by the standard deviation.</td>
    <td>StandardScaler</td>
    <td>Transform a feature to have close to normal distribution (caution: this reduces the effect of outliers).</td>
  </tr>
</table>


In [None]:
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
from sklearn.model_selection import train_test_split
# Create a column transformer
ct = make_column_transformer(
    (MinMaxScaler(), ["age", "bmi", "children"]), # turn all values in these columns between 0 and 1
    (OneHotEncoder(handle_unknown="ignore"), ["sex", "smoker", "region"]) # convert these text columns to one hot encoded
)

# Create X & y
X = insurance.drop('charges', axis=1)
y = insurance['charges']

# Build our train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit the column transformer to our training data
ct.fit(X_train)

# Transform training and test data with normalization (MinMaxScaler) and OneHotEncoder
X_train_normal = ct.transform(X_train)
X_test_normal = ct.transform(X_test)



In [None]:
# View data
X_train.loc[0], X_train_normal[0]

In [None]:
# Check shape
X_train.shape, X_train_normal.shape

Data has been normalized and one hot encoded.
Now let's build a neural network model on it.

In [None]:
# Build a neural network model to fit on our normalized data
tf.random.set_seed(42)
# Instantiate the model
normalized_model = tf.keras.Sequential(
    [
     tf.keras.layers.Dense(100),
     tf.keras.layers.Dense(10),
     tf.keras.layers.Dense(1),
    ]
)

# Compile the model
normalized_model.compile(loss="mae", optimizer="adam", metrics=["mae"])

# Fit the model
normalized_history = normalized_model.fit(X_train_normal, y_train, epochs=100, verbose=0)

In [None]:
# Compare it to unprocesses data
# Create X and y values (features and labels)
X_unprocessed = insurance_one_hot.drop("charges", axis=1)
y_unprocessed = insurance_one_hot["charges"]
# Split unprocessed datasets
X_train_unprocessed, X_test_unprocessed, y_train_unprocessed, y_test_unprocessed = train_test_split(X_unprocessed, y_unprocessed, test_size=0.2, random_state=42)
tf.random.set_seed(42)
# Instantiate Model
unprocessed_model = tf.keras.Sequential(
    [
     tf.keras.layers.Dense(100),
     tf.keras.layers.Dense(10),
     tf.keras.layers.Dense(1)
    ]
)
# Compile
unprocessed_model.compile(loss="mae", optimizer="adam", metrics=["mae"])
# Fit the model
unprocessed_history = unprocessed_model.fit(X_train_unprocessed, y_train_unprocessed, epochs=100, verbose=0)

In [None]:
# Evaluate the model
normalized_results = normalized_model.evaluate(X_test_normal, y_test)
unnormalized_results = unprocessed_model.evaluate(X_test_unprocessed, y_test_unprocessed)
print(normalized_results, '\n\n', unnormalized_results)