## Preprocessing Data (Normalizatrion and Standardization)
In terms of scaling values, neural networks tend to prefer normalization. If you are not sure on which to use, you could try both and see which performs better. Many Machine learning algorithms perform better or converge faster when features are on a relatively similar scale and/or close to normally distributed such as:
- linear and logistic regression
- nearest neighbors
- neural networks
- support vector machines with radial bias kernel functions
- principal components analysis
- linear discriminent analysis

## Feature Scaling

| Scaling Type | What it does | Scikit-Learn Function | When to use |
|:------------:|:------------:|:---------------------:|:-----------:|
| Scale (Normalization) | Converts all values to between 0 and 1 whilst preserving the original distribution | MinMaxScaler | Use as default scaler with neural networks |
| Standardization | Removes the mean and divides each value by the standard deviation | StandardScaler | Transform feature to have close to normal distribution (caution: this reduces the effect of outliers) |

In [2]:
import tensorflow as tf
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import OneHotEncoder, MinMaxScaler
import matplotlib.pyplot as plt
plt.style.use('dark_background')
insurance = pd.read_csv('insurance.csv')

In [3]:
# Creat a cloumn transformer
ct = make_column_transformer(
    (MinMaxScaler(), ['age', 'bmi', 'children']), # normailize values between 0 and 1
    (OneHotEncoder(), ['sex', 'smoker', 'region']) # one hot encode categorical variables
)

# Create X and y values
X = insurance.drop('charges', axis=1)
y = insurance['charges']

# Split into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit the column transformer to our training data
ct.fit(X_train)

# Transform training and test data with normalization and one hot encoding
X_train_normal = ct.transform(X_train)
X_test_normal = ct.transform(X_test)


In [4]:
# View the transformed data 
X_train.loc[0] , X_train_normal[0]

(age                19
 sex            female
 bmi              27.9
 children            0
 smoker            yes
 region      southwest
 Name: 0, dtype: object,
 array([0.60869565, 0.10734463, 0.4       , 1.        , 0.        ,
        1.        , 0.        , 0.        , 1.        , 0.        ,
        0.        ]))

In [5]:
# Check shapes
X_train.shape, X_train_normal.shape

((1070, 6), (1070, 11))

In [6]:
# Build a neural network on normalized data
model = tf.keras.Sequential([
    tf.keras.layers.Dense(100, input_shape=[11]),
    tf.keras.layers.Dense(10),
    tf.keras.layers.Dense(1)
])

model.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss=tf.keras.losses.mae,
    metrics=['mae']
)

model.fit(
    X_train_normal,
    y_train,
    epochs=100,
    verbose=0
)

<keras.src.callbacks.History at 0x21a0e7d1a80>

In [7]:
# Evaluate on test data
# Since it was trained on normalized data it needs to be trained on normalized data
loss, mae = model.evaluate(X_test_normal, y_test, verbose=0)
loss, mae

(3442.244384765625, 3442.244384765625)