1. [Data Pre-Processing](#prepro)
1. [NN Model](#model)

<a name="prepro"></a>
# Data Pre-Processing

In [1]:
import pandas as pd 

df = pd.read_csv('insurance.csv')
df.head()

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.9,0,yes,southwest,16884.924
1,18,male,33.77,1,no,southeast,1725.5523
2,28,male,33.0,3,no,southeast,4449.462
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.88,0,no,northwest,3866.8552


In [2]:
X = df.iloc[:, 0:6]
y = df.iloc[:, -1]

X.shape, y.shape

((1338, 6), (1338,))

In [3]:
X.describe(include='all')

Unnamed: 0,age,sex,bmi,children,smoker,region
count,1338.0,1338,1338.0,1338.0,1338,1338
unique,,2,,,2,4
top,,male,,,no,southeast
freq,,676,,,1064,364
mean,39.207025,,30.663397,1.094918,,
std,14.04996,,6.098187,1.205493,,
min,18.0,,15.96,0.0,,
25%,27.0,,26.29625,0.0,,
50%,39.0,,30.4,1.0,,
75%,51.0,,34.69375,2.0,,


Since __NNs cannot work with string data directly__, we need to convert our categorical features into numerical.

In [4]:
X = pd.get_dummies(X)
X.head()

Unnamed: 0,age,bmi,children,sex_female,sex_male,smoker_no,smoker_yes,region_northeast,region_northwest,region_southeast,region_southwest
0,19,27.9,0,1,0,0,1,0,0,0,1
1,18,33.77,1,0,1,1,0,0,0,1,0
2,28,33.0,3,0,1,1,0,0,0,1,0
3,33,22.705,0,0,1,1,0,0,1,0,0
4,32,28.88,0,0,1,1,0,0,1,0,0


In [5]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

In [6]:
X_train.shape, X_test.shape

((896, 11), (442, 11))

In [7]:
y_train.shape, y_test.shape

((896,), (442,))

The usual preprocessing step for numerical variables, among others, is __standardization__ that rescales features to zero mean and unit variance. 

__Normalization__ is another way of preprocessing numerical data: it scales the numerical features to a fixed range - usually between 0 and 1.

In [8]:
from sklearn.preprocessing import Normalizer
from sklearn.compose import ColumnTransformer

ct = ColumnTransformer([('normalize', Normalizer(), ['age', 'bmi', 'children'])], remainder='passthrough')

X_train_norm = ct.fit_transform(X_train)
X_test_norm = ct.transform(X_test)

The name of the column transformer is “only numeric”, it applies a `Normalizer()` to the `age`, `bmi`, and `children` columns, and for the rest of the columns it just passes through. `ColumnTransformer()` returns NumPy arrays.

In [9]:
X_train_norm = pd.DataFrame(X_train_norm, columns = X_train.columns)
X_train_norm.head()

Unnamed: 0,age,bmi,children,sex_female,sex_male,smoker_no,smoker_yes,region_northeast,region_northwest,region_southeast,region_southwest
0,0.863808,0.503821,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0
1,0.740865,0.670578,0.037993,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0
2,0.827684,0.560894,0.018393,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0
3,0.500102,0.865966,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0
4,0.83269,0.553739,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0


In [10]:
from sklearn.preprocessing import StandardScaler

my_ct = ColumnTransformer([('scale', StandardScaler(), ['age', 'bmi', 'children'])], remainder='passthrough')
X_train_scaled = ct.fit_transform(X_train)
X_test_scaled = ct.transform(X_test)

X_train_scaled = pd.DataFrame(X_train_scaled, columns = X_train.columns)
X_train_scaled.head()

Unnamed: 0,age,bmi,children,sex_female,sex_male,smoker_no,smoker_yes,region_northeast,region_northwest,region_southeast,region_southwest
0,0.863808,0.503821,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0
1,0.740865,0.670578,0.037993,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0
2,0.827684,0.560894,0.018393,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0
3,0.500102,0.865966,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0
4,0.83269,0.553739,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0


In [11]:
X_train_scaled.shape

(896, 11)

In [12]:
X_train_scaled.describe()

Unnamed: 0,age,bmi,children,sex_female,sex_male,smoker_no,smoker_yes,region_northeast,region_northwest,region_southeast,region_southwest
count,896.0,896.0,896.0,896.0,896.0,896.0,896.0,896.0,896.0,896.0,896.0
mean,0.754101,0.625607,0.02239,0.487723,0.512277,0.790179,0.209821,0.256696,0.252232,0.25558,0.235491
std,0.133154,0.145262,0.025673,0.500128,0.500128,0.407408,0.407408,0.437054,0.434536,0.436431,0.424542
min,0.320877,0.306215,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.661468,0.510737,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
50%,0.795743,0.602983,0.017518,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0
75%,0.859321,0.748948,0.038561,1.0,1.0,1.0,0.0,1.0,1.0,1.0,0.0
max,0.951963,0.947121,0.133655,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


<a name="model"></a>
# 2. NN Model
1. [Input Layer](#input)
1. [Hidden Layers](#hidden)
1. [Output Layer](#output)
1. [Optimizers](#opt)
1. [Training & Evaluating](#train)

In [13]:
from tensorflow.keras.models import Sequential

# instantiate model
model = Sequential(name='my_model')
# check layers
model.layers

[]

A __fully-connected layer__ in which __all neurons connect to all neurons__ in the next layer.

<a name='input'></a>
## 2.1 Input Layer

Pay attention to the dimensions of the __weight__ and __bias__ parameter matrices. Since we chose to create a layer with three neurons, the number of outputs of this layer is 3. Hence, the bias parameter would be a vector of `(3, 1)` dimension.

![](http://content.codecademy.com/courses/deeplearning-with-tensorflow/implementing-neural-networks/layers_diagram.svg)

In [14]:
X.shape

(1338, 11)

In [15]:
from tensorflow.keras.layers import InputLayer
import tensorflow as tf

input_layer = InputLayer(input_shape=(X.shape[1], ))

model.add(input_layer)
model.summary()

Model: "my_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________


<a name='hidden'></a>
## 2.2 Hidden Layers

__Adding more layers__ to a NN naturally __increases the number of parameters__ to be tuned. With every layer, there are associated weight and bias vectors.

In following diagram below we show the size of parameter vectors with each layer. In our case, the __1st layer__’s weight matrix (red) has shape `(11, 64)` because we feed __11 features__ to __64 hidden neurons__. 

The __output layer__ (purple) has the weight matrix of shape `(64, 1)` because we have __64 input units__ and __1 neuron__ in the final layer.

![](http://content.codecademy.com/courses/deeplearning-with-tensorflow/implementing-neural-networks/hidden_layers_diagram.svg)

In [17]:
from tensorflow.keras.layers import Dense

model.add(Dense(128, activation='relu'))
model.summary()

Model: "my_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 128)               1536      
                                                                 
Total params: 1,536
Trainable params: 1,536
Non-trainable params: 0
_________________________________________________________________


<a name='output'></a>
## 2.3 Output Layer

The output layer shape depends on your task. In the case of regression, we need __one output for each sample__.

In our case, we are doing regression and wish to predict one number for each data point: the medical cost billed by health insurance indicated in the charges column in our data. Hence, our output layer has only __one neuron__.

Notice that you __don’t need to specify the input shape of this layer__ since Tensorflow with Keras can __automatically infer its shape from the previous layer__.

In [18]:
model.add(Dense(1))
model.summary()

Model: "my_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 128)               1536      
                                                                 
 dense_1 (Dense)             (None, 1)                 129       
                                                                 
Total params: 1,665
Trainable params: 1,665
Non-trainable params: 0
_________________________________________________________________


<a name='opt'></a>
## 2.4 Optimizers

While model __parameters__ are the ones that the model uses to make predictions, __hyperparameters__ determine the learning process (learning rate, number of iterations, optimizer type).

In [19]:
from tensorflow.keras.optimizers import Adam

opt = Adam(learning_rate=0.01)

model.compile(loss='mse', metrics=['mae'], optimizer=opt)

<a name='train'></a>
## 2.5 Training & Evaluating

In [20]:
model.fit(X_train_scaled, y_train, epochs=50, batch_size=3, verbose=1)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x22d91b3a3d0>

In [22]:
val_mse, val_mae = model.evaluate(X_test_scaled, y_test, verbose=0)
print(f"MSE: {val_mse}\nMAE: {val_mae}")

MSE: 47914156.0
MAE: 5001.96728515625
