# Neural Network and Decision Tree Analysis

I will be practicing supervised learning techniques gained from the machine learning course. While insurance.csv dataset is used throughout the process, it is to be noted that I do not intend to educe any meaningful outcome from the data. The data is solely used to implement the machine learning techniques. 

In [1]:
import pandas as pd
import numpy as np
import math

In [51]:
df = pd.read_csv("Datasets/insurance.csv")
df.head

<bound method NDFrame.head of       age     sex     bmi  children smoker     region      charges
0      19  female  27.900         0    yes  southwest  16884.92400
1      18    male  33.770         1     no  southeast   1725.55230
2      28    male  33.000         3     no  southeast   4449.46200
3      33    male  22.705         0     no  northwest  21984.47061
4      32    male  28.880         0     no  northwest   3866.85520
...   ...     ...     ...       ...    ...        ...          ...
1333   50    male  30.970         3     no  northwest  10600.54830
1334   18  female  31.920         0     no  northeast   2205.98080
1335   18  female  36.850         0     no  southeast   1629.83350
1336   21  female  25.800         0     no  southwest   2007.94500
1337   61  female  29.070         0    yes  northwest  29141.36030

[1338 rows x 7 columns]>

The dataset consists of 1338 examples with 7 columns. The columns are:

| Features     | Data     |
| ----------- | ----------- |
| age    | 18-64   |
| sex    | female/male    |
| bmi    | 16.0-53.1   |
| children    | 0-5    |
| smoker    | yes/no    |
| region    | SE/SW/NE/NW    |
| charges    | 1120-63800    |

# Neural Network Algorithm

Neural network is the algorithm that try to mimic the brain. Its composed of the input layer, hidden layer(s), and the output layer. It's to be noted that the input layer has to be composed of only numerical data. We will go through the process to turn non-numeric data to numerical data, applying one hot encoding where necessary. One-hot encoding refers to splitting up categorical data into the number of categories, making each category a binary data. 

In [38]:
#The following code convers 'male' to 1 'female' to 0
df_encoded = pd.get_dummies(df, columns=['sex'], dtype=int, drop_first=True)

#The following code convers 'yes' to 1 'no' to 0
df_encoded = pd.get_dummies(df_encoded, columns=['smoker'], dtype=int, drop_first=True)

#The following code implements one-hot encoding on 'region' feature
df_encoded = pd.get_dummies(df_encoded, columns=['region',], dtype=int)
df_encoded.head

<bound method NDFrame.head of       age     bmi  children      charges  sex_male  smoker_yes  \
0      19  27.900         0  16884.92400         0           1   
1      18  33.770         1   1725.55230         1           0   
2      28  33.000         3   4449.46200         1           0   
3      33  22.705         0  21984.47061         1           0   
4      32  28.880         0   3866.85520         1           0   
...   ...     ...       ...          ...       ...         ...   
1333   50  30.970         3  10600.54830         1           0   
1334   18  31.920         0   2205.98080         0           0   
1335   18  36.850         0   1629.83350         0           0   
1336   21  25.800         0   2007.94500         0           0   
1337   61  29.070         0  29141.36030         0           1   

      region_northeast  region_northwest  region_southeast  region_southwest  
0                    0                 0                 0                 1  
1                  

I will implement the neural network algorithm to calculate the probability of a specific person being a smoker. Hence I need to drop the column and store in a different array. 

In [4]:
smoker = df_encoded['smoker_yes'].to_numpy()
print(smoker.shape)

(1338,)


Before proceeding any further, I will start by scaling every feature by z-score normalization. 

In [40]:
#The code below implements z-score normalization on all the features of df_encoded
df_encoded_drop = df_encoded.drop('smoker_yes', axis=1)
df_z_scaled = df_encoded_drop.copy()

for column in df_z_scaled.columns:
    df_z_scaled[column] = (df_z_scaled[column]-df_z_scaled[column].mean()) / df_z_scaled[column].std()
    
df_z_scaled.head



<bound method NDFrame.head of            age       bmi  children   charges  sex_male  region_northeast  \
0    -1.438227 -0.453151 -0.908274  0.298472 -1.010141         -0.565056   
1    -1.509401  0.509431 -0.078738 -0.953333  0.989221         -0.565056   
2    -0.797655  0.383164  1.580335 -0.728402  0.989221         -0.565056   
3    -0.441782 -1.305043 -0.908274  0.719574  0.989221         -0.565056   
4    -0.512957 -0.292447 -0.908274 -0.776512  0.989221         -0.565056   
...        ...       ...       ...       ...       ...               ...   
1333  0.768185  0.050278  1.580335 -0.220468  0.989221         -0.565056   
1334 -1.509401  0.206062 -0.908274 -0.913661 -1.010141          1.768415   
1335 -1.509401  1.014499 -0.908274 -0.961237 -1.010141         -0.565056   
1336 -1.295877 -0.797515 -0.908274 -0.930014 -1.010141         -0.565056   
1337  1.551106 -0.261290 -0.908274  1.310563 -1.010141         -0.565056   

      region_northwest  region_southeast  region_southwes

In [6]:
#creating nparray of all necessary features
x_train = df_z_scaled[['age', 'bmi', 'children', 'charges', 'sex_male', 'region_northeast', 'region_northwest', 
                      'region_southeast', 'region_southwest']].to_numpy()
x_train.shape

(1338, 9)

In [8]:
# importing packages necessary to implement neural network
import tensorflow as tf
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras import Sequential
from tensorflow.keras.losses import MeanSquaredError, BinaryCrossentropy
from tensorflow.keras.activations import sigmoid

Each neuron has an activation. The activation can be linear, sigmoid or relu. Since the neural network aims to predict the chances of a person being a smoker, we will use sigmoid activation. 

In [15]:
model = Sequential(
    [
        tf.keras.Input(shape=(9,)),
        Dense(3, activation='sigmoid', name='layer1'),
        Dense(1, activation='sigmoid', name='layer2')
    ]
)

In [10]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 layer1 (Dense)              (None, 3)                 30        
                                                                 
 layer2 (Dense)              (None, 1)                 4         
                                                                 
Total params: 34
Trainable params: 34
Non-trainable params: 0
_________________________________________________________________


In [16]:
#Describes the random biases and weights Tensorflow has initiated
W1, b1 = model.get_layer("layer1").get_weights()
W2, b2 = model.get_layer("layer2").get_weights()
print(f"W1{W1.shape}:\n", W1, f"\nb1{b1.shape}:", b1)
print(f"W2{W2.shape}:\n", W2, f"\nb2{b2.shape}:", b2)

W1(9, 3):
 [[-0.0243066  -0.6859954   0.5371651 ]
 [-0.46246472 -0.33147198  0.54098004]
 [ 0.322226   -0.69254774 -0.66457164]
 [-0.06176066  0.36415774  0.04277718]
 [ 0.35824233 -0.3874821  -0.5155927 ]
 [-0.6085216   0.6997232   0.3565182 ]
 [-0.00584257  0.42300636  0.40524262]
 [-0.16720092  0.31298     0.66041654]
 [-0.5707931  -0.4807679  -0.70085335]] 
b1(3,): [0. 0. 0.]
W2(3, 1):
 [[ 0.2804159]
 [-0.6430382]
 [ 0.7749101]] 
b2(1,): [0.]


In [18]:
model.compile(
    loss = tf.keras.losses.BinaryCrossentropy(),
    optimizer = tf.keras.optimizers.legacy.Adam(learning_rate=0.01),
)
#epochs means the entire data set should be applied during training 10 times.
model.fit(
    x_train,smoker,            
    epochs=10,
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x29b096850>

In [19]:
#After fitting, the weights have been updated
W1, b1 = model.get_layer("layer1").get_weights()
W2, b2 = model.get_layer("layer2").get_weights()
print(f"W1{W1.shape}:\n", W1, f"\nb1{b1.shape}:", b1)
print(f"W2{W2.shape}:\n", W2, f"\nb2{b2.shape}:", b2)

W1(9, 3):
 [[ 0.97016275  0.85319906 -0.9603485 ]
 [ 1.4447821   1.4506173  -1.4787594 ]
 [ 0.16927463  0.22694588 -0.1928326 ]
 [-3.2898934  -2.8743703   3.203796  ]
 [-0.0976601  -0.22181818  0.13050695]
 [-0.2908088   0.31043464  0.10585709]
 [-0.35587466  0.26202214  0.16500361]
 [-0.40883565  0.13569534  0.23385766]
 [-0.40078142  0.2355054   0.21645196]] 
b1(3,): [ 1.4255282  1.3723803 -1.46342  ]
W2(3, 1):
 [[-3.2533317]
 [-4.443698 ]
 [ 2.9462836]] 
b2(1,): [-0.23018911]


## Predictions
Since now we have a trained model, we can use it to make predictions. Since this model predicts a probability, in order to make decision there has to be a threshold. We will set 0.5 as the threshold. 

In [58]:
X_test = np.array([
    [18, 33.770, 1, 1725.55230, 1, 0, 1, 0, 0],  # neg example
    [19, 27.900, 0, 16884.92400, 0, 0, 0, 1, 0]])   # pos example
print(X_test.shape)
col_means = np.mean(df_encoded_drop, axis=0)
col_means = col_means.values.reshape(1,-1)
print(col_means.shape)
col_std = np.std(df_encoded_drop, axis=0)
col_std = col_std.values.reshape(1,-1)
X_testn =(X_test - col_means) / col_std


predictions = model.predict(X_testn)
print("predictions = \n", predictions)

(2, 9)
(1, 9)
predictions = 
 [[4.8179663e-04]
 [7.3478907e-01]]


To convert the probabilities to a decision, we apply a threshold:

In [60]:
yhat = (predictions >= 0.5).astype(int)
print(f"decisions = \n{yhat}")

decisions = 
[[0]
 [1]]
