# Deep Learning

Regression

In [1]:
import pandas as pd

from sklearn.model_selection import train_test_split

# from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping

from scikeras.wrappers import KerasClassifier, KerasRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error, make_scorer

import matplotlib.pyplot as plt

In [2]:
df = pd.read_csv('insurance.csv') #load the dataset
print(df.shape)
df.head(3)

(1338, 7)


Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,1,27.9,0,1,southwest,16884.924
1,18,0,33.77,1,0,southeast,1725.5523
2,28,0,33.0,3,0,southeast,4449.462


In [3]:
# inspect categorical features
df.region.unique()

array(['southwest', 'southeast', '0rthwest', '0rtheast'], dtype=object)

In [4]:
# clean categorical features
df.region = df.region.replace('0', 'no', regex=True)
df.region.unique()

array(['southwest', 'southeast', 'northwest', 'northeast'], dtype=object)

In [5]:
# Define X and y
X = df.iloc[:,0:6]
y = df.iloc[:,-1]

In [6]:
# one-hot encoding for categorical variables
X = pd.get_dummies(X) 
X.head(2)

Unnamed: 0,age,sex,bmi,children,smoker,region_northeast,region_northwest,region_southeast,region_southwest
0,19,1,27.9,0,1,0,0,0,1
1,18,0,33.77,1,0,0,0,1,0


`Note:`

Train,test, validation splits comes differently in terms of Neural Networks. Usually using traditional ML algorithm we do the process is to split a given data set into 70% train data set and 30% test data set (ideally). In the training phase, we fit the model on the training data. And now to evaluate the model (i.e., to check how well the model is able to predict on unseen data), we run the model against the test data and get the predicted results. Since we already know what the expected results are, we compare the predicted and the real results to get the accuracy of the model.
If the accuracy is not up to the desired level, we repeat the above process (train, test, compare) until the desired accuracy is achieved.

In Neural Networks approch, We do split our data set in train_test_plit(our test set) then we do spliting again our training set in fiting phase that will be our validation_set). Then finally we will test our model using the testing set(unseen data) and compare the predicted result to the real result.

In [7]:
# Split data
x_train, x_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size = 0.2,
                                                    random_state = 42)

print(x_train.shape, x_test.shape, y_train.shape, y_test.shape)

(1070, 9) (268, 9) (1070,) (268,)


In [8]:
# standardize
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

# Designing Model

In [9]:
# Creating a keras sequential object
model_regr = Sequential()

### Define model

In [15]:
#### INPUT LAYER
model_regr.add(Dense(units = X.shape[1] , activation = 'relu')) 


#### HIDDEN LAYER 1
# `Note:`
# How do we choose the number of hidden layers and the number of units per layer? That is a tough question and there 
# is no good answer. The rule of thumb is to start with one hidden layer and add as many units as we have features in the
# dataset. However, this might not always work. We need to try things out and observe our learning curve.

# there are a numbers of activation functions such as softmax, sigmoid, 
# but ReLU (relu) (Rectified Linear Unit) is very effective in many applications and we’ll use it here.
model_regr.add(Dense(128, activation = 'relu'))

#### OUTPUT LAYER
model_regr.add(Dense(1, activation = 'linear'))  

### OPTIMIZER

In [16]:
# WE have a lot of optimizers such as SGD (Stochastic Gradient Descent optimizer), Adam, RMSprop, and others.
# right now adam is the best one as its solved previous optmizers issues.
opt = Adam(learning_rate = 0.01) # by default adam learning rate is 0.0.1

### COMPILE MODEL

In [17]:
# loss/cost 
# MSE, MAE, Huber loss  
model_regr.compile(loss='mse',  metrics=['mae'], optimizer=opt)  

### FIT THE MODEL

In [None]:
h = model_regr.fit(x_train, y_train, 
               validation_split=0.2, 
               epochs=100, 
               batch_size=10,
               verbose=1,
#                callbacks=[stop])

#### Model Summary

In [13]:
# view summary
model_regr.summary()

ValueError: This model has not yet been built. Build the model first by calling `build()` or by calling the model on a batch of data.

#### Training Phase

Add early stoping when theres no improvement.

In [None]:
# reference https://keras.io/api/callbacks/early_stopping/
stop = EarlyStopping(monitor='val_loss', # validation_split 20%
                     mode='min', 
                     patience=30,
                     verbose=1)

In [None]:
# create a variable to store our fitted model
h = model_regr.fit(x_train, y_train, 
               validation_split=0.2, 
               epochs=100, 
               batch_size=10,
               verbose=1,
               callbacks=[stop])

In [None]:
h.history.keys()

In [None]:
#plotting

fig, axs = plt.subplots(1,2,
                        figsize=(15, 6),
                        gridspec_kw={'hspace': 0.5, 'wspace': 0.2}) 
(ax1, ax2) = axs
ax1.plot(h.history['loss'], label='Train')
ax1.plot(h.history['val_loss'], label='Validation')
ax1.set_title('learning rate=' + str(0.01))
ax1.legend(loc="upper right")
ax1.set_xlabel("# of epochs")
ax1.set_ylabel("loss (MSE)")

ax2.plot(h.history['mae'], label='Train')
ax2.plot(h.history['val_mae'], label='Validation')
ax2.set_title('learning rate=' + str(0.01))
ax2.legend(loc="upper right")
ax2.set_xlabel("# of epochs")
ax2.set_ylabel("MAE")

#### Evaluation

In [None]:
val_mse, val_mae = model_regr.evaluate(x_test, y_test, verbose = 1)

# GridSearchCV

### Function For Designing Model
Function that creates and returns your Keras sequential model (To use in skires wrappers)

In [None]:
def design_model(features):
  # ann model instance  
  model_regr = Sequential()
  
  
  #### INPUT LAYER>>>>
  #adding the input layer
  model_regr.add(Dense(units = X.shape[1] , activation = 'relu')) 


  #### HIDDEN LAYER1>>>>
  # there are a numbers of activation functions such as softmax, sigmoid, 
  # but ReLU (relu) (Rectified Linear Unit) is very effective in many applications and we’ll use it here.
  model_regr.add(Dense(128, activation = 'relu'))


  #### OUTPUT LAYER>>>>
  model_regr.add(Dense(1, activation = 'linear'))  


  #### Optimizer
  # WE have a lot of optimizers such as SGD (Stochastic Gradient Descent optimizer), Adam, RMSprop, and others.
  # right now adam is the best one as its solved previous optmizers issues.
  opt = Adam(learning_rate = 0.01)
  # loss/cost 
  # MSE, MAE, Huber loss  
  model_regr.compile(loss='mse',  metrics=['mae'], optimizer=opt)  
    

  return model_regr

Invoke our fucntion and pass the x_train argument then save it in a variable.

In [None]:
model_regr = design_model(x_train)

Fitting our training set to our `model_regr`

In [None]:
model_regr.fit(x_train, y_train, 
               verbose = 1)

To use KerasRegressor, we must define a function that creates and returns your Keras sequential model,(Above Function)
then pass this function to the model argument when constructing the KerasClassifier class.

In [None]:
model = KerasRegressor(model = model_regr)

This is computational extensive, we will use small value here.

List of hyperparameters:
 1. the learning rate
 2. number of batches
 3. number of epochs
 4. number of units per hidden layer
 5. activation functions.

In [None]:
param_grid = dict(epochs = [50,100],
                  batch_size = [1,10,50])

In [None]:
grid = GridSearchCV(estimator=model, 
                    param_grid=param_grid,
                    n_jobs=-1, # use all processor cores of our machine (faster!!)
                    scoring = 'r2',
                    return_train_score = True,
                    cv=3)

grid_result = grid.fit(x_train, y_train)

In [None]:
grid_result.best_score_ , grid_result.best_params_

### Summary


1. Preparing the data for learning:
2. separating features from labels using array slicing
3. determining the shape of your data
4. preprocessing the categorical variables using one-hot encoding
5. splitting the data into training and test sets
6. scaling the numerical features
7. Designing a Sequential model by chaining InputLayer() and the tf.keras.layers.Dense layers. InputLayer() was used as a placeholder for the input data. The output layer in this case needed one neuron since we need a prediction of a single value in the regression. And finally, hidden layers were added with the relu activation function to handle complex dependencies in the data.
8. Choosing an optimizer using keras.optimizers with a specific learning rate hyperparameter.
9. Training the model - using model.fit() to train the model on the training data and training labels.
10. Setting the values for the learning hyperparameters: number of epochs and batch sizes.
11. Evaluating the model using model.evaluate() on the test data.
