###Dataset

#####The datset is real estate industry in Boston (US). 
#####This database contains 14 attributes. 
#####The target variable refers to the median value of owner-occupied homes in 1000 USD's.

* CRIM: per capita crime rate by town
* ZN: proportion of residential land zoned for lots over 25,000 sq.ft.
* INDUS: proportion of non-retail business acres per town
* CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
* NOX: nitric oxides concentration (parts per 10 million)
* RM: average number of rooms per dwelling
* AGE: proportion of owner-occupied units built prior to 1940
* DIS: weighted distances to five Boston employment centres
* RAD: index of accessibility to radial highways
* TAX: full-value property-tax rate per 10,000 USD
* PTRATIO: pupil-teacher ratio by town
* B: 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
* LSTAT: lower status of the population (%)
* MEDV: Median value of owner-occupied homes in 1000 USD's (Target)

###Objective


#####The objective in this exercise is to apply linear regression to find the median value of owner-occupied homes in 1000 USD's.

We will build a Machine learning model (i.e. Linear Regression) using tensorflow.keras (in short tf.keras) API.

###Loading library

In [1]:
import numpy as py   #to perform calculation
import pandas as pd  #to read data
import matplotlib.pyplot as plt   #to visualize 
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

###Loading Data

In [2]:
boston = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/Boston_Housing/Training_set_boston.csv")

###View Data

In [3]:
#To view the first 5 rows 
boston.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,15.0234,0.0,18.1,0.0,0.614,5.304,97.3,2.1007,24.0,666.0,20.2,349.48,24.91,12.0
1,0.62739,0.0,8.14,0.0,0.538,5.834,56.5,4.4986,4.0,307.0,21.0,395.62,8.47,19.9
2,0.03466,35.0,6.06,0.0,0.4379,6.031,23.3,6.6407,1.0,304.0,16.9,362.25,7.83,19.4
3,7.05042,0.0,18.1,0.0,0.614,6.103,85.1,2.0218,24.0,666.0,20.2,2.52,23.29,13.4
4,0.7258,0.0,8.14,0.0,0.538,5.727,69.5,3.7965,4.0,307.0,21.0,390.95,11.28,18.2


In [4]:
#Number of rows & columns of the dataset
print("Number of row for this dataset is: ", boston.shape[0])
print("Number of column for this dataset is: ", boston.shape[1])

Number of row for this dataset is:  404
Number of column for this dataset is:  14


###Determine the Input & Target variables

In [5]:
X = boston.drop("MEDV", axis=1)
y = boston.MEDV

In [6]:
#State the number of input variable column
num_input = X.shape[1]
print(num_input)

13


In [7]:
#View all input columns name
X.columns

Index(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX',
       'PTRATIO', 'B', 'LSTAT'],
      dtype='object')

###Spliting Training and Testing (Validation) Dataset

In [8]:
#Spliting training & testing dataset (validation) into 80:20
X_train_un, X_test_un, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [9]:
#View number of training & testing (validation) row
print("Number of training set - ", X_train_un.shape[0])
print("Number of testing set - ", X_test_un.shape[0])

Number of training set -  323
Number of testing set -  81


###Data Cleaning

In [10]:
scaler = StandardScaler()
#scaler.fit(X_train_un)

X_train = scaler.fit_transform(X_train_un)  #fit_transform: the model will learn the features of the training set. These learned parameters are then applied to test data
X_test = scaler.transform(X_test_un)  #fit: use the same data cleaning methods from training data to transform our test data. We don't want our model learn from test data as we want the test data to unknown to the model. 

###Model Training

In [11]:
#Import library for neural network using tensorflow 
import tensorflow
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from numpy.random import seed

In [12]:
#Define model
model = Sequential()
model.add(Dense(10, activation='relu', input_shape=(num_input,)))   #rectified linear unit
model.add(Dense(8, activation='relu'))
model.add(Dense(1))

#####Models can be defined either with the Sequential API or the Functional API (you will know this in later modules). Here we will define the model with Sequential API.

Sequential API The sequential API is the simplest API to get started with Deep Learning.

Our model accepts one input, has one hidden layer with 1 node and then an output layer with one node to predict a numerical value.

In [13]:
#Compiling model
from tensorflow.keras.optimizers import RMSprop
optimizer = RMSprop(0.01)  #0.01 is our learning rate

In [14]:
model.compile(loss='mean_squared_error', optimizer=optimizer)

#####Compiling model needs 3 parameters:
* Loss function: mean square error or cross-entropy
* Optimizer: RMSprop (Root Mean Square Propagation)
* Learning rate: ranging from 10−6 to 1.0

We only can determine the optimal learning rate through trial and error. 

A traditional default value for the learning rate is 0.1 or 0.01, and this may represent a good starting point on your problem.

In [15]:
#Fitting model 
seed_value = 42   #to get reproducible results

tensorflow.random.set_seed(seed_value)
model.fit(X_train, y_train, epochs=10, batch_size=30, verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f52a512b750>

* Epochs: #loops through the training dataset
* Batch size: #samples in an epoch used to estimate model error
* Verbose: range 0, 1 or 2 , to view the training progress for each epoch

verbose=0 will show you nothing (silent)

verbose=1 will show you an animated progress bar

verbose=2 will just mention the number of epoch

In [16]:
#Evaluating model
mse = model.evaluate(X_test, y_test)



In [17]:
print('MSE without tuning: {}'.format(mse))

MSE without tuning: 19.21466827392578


#####Evaluating the model requires that you first choose a holdout dataset used to evaluate the model. This should be data not used in the training process i.e. the X_test.

Our model tells us the mean squared error is 19.21. What does it mean?
When you subtract the predicted values (of X_test data) from the acutal value (of X_test data), then square it and sum all the squares, and finally take a mean (i.e. average) of it, the result you will get is 19.21 in this case.

In [18]:
#Model prediction
#model.predict(X_test)

#####By using this model, it predict the median housing price in 1000USD for each house configuration is as above via linear regression using tensor flow. 

###Model Tunning

##### 3 hyperparameters allow us to tune:
* Learning Rate: Key hyperparameter
* Epochs
* Batch Size

A scalar used to train a model via gradient descent. During each iteration, the gradient descent algorithm multiplies the learning rate by the gradient. The resulting product is called the gradient step.

We will automate hyperparameter tuning using Random Search and Keras. Random Search is a hyperparameter search procedure that is performed on a defined grid of hyperparameters.

Random Search is a hyperparameter search procedure that is performed on a defined grid of hyperparameters. However, not all hyperparameter combinations are used to train a new model, only some selected randomly, while a process of cross-validation to measure the performance of temporal models. Once the process is complete, the hyperparameters and the best performing model are chosen.

Keras-turner is used to execute the hyperparameter tuning procedure. It is a library that helps you pick the optimal set of hyperparameters for your TensorFlow model.

In [19]:
#Install and import library
!pip install keras-tuner -q
import kerastuner as kt

  This is separate from the ipykernel package so we can avoid doing imports until


In [20]:
#Define a general architecture of model using user-defined function
def model_builder(hp):
  model = Sequential()
  model.add(Dense(10, activation='relu', input_shape=(num_input,)))
  model.add(Dense(8, activation='relu'))
  model.add(Dense(1))
  hp_learning_rate = hp.Choice('learning_rate', values = [1e-1, 1e-2, 1e-3, 1e-4]) # Tuning the learning rate (four different values to test: 0.1, 0.01, 0.001, 0.0001) #hp is an alias for Keras Tuner’s HyperParameters class
  optimizer = RMSprop(learning_rate = hp_learning_rate)                            # Defining the optimizer
  model.compile(loss='mse',metrics=['mse'], optimizer=optimizer)                   # Compiling the model 
  return model                                                                     # Returning the defined model

In [21]:
#Define hypeparameter grid to be validated
tuner_rs = kt.RandomSearch(
              model_builder,                # Takes hyperparameters (hp) and returns a Model instance
              objective = 'mse',            # Name of model metric to minimize or maximize
              seed = 42,                    # Random seed for replication purposes
              max_trials = 5,               # Total number of trials (model configurations) to test at most. Note that the oracle may interrupt the search before max_trial models have been tested.
              directory='random_search')    # Path to the working directory (relative)

In [22]:
#Run the random search using the search method
tuner_rs.search(X_train, y_train, epochs=10, validation_split=0.2, verbose=1)

In [23]:
#Print the summary results of the hyperparameter tuning procedure
tuner_rs.results_summary()

Results summary
Results in random_search/untitled_project
Showing 10 best trials
<keras_tuner.engine.objective.Objective object at 0x7f52a26570d0>
Trial summary
Hyperparameters:
learning_rate: 0.01
Score: 79.75146484375
Trial summary
Hyperparameters:
learning_rate: 0.1
Score: 103.4114761352539
Trial summary
Hyperparameters:
learning_rate: 0.0001
Score: 249.3369140625
Trial summary
Hyperparameters:
learning_rate: 0.001
Score: 277.0947265625


In [24]:
#Print the results of the tuned model the tuner had tried & evaluate
model_tuned = tuner_rs.get_best_models(num_models=1)[0]
mse_tuned = model_tuned.evaluate(X_test, y_test)



In [25]:
print('MSE with tuning: {}'.format(mse_tuned))

MSE with tuning: [601.84326171875, 601.84326171875]


In [26]:
#Print the tuned model's architecture
model_tuned.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 10)                140       
                                                                 
 dense_1 (Dense)             (None, 8)                 88        
                                                                 
 dense_2 (Dense)             (None, 1)                 9         
                                                                 
Total params: 237
Trainable params: 237
Non-trainable params: 0
_________________________________________________________________


###Best model

#####The best model is the untuned neural network with mean squared error of 25.03.

###Model prediction


#####It requires you have new data for which a prediction is required, e.g. where you do not have the target values.

From an API perspective, you simply call a function to make a prediction of a class label, probability, or numerical value: whatever you designed your model to predict.

We have our new test data located at the given github location:

https://raw.githubusercontent.com/dphi-official/Datasets/master/Boston_Housing/Testing_set_boston.csv

In [27]:
#Load new test data
new_test_data = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/Boston_Housing/Testing_set_boston.csv')

In [28]:
# make a prediction
model.predict(new_test_data)



array([[1670.8047],
       [1683.7003],
       [2412.1733],
       [1816.8334],
       [2263.1577],
       [1618.4404],
       [1592.7236],
       [1704.731 ],
       [2291.4988],
       [1666.6761],
       [1619.9502],
       [1523.4867],
       [1820.6334],
       [1669.7302],
       [1643.719 ],
       [2273.4504],
       [1551.9165],
       [2168.1052],
       [1556.1239],
       [2258.276 ],
       [1786.6687],
       [1666.9539],
       [1398.2715],
       [1683.9121],
       [2219.8167],
       [2227.6265],
       [1759.2947],
       [1941.2373],
       [1601.1027],
       [1674.981 ],
       [1639.8813],
       [1747.4865],
       [2254.9575],
       [2301.638 ],
       [2206.8506],
       [1415.5776],
       [1746.8546],
       [1768.758 ],
       [1626.1367],
       [1741.1818],
       [1631.3223],
       [1616.5911],
       [1500.9866],
       [1684.5378],
       [1704.5417],
       [2269.1848],
       [1415.6134],
       [1778.8352],
       [2272.3093],
       [1677.2786],


#####The new median housing price in 1000 USD from the optimized neural network model.

#####Further improvement:
https://towardsdatascience.com/hyperparameter-tuning-with-keras-tuner-283474fbfbe

#####Train, validate & test dataset
https://towardsdatascience.com/train-validation-and-test-sets-72cb40cba9e7

#####Note on ML & cleaning
https://towardsdatascience.com/how-to-avoid-potential-machine-learning-pitfalls-a08781f3518e
https://towardsdatascience.com/practical-guide-to-data-cleaning-in-python-f5334320e8e