# **House_Predictions using Boston_Housing Dataset with Regression Model** 

---

### **Regression uses targets (Continuous Values) instead of labels (Discrete Values)**

### Scikit-learn's Repeated K-Fold Method behaves little differently than the tradional K-Fold, as no. of split were 4 but scores were 5 at 100 epochs and after tuning the no. of epochs i.e. 500, the resultant scores were 6 

### Good thing is that overall final result got better i.e. from 17.97% to 16.81%

## ** Loading Dataset **

In [0]:
from keras.datasets import boston_housing

(train_data, train_targets), (test_data, test_targets) = boston_housing.load_data()

## **Checking Shape of Data**

In [63]:
train_data.shape   # samples, features ; 2D tensor
                      # 404 training samples, with 13 features e.g. Crime Rate 

(404, 13)

In [64]:
test_data.shape

(102, 13)

## **Checking train_targets**

In [65]:
train_targets  # the median values of owner-occupied homes (in USD, according to 1970s period)

array([15.2, 42.3, 50. , 21.1, 17.7, 18.5, 11.3, 15.6, 15.6, 14.4, 12.1,
       17.9, 23.1, 19.9, 15.7,  8.8, 50. , 22.5, 24.1, 27.5, 10.9, 30.8,
       32.9, 24. , 18.5, 13.3, 22.9, 34.7, 16.6, 17.5, 22.3, 16.1, 14.9,
       23.1, 34.9, 25. , 13.9, 13.1, 20.4, 20. , 15.2, 24.7, 22.2, 16.7,
       12.7, 15.6, 18.4, 21. , 30.1, 15.1, 18.7,  9.6, 31.5, 24.8, 19.1,
       22. , 14.5, 11. , 32. , 29.4, 20.3, 24.4, 14.6, 19.5, 14.1, 14.3,
       15.6, 10.5,  6.3, 19.3, 19.3, 13.4, 36.4, 17.8, 13.5, 16.5,  8.3,
       14.3, 16. , 13.4, 28.6, 43.5, 20.2, 22. , 23. , 20.7, 12.5, 48.5,
       14.6, 13.4, 23.7, 50. , 21.7, 39.8, 38.7, 22.2, 34.9, 22.5, 31.1,
       28.7, 46. , 41.7, 21. , 26.6, 15. , 24.4, 13.3, 21.2, 11.7, 21.7,
       19.4, 50. , 22.8, 19.7, 24.7, 36.2, 14.2, 18.9, 18.3, 20.6, 24.6,
       18.2,  8.7, 44. , 10.4, 13.2, 21.2, 37. , 30.7, 22.9, 20. , 19.3,
       31.7, 32. , 23.1, 18.8, 10.9, 50. , 19.6,  5. , 14.4, 19.8, 13.8,
       19.6, 23.9, 24.5, 25. , 19.9, 17.2, 24.6, 13

# Preparing Data

---

## Normalizing the Data

### feature-wise normalization i.e. (feature - mean)/ standard deviation using Numpy
### Using Normalization, to prevent hetrogenous ranges feeding to a Neural Network or to feed homogenous range (each feature close to zero) to a NN.

In [0]:
mean = train_data.mean(axis = 0)   # features on axis 0
train_data -= mean
std = train_data.std(axis = 0)
train_data /= std

# Note: the quantities use for normalizing the test data, have been computed on train data (BEWARE: Never use computing on test data)

test_data -= mean
test_data /= std

# Building Network 

---

## Defining the Model 

### Example of Scalar Regression, i.e. in which a single continuous value will be predicted.

### No ACTIVATION function will be used on OUTPUT layer, because we are willing to predict a single unit, i.e. the last layer will be a linear layer, allowing the network to predict values in  any range. 

### Sigmoid is not used on last layer, because it will restrict the range (an output can take) between 0 to 1.

### MSE (Mean Squaared Error, squared difference of predictions and targets) is used as a loss function, because of the industry practices.

### MEA (Mean Absolute Error, the absolute difference value of predictions and targets i.e. a MAE of 0.5 will reflect that predictions are off by USD 500 from targets on average)



In [0]:
from keras import models
from keras import layers

# For Reusability, model is defined as a function.

def build_model():
  model = models.Sequential()
  model.add(layers.Dense(64, activation = 'relu', input_shape = (train_data.shape[1],))) # number of features (13) are used as vector in input shape
  model.add(layers.Dense(64, activation = 'relu'))
  model.add(layers.Dense(1)) # no activation function
  model.compile(optimizer = 'rmsprop', loss = 'mse', metrics = ['mae'])
  return model

# Validation Step : Using K-fold Cross-Validation Method

---

### Especially, when you have small dataset or else it will take more time to process
### In order to aviod increasing the variance between validation and training set split i.e. to allow more room for generalization



In [0]:
import numpy as np

In [0]:
from sklearn.model_selection import RepeatedKFold 

X = train_data
y = train_targets

#def RepeatedKFold():
kf = RepeatedKFold(n_splits=4, n_repeats=100, random_state=None) 

for train_index, test_index in kf.split(X):
  print("Train:", train_index, "Validation:",test_index)
  partial_train_data, val_data = X[train_index], X[test_index] 
  partial_train_targets, val_targets = y[train_index], y[test_index]

In [0]:
 # Build the Keras Models (already commpiled)
  model = build_model()
  # Train the model (in silence mode, verbose = 0)
  model.fit(partial_train_data, partial_train_targets, epochs = 100, batch_size = 1, verbose = 0)
  # Evaluate the model on the validation data
  val_mse, val_mae = model.evaluate(val_data, val_targets, verbose = 0)
  all_scores.append(val_mae)

In [81]:
all_scores

[2.0241272968821007,
 2.3239941054051467,
 2.779041604240342,
 2.360041854995312,
 2.426937988488981]

## All Scores

In [83]:
all_scores 

[2.0241272968821007,
 2.3239941054051467,
 2.779041604240342,
 2.360041854995312,
 2.426937988488981]

## Final Validation Score

In [84]:
np.mean(all_scores)  # we are still off by USD 2367 on average

2.382828570002377

## Updating the Epochs

In [0]:
from sklearn.model_selection import RepeatedKFold 

X = train_data
y = train_targets


#def RepeatedKFold():
kf = RepeatedKFold(n_splits=4, n_repeats=500, random_state=None) 

for train_index, test_index in kf.split(X):
  #print("Train:", train_index, "Validation:",test_index)
  partial_train_data, val_data = X[train_index], X[test_index] 
  partial_train_targets, val_targets = y[train_index], y[test_index]

In [0]:
 # Build the Keras Models (already commpiled)
  model = build_model()
  # Train the model (in silence mode, verbose = 0)
  model.fit(partial_train_data, partial_train_targets, epochs = 500, batch_size = 1, verbose = 0)
  # Evaluate the model on the validation data
  val_mse, val_mae = model.evaluate(val_data, val_targets, verbose = 0)
  all_scores.append(val_mae)

In [92]:
all_scores

[2.0241272968821007,
 2.3239941054051467,
 2.779041604240342,
 2.360041854995312,
 2.426937988488981,
 2.7708221803797355]

In [93]:
np.mean(all_scores) 

2.4474941717319365

# Training the final model

In [94]:
# Get a fresh compiled model.
model = build_model()

#Training on Entire Data
model.fit(train_data, train_targets, epochs = 80, batch_size = 16, verbose = 0)
test_mse_score, test_mae_score = model.evaluate(test_data, test_targets)



# Final Result

In [95]:
test_mse_score   # We are still off $ 1797

16.81532871021944