## About the Dataset
We will be working on a data set that comes from the real estate industry in Boston (US). This database contains 14 attributes. The output variable refers to the median value of owner-occupied homes in 1000 USD's.

* CRIM: per capita crime rate by town
* ZN: proportion of residential land zoned for lots over 25,000 sq.ft.
* INDUS: proportion of non-retail business acres per town
* CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
* NOX: nitric oxides concentration (parts per 10 million)
* RM: average number of rooms per dwelling
* AGE: proportion of owner-occupied units built prior to 1940
* DIS: weighted distances to five Boston employment centres
* RAD: index of accessibility to radial highways
* TAX: full-value property-tax rate per 10,000 USD
* PTRATIO: pupil-teacher ratio by town
* B: 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
* LSTAT: lower status of the population (%)
* MEDV: Median value of owner-occupied homes in 1000 USD's (Output/Target)


In [None]:
# importing packages
import numpy as np # to perform calculations 
import pandas as pd # to read data
import matplotlib.pyplot as plt # to visualise

### Loading Data

In [None]:
# In read_csv() function, I have passed the location to where the file is located in a github page
boston_data = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/Boston_Housing/Training_set_boston.csv" )

## View Data

In [None]:
boston_data.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,15.0234,0.0,18.1,0.0,0.614,5.304,97.3,2.1007,24.0,666.0,20.2,349.48,24.91,12.0
1,0.62739,0.0,8.14,0.0,0.538,5.834,56.5,4.4986,4.0,307.0,21.0,395.62,8.47,19.9
2,0.03466,35.0,6.06,0.0,0.4379,6.031,23.3,6.6407,1.0,304.0,16.9,362.25,7.83,19.4
3,7.05042,0.0,18.1,0.0,0.614,6.103,85.1,2.0218,24.0,666.0,20.2,2.52,23.29,13.4
4,0.7258,0.0,8.14,0.0,0.538,5.727,69.5,3.7965,4.0,307.0,21.0,390.95,11.28,18.2


### Separating Input Features and Output Features

In [None]:
X = boston_data.drop('MEDV', axis = 1)    # Input Variables/features
y = boston_data.MEDV      # output variables/features

# Splitting the data

In [None]:
# import train_test_split
from sklearn.model_selection import train_test_split 

# Assign variables to capture train test split output
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# X_train: independent/input feature data for training the model
# y_train: dependent/output feature data for training the model
# X_test: independent/input feature data for testing the model; will be used to predict the output values
# y_test: original dependent/output values of X_test; We will compare this values with our predicted values to check the performance of our built model.
 
# test_size = 0.20: 20% of the data will go for test set and 70% of the data will go for train set
# random_state = 42: this will fix the split i.e. there will be same split for each time you run the code

In [None]:
# find the number of input features
n_features = X.shape[1]
print(n_features)

13


# Training our model


### 1. Define the model

In [None]:
from tensorflow.keras import Sequential    # import Sequential from tensorflow.keras
from tensorflow.keras.layers import Dense  # import Dense from tensorflow.keras.layers
from numpy.random import seed     # seed helps you to fix the randomness in the neural network.  
import tensorflow

In [None]:
# define the model
model = Sequential()
model.add(Dense(10, activation='relu', input_shape=(n_features,)))
model.add(Dense(8, activation='relu'))
model.add(Dense(1))

### 2. Compile the model

In [None]:
# import RMSprop optimizer
from tensorflow.keras.optimizers import RMSprop
optimizer = RMSprop(0.01)    # 0.01 is the learning rate

In [None]:
model.compile(loss='mean_squared_error',optimizer=optimizer)    # compile the model

### 3. Fitting the model

In [None]:
seed_value = 42
seed(seed_value)        # If you build the model with given parameters, set_random_seed will help you produce the same result on multiple execution


# Recommended by Keras -------------------------------------------------------------------------------------
# 1. Set `PYTHONHASHSEED` environment variable at a fixed value
import os
os.environ['PYTHONHASHSEED']=str(seed_value)

# 2. Set `python` built-in pseudo-random generator at a fixed value
import random
random.seed(seed_value)

# 3. Set `numpy` pseudo-random generator at a fixed value
import numpy as np
np.random.seed(seed_value)
# Recommended by Keras -------------------------------------------------------------------------------------


# 4. Set the `tensorflow` pseudo-random generator at a fixed value
tensorflow.random.set_seed(seed_value) 
model.fit(X_train, y_train, epochs=10, batch_size=30, verbose = 1)    # fit the model

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f833662e160>

### 4. Evaluate the model

In [None]:
model.evaluate(X_test, y_test)



64.87602233886719

#### Hyperparameter Tunning
The hyperparameters here in this notebook are:
1. Learning Rate
2. Epochs
3. Batch Size

We can try and change the values of these parameters and see the performance  of the model (evaluate the model) on X_test data

**Learning Rate**

In [None]:
####################### Complete example to check the performance of the model with different learning rates #######################################
# define the model
model = Sequential()
model.add(Dense(10, activation='relu', input_shape=(n_features,)))
model.add(Dense(8, activation='relu'))
model.add(Dense(1))

optimizer = RMSprop(0.1)    # 0.1 is the learning rate
model.compile(loss='mean_squared_error',optimizer=optimizer)    # compile the model

# fit the model 
model.fit(X_train, y_train, epochs=10, batch_size=30, verbose = 1)

# evaluate the model
print('The MSE value is: ', model.evaluate(X_test, y_test))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
The MSE value is:  122.82394409179688


In [None]:
# Play with learning rate
learning_rate = ?          # Replace ? with a floating-point number(decimal no.)
epochs = 10
optimizer = RMSprop(learning_rate)
model.compile(loss='mean_squared_error',optimizer=optimizer)    # Compile the model
model.fit(X_train, y_train, epochs=epochs, batch_size=30)       # Fit the model
model.evaluate(X_test, y_test)                                  # Evaluate the model

**Epochs**

A full training pass over the entire dataset such that each example has been seen once. Thus, an epoch represents N/batch size training iterations, where N is the total number of examples.

In [None]:
####################### Complete example to check the performance of the model with different epochs and learning rate = 0.01 #######################################
# define the model
model = Sequential()
model.add(Dense(10, activation='relu', input_shape=(n_features,)))
model.add(Dense(8, activation='relu'))
model.add(Dense(1))

optimizer = RMSprop(0.1)    # 0.1 is the learning rate
model.compile(loss='mean_squared_error',optimizer=optimizer)    # Compile the model

# fit the model 
model.fit(X_train, y_train, epochs=100, batch_size=30, verbose = 1)

# evaluate the model
print('The MSE value is: ', model.evaluate(X_test, y_test))

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

In [None]:
# Play with epochs
learning_rate = 0.01         
epochs = ?             # Replace ? with an integer
optimizer = RMSprop(learning_rate)
model.compile(loss='mean_squared_error',optimizer=optimizer)    # Compile the model
model.fit(X_train, y_train, epochs=epochs, batch_size=30)       # Fit the model
model.evaluate(X_test, y_test)                                  # Evaluate the model

In [None]:
# play with learning rate and epochs
learning_rate = ?        # Replace ? with a floating-point number
epochs = ?             # Replace ? with an integer
optimizer = RMSprop(learning_rate)
model.compile(loss='mean_squared_error',optimizer=optimizer)    # Compile the model
model.fit(X_train, y_train, epochs=epochs, batch_size=30)       # Fit the model
model.evaluate(X_test, y_test)                                  # Evaluate the model

**Batch Size**

The number of examples in a batch.

In [None]:
####################### Complete example to check the performance of the model with different batch size while keeping epochs as 30 and learning rate as 0.01 #######################################
# define the model
model = Sequential()
model.add(Dense(10, activation='relu', input_shape=(n_features,)))
model.add(Dense(8, activation='relu'))
model.add(Dense(1))

optimizer = RMSprop(0.1)    # 0.1 is the learning rate
model.compile(loss='mean_squared_error',optimizer=optimizer)    # Compile the model

# fit the model 
model.fit(X_train, y_train, epochs=10, batch_size=40, verbose = 1)

# evaluate the model
print('The MSE value is: ', model.evaluate(X_test, y_test))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
The MSE value is:  441.9634094238281


In [None]:
# play with batch size
learning_rate = 0.01        
epochs = 150         
batch = ?      # Replace ? with an integer    
optimizer = RMSprop(learning_rate)
model.compile(loss='mean_squared_error',optimizer=optimizer)    # compile the model
model.fit(X_train, y_train, epochs=epochs, batch_size=batch)     # fit the model
model.evaluate(X_test, y_test)       # Evaluate the model

#### 5. Make a Prediction

In [None]:
# Load new test data
new_test_data = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/Boston_Housing/Testing_set_boston.csv')

In [None]:
# make a prediction
model.predict(new_test_data)