<i>In this part, you will implement linear regression with multiple variables to predict the prices of houses. Suppose you are selling your house and you want to know what a good market price would be. One way to do this is to ﬁrst collect information on recent houses sold and make a model of housing prices. The ﬁle ex1data2.txt contains a training set of housing prices in Portland, Oregon. The ﬁrst column is the size of the house (in square feet), the second column is the number of bedrooms, and the third column is the price of the house. </i>

<b>STEP 1 : Feature Normalization</b>

We will start by loading and displaying some values from this dataset. By looking at the values, note that house sizes are about 1000 times the number of bedrooms. When features diﬀer by orders of magnitude, ﬁrst performing feature scaling can make gradient descent converge much more quickly.


In [43]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

col_names = ['House Size', 'No. of Bedrooms', 'House Price']
df = pd.read_csv('ex1data2.txt', names = col_names)

In [44]:
df.head(5)

Unnamed: 0,House Size,No. of Bedrooms,House Price
0,2104,3,399900
1,1600,3,329900
2,2400,3,369000
3,1416,2,232000
4,3000,4,539900


In [45]:
def normalize_features(X):
    X_copy = X.copy(deep=True)
    X_copy = (X_copy - X_copy.mean())/X_copy.std()
    return X_copy

In [46]:
# we store this value to later normalize the value we make prediction for
houseSize_mean, bedroom_mean, price_mean = df.mean()
houseSize_sd, bedroom_sd, price_sd = df.std()

In [51]:
def compute_cost(X, Y, theta) :
    hypothesis_matrix = np.dot(X.T,theta)
    cost = np.sum(np.square(np.subtract(hypothesis_matrix, Y)), axis=0)
    cost = cost / (2 * len(X))
    print('Cost value is ' + str(cost))

In [55]:
def gradient_descent(X, theta, Y, num_of_iter, alpha):
    m = len(X)
    alpha = alpha / m
    temp_theta = np.matrix.copy(theta)
    for i in range(num_of_iter):
        #compute_cost(X, Y, theta)
        hypothesis_matrix = np.dot(X, theta)
        for j in range(len(theta)):
            value = (alpha * np.sum(np.multiply(np.subtract(hypothesis_matrix, Y), np.row_stack(X[:, j]))))
            #print(value)
            theta[j][0] = temp_theta[j][0] - value
        temp_theta = np.matrix.copy(theta)
        '''if i == 100 :
            print(temp_theta)'''
    return theta

In [56]:
def predict(theta):
    size = input('Enter size of house : ')
    no_of_bedrooms = input('Enter no of bedrooms : ')
    size_norm = (size - houseSize_mean)/ houseSize_sd
    no_of_bedrooms = (no_of_bedrooms - bedroom_mean) / bedroom_sd
    input_matrix = np.column_stack([1, float(size), float(no_of_bedrooms)])
    price = np.dot(input_matrix,theta)
    return price


def run():
    names = ['size', 'no_of_bedrooms', 'price']
    dataset = pd.read_csv('ex1data2.txt', names=names)
    dataset_norm = normalize_features(dataset)
    array = dataset_norm.values

    # initialize values
    x = array[:, 0:2]  
    x = np.column_stack([np.ones(len(x)), x])
    y = array[:, 2]
    y = np.row_stack(y)

    n = len(x[0]) # number of features
    theta = np.zeros(shape=(n, 1))

    # Gradient Descent
    GDtheta = gradient_descent(x, theta, y, 400, 0.01)
    print("theta from gradient descent: \n", GDtheta)

    profit = predict(GDtheta)
    print("profit predicted by gradient descent: ", profit)


if __name__ == '__main__':
    run()

('theta from gradient descent: \n', array([[-1.23287905e-16],
       [ 8.00441430e-01],
       [ 2.93790099e-02]]))
Enter size of house : 2104
Enter no of bedrooms : 3
('profit predicted by gradient descent: ', array([[1684.12219692]]))
