# Basics of Linear Regression - House prices

In this project, I aim to calculate the cost of house prices, using various factors such as size and number of bedrooms. I will first attempt to construct my own functions to implement the gradient descent algorithm using numpy, after which I can cross validate my predictions using scikit learn.

In [77]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import SGDRegressor
from sklearn.preprocessing import StandardScaler

In [78]:
houses = pd.read_csv("houses.txt",header=None)
houses.columns = ['Size (sqft)','Bedrooms','Floors','Age','Price (1000USD)']
print(houses.shape)
houses.head()

(100, 5)


Unnamed: 0,Size (sqft),Bedrooms,Floors,Age,Price (1000USD)
0,952.0,2.0,1.0,65.0,271.5
1,1244.0,3.0,1.0,64.0,300.0
2,1947.0,3.0,2.0,17.0,509.8
3,1725.0,3.0,2.0,42.0,394.0
4,1959.0,3.0,2.0,15.0,540.0


In [79]:
X = np.array(houses[['Size (sqft)','Bedrooms','Floors','Age']])
y = np.array(houses['Price (1000USD)'])

## The cost function

We first start by defining the cost function $J(\mathbf{w},b)$, which is given by:

$$J(\mathbf{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 $$ 
where:
$$ f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)} + b $$
and $m$ is the number of observations

In [80]:
def compute_cost(X, y, w, b): 
# w is an ndarray (n,) and b is a scalar 
    m = X.shape[0]
    cost = 0.0
    for i in range(m):                                
        f_wb_i = np.dot(X[i], w) + b
        cost = cost + (f_wb_i - y[i])**2
    cost = cost / (2 * m)    
    return cost

## Gradient Descent

Now we can minimise this function to find optimal parameters $\mathbf{w}$ and $b$, starting with some initial values. First we must compute the partial derivatives:

$$\frac{\partial J(\mathbf{w},b)}{w_j} = \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)})- y^{(i)})x^{(i)}_j$$
$$\frac{\partial J(\mathbf{w},b)}{b} = \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)})- y^{(i)})$$

for  $j=1,...,n$  where  $n$  is the number of features

Using the partials, we can repeat the following until convergence:

$$ w_j = w_j -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j}$$
$$ b= b -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial b}$$

where $\alpha$ is a constant determining step size

In [81]:
def compute_gradient(X, y, w, b): 

    m,n = X.shape
    dj_dw = np.zeros((n,))
    dj_db = 0.

    for i in range(m):                             
        err = (np.dot(X[i], w) + b) - y[i]   
        for j in range(n):                         
            dj_dw[j] = dj_dw[j] + err * X[i, j]    
        dj_db = dj_db + err                        
    dj_dw = dj_dw / m                                
    dj_db = dj_db / m                                
        
    return dj_db, dj_dw

In [44]:
def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters): 
# We can choose functions as well as a variable to set the number of iterations we want to run
    
    w = w_in
    b = b_in
    J = []
    
    for i in range(num_iters):

        # Calculate the gradient and update the parameters
        dj_db,dj_dw = gradient_function(X, y, w, b)

        # Update Parameters using w, b, alpha and gradient
        w = w - alpha * dj_dw
        b = b - alpha * dj_db
        
        # Save cost at each iteration
        J.append( cost_function(X, y, w, b))

        # Print cost every at intervals 10 times or as many iterations if < 10
        if i% 100 == 0:
            print(f"Iteration {i:4d}: Cost {J[-1]:8.2f}   ")
        
    return w, b

## Initial Implementation
We can input our test data to obtain optimal parameter values. We should also ensure that cost is always decreasing with more iterations

In [82]:
# initialize parameters, iterations and learning rate
initial_w = np.zeros(4,)
initial_b = 0.
iterations = 1000
alpha = 1.0e-7
# run gradient descent 
w_final, b_final = gradient_descent(X, y, initial_w, initial_b,
                                                    compute_cost, compute_gradient, 
                                                    alpha, iterations)
print(f"parameters found by gradient descent: b = {b_final:0.2f}, w = {w_final} ")

Iteration    0: Cost 44154.43   
Iteration  100: Cost  1565.13   
Iteration  200: Cost  1560.99   
Iteration  300: Cost  1556.92   
Iteration  400: Cost  1552.92   
Iteration  500: Cost  1549.00   
Iteration  600: Cost  1545.15   
Iteration  700: Cost  1541.36   
Iteration  800: Cost  1537.65   
Iteration  900: Cost  1534.01   
parameters found by gradient descent: b = 0.00, w = [ 2.54443890e-01 -1.51872883e-04 -5.74533712e-04 -5.63067399e-02] 


## Z-score normalisation

In order to improve efficiency of our model, we can apply z-score normalisation to each parameter. To implement z-score normalization, we can use this formula:
$$x^{(i)}_j = \dfrac{x^{(i)}_j - \mu_j}{\sigma_j} \tag{4}$$ 
where $j$ represents features. $\mu_j$ and $\sigma_j$ are the mean and the standard deviation of feature $j$ respectively.

In [49]:
def ZSNorm(X):
    
    mu     = np.mean(X, axis=0)
    sigma  = np.std(X, axis=0)     
    
    X_norm = (X - mu) / sigma      

    return (X_norm, mu, sigma)

In [72]:
X_ZSNorm, X_mu, X_sigma = ZSNorm(X)
initial_w = np.zeros(4,)
initial_b = 0.
iterations = 1000
alpha = 1.0e-1 # we are able to use a much larger learning rate after normalisation
w_norm, b_norm = gradient_descent(X_ZSNorm, y, initial_w, initial_b,
                                                    compute_cost, compute_gradient, 
                                                    alpha, iterations)
print(f"parameters found by gradient descent after normalisation: b = {b_norm:0.2f}, w = {w_norm} ")

Iteration    0: Cost 57326.42   
Iteration  100: Cost   221.73   
Iteration  200: Cost   219.71   
Iteration  300: Cost   219.71   
Iteration  400: Cost   219.71   
Iteration  500: Cost   219.71   
Iteration  600: Cost   219.71   
Iteration  700: Cost   219.71   
Iteration  800: Cost   219.71   
Iteration  900: Cost   219.71   
parameters found by gradient descent after normalisation: b = 362.24, w = [110.61335173 -21.47323884 -32.66070323 -37.77938362] 


We can see that after normalisation, we can increase the learning rate, allowing the algorithm to converge to the lowest cost before even the 200th iteration

## Predictions
We can now use the model trained on this dataset to predict the price of a house using its features. Suppose we want to consider a house which has area 1340sqft, has 4 bedrooms, 2 floors and is 16 years old.

In [60]:
# First, normalize out example
x_house = np.array([1340, 4, 2, 16])
x_house_norm = (x_house - X_mu) / X_sigma
x_house_predict = np.dot(x_house_norm, w_norm) + b_norm
print(f" predicted price of a house with 1340 sqft, 4 bedrooms, 2 floors, 16 years old = ${x_house_predict*1000:0.0f}")

 predicted price of a house with 1340 sqft, 4 bedrooms, 2 floors, 16 years old = $291480


## Scikit Learn

Finally, we use Scikit Learn to normalize and perform gradient descent

In [76]:
scaler = StandardScaler()
X_norm = scaler.fit_transform(X)
sgdr = SGDRegressor(max_iter=1000)
sgdr.fit(X_norm, y)
b_sk = sgdr.intercept_
w_sk = sgdr.coef_
print(f"scikit learn model parameters: w = {w_sk}, b = {b_sk}")
print(f"previous model parameters: w = {w_norm}, b = {b_norm}")

scikit learn model parameters: w = [110.15324212 -21.2043253  -32.37601808 -37.84995977], b = [362.22575239]
previous model parameters: w = [110.61335173 -21.47323884 -32.66070323 -37.77938362], b = 362.2395199999998
