## 3.2 Gradient Descent

Loss function $$J(w) = \frac{1}{n} \sum \limits_{i=1} ^{n} (y^{i}-\hat{y^{i}}) $$
where n is the number of samples, $y^{i}$ are the observed values, $\hat{y^{i}}$ are the predicted values

In [1]:
import pickle
import matplotlib.pylab as plt
import pandas as pd
import numpy as np

In [2]:
## load data
def load_data():
    df = pd.read_csv('normalized.txt', header=None)
    df.columns = ['area', 'bdrm', 'price']
    df ['intercept'] = 1
    X = df[['intercept', 'area', 'bdrm']].values
    y = df['price'].values
    return X, y

In [3]:
def compute_error(pred, truth):
    # compute the mean squared error
    ## to do]
    
    return ((pred - truth).T@(pred - truth))/(len(truth))

In [4]:
def compute_grad(X, y, w):
    # compute the gradient
    ## to do
    ### Since in this particular case there are only two features
    # y = y[:, None]
    n = X.shape[0]
    gradient = - (2/n)*(X.T)@(y - X@w) 
    
    return gradient

In [5]:
## training with gradient descent
def train(X, y, learning_rate, max_num_iter=1000):
    """
    write a function which apply Gradient descent to estimate parameter w
    Please also record the mean squared error in the iteration process and save it to error history
    """
    num_dim = X.shape[1]
    error_history = []
    
    # initialize w
    w = np.zeros((num_dim, 1), dtype=np.float32)
    
    # ensure dimension is 2-dim
    y = y.reshape(y.shape[0], 1)
    
    for steps in range(max_num_iter):
        # compute gradient descent over all training examples
        # perform gradient descent (vector update)
        # get predictions
        # compute MSE and add it into error_history 
        ## to do
        # w -= learning_rate*compute_grad(X, y, w)
        predictions = X@w
        error = compute_error(predictions, y)[0,0]
        error_history.append(error)
        w -= learning_rate*compute_grad(X, y, w)
    
    return w, error_history

In [6]:
## fit regression model and plot the SSE vs. number of iterations
X, y = load_data()
trained_w, error_history = train(X, y, learning_rate=0.3)
## to do



## 3.3 Make prediction based on the linear regression model you get

In [7]:
# get the values used for normalization
f = open('mean_std.pk', 'rb')
norm_params = pickle.load(f)
l, b, p = norm_params['area'], norm_params['n_bedroom'], norm_params['price']
f.close()

In [8]:
# first normalize the features and then make predictions
## to do
X, y = load_data()
area = (3150 - l["mean"])/l["std"]
bedroom = (4 - b["mean"])/b["std"]
features = np.array([[1, area, bedroom]])
predictions = features@trained_w
pred_price = predictions*p["std"] + p["mean"]
 
# print out the predicted value
print("The w is {}, and pred_price is {}\n".format(trained_w, pred_price[0,0]))

The w is [[-6.1525851e-09]
 [ 8.8476592e-01]
 [-5.3178787e-02]], and pred_price is 493159.44818594644

