# Multivariate Batch Gradient Descent

Linear regression with multiple variables is also known as "multivariate linear regression".
The multivariable form of the hypothesis function accommodating these multiple features is as follows:

hθ(x) = θ0 + θ1x1 + θ2x2 + θ3x3 + ⋯ + θnxn

The gradient descent equation itself is generally the same form; we just have to repeat it for our 'n+1' features.
We can speed up gradient descent by having each of our input values in roughly the same range.
Two techniques to help with this are feature scaling and mean normalization. Feature scaling involves dividing the input values by the range (i.e. the maximum value minus the minimum value) of the input variable, resulting in a new range of just 1. Mean normalization involves subtracting the average value for an input variable from the values for that input variable resulting in a new average value for the input variable of just zero. To implement both of these techniques, adjust your input values as shown in this formula:

In [2]:
# Import libraries
%matplotlib inline
from matplotlib import cm
import matplotlib.pyplot as plt
from matplotlib.ticker import LinearLocator, FormatStrFormatter
from mpl_toolkits.mplot3d.axes3d import Axes3D
import numpy as np
import pandas as pd
from scipy import stats



# Load data and specify column names
data = pd.read_csv("/Users/Valina/Documents/DSML-Projects/MLFoundations_CaseStudyApproach/data/ex1data2.txt", names=['size','nbedrooms','price'])


In [3]:
data.shape

(47, 3)

In [8]:
print(data.describe())

              size  nbedrooms          price
count    47.000000  47.000000      47.000000
mean   2000.680851   3.170213  340412.659574
std     794.702354   0.760982  125039.899586
min     852.000000   1.000000  169900.000000
25%    1432.000000   3.000000  249900.000000
50%    1888.000000   3.000000  299900.000000
75%    2269.000000   4.000000  384450.000000
max    4478.000000   5.000000  699900.000000


House sizes are about 1000 times the number of bedrooms. When features differ by orders of mag- nitude, first performing feature scaling can make gradient descent converge much more quickly.


In [None]:
# feature scaling and mean normalization
newdata = pd.DataFrame()
newdata['size'] = (data['size']-data['size'].mean())/data['size'].std()
newdata['nbedrooms'] = (data['nbedrooms']-data['nbedrooms'].mean())/data['nbedrooms'].std()
newdata['price'] = (data['price']-data['price'].mean())/data['price'].std()

In [41]:
# Define cost function and its derivatives
def J(par0, par1, par2, df):
    cost = (par0*df['size'] + par1*df['nbedrooms'] + par2 - df['price'])**2
    return cost.sum()/(2*len(df.index))

def dJ0(par0, par1, par2, df):
    dcosti = (par0*df['size'] + par1*df['nbedrooms'] + par2 - df['price'])
    return dcosti.sum()/len(df.index) 


def dJi(par0, par1, par2, df, i_par):
    dcosti = (par0*df['size'] + par1*df['nbedrooms'] + par2 - df['price'])*(df.iloc[:,i_par])
    return dcosti.sum()/len(df.index) 

In [42]:
# initialize parameters and variables
intercept2 = 0.0
slope0 = 0.0
slope1 = 0.0
cost_fn0 = 0.0
dcost_fn = 10000
alpha = 0.01
precision = 1e-06
i = 0

while dcost_fn > precision:
    i+= 1
    # update parameters simultaneously
    theta2 = intercept2 - alpha*dJ0(slope0, slope1, intercept2, newdata)
    theta0 = slope0 - alpha*dJi(slope0, slope1, intercept2, newdata, 0)
    theta1 = slope1 - alpha*dJi(slope0, slope1, intercept2, newdata, 1)
    intercept2 = theta2
    slope0 = theta0
    slope1 = theta1
    # recalculate cost and its gradient
    cost_fn = J(slope0, slope1, intercept2, newdata)
    dcost_fn = abs(cost_fn - cost_fn0)
    # store value for next iteration
    cost_fn0 = cost_fn
    
print(i, dcost_fn, cost_fn, slope0, slope1, intercept2)

778 9.95026441991e-07 0.130801255424 0.868438441107 -0.0368565383282 -1.10585299783e-16
