# Gradient Decent For Multivariable Linear Regression

| Size (sqft) | Number of Bedrooms  | Number of floors | Age of  Home | Price (1000s dollars)  |   
| ----------------| ------------------- |----------------- |--------------|-------------- |  
| 2104            | 5                   | 1                | 45           | 460           |  
| 1416            | 3                   | 2                | 40           | 232           |  
| 852             | 2                   | 1                | 35           | 178           |  


The dataset is what we are going to use for our task.

In [1]:
#Clearly from above cases our 
m=3
n=4

In [2]:
import numpy as np
#Lets also set the precision of our data via numpy
np.set_printoptions(precision=2)  # reduced display precision on numpy arrays

In [3]:
#Step1: Lets first Declare the X and Y features and labels
X=np.array([[2014,5,1,45],[1416,3,2,40],[852,2,1,35]])
Y=np.array([460,232,178])

In [4]:
#Step2:New lets declare the initial values of our parameters. Since there are 4 features there will be 4Ws,w1,w2,w3,w4 hence our
#n=4 and m=3. There will be a b too as the reference equations remains to be f=wx+b
w=np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])
b=785.1811367994083

In [9]:
#Step3:Lets get to the main work and write the function for generating predictions
#I for once what this to be vector based from the start. So to refer to scalar calculations refer to default notes.
def prediction(X,Y,w,b,m):
    y_=[]
    for i in range(0,m,1):
        predict=np.dot(w,X[i])+b
        y_.append(predict)
    return(y_)

In [10]:
#Step4:Lets move on to the next step. Lets write a function that calculates the cost function.
#General Logic: We already have a function that calculates the y_hat. The simple task of this function is to feed value out of
#The previously created functions and feed off it to find cost functions.
def cost_func(X,Y,w,b,m,n):
    y_=prediction(X,Y,w,b,m)
    error=y_-y
    errorSquared=error*error
    summation=0
    for i in range(0,len(error),1):
        summation=summation+errorSquared[i]
    cost_function=summation/(2*m)
    return(cost_function)

In [12]:
#Step5:Lets move to the main guy. Lets create the function for defining the gradient decent's derivative
def derivative(X,Y,w,b,m,n):
    dj_dw=np.zeros(4)#Here we are initialising the values of dj_dw for each value w.
    dj_db=0
    y_=prediction(X,Y,w,b,m)
    error=y_-y
    for i in range(0,m,1):  
        for j in range(0,n,1):
            dj_dw[j]=dj_dw[j]+error[i]*X[i,j]       #Note that we are try to calculate dj_w1,dj_w2,dj_w3,dj_w4. Like in the single
        dj_db=dj_db+error[i]                        # feature case we used to do it for 1 x, here we are calculating it for 4 
    dj_dw=dj_dw/m                                   #different values each correspoding to one of the dj_dw's.
    dj_db=dj_db/m
    return(dj_dw,dj_db)
            

In [28]:
#Step6:Let move onto the boss. Let write the gradient decent algorithm.
def gradient_decent(X,Y,w,b,m,n,iters,alpha):
    j_array=[]#List, which we later convert to array, used to store values of j for different values of Ws
    for i in range(0,iters,1):
        j_array.append(cost_func(X,Y,w,b,m,n))
        dw,db=derivative(X,Y,w,b,m,n)
        w=w-alpha*dw#Vector substraction
        b=b-alpha*db#Scalar Substraction
    return(np.array(j_array),w,b)

In [29]:
#Lets call out the whole process
iters=1000
J,W,B=gradient_decent(X,Y,w,b,3,4,iters,0.1)
print(J,W,B)

[2.07e+002 6.33e+012 3.25e+023 1.66e+034 8.52e+044 4.37e+055 2.24e+066
 1.15e+077 5.88e+087 3.02e+098 1.55e+109 7.92e+119 4.06e+130 2.08e+141
 1.07e+152 5.47e+162 2.80e+173 1.44e+184 7.36e+194 3.77e+205 1.93e+216
 9.91e+226 5.08e+237 2.60e+248 1.33e+259 6.84e+269 3.50e+280 1.80e+291
 9.21e+301       inf       inf       inf       inf       inf       inf
       inf       inf       inf       inf       inf       inf       inf
       inf       inf       inf       inf       inf       inf       inf
       inf       inf       inf       inf       inf       inf       inf
       inf       inf       inf       nan       nan       nan       nan
       nan       nan       nan       nan       nan       nan       nan
       nan       nan       nan       nan       nan       nan       nan
       nan       nan       nan       nan       nan       nan       nan
       nan       nan       nan       nan       nan       nan       nan
       nan       nan       nan       nan       nan       nan       nan
      

  intermediate=(prediction(X[i],Y,w,b)-Y[i])**2
  dj_dw[j]=dj_dw[j]+intermediate*X[i,j]#Note that we are try to calculate dj_w1,dj_w2,dj_w3,dj_w4. Like in the single
  w=w-alpha*dw#Vector substraction
