# Linear Regression

In machine learning linear regression is an algorithm that fits a line in the given data-set and hence helps determine the relationship, predict values of, the dependent variable Y using the independent variable (explanotary variable) X.
We are going to implement linear regression using, the iterative optimization technique of finding the parameters, vanilla gradient descent.

In [4]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import style
import random
import math

# creating a n-dimensional dataset. Here we are taking the no. of features to be 9
# let m be the no. of training examples, we are taking the m to be 150
X = np.random.rand(10,3)
y = np.random.rand(10,1)
print ('The Feature set is \n',X)
print ('The set of output values of test case is \n', y)


The Feature set is 
 [[ 0.97852419  0.92455041  0.85851857]
 [ 0.51976042  0.32056401  0.80391003]
 [ 0.9317216   0.15057869  0.86471609]
 [ 0.78869088  0.75622387  0.74278469]
 [ 0.68917156  0.32414605  0.23585787]
 [ 0.33046011  0.5238525   0.49533811]
 [ 0.86759617  0.40921653  0.32566359]
 [ 0.31287049  0.3995969   0.8253191 ]
 [ 0.28780861  0.68196634  0.83703102]
 [ 0.95015994  0.11576446  0.53697296]]
The set of output values of test case is 
 [[ 0.70239582]
 [ 0.75733771]
 [ 0.9964127 ]
 [ 0.63264805]
 [ 0.45813932]
 [ 0.15012015]
 [ 0.61330861]
 [ 0.0486289 ]
 [ 0.37916888]
 [ 0.10824838]]


## Hypothesis Value

Hypothesis is a function that we fit our dataset to. We first assume a hypothesis function with parameters - B0, B1, B2,..., Bn as equal to B0x0 + B1x1 + B2x2 +... + Bnxn,  where n is equal to the no. of features in the given data set.

In [5]:
def hypothesis_value(B,X) :
    h=0
    for i in range(3):
        h+= B[i]*X[i]
    return h

## Gradient Descent 

Gradient Descent is an optimization algorithm. Gradient descent itself is of many types - like stochastic gradient descent, batch gradient descent,etc. Here we will be using the vanilla batch gradient descent algorithm for our task of finding the lowest point of difference in our objective function (cost function).

In [6]:
def grad_descent(X,y,B,i):
    G = 0
    for j in range(10):
        x = X[j]
        # after slicing the row of feature set that we want to evaluate our hypothesis function on, we call the hypothesis_value funtion to know its value.
        h = hypothesis_value(B,x)
        xi = X[:,i]
        G = G + (h - y[j])*xi[j]
    return G

        

## Convergence 

This function determines whether our gradient descent algorithm has converged. We will be using the value of two last gradient runs of parameter updation step to observe whether the optimization algorithm has converged. Gradient descent has been proved to be a reliable indicator of whether the cost function has attained a local minima. 

In [7]:
def convergence(g1,g2):
    if abs(g1-g2) < 0.000001:
        return False
    else:
        return True


## Parameter update 

In [8]:
B = np.zeros(3)
conv = True
c= 0
alpha = 0.5
while conv:
    gamma1 = 0
    for i in range(3):
        gamma = grad_descent(X,y,B,i)
        B[i] = B[i] - alpha*gamma
        gamma1 += gamma
    if c==0:
        gamma2 = gamma1-1
    conv = convergence(gamma1,gamma2)
    c+=1
    gamma2 = gamma1
    

print ('The parameter array is equal to \n')
print (B)
    

The parameter array is equal to 

[ 0.48784393 -0.01542429  0.27486055]


## Test Case 

In this part we will randomy take an array of n elements (representing a test case) and then using that as test set we will try to predict the value of the output variable (dependent variable).

In [11]:
X_test = np.random.rand(3)
print ('With the test case of \n',X_test)
h = hypothesis_value(B,X_test)
print ('The predicted value of the output variable is   ', h)


With the test case of 
 [ 0.35110989  0.0405294   0.84553279]
The predicted value of the output variable is    0.403065300306
