# Multiple Linear Regression Model

Here we will develop a model to estimate price of house given the following parameters
 -  Size
 -  No. of Bedrooms
 -  No. of Bathrooms
 -  Distance to City
 -  Age of House
 -  School Quality

For the model I will be using a training set of 1000 records generated from AI.

In [1]:
#importing required libraries

import numpy as np
import pandas as pd
import time

## Linear Regression Model

The equation of the linear regression model is given by:

$$ f(x_{1},x_{2},....,x_{n}) = \sum \limits _{i=1} ^{n} w_{i}x_{i}\:+\:B $$

Where:
- $f(x_{1},x_{2},....,x_{n})$ is the prediction of the model ( $ \hat{Y} $ ) for given set of inputs
- n is the no. of independent variables
- $x_{i}$ is the independent variable
- $B$ is the y-intercept
- $w_{i}$ is the slope of the line with respect to $x_{i}$

## Cost Function

Equation of the model should give good approxiamtion for the given inputs. Therefore we use least squares approach to ensure our models gives better approximation for the given inputs.<br> 
Cost function gives normalized value for sum of squares of error generated by model for known output values.

$$ J(w,b) = \frac{1}{2m} \sum \limits _{i=1} ^{m} [f(x_{1},x_{2},....,x_{n})-y_{i}]^2 $$

Where :
 - $m$ is the no. of training examples
 - $y_{i}$ is the actual output of for given set of inputs 


In [None]:
def model_func(X,W,B):
    '''
    This function returns the output of the model
        X: input data as a numpy 2D array
        W: Weights as a numpy array
        B: Bias
    '''
    Y_hat=np.dot(X,W)+B
    return Y_hat

def cost_func(Y_hat,Y):
    '''
    This function returns the cost of the model
        input_data: input data as a 2-D numpy array
        output_data: output data as a numpy array
    '''
    return np.mean((Y_hat - Y)**2)/2

we need to find $w_{i},B$ for $f$ minimize the cost functions. lower the value of cost function is, better the approximation of the model gets.<br>
By differentiating $J(w,B)$ with respect to the parameters and equating it to zero we can have a set of simultaneous equations which can be solved to find the parameters.<br>
For data sets with large number of inputs it could get complex to solve the set of simultaneous equations. So here we use ___gradient descent algorithm___ which is an _iteretive technique_ to find $w,B$<br>

$$ w_{n} = w_{n-1}-\alpha \frac {\partial{J(w,B)}}{\partial{w}} $$ 
$$B_{n} = B_{n-1}-\alpha \frac {\partial{J(w,B)}}{\partial{B}}$$


$\frac {\displaystyle\partial{J(w,B)}}{\displaystyle\partial{w_{k}}} = \sum \limits _{i=1} ^{m} [w_{1}x_{i,1}+w_{2}x_{i,2}+...+w_{k}x_{i,k}+...+w_{n}x_{i,n}+B-y_{i}] \frac {x_{k}}{m}$


In [None]:
def para_update(X,Y,W,B,alpha,error):
    '''
    This function updates the parameters of the model
        X: input data as a numpy array
        Y: output data as a numpy array
        W: Weights as a numpy array
        B: Bias
        alpha: learning rate
        error : allowed relative error
    '''

    Y_hat = model_func(X,W,B)
    Wn = W - alpha*np.dot(X,(Y_hat - Y))
    Bn = B - alpha*np.sum(Y_hat - Y)
    rel_error_w = np.max(np.abs(Wn-W))/np.max(np.abs(W))
    rel_error_b=abs(Bn/B-1)
    if rel_error_w<error and rel_error_b<error:
        return Wn,Bn
    else:
        return para_update(X,Y,Wn,Bn,alpha,error)

## Calculating Parameters for the Data set


[14 20 26]
