# Machine Learning With Python: Linear Regression Multiple Variables

## Sample problem of predicting home price in monroe, new jersey (USA)

Below is the table containing home prices in monroe twp, NJ. Here price depends on **area (square feet), bed rooms and age of the home (in years)**. Given these prices we have to predict prices of new homes based on area, bed rooms and age.

<img src="homeprices.jpg" style='height:200px;width:350px'>

Given these home prices find out price of a home that has,

**3000 sqr ft area, 3 bedrooms, 40 year old**

**2500 sqr ft area, 4 bedrooms,  5 year old**

We will use regression with multiple variables here. Price can be calculated using following equation,

<img src="equation.jpg" >

Here area, bedrooms, age are called independant variables or **features** whereas price is a dependant variable

## Collecting Data

In [1]:
import pandas as pd
import numpy as np
from sklearn import linear_model

In [2]:
df = pd.read_csv('homeprices.csv')
df

Unnamed: 0,area,bedrooms,age,price
0,2600,3.0,20,550000
1,3000,4.0,15,565000
2,3200,,18,610000
3,3600,3.0,30,595000
4,4000,5.0,8,760000
5,4100,6.0,8,810000


In [3]:
import math
median_bedrooms = math.floor(df.bedrooms.median())
median_bedrooms

4

### Data Preprocessing: Fill NA values with median value of a column

In [4]:
df.bedrooms.median()

4.0

In [5]:
df.bedrooms = df.bedrooms.fillna(df.bedrooms.median())
df

Unnamed: 0,area,bedrooms,age,price
0,2600,3.0,20,550000
1,3000,4.0,15,565000
2,3200,4.0,18,610000
3,3600,3.0,30,595000
4,4000,5.0,8,760000
5,4100,6.0,8,810000


## Creating and Fitting the Model

In [6]:
reg = linear_model.LinearRegression()
reg.fit(df.drop('price',axis='columns'), df.price) # (independent variable , dependent variable)

LinearRegression()

$y = m_1*area + m_2*bedrooms + m_3*age + b$

In [7]:
m = reg.coef_
b = reg.intercept_

In [13]:
m

array([  112.06244194, 23388.88007794, -3231.71790863])

In [9]:
b

221323.00186540443

## Find price of home with 3000 sqr ft area, 3 bedrooms, 40 year old

In [15]:
predict_reg = reg.predict([[3000, 3, 40]]) # area, bedrooms, age
predict_reg

array([498408.25158031])

In [16]:
predict_eq = m[0]*3000 + m[1]*3 + m[2]*40 + b
predict_eq

498408.2515803069

In [17]:
predict_reg == predict_eq

array([ True])

## Find price of home with 2500 sqr ft area, 4 bedrooms,  5 year old

In [18]:
predict_reg = reg.predict([[2500, 4, 5]]) # area, bedrooms, age
predict_reg

array([578876.03748933])

In [19]:
predict_eq = m[0]*2500 + m[1]*4 + m[2]*5 + b
predict_eq

578876.0374893326

In [20]:
predict_reg == predict_eq

array([ True])

---