### Machine Learning With Python: Linear Regression Multiple Variables

Problem 4:  House price prediction using Linear Regression – Multiple Variable all so known as multivariate regression

Given these home prices find out price of a home that has,

    3000 sqr ft area, 3 bedrooms, 40 year old
    2500 sqr ft area, 4 bedrooms, 5 year old

We will use regression with multiple variables here. Price can be calculated using following equation, 
           
           Y = m1*x1 + m2*X2 + m3*X3 + b

Here area, bedrooms, age are called independant variables or features whereas price is a dependant variable

### Topics:
* Data preprocessing : Handling NA values
* Linear Regression using Multiple Variables.

In [1]:
import pandas as pd
import numpy as np
from sklearn import linear_model

In [4]:
df = pd.read_csv("homeprices_mv.csv")
df

Unnamed: 0,area,bedrooms,age,price
0,2600,3.0,20,550000
1,3000,4.0,15,565000
2,3200,,18,610000
3,3600,3.0,30,595000
4,4000,5.0,8,760000


Since one datapoint is missing so i will have median of all bedrooms it will be safe assumption

### Data Preprocessing: Fill NA values with median value of a column

In [5]:
df.bedrooms.median()

3.5

i got 3.5 value as float type so i want to keep it integer i.e whole number to do so i need to import math

In [6]:
import math
median_bedrooms = math.floor(df.bedrooms.median())
median_bedrooms

3

To fill missing value in a data frames we need fillna function , fillna function is avaiable in pandas series

In [9]:
df.bedrooms.fillna(median_bedrooms)


0    3.0
1    4.0
2    3.0
3    3.0
4    5.0
Name: bedrooms, dtype: float64

We can see that we got a new series with this NaN value is now replace with median number

### Need to assign this new series back to original series so that my data frame gets updated


In [10]:
df.bedrooms = df.bedrooms.fillna(median_bedrooms)
df

Unnamed: 0,area,bedrooms,age,price
0,2600,3.0,20,550000
1,3000,4.0,15,565000
2,3200,3.0,18,610000
3,3600,3.0,30,595000
4,4000,5.0,8,760000


Now if i see my data frame it looks better

SO datapreprocessing steps is over

### Summarize

To summarize before applying any machine learning model you need to preprocesses your data ,
you need to clean your data because data is always massy , there are problems with it and fixed the error
and prepare your data and then apply your actual machine learning model using this data you train the model.

- So now my data frame looks good
- I am all set to train my model

### Create Linear Regression Class Object

In [15]:
reg = linear_model.LinearRegression()   

### Training the Model

In [16]:
reg.fit(df[['area','bedrooms','age']],df.price)

LinearRegression()

If i execute this my model is ready !

### Coeffient

In [17]:
reg.coef_

array([   137.25, -26025.  ,  -6825.  ])

In [18]:
reg.intercept_

383724.9999999998

### Find price of home with 3000 sqr ft area, 3 bedrooms, 40 year old

In [19]:
reg.predict([[3000,3,40]])

array([444400.])

In [22]:
137.25*3000+-26025*3+-6825*40+383724.9999999998

444399.9999999998

In [23]:
reg.predict([[3000,4,15]])

array([589000.])

**Find price of home with 2500 sqr ft area, 4 bedrooms,  5 year old**

In [24]:
reg.predict([[2500,4,5]])

array([588625.])

Here the price is high compare to the list becz it is 5 years old

***Exercise***

In exercise folder (same level as this notebook on github) there is hiring.csv. This file contains hiring statics for a firm such as experience of candidate, his written test score and personal interview score. Based on these 3 factors, HR will decide the salary. Given this data, you need to build a machine learning model for HR department that can help them decide salaries for future candidates. Using this predict salaries for following candidates,

2 yr experience, 9 test score, 6 interview score

12 yr experience, 10 test score, 10 interview score

Answer
53713.86 and 93747.79