<h2 style="color:green" align="center"> Machine Learning With Python: Linear Regression Multiple Variables</h2>

<h3 style="color:purple">Sample problem of predicting home price in monroe, new jersey (USA)</h3>

Below is the table containing home prices in monroe twp, NJ. Here price depends on **area (square feet), bed rooms and age of the home (in years)**. Given these prices we have to predict prices of new homes based on area, bed rooms and age.

<img src="homeprices.jpg" style='height:200px;width:350px'>

Given these home prices find out price of a home that has,

**3000 sqr ft area, 3 bedrooms, 40 year old**

**2500 sqr ft area, 4 bedrooms,  5 year old**

We will use regression with multiple variables here. Price can be calculated using following equation,

<img src="equation.jpg" >

Here area, bedrooms, age are called independant variables or **features** whereas price is a dependant variable

In [1]:
import pandas as pd
import numpy as np
from sklearn import linear_model

In [48]:
df = pd.read_csv('apartments.csv')
df

Unnamed: 0,distance_to_city_center,rooms,size,price
0,2.4,1.0,19.35,191.565
1,2.4,2.0,13.08,221.568
2,5.0,1.0,24.66,185.936
3,1.9,1.0,24.82,275.502
4,1.9,1.0,25.39,241.205
...,...,...,...,...
71,5.4,2.0,41.31,325.000
72,3.0,2.0,41.94,373.088
73,11.3,1.0,34.18,177.702
74,4.0,2.0,34.30,264.110


**Data Preprocessing: Fill NA values with median value of a column**

In [49]:
df.rooms.median()

2.5

In [52]:
df.rooms = df.rooms.fillna(df.rooms.median())
df

Unnamed: 0,distance_to_city_center,rooms,size,price
0,2.4,1.0,19.35,191.565
1,2.4,2.0,13.08,221.568
2,5.0,1.0,24.66,185.936
3,1.9,1.0,24.82,275.502
4,1.9,1.0,25.39,241.205
...,...,...,...,...
71,5.4,2.0,41.31,325.000
72,3.0,2.0,41.94,373.088
73,11.3,1.0,34.18,177.702
74,4.0,2.0,34.30,264.110


In [54]:
reg = linear_model.LinearRegression()
reg.fit(df.drop('price',axis='columns'),df.price)

In [55]:
reg.coef_

array([-20.32150668, -13.98249648,   7.59329825])

In [56]:
reg.intercept_

132.08998868785307

**Find price of home with 3000 sqr ft area, 3 bedrooms, 40 year old**

In [60]:
reg.predict([[3000, 3, 40]])



array([-60570.6456048])

In [61]:
112.06244194*3000 + 23388.88007794*3 + -3231.71790863*40 + 221323.00186540384

498408.25157402386

**Find price of home with 2500 sqr ft area, 4 bedrooms,  5 year old**

In [62]:
reg.predict([[2500, 4, 5]])



array([-50689.64020114])

<h3>Exercise<h3>

In exercise folder (same level as this notebook on github) there is **hiring.csv**. This file contains hiring statics for a firm such as experience of candidate, his written test score and personal interview score. Based on these 3 factors, HR will decide the salary. Given this data, you need to build a machine learning model for HR department that can help them decide salaries for future candidates. Using this predict salaries for following candidates,


**2 yr experience, 9 test score, 6 interview score**

**12 yr experience, 10 test score, 10 interview score**


<h3>Answer<h3>

53713.86 and 93747.79