<h2 style="color:green" align="center"> Machine Learning With Python: Linear Regression Multiple Variables</h2>

Below is the table containing home prices in monroe twp, NJ. Here price depends on **area (square feet), bed rooms and age of the home (in years)**. Given these prices we have to predict prices of new homes based on area, bed rooms and age.

<img src="homeprices.jpg" style='height:200px;width:350px'>

Given these home prices find out price of a home that has,

**3000 sqr ft area, 3 bedrooms, 40 year old**

**2500 sqr ft area, 4 bedrooms,  5 year old**

<img src="equation.jpg" >

# Importing Libraries

In [3]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd 
import os

# Fetching Data

In [4]:
df=pd.read_csv("homeprices.csv")        

In [5]:
df

Unnamed: 0,area,bedrooms,age,price
0,2600,3.0,20,550000
1,3000,4.0,15,565000
2,3200,,18,610000
3,3600,3.0,30,595000
4,4000,5.0,8,760000
5,4100,6.0,8,810000


# Taking care of missing data

**Data Preprocessing: Fill NA values with median value of a column**

In [6]:
median_value=df.bedrooms.median()

In [7]:
median_value

4.0

In [8]:
df.bedrooms=df.bedrooms.fillna(median_value)

In [9]:
df

Unnamed: 0,area,bedrooms,age,price
0,2600,3.0,20,550000
1,3000,4.0,15,565000
2,3200,4.0,18,610000
3,3600,3.0,30,595000
4,4000,5.0,8,760000
5,4100,6.0,8,810000


In [10]:
##--> Splitting the data
x=df.iloc[:,:-1]
y=df.iloc[:,-1:]


In [11]:
print(x)
print(y)

   area  bedrooms  age
0  2600       3.0   20
1  3000       4.0   15
2  3200       4.0   18
3  3600       3.0   30
4  4000       5.0    8
5  4100       6.0    8
    price
0  550000
1  565000
2  610000
3  595000
4  760000
5  810000


# Creating the model

In [12]:
from sklearn.linear_model import LinearRegression
lr=LinearRegression()

In [13]:
lr.fit(x,y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

**Find price of home with 3000 sqr ft area, 3 bedrooms, 40 year old**

In [14]:
pred1=lr.predict([[3000,3,40]])

In [15]:
pred1 # since the age is more,hence the price is less

array([[498408.25158031]])

In [16]:
lr.coef_

array([[  112.06244194, 23388.88007794, -3231.71790863]])

In [17]:
lr.intercept_

array([221323.0018654])

# 2nd Problem

In exercise folder (same level as this notebook on github) there is **hiring.csv**. This file contains hiring statics for a firm such as experience of candidate, his written test score and personal interview score. Based on these 3 factors, HR will decide the salary. Given this data, you need to build a machine learning model for HR department that can help them decide salaries for future candidates. Using this predict salaries for following candidates,


**2 yr experience, 9 test score, 6 interview score**

**12 yr experience, 10 test score, 10 interview score**

In [20]:
hiring=pd.read_csv("hiring.csv")        

In [21]:
hiring.head()

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,,8.0,9,50000
1,,8.0,6,45000
2,five,6.0,7,60000
3,two,10.0,10,65000
4,seven,9.0,6,70000


In [22]:
hiring.columns=['experience', 'test_score_out_of_10', 'interview_score_out_of_10',
       'salary_in_dollars']

# Taking care of missing data

In [23]:
hiring.experience=hiring.experience.fillna("zero")

In [24]:
hiring

Unnamed: 0,experience,test_score_out_of_10,interview_score_out_of_10,salary_in_dollars
0,zero,8.0,9,50000
1,zero,8.0,6,45000
2,five,6.0,7,60000
3,two,10.0,10,65000
4,seven,9.0,6,70000
5,three,7.0,10,62000
6,ten,,7,72000
7,eleven,7.0,8,80000


In [25]:
from word2number import w2n

In [26]:
hiring.experience=hiring.experience.apply(w2n.word_to_num)

In [27]:
hiring

Unnamed: 0,experience,test_score_out_of_10,interview_score_out_of_10,salary_in_dollars
0,0,8.0,9,50000
1,0,8.0,6,45000
2,5,6.0,7,60000
3,2,10.0,10,65000
4,7,9.0,6,70000
5,3,7.0,10,62000
6,10,,7,72000
7,11,7.0,8,80000


In [28]:
hiring.test_score_out_of_10=hiring.test_score_out_of_10.fillna(hiring.test_score_out_of_10.median())

In [29]:
hiring

Unnamed: 0,experience,test_score_out_of_10,interview_score_out_of_10,salary_in_dollars
0,0,8.0,9,50000
1,0,8.0,6,45000
2,5,6.0,7,60000
3,2,10.0,10,65000
4,7,9.0,6,70000
5,3,7.0,10,62000
6,10,8.0,7,72000
7,11,7.0,8,80000


In [30]:
from sklearn.linear_model import LinearRegression
lr=LinearRegression()

In [31]:
X=hiring.iloc[:,:-1].values
Y=hiring.iloc[:,-1:].values

In [32]:
X

array([[ 0.,  8.,  9.],
       [ 0.,  8.,  6.],
       [ 5.,  6.,  7.],
       [ 2., 10., 10.],
       [ 7.,  9.,  6.],
       [ 3.,  7., 10.],
       [10.,  8.,  7.],
       [11.,  7.,  8.]])

In [33]:
Y

array([[50000],
       [45000],
       [60000],
       [65000],
       [70000],
       [62000],
       [72000],
       [80000]], dtype=int64)

In [34]:
lr.fit(X,Y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

**predicting salary of an employee with 2 yrs experience, 9 test score, 6 interview score**

In [35]:
lr.predict([[2,9,6]])

array([[53205.96797671]])

**predicting salary of an employee with 12 yrs experience, 10 test score, 10 interview score**

In [39]:
lr.predict([[12,10,10]])

array([[92002.18340611]])