<h1 style = "color:red; border-bottom: 4px solid gold; 
           align: center;
           padding-bottom: 5px;">Machine Learning</h1>

<h3 style="color:purple">Sample problem of predicting home price in monroe, new jersey (USA)</h3>

Below is the table containing home prices in monroe twp, NJ. Here price depends on **area (square feet), bed rooms and age of the home (in years)**. Given these prices we have to predict prices of new homes based on area, bed rooms and age.

<img src="homeprices.jpg" style='height:200px;width:350px'>

Given these home prices find out price of a home that has,

**3000 sqr ft area, 3 bedrooms, 40 year old**

**2500 sqr ft area, 4 bedrooms,  5 year old**

We will use regression with multiple variables here. Price can be calculated using following equation,

<img src="equation.jpg" >

In [1]:
import pandas as pd
import numpy as np
from sklearn import linear_model
from word2number import w2n
import math

In [2]:
df = pd.read_csv('homeprices.csv')
df

Unnamed: 0,area,bedrooms,age,price
0,2600,3.0,20,550000
1,3000,4.0,15,565000
2,3200,,18,610000
3,3600,3.0,30,595000
4,4000,5.0,8,760000
5,4100,6.0,8,810000


**Data Preprocessing: Fill NA values with median value of a column**

In [3]:
df.bedrooms.median()

4.0

In [4]:
df.bedrooms = df.bedrooms.fillna(value=df.bedrooms.median())
df.bedrooms

0    3.0
1    4.0
2    4.0
3    3.0
4    5.0
5    6.0
Name: bedrooms, dtype: float64

In [5]:
reg = linear_model.LinearRegression()
reg.fit(df[["area","bedrooms","age"]],df.price)

In [6]:
reg.coef_

array([  112.06244194, 23388.88007794, -3231.71790863])

In [7]:
reg.intercept_

221323.00186540408

**Find price of home with 3000 sqr ft area, 3 bedrooms, 40 year old**

**Find price of home with 2500 sqr ft area, 4 bedrooms, 34 year old**

In [8]:
reg.predict(pd.DataFrame({"area":[3000,2500],
                          "bedrooms":[3,4],
                          "age":[40,34]}))

array([498408.25158031, 485156.21813898])

<h3>Exercise<h3>

In exercise folder (same level as this notebook on github) there is **hiring.csv**. This file contains hiring statics for a firm such as experience of candidate, his written test score and personal interview score. Based on these 3 factors, HR will decide the salary. Given this data, you need to build a machine learning model for HR department that can help them decide salaries for future candidates. Using this predict salaries for following candidates,


**2 yr experience, 9 test score, 6 interview score**

**12 yr experience, 10 test score, 10 interview score**


In [9]:
E_data = pd.read_csv(r"C:\Users\rdank\OneDrive\Desktop\Self Taught\Git\Codebasics\Machine Learning\2_linear_reg_multivariate\Exercise\hiring.csv")
E_data.head()

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,,8.0,9,50000
1,,8.0,6,45000
2,five,6.0,7,60000
3,two,10.0,10,65000
4,seven,9.0,6,70000


In [10]:
##Rename the columns 
E_data = E_data.rename(columns={"test_score(out of 10)": "test_score",
                        "interview_score(out of 10)":"interview_score",
                        "salary($)":"salary"})

E_data.head()

Unnamed: 0,experience,test_score,interview_score,salary
0,,8.0,9,50000
1,,8.0,6,45000
2,five,6.0,7,60000
3,two,10.0,10,65000
4,seven,9.0,6,70000


In [11]:
## Fill experiance with zero
E_data.experience = E_data.experience.fillna(value = "zero")
E_data.experience

0      zero
1      zero
2      five
3       two
4     seven
5     three
6       ten
7    eleven
Name: experience, dtype: object

In [12]:
E_data.experience = E_data.experience.apply(w2n.word_to_num)
E_data.head()

Unnamed: 0,experience,test_score,interview_score,salary
0,0,8.0,9,50000
1,0,8.0,6,45000
2,5,6.0,7,60000
3,2,10.0,10,65000
4,7,9.0,6,70000


In [13]:
math.floor((E_data.test_score).mean())

7

In [14]:
## fill na in test_score with the mean value
E_data.test_score = E_data.test_score.fillna(value =  math.floor((E_data.test_score).mean()))
E_data

Unnamed: 0,experience,test_score,interview_score,salary
0,0,8.0,9,50000
1,0,8.0,6,45000
2,5,6.0,7,60000
3,2,10.0,10,65000
4,7,9.0,6,70000
5,3,7.0,10,62000
6,10,7.0,7,72000
7,11,7.0,8,80000


In [15]:
reg = linear_model.LinearRegression()
reg.fit(E_data[["experience","test_score","interview_score"]],y=E_data.salary)

<h2 style =" color: gold";> Question</h2>

**2 yr experience, 9 test score, 6 interview score**

**12 yr experience, 10 test score, 10 interview score**

<h2 style= "color: gold";> Answer </h2>

In [16]:
reg.predict(pd.DataFrame({"experience":[2,12],
                          "test_score" : [9,10],
                          "interview_score": [6,10]}))

array([53713.86677124, 93747.79628651])