<a href="https://colab.research.google.com/github/Sanvee0306/ds_python/blob/main/Exercise_linear_regression_multivariate.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h2 style="color:green" align="center"> Machine Learning With Python: Linear Regression Multiple Variables</h2>

<h3 style="color:purple">Sample problem of predicting home price in monroe, new jersey (USA)</h3>

Below is the table containing home prices in monroe twp, NJ. Here price depends on **area (square feet), bed rooms and age of the home (in years)**. Given these prices we have to predict prices of new homes based on area, bed rooms and age.

Given these home prices find out price of a home that has,

**3000 sqr ft area, 3 bedrooms, 40 year old**

**2500 sqr ft area, 4 bedrooms,  5 year old**

Here area, bedrooms, age are called independant variables or **features** whereas price is a dependant variable

In [1]:
import pandas as pd
import numpy as np
from sklearn import linear_model

In [2]:
df = pd.read_csv('homeprices.csv')
df

Unnamed: 0,area,bedrooms,age,price
0,2600,3.0,20,550000
1,3000,4.0,15,565000
2,3200,,18,610000
3,3600,3.0,30,595000
4,4000,5.0,8,760000
5,4100,6.0,8,810000


**Data Preprocessing: Fill NA values with median value of a column**

In [3]:
df.bedrooms.median()

4.0

In [4]:
df.bedrooms = df.bedrooms.fillna(df.bedrooms.median())
df

Unnamed: 0,area,bedrooms,age,price
0,2600,3.0,20,550000
1,3000,4.0,15,565000
2,3200,4.0,18,610000
3,3600,3.0,30,595000
4,4000,5.0,8,760000
5,4100,6.0,8,810000


In [5]:
reg = linear_model.LinearRegression()
reg.fit(df.drop('price',axis='columns'),df.price)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [6]:
reg.coef_

array([  112.06244194, 23388.88007794, -3231.71790863])

In [7]:
reg.intercept_

221323.00186540396

**Find price of home with 3000 sqr ft area, 3 bedrooms, 40 year old**

In [8]:
reg.predict([[3000, 3, 40]])

array([498408.25158031])

In [9]:
112.06244194*3000 + 23388.88007794*3 + -3231.71790863*40 + 221323.00186540384

498408.25157402386

**Find price of home with 2500 sqr ft area, 4 bedrooms,  5 year old**

In [10]:
reg.predict([[2500, 4, 5]])

array([578876.03748933])

<h3>Exercise<h3>

In exercise folder (same level as this notebook on github) there is **hiring.csv**. This file contains hiring statics for a firm such as experience of candidate, his written test score and personal interview score. Based on these 3 factors, HR will decide the salary. Given this data, you need to build a machine learning model for HR department that can help them decide salaries for future candidates. Using this predict salaries for following candidates,


**2 yr experience, 9 test score, 6 interview score**

**12 yr experience, 10 test score, 10 interview score**


<h3>Answer<h3>

53713.86 and 93747.79

In [12]:
pip install word2number

Collecting word2number
  Downloading https://files.pythonhosted.org/packages/4a/29/a31940c848521f0725f0df6b25dca8917f13a2025b0e8fcbe5d0457e45e6/word2number-1.1.zip
Building wheels for collected packages: word2number
  Building wheel for word2number (setup.py) ... [?25l[?25hdone
  Created wheel for word2number: filename=word2number-1.1-cp37-none-any.whl size=5584 sha256=53ca8ac03ab9690133e440e755164e3b1f009fd505709375f461c605d6af1cbe
  Stored in directory: /root/.cache/pip/wheels/46/2f/53/5f5c1d275492f2fce1cdab9a9bb12d49286dead829a4078e0e
Successfully built word2number
Installing collected packages: word2number
Successfully installed word2number-1.1


In [13]:
from word2number import w2n

In [14]:
hiring = pd.read_csv("/content/hiring.csv")

In [15]:
hiring

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,,8.0,9,50000
1,,8.0,6,45000
2,five,6.0,7,60000
3,two,10.0,10,65000
4,seven,9.0,6,70000
5,three,7.0,10,62000
6,ten,,7,72000
7,eleven,7.0,8,80000


In [17]:
hiring.experience =  hiring.experience.fillna("zero")

In [18]:
hiring

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,zero,8.0,9,50000
1,zero,8.0,6,45000
2,five,6.0,7,60000
3,two,10.0,10,65000
4,seven,9.0,6,70000
5,three,7.0,10,62000
6,ten,,7,72000
7,eleven,7.0,8,80000


In [19]:
hiring.experience = hiring.experience.apply(w2n.word_to_num)
hiring

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,0,8.0,9,50000
1,0,8.0,6,45000
2,5,6.0,7,60000
3,2,10.0,10,65000
4,7,9.0,6,70000
5,3,7.0,10,62000
6,10,,7,72000
7,11,7.0,8,80000


In [21]:
import math
median_test_score = math.floor(hiring['test_score(out of 10)'].mean())
median_test_score

7

In [22]:
hiring['test_score(out of 10)'] = hiring['test_score(out of 10)'].fillna(median_test_score)

In [23]:
hiring

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,0,8.0,9,50000
1,0,8.0,6,45000
2,5,6.0,7,60000
3,2,10.0,10,65000
4,7,9.0,6,70000
5,3,7.0,10,62000
6,10,7.0,7,72000
7,11,7.0,8,80000


In [26]:
reg1 = linear_model.LinearRegression()
reg1.fit(hiring[['experience','test_score(out of 10)','interview_score(out of 10)']],hiring['salary($)'])

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [27]:
reg1.coef_

array([2922.26901502, 2221.30909959, 2147.48256637])

In [28]:
reg1.intercept_

14992.65144669314

## Find salary of employee with 2 year experience, 9 test score, 6 interview score


In [29]:
reg1.predict([[2,9,6]])

array([53713.86677124])

In [30]:
2922.26901502*2+2221.30909959*9+2147.48256637*6+14992.65144669314

53713.86677126314

## Find salary of employee with 12 year experience, 10 test score, 10 interview score.

In [31]:
reg1.predict([[12,10,10]])

array([93747.79628651])

In [32]:
2922.26901502*12+2221.30909959*10+2147.48256637*10+14992.65144669314

93747.79628653315

In [33]:
print("\U0001f600")

😀


😀Well Done Sanvee...