# Linear Regression

Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a statistical method that is used for predictive analysis. Linear regression makes predictions for continuous/real or numeric variables such as sales, salary, age, product price, etc.

Linear regression algorithm shows a linear relationship between a dependent (y) and one or more independent (y) variables, hence called as linear regression. Since linear regression shows the linear relationship, which means it finds how the value of the dependent variable is changing according to the value of the independent variable.

### Import Libraries

In [3]:
!pip install word2number

Collecting word2number
  Downloading word2number-1.1.zip (9.7 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: word2number
  Building wheel for word2number (setup.py): started
  Building wheel for word2number (setup.py): finished with status 'done'
  Created wheel for word2number: filename=word2number-1.1-py3-none-any.whl size=5567 sha256=84ccde6415410df7031014be106709cbaf80ffdc614a77ee3bbf316fd0bc10da
  Stored in directory: c:\users\repan\appdata\local\pip\cache\wheels\a0\4a\5b\d2f2df5c344ddbecb8bea759872c207ea91d93f57fb54e816e
Successfully built word2number
Installing collected packages: word2number
Successfully installed word2number-1.1


In [4]:
import numpy as np
import pandas as pd
from sklearn import linear_model
from word2number import w2n
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

### Import dataset

In [5]:
df = pd.read_csv('hiring.csv')
df.head()

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,,8.0,9,50000
1,,8.0,6,45000
2,five,6.0,7,60000
3,two,10.0,10,65000
4,seven,9.0,6,70000


### Preprocessing Datasets

In [6]:
df.experience = df.experience.fillna('Zero')

In [7]:
df

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,Zero,8.0,9,50000
1,Zero,8.0,6,45000
2,five,6.0,7,60000
3,two,10.0,10,65000
4,seven,9.0,6,70000
5,three,7.0,10,62000
6,ten,,7,72000
7,eleven,7.0,8,80000


In [8]:
### Apply word_to_num
df.experience = df.experience.apply(w2n.word_to_num)

In [9]:
df

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,0,8.0,9,50000
1,0,8.0,6,45000
2,5,6.0,7,60000
3,2,10.0,10,65000
4,7,9.0,6,70000
5,3,7.0,10,62000
6,10,,7,72000
7,11,7.0,8,80000


In [10]:
import math
median_test_score = math.floor(df['test_score(out of 10)'].mean())
median_test_score

7

In [11]:
dff = df['test_score(out of 10)'].mean()
dff

7.857142857142857

In [12]:
df['test_score(out of 10)'] = df['test_score(out of 10)'].fillna(dff)

In [13]:
df

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,0,8.0,9,50000
1,0,8.0,6,45000
2,5,6.0,7,60000
3,2,10.0,10,65000
4,7,9.0,6,70000
5,3,7.0,10,62000
6,10,7.857143,7,72000
7,11,7.0,8,80000


In [14]:
### Show Columns Name
df.columns

Index(['experience', 'test_score(out of 10)', 'interview_score(out of 10)',
       'salary($)'],
      dtype='object')

### Features Selection

In [15]:
predictors = ['experience', 'test_score(out of 10)', 'interview_score(out of 10)']
x = df[predictors]
y = df['salary($)']

### Split Train and test datasets

In [16]:
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=42)

In [17]:
x_train.shape,x_test.shape

((6, 3), (2, 3))

In [18]:
y_train.shape,y_test.shape

((6,), (2,))

### Apply Linear Regression

In [19]:
reg = LinearRegression()

In [20]:
model = reg.fit(x_train,y_train)

In [21]:
model.predict([[5,6,7]])



array([58355.74491311])

### Accuracy Score

In [22]:
model.score(x_train,y_train)

0.945171325224806

In [23]:
model.score(x_test,y_test)

0.9287916364000982