# Multivariate Linear Regression

In multivariate linear regression, we model the relationship between the **dependent variable** (the variable we're trying to predict) and multiple **independent variables**. The equation takes the form:

y = b₀ + b₁x₁ + b₂x₂ + ... + bₙxₙ

- **y**: The dependent variable.
- **b₀**: The intercept (the value of y when all independent variables are 0).
- **b₁, b₂, ..., bₙ**: Coefficients for the independent variables **x₁, x₂, ..., xₙ**, respectively.

Each coefficient represents the change in the dependent variable for a one-unit change in the corresponding independent variable, while holding all other independent variables constant.

This equation extends the simple linear regression equation **y = mx + c** to accommodate multiple factors influencing the outcome.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

In [None]:
df = pd.read_csv("hiring.csv")
df

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,,8.0,9,50000
1,,8.0,6,45000
2,five,6.0,7,60000
3,two,10.0,10,65000
4,seven,9.0,6,70000
5,three,7.0,10,62000
6,ten,,7,72000
7,eleven,7.0,8,80000
8,ten,6.0,7,68000
9,seven,8.0,6,70000


In [None]:
replace_1 = {"five" : 5 , "two": 2, "seven" : 7, "three": 3, "ten" : 10, "eleven": 11, "seven" : 7, "three" : 3}
df['experience'].replace(replace_1 , inplace=True)
df

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,,8.0,9,50000
1,,8.0,6,45000
2,5.0,6.0,7,60000
3,2.0,10.0,10,65000
4,7.0,9.0,6,70000
5,3.0,7.0,10,62000
6,10.0,,7,72000
7,11.0,7.0,8,80000
8,10.0,6.0,7,68000
9,7.0,8.0,6,70000


In [None]:
df['experience'].mean()

6.444444444444445

In [None]:
df['experience'].median()

7.0

In [None]:
df['experience'].mode()

0     3.0
1     7.0
2    10.0
Name: experience, dtype: float64

In [None]:
# Median would be appropirate here to fill the NaN values in this data
median = df['experience'].median()
df["experience"].fillna(median, inplace=True)
df
# Looks better now

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,7.0,8.0,9,50000
1,7.0,8.0,6,45000
2,5.0,6.0,7,60000
3,2.0,10.0,10,65000
4,7.0,9.0,6,70000
5,3.0,7.0,10,62000
6,10.0,,7,72000
7,11.0,7.0,8,80000
8,10.0,6.0,7,68000
9,7.0,8.0,6,70000


In [None]:
df["test_score(out of 10)"].interpolate('linear', inplace=True)

In [None]:
df
# Looks very tidy now

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,7.0,8.0,9,50000
1,7.0,8.0,6,45000
2,5.0,6.0,7,60000
3,2.0,10.0,10,65000
4,7.0,9.0,6,70000
5,3.0,7.0,10,62000
6,10.0,7.0,7,72000
7,11.0,7.0,8,80000
8,10.0,6.0,7,68000
9,7.0,8.0,6,70000


In [None]:
df.describe()

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
count,11.0,11.0,11.0,11.0
mean,6.545455,7.818182,7.727273,64454.545455
std,3.045115,1.401298,1.55505,9963.570006
min,2.0,6.0,6.0,45000.0
25%,4.0,7.0,6.5,61000.0
50%,7.0,8.0,7.0,67000.0
75%,8.5,8.5,9.0,70000.0
max,11.0,10.0,10.0,80000.0


In [None]:
from sklearn import linear_model
reg = linear_model.LinearRegression()

In [None]:
reg.fit(df[['experience', 'test_score(out of 10)','interview_score(out of 10)']], df[['salary($)']])
                                # INDEPENDENT VARIABLES                    # TARGET VARIABLE

In [None]:
reg.coef_
# b₁x₁ , b₂x₂ , b₂x₂ ,  b₃x₃

array([[1934.76529264, 1481.60719781, 1560.79038361]])

In [None]:
reg.intercept_
# b₀

array([28146.49975561])

In [27]:
reg.predict([[15,9,8]])



array([[82988.76699433]])