# Multiple Linear Regression

Made by Faris D. Qadri | 2021-07-09

Personal and professional use is allowed with permission from author.

[Method explanation](https://en.m.wikiversity.org/wiki/Multiple_linear_regression)

[Source](https://medium.com/machine-learning-with-python/multiple-linear-regression-implementation-in-python-2de9b303fc0c)

[Notebook and data source](https://github.com/Harshita0109/Sales-Prediction)

## Libraries

In [1]:
# Necessary libraries

## Data handling
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

## Visualization
import matplotlib.pyplot as plt
import seaborn as sns

## Multiple linear regression model
from sklearn.linear_model import LinearRegression

## Accuracy check
from sklearn import metrics

## Dataest

In [2]:
df = pd.read_csv("advertising.csv")

In [3]:
df.head()

Unnamed: 0,TV,Radio,Newspaper,Sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,12.0
3,151.5,41.3,58.5,16.5
4,180.8,10.8,58.4,17.9


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   TV         200 non-null    float64
 1   Radio      200 non-null    float64
 2   Newspaper  200 non-null    float64
 3   Sales      200 non-null    float64
dtypes: float64(4)
memory usage: 6.4 KB


In [5]:
# Setting the value x and y
x = df[['TV', 'Radio', 'Newspaper']]
y = df['Sales']

In [6]:
# Train test splitting
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 100)

## Implementation

In [7]:
# Model fitting 
mlr = LinearRegression()  
mlr.fit(x_train, y_train)

LinearRegression()

In [8]:
# Intercept and Coefficient
print("Intercept: ", mlr.intercept_)
print("Coefficients:")
list(zip(x, mlr.coef_))

Intercept:  4.334595861728438
Coefficients:


[('TV', 0.05382910866725004),
 ('Radio', 0.1100122438855805),
 ('Newspaper', 0.00628995014613036)]

In [9]:
# Predicting
y_pred_mlr= mlr.predict(x_test)

# Predicted values
print("Prediction for test set: {}".format(y_pred_mlr))

Prediction for test set: [ 9.35221067 20.96344625 16.48851064 20.10971005 21.67148354 16.16054424
 13.5618056  15.39338129 20.81980757 21.00537077 12.29451311 20.70848608
  8.17367308 16.82471534 10.48954832  9.99530649 16.34698901 14.5758119
 17.23065133 12.56890735 18.55715915 12.12402775 20.43312609 17.78017811
 16.73623408 21.60387629 20.13532087 10.82559967 19.12782848 14.84537816
 13.13597397  9.07757918 12.07834143 16.62824427  8.41792841 14.0456697
  9.92050209 14.26101605 16.76262961 17.17185467 18.88797595 15.50165469
 15.78688377 16.86266686 13.03405813 10.47673934 10.6141644  20.85264977
 10.1517568   6.88471443 17.88702583 18.16013938 12.55907083 16.28189561
 18.98024679 11.33714913  5.91026916 10.06159509 17.62383031 13.19628335]


### End Result

In [10]:
# Comparison between actual value and the predicted value
mlr_compared = pd.DataFrame({'Actual value': y_test, 'Predicted value': y_pred_mlr})
mlr_compared.head()

Unnamed: 0,Actual value,Predicted value
126,6.6,9.352211
104,20.7,20.963446
99,17.2,16.488511
92,19.4,20.10971
111,21.8,21.671484


In [11]:
# Accuracy check
meanAbErr = metrics.mean_absolute_error(y_test, y_pred_mlr)
meanSqErr = metrics.mean_squared_error(y_test, y_pred_mlr)
rootMeanSqErr = np.sqrt(metrics.mean_squared_error(y_test, y_pred_mlr))

print('R squared: {:.2f}'.format(mlr.score(x,y)*100))
print('Mean Absolute Error:', meanAbErr)
print('Mean Square Error:', meanSqErr)
print('Root Mean Square Error:', rootMeanSqErr)

R squared: 90.11
Mean Absolute Error: 1.2278183566589411
Mean Square Error: 2.6360765623280664
Root Mean Square Error: 1.6235998775338911
