# Linear regression

In this exercise, we will find out effect of advertisements on sales.

In this excercise, we will use three different advertisement approaches: TV, Radio and newspaper.

For this we will apply Univariate Linear Regression

### Imports

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

### Read data and display top rows

In [None]:
data = pd.read_csv("Advertising.csv")
data.head()

### Remove extra column

In [None]:
data.drop(['Unnamed: 0'], axis=1)

## Simple linear regression 

### Plot TV vs Sales data

In [None]:
plt.figure(figsize=(16, 8))
plt.scatter(
    data['TV'],
    data['sales'],
    c='black'
)
plt.xlabel("Money spent on TV ads ($)")
plt.ylabel("Sales ($)")
plt.show()

### Perform linear regression

In [None]:
X = data['TV'].values.reshape(-1,1)
y = data['sales'].values.reshape(-1,1)

reg = LinearRegression()
reg.fit(X, y)

# Print results
print(reg.coef_[0][0])
print(reg.intercept_[0])

print("The linear model is: Y = {:.5} + {:.5}X".format(reg.intercept_[0], reg.coef_[0][0]))

### Plot regression line

In [None]:
predictions = reg.predict(X)

plt.figure(figsize=(16, 8))
plt.scatter(
    data['TV'],
    data['sales'],
    c='black'
)
plt.plot(
    data['TV'],
    predictions,
    c='blue',
    linewidth=2
)
plt.xlabel("Money spent on TV ads ($)")
plt.ylabel("Sales ($)")
plt.show()

# Exercise

1) What is the error when the advertisment for TV is 286.0?

2) Find the total Error between Regression Line and data points.

3) What is Linear Regression Model for sales if we instead consider radio? What is the error when the advertisment for radio is 13.9? Find the total Error between Regression Line and data points?

4) What is Linear Regression Model for sales if we consider Newspapers? What is the error when the advertisment for newspaper is 3.7? Find the total Error between Regression Line and data points?

5) Discuss the results.

## Multivariate linear regression 

### Split data into inputs and labels

In [None]:
Xs = data.drop(['sales', 'Unnamed: 0'], axis=1)
y = data['sales'].values.reshape(-1,1)

### Perform linear regression

In [None]:
reg = LinearRegression()
reg.fit(Xs, y)
print(reg.coef_)
print(reg.intercept_)
print("The linear model is: Y = {:.5} + {:.5}*TV + {:.5}*radio + {:.5}*newspaper".format(reg.intercept_[0], reg.coef_[0][0], reg.coef_[0][1], reg.coef_[0][2]))

### Evaluate the model
#### Calculate the $R^2$ value

In [None]:
reg.score(Xs, y)

### Calculate the error for each datapoint

In [None]:
y = data['sales']

a= reg.intercept_[0] + reg.coef_[0][0]*data['TV'] + reg.coef_[0][1]*data['radio'] + reg.coef_[0][2]*data['newspaper']

print("Actual",   "Predicted", "Error")
for i in range(0, 199):
    error= abs(y.values[i]- a[i])
    print(f'{y.values[i]:.3f}\t{a[i]:.3f}\t{error:.3f}')

# Exercise

1) What is the error when the advrtiesments for TV, Radio and Newspaper are 286.0	13.9	3.7, respectively?

2) Find the total error between Multiple Regression Model and data points? Compare the error results between linear regression and multivariate regression models.

3) Can you add one more advertisment to the data? If yes, then add one more advertisement type as a fourth input variable and each value should be generated randomly between 20 and 300. Apply multivariate regression model again.

4) Discuss the results.