# Title

**Exercise: B.1 - Simple Multi-linear Regression**

# Description
The aim of this exercise is to understand how to use multi regression. Here we will observe the difference in MSE for each model as the predictors change. 

# Instructions:
- Read the file Advertisement.csv as a dataframe.
- For each instance of the predictor combination, form a model. For example, if you have 2 predictors,  A and B, you will end up getting 3 models - one with only A, one with only B and one with both A and B.
- Split the data into train and test sets
- Compute the MSE of each model 
- Print the Predictor - MSE value pair.


# Hints:

<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html" target="_blank">pd.read_csv(filename)</a> : Returns a pandas dataframe containing the data and labels from the file data

<a href="http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.normalize.html" target="_blank">sklearn.preprocessing.normalize()</a> : Scales input vectors individually to unit norm (vector length).

<a href="https://numpy.org/doc/stable/reference/generated/numpy.interp.html" target="_blank">np.interp()</a> : Returns one-dimensional linear interpolation

<a href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html" target="_blank">sklearn.train_test_split()</a> : Splits the data into random train and test subsets

<a href="https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html" target="_blank">sklearn.LinearRegression()</a> : LinearRegression fits a linear model

<a href="https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.fit" target="_blank">sklearn.fit()</a> : Fits the linear model to the training data

<a href="https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.predict" target="_blank">sklearn.predict()</a> : Predict using the linear model.


Note: This exercise is **auto-graded and you can try multiple attempts.**

In [1]:
#import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.metrics import mean_squared_error
from prettytable import PrettyTable

### Reading the dataset

In [2]:
#Read the file "Advertising.csv"
df = pd.read_csv("Advertising.csv")

In [3]:
#Take a quick look at the data to list all the predictors
df.head()

Unnamed: 0,TV,Radio,Newspaper,Sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,9.3
3,151.5,41.3,58.5,18.5
4,180.8,10.8,58.4,12.9


### Create different multi predictor models 

In [5]:
### edTest(test_mse) ###
#List to store the MSE values
mse_list = []

#List of all predictor combinations to fit the curve
cols = [['TV'],['Radio'],['Newspaper'],['TV','Radio'],['TV','Newspaper'],['Radio','Newspaper'],['TV','Radio','Newspaper']]

for i in cols:
    #Set each of the predictors from the previous list as x
    x = df[i]
    
    
    #"Sales" column is the reponse variable
    y = df["Sales"]
    
   
    #Splitting the data into train-test sets with 80% training data and 20% testing data. 
    #Set random_state as 0
    xtrain, xtest, ytrain, ytest = train_test_split(x, y, train_size=0.8, random_state=42)

    #Create a LinearRegression object and fit the model
    lreg = LinearRegression()
    lreg.fit(xtrain, ytrain)
    
    #Predict the response variable for the test set
    y_pred= lreg.predict(xtest)
    
    #Compute the MSE
    MSE = mean_squared_error(ytest, y_pred)
    
    #Append the MSE to the list
    mse_list.append(MSE)


### Display the MSE with predictor combinations

In [6]:
t = PrettyTable(['Predictors', 'MSE'])

#Loop to display the predictor combinations along with the MSE value of the corresponding model
for i in range(len(mse_list)):
    t.add_row([cols[i],mse_list[i]])

print(t)

+------------------------------+--------------------+
|          Predictors          |        MSE         |
+------------------------------+--------------------+
|            ['TV']            | 10.204654118800956 |
|          ['Radio']           | 23.248766588129108 |
|        ['Newspaper']         | 30.620733995242563 |
|       ['TV', 'Radio']        | 3.137948009068354  |
|     ['TV', 'Newspaper']      | 11.062557300662816 |
|    ['Radio', 'Newspaper']    |  23.2046437454446  |
| ['TV', 'Radio', 'Newspaper'] | 3.174097353976104  |
+------------------------------+--------------------+


### Comment on the trend of MSE values with changing predictor(s) combinations. 