# Problem Statement

You have been provided with a dataset that contains the cost of advertising on different media channels and the corresponding sales of XYZ firm. Evaluate the dataset to:
* Find the features or media channels used by the firm.
* Find the sales figures for each channel.
* Create a model to predict the sales outcome.
* Split it into training and testing datasets for the model
* Calculate the mean squared error (MSE)

In [5]:
# Import Necessary Library

import pandas as pd

In [7]:
# Import the advertising dataset

df_adv_data = pd.read_csv("advertising.csv",index_col=0)

In [8]:
# View top 5 records

df_adv_data.head()

Unnamed: 0,TV,Radio,Newspaper,Sales
1,230.1,37.8,69.2,22.1
2,44.5,39.3,45.1,10.4
3,17.2,45.9,69.3,9.3
4,151.5,41.3,58.5,18.5
5,180.8,10.8,58.4,12.9


**Sales** figures for each channel

In [9]:
# View the size dataset

df_adv_data.size

800

In [10]:
# View the shape of the dataset

df_adv_data.shape

(200, 4)

In [11]:
# View the columns of the dataset

df_adv_data.columns

Index(['TV', 'Radio', 'Newspaper', 'Sales'], dtype='object')

* **TV-Radio-Newspaper** > Features
* **Sales** > Response

In [14]:
# Create a feature object from the columns

X_feature = df_adv_data[["TV","Radio","Newspaper"]]

In [15]:
# View the feature object 

X_feature.head()

Unnamed: 0,TV,Radio,Newspaper
1,230.1,37.8,69.2
2,44.5,39.3,45.1
3,17.2,45.9,69.3
4,151.5,41.3,58.5
5,180.8,10.8,58.4


In [18]:
# Create a target object from sales column which is a response in the dataset

Y_target = df_adv_data[["Sales"]]

In [20]:
# View the target object

Y_target.head()

Unnamed: 0,Sales
1,22.1
2,10.4
3,9.3
4,18.5
5,12.9


In [21]:
# Split test and training data 
# by default %75 training data and %25 testing data

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(X_feature,Y_target,random_state=1) 

In [22]:
# View shape of train and test data sets for both feature and response

print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)

(150, 3)
(150, 1)
(50, 3)
(50, 1)


In [25]:
# linear regression model

from sklearn.linear_model import LinearRegression
linReg = LinearRegression()

# Code to create linear regression model which will predict the sales outcome for any new data

linReg.fit(x_train,y_train)

LinearRegression()

In [26]:
# print the intercept and coefficient 

print(linReg.intercept_)
print(linReg.coef_)

[2.87696662]
[[0.04656457 0.17915812 0.00345046]]


In [27]:
# Prediction 

y_pred = linReg.predict(x_test)
y_pred

array([[21.70910292],
       [16.41055243],
       [ 7.60955058],
       [17.80769552],
       [18.6146359 ],
       [23.83573998],
       [16.32488681],
       [13.43225536],
       [ 9.17173403],
       [17.333853  ],
       [14.44479482],
       [ 9.83511973],
       [17.18797614],
       [16.73086831],
       [15.05529391],
       [15.61434433],
       [12.42541574],
       [17.17716376],
       [11.08827566],
       [18.00537501],
       [ 9.28438889],
       [12.98458458],
       [ 8.79950614],
       [10.42382499],
       [11.3846456 ],
       [14.98082512],
       [ 9.78853268],
       [19.39643187],
       [18.18099936],
       [17.12807566],
       [21.54670213],
       [14.69809481],
       [16.24641438],
       [12.32114579],
       [19.92422501],
       [15.32498602],
       [13.88726522],
       [10.03162255],
       [20.93105915],
       [ 7.44936831],
       [ 3.64695761],
       [ 7.22020178],
       [ 5.9962782 ],
       [18.43381853],
       [ 8.39408045],
       [14

In [28]:
# Import required libraries for Calculating MSE (mean square error)

from sklearn import metrics
import numpy as np

In [33]:
# Calculate the mean square error (MSE)

np.sqrt(metrics.mean_squared_error(y_test,y_pred)) # We can use this calculation's answer to 
                                                   # determine the accuracy of the model

1.4046514230328953