In [1]:
import time
import random
from math import *
import operator
import pandas as pd
import numpy as np

# import plotting libraries
import matplotlib
import matplotlib.pyplot as plt
from pandas.plotting import scatter_matrix
%matplotlib inline 

import seaborn as sns
sns.set(style="white", color_codes=True)
sns.set(font_scale=1.5)

# import the ML algorithm
from sklearn.linear_model import LinearRegression
#from pandas.core import datetools

from statsmodels.tools.eval_measures import rmse

# import libraries for model validation
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split

# import libraries for metrics and reporting
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
from sklearn import metrics

In [2]:
location = r"E:\MYLEARN\2-ANALYTICS-DataScience\datasets\Advertising.csv"

In [3]:
# load the training data from glass data set
df_training = pd.read_csv(location)

In [4]:
df_training.head()

Unnamed: 0.1,Unnamed: 0,TV,radio,newspaper,sales
0,1,230.1,37.8,69.2,22.1
1,2,44.5,39.3,45.1,10.4
2,3,17.2,45.9,69.3,9.3
3,4,151.5,41.3,58.5,18.5
4,5,180.8,10.8,58.4,12.9


Making predictions
Our prediction function outputs an estimate of sales given a company’s 
- radio advertising spend 
- and our current values for Weight and Bias.

Sales =  Bias + Weight * Radio 

Weight
the coefficient for the Radio independent variable. 
In machine learning we call coefficients weights.

Radio
the independent variable. In machine learning we call these variables features.

Bias
the intercept where our line intercepts the y-axis. 
In machine learning we can call intercepts bias. 
Bias offsets all predictions that we make.

### Cost function

Let’s use MSE (L2) as our cost function. 
MSE measures the average squared difference between an observation’s actual 
and predicted values.

The output is a single number representing the cost, or score, associated with our 
current set of weights. Our goal is to minimize MSE to improve the accuracy of our model.

### Math

#### our simple linear equation y=mx+b, we can calculate MSE as:

$$MSE = \frac{1}{N} * ∑(y_i − (mx_i+b))^2$$ 

N  is the total number of observations (data points)
(1/N) * ∑ is the mean
$y_i$ is the actual value of an observation and $mx_i + b$ is our prediction

### Gradient descend
#### To minimize MSE we use Gradient Descent to calculate the gradient of our cost function. 

Math

There are two parameters (coefficients) in our cost function we can control: weight m and bias b. 

Since we need to consider the impact each one has on the final prediction, we use partial derivatives. 

To find the partial derivatives, we use the Chain rule. 

We need the chain rule because $(y − (mx+b))^2$ is really 2 nested functions: 
     the inner function $y−mx -b$ and 
     the outer function $x^2$.
    

Cost Function:-

$ f(m, b) = \frac{1}{N}∑(y_i−(mx_i + b))^2$

We can calculate the gradient of this cost function as:

$$f′(m, b) = \begin{bmatrix} \frac{df}{dm} \\ \frac{df}{db} \end {bmatrix} = \begin{bmatrix} \frac {1}{N} ∑−2x_i(y_i−(mx_i+b)) \\ \frac {1}{N} ∑−2(yi−(mxi+b))\end {bmatrix} $$


### Code

To solve for the gradient, we iterate through our data points using our new weight and bias values and take the average of the partial derivatives. 

The resulting gradient tells us the slope of our cost function at our current position (i.e. weight and bias) and the direction we should update to reduce our cost function (we move in the direction opposite the gradient). 

The size of our update is controlled by the <font color=blue> learning rate.</font>


In [5]:
def predict_sales(radio, weight, bias):
    return weight*radio + bias

In [6]:
def cost_function(radio, sales, weight, bias):
    companies = len(radio)
    
    total_error = 0.0
    for i in range(companies):
        total_error += (sales[i] - (weight * radio[i] + bias)) ** 2
        
    return total_error / companies

In [7]:
def update_weights(radio, sales, weight, bias, learning_rate):
    weight_deriv = 0
    bias_deriv   = 0
    companies    = len(radio)

    for i in range(companies):
        
        # Calculate partial derivatives
        # -2x(y - (mx + b))
        weight_deriv += -2 * radio[i] * (sales[i] - (weight*radio[i] + bias))

        # -2(y - (mx + b))
        bias_deriv += -2 * (sales[i] - (weight*radio[i] + bias))

    # We subtract because the derivatives point in direction of steepest ascent
    weight -= (weight_deriv / companies) * learning_rate
    bias   -= (bias_deriv   / companies) * learning_rate

    return weight, bias

#### Training


Training a model is the process of iteratively improving your prediction equation by looping through the dataset multiple times, each time updating the weight and bias values in the direction indicated by the slope of the cost function (gradient). 

Training is complete when we reach an acceptable error threshold, or when subsequent training iterations fail to reduce our cost.

Before training we need to initializing our weights (set default values), set our hyperparameters (learning rate and number of iterations), and prepare to log our progress over each iteration.

In [8]:
def train(radio, sales, weight, bias, learning_rate, iters, prev_cost):
    cost_history = []
    first_time   = 1

    for i in range(iters):
        weight, bias = update_weights(radio, sales, weight, bias, learning_rate)
        
        # Calculate cost for auditing purposes
        cost = cost_function(radio, sales, weight, bias)
        
#         if cost < prev_cost:
#             prev_cost = cost
#         else:
#             break;
                
        cost_history.append(cost)
        prev_cost = cost    

        # Log Progress
        # if i % 10 == 0:
        print ("iter: " + str(i) + \
               " cost: ", cost, \
               'Bias :', bias, \
               'Weight :', weight)

    return weight, bias, cost_history

In [9]:
# create a Python list of feature names
feature_cols = ['radio']

# use the list to select a subset of the original DataFrame
X = df_training[feature_cols]

# select a Series from the DataFrame
y = df_training['sales']

In [10]:
# first try the linear regression
linreg = LinearRegression()

linreg.fit(X, y)

intercept, coeff = linreg.intercept_, linreg.coef_


In [11]:
print('Intercept : {}, coeff : {}'.format(intercept, coeff ))

Intercept : 9.311638095158283, coeff : [0.20249578]


In [12]:
# cost
# cost_function(radio, sales, weight, bias)
orig_cost = cost_function (X.values, y.values, coeff, intercept)
orig_cost

array([18.09239775])

In [13]:
# regularize the model
# gradient descent
# Hyperparameters
learning_rate    = 0.001
max_iteration    = 1000

weight           = coeff
bias             = intercept

print('Initial Intercept = {}, Coeff = {}\n'.format(bias, weight))

weight, bias, cost_history = train(X.values, 
                                   y.values, 
                                   coeff, 
                                   intercept, 
                                   learning_rate, 
                                   max_iteration,
                                   orig_cost+2)

print('\nFinal   Intercept = {}, Coeff = {}'.format(bias, weight))

print('\nCost history : \n', cost_history)

Initial Intercept = 9.311638095158283, Coeff = [0.20249578]

iter: 0 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 1 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 2 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 3 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 4 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 5 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 6 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 7 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 8 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 9 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 10 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 11 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 12 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 13 cost:  [18.092397

iter: 241 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 242 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 243 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 244 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 245 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 246 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 247 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 248 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 249 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 250 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 251 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 252 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 253 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 254 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20

iter: 403 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 404 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 405 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 406 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 407 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 408 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 409 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 410 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 411 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 412 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 413 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 414 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 415 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 416 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20

iter: 570 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 571 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 572 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 573 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 574 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 575 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 576 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 577 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 578 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 579 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 580 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 581 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 582 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 583 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20

iter: 737 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 738 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 739 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 740 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 741 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 742 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 743 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 744 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 745 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 746 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 747 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 748 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 749 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 750 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20

iter: 903 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 904 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 905 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 906 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 907 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 908 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 909 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 910 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 911 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 912 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 913 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 914 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 915 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20249578]
iter: 916 cost:  [18.09239775] Bias : [9.3116381] Weight : [0.20

#### Model evaluation
If our model is working, we should see our cost decrease after every iteration.



#### Summary
By learning the best values for weight (.46) and bias (.25), we now have an equation that predicts future sales based on radio advertising investment.

Sales = .46Radio + .025