# Exercise: Linear Regression

You are given a dataset `Advertising.csv` that contains statistics about the Sales of a product in 200 different markets, together with advertising budgets in each of these markets for different media channels, namely, TV, Radio, and Newspaper. <br>
<br><br>
**The dataset (200x4) comprises four columns as follows:**
- TV: The advertising budget for television in thousands of dollars.
- Radio: The advertising budget for radio in thousands of dollars.
- Newspaper: The advertising budget for newspapers in thousands of dollars.
- Sales: The number of product units sold, represented in thousands.

<br><br>
**Now, you are required to import sklearn library and build a linear regression model.**
- TV, Radio, and Newspaper are the <u>input features</u>. 
- Sales is the <u>output</u> 

<br>
At a minimum, you should complete the following steps and answer the accompanying questions.

## Import Library & Read the Data

In [18]:
import pandas as pd
filename = "Advertising.csv"
df = pd.read_csv(filename)
df

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-114.31,34.19,15.0,5612.0,1283.0,1015.0,472.0,1.4936,66900.0
1,-114.47,34.40,19.0,7650.0,1901.0,1129.0,463.0,1.8200,80100.0
2,-114.56,33.69,17.0,720.0,174.0,333.0,117.0,1.6509,85700.0
3,-114.57,33.64,14.0,1501.0,337.0,515.0,226.0,3.1917,73400.0
4,-114.57,33.57,20.0,1454.0,326.0,624.0,262.0,1.9250,65500.0
...,...,...,...,...,...,...,...,...,...
16995,-124.26,40.58,52.0,2217.0,394.0,907.0,369.0,2.3571,111400.0
16996,-124.27,40.69,36.0,2349.0,528.0,1194.0,465.0,2.5179,79000.0
16997,-124.30,41.84,17.0,2677.0,531.0,1244.0,456.0,3.0313,103600.0
16998,-124.30,41.80,19.0,2672.0,552.0,1298.0,478.0,1.9797,85800.0


## Data Exploration/Visualization

In [7]:
import numpy as np 
# import all the X values
dataTV = np.array(df["TV"]).tolist()
dataRD = np.array(df["Radio"]).tolist()
dataNP = np.array(df["Newspaper"]).tolist()

# import the Y value (shared across the 3 of the X values)
dataSales = np.array(df["Sales"]).tolist()

## Define helper functions

In [8]:
def MSE(x,y,b,w1):
    """
    Find the Mean Squared Error between true outputs and predicted outputs
    Inputs: x - list, the input feature values
            y - list, the true output values
            b - float/int, bias
            w1 - float/int, weight

    Outputs: MSE - float
    """
    m = len(x)
    total_loss = 0
    for i in range(m):
        total_loss = total_loss + (y[i] - (b + w1 * x[i])) ** 2
    return total_loss/m #MSE

def update_bias_weights(x, y, b, w1, alpha):
    """
    Update the bias and weights based on Gradient Descent 
    Inputs: X - list, the input feature values
            Y - list, the true output values
            b - float/int, bias
            W1 - float/int, weight   
            alpha - float, the learning rate used in Gradient Descent
    Outputs: 
            (b, W1) - tuple, the updated bias and weights    
    """
    m = len(x)
    db = 0
    dw1 = 0
    for i in range(m):
        db  = db  + 2 * (y[i] - (b + w1 * x[i])) * (-1)
        dw1 = dw1 + 2 * (y[i] - (b + w1 * x[i])) * (-x[i])

    db = db / m
    dw1 = dw1 / m

    # subtract due to the derivatives pointing in the direction of steepest ascent (pos gradient)
    b = b - db * alpha
    w1 = w1 - dw1 * alpha

    return (b, w1)

## Training function (put the two above functions together)

In [9]:
def train(x, y, b, w1, alpha, iterations):
    """
    Train linear regression model for the specified iterations
    Inputs: X - list, the input feature values
            Y - list, the true output values
            b - float/int, bias
            W1 - float/int, weight 
            learning_rate - float, the learning rate used in Gradient Descent
            learning_iterations - int, the number of times of training
    Outputs: 
            (loss_history, b, W) - tuple, return loss_history and final parameters 
    """
    lHistory = []

    for i in range(iterations):
        b, w1 = update_bias_weights(x,y,b,w1,alpha)

        # find MSE after each iteration of updating the bias and weights
        loss = MSE(x,y,b,w1)
        lHistory.append(loss)

        if i < 5 or i >= iterations - 5:
            print("iter={:d} \t b={:.5f} \t W1={:.5f} \t MSE={}".format(i+1, b, w1, loss))
    return (lHistory, b, w1)  

## Simple Linear Regression
You need to build three different simple linear regression models, each with one of the variables (TV, Radio, Newspaper) as the feature(X). 

Save the performance metrics (MSE, RMSE, R²) for each of the simple regression models and compare them later with a multi-variable linear regression model that uses all three variables as independent variables.

In [17]:
initial_b = 0
initial_w1 = 0

alpha = 0.01
ite = 100000


lHist, b , w1 = train(dataTV, dataSales, initial_b, initial_w1, alpha, ite)

iter=1 	 b=0.28045 	 W1=48.21083 	 MSE=67072797.712930396
iter=2 	 b=-141.22554 	 W1=-27823.86783 	 MSE=22417771491462.227
iter=3 	 b=81687.70113 	 W1=16085770.93375 	 MSE=7.492704456171997e+18
iter=4 	 b=-47225785.22295 	 W1=-9299614799.30640 	 MSE=2.504290851968841e+24
iter=5 	 b=27302490913.30220 	 W1=5376356297617.32129 	 MSE=8.370105491200568e+29


OverflowError: (34, 'Result too large')

## Multi-variable Linear Regression

In [2]:
# Normalization/Scaling

In [3]:
# Model building

## Model Performance
Based on the metric values, determine which linear regression model seems to be the best fit among all models?

## Prediction on Unseen Data
Using your best model above to make a prediction 

In [4]:
# Predict for new, previously unseen data

new_data = pd.DataFrame({'TV': [200], 'Radio': [30], 'Newspaper': [50]})

**Is it a good idea to include as many featuers (X) as possible in a linear regression for better performance? Please share your thoughts.**