# Simple Linear Regression
In this notebook, we'll use the advertising dataset from chapter 2 of ISLR to do a simple linear regression analysis to show the relationship between unit sales and television budget advertising. Simple linear regression uses a single independent variable (or feature or predictor) and a single dependent variable (or response) to build a linear model to predict future, unseen responses.  Linear regression can also be used for inference, i.e. understanding the relationship between the predictor(s) and the response.

In this analysis, we will only use the TV advertising budget for unit sales.

## Read the dataset and visualize it
The first thing we want to do is read in the dataset and learn something about it.  We have many means at our disposal to understand a dataset, but understand that highly dimensional datasets (i.e. those that have more than two predictors) are harder to visualize.  In this case, we have a single predictor, so we have more options at our disposal.

Here, we'll print out some part of our dataset and we'll plot TV budget vs Unit Sales.

In [None]:
from sklearn import linear_model
import pandas as pd

df = pd.read_csv("Advertising.csv")

df.head()

Next, let's plot unit sales vs. TV budget

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

plt.scatter(df.TV, df.Sales,  color='black')
plt.show()

## Fitting a model

So, given the plot above, does it seem reasonable that a linear model makes sense to describe the relationship between TV budget and unit sales?  Well, it does, but do you notice anything unusual about the plot?

The data above (Sales ~ TV) exhibit a pattern called **heteroscedasticity**.  This means that the variance in the response grows with an increase in the predictor.  Generally speaking, and depending on what you're trying to do, you should use caution using linear regression in the presence of heteroscedacticity.  For our purporses, we don't care as we're just learning.

In [None]:
lm = linear_model.LinearRegression()

lm.fit(df.TV.to_frame(), df.Sales.to_frame())

In [None]:
# The coefficients
print('Coefficient: %.4f' % lm.coef_[0][0])
print('Intercept: %.2f' % lm.intercept_[0])

In [None]:
sales_estimates = lm.predict(df.TV.to_frame())

plt.scatter(df.TV, df.Sales,  color='red')
plt.plot(df.TV, sales_estimates, color='blue', linewidth=2)

plt.show()