# Linear Regression: Basics
***Part [1/2]***

---

### Objectives
- Perform simple **EDA**.
    - *Correlation Matrix*
    - *Scatterplots*
    
    
- **Create a model** and training it on data.


- Read the basics from a *`statsmodels`* **model summary.**
    - $\large R^2$
    - $\large P\text{-values}$


- Make **predictions** on the data.
    - *Learn about **residuals** and simple **regression metrics**.*

In [11]:
# Imports
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

import statsmodels.api as sm
from statsmodels.formula.api import ols

In [7]:
# Load in data.
df = pd.read_csv('data/advertising.csv', index_col=0)
df

Unnamed: 0,TV,radio,newspaper,sales
1,230.1,37.8,69.2,22.1
2,44.5,39.3,45.1,10.4
3,17.2,45.9,69.3,9.3
4,151.5,41.3,58.5,18.5
5,180.8,10.8,58.4,12.9
...,...,...,...,...
196,38.2,3.7,13.8,7.6
197,94.2,4.9,8.1,9.7
198,177.0,9.3,6.4,12.8
199,283.6,42.0,66.2,25.5


## EDA
- *Correlation Matrix*

- *Scatterplots*

In [None]:
corr = (
    df
    .corr()
    .abs()
    .round(3)
)
corr

In [None]:
fig, ax = plt.subplots(figsize=(8,6))

# Set a 'mask' for the upper half of the heatmap. 
mask = np.triu(np.ones_like(corr, dtype=np.bool))

sns.heatmap(corr, annot=True, mask=mask, cmap='Blues', ax=ax)
plt.setp(ax.get_xticklabels(), rotation=0, ha="center",)
plt.setp(ax.get_yticklabels(), rotation=0)

# Fix the cutoff squares and remove empty row and column. 
ax.set_ylim(len(corr), 1)
ax.set_xlim(xmax=len(corr)-1)

fig.tight_layout()

## Modeling

### Train a Model: Simple Linear Regression
> ***One*** *predictive feature.*

#### Review Model Summary
> $R^2$, $P\text{-values}$

### Train a Model: Multiple Linear Regression
> ***Multiple*** *predictive features.*

#### Review Model Summary

### Make Predictions