# Simple Regression

[Linear regression]() is perhaps the most simple of all of the models. Do you remember the linear formula where `m` is the slope and `b` is where the line starts on the y-axis?

$$y=mx+b$$

This is a simple linear model since there is only one coefficient - `mx`.

## Imports and load data

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

%matplotlib inline

In [None]:
df = pd.read_csv("./SalaryData.csv")

In [None]:
df.head()

In [None]:
df.shape

In [None]:
df.isnull().values.any()

## Split data

Splitting the depedent variable (`Salary`) out from the indepedent variable (`YearsExperience`) so we can build our model.

In [None]:
train_set, test_set = train_test_split(df, test_size=0.2, random_state=42)

In [None]:
df_copy = train_set.copy()

In [None]:
df_copy.shape

In [None]:
df_copy.head()

## Exploratory Data Analysis

In [None]:
df_copy.describe()

In [None]:
df_copy.corr()

In [None]:
df_copy.plot.scatter(x='YearsExperience', y='Salary')

In [None]:
# Regression plot
sns.regplot('YearsExperience', # Horizontal axis
           'Salary', # Vertical axis
           data=df_copy)

## Predict

In [None]:
test_set_full = test_set.copy()

test_set = test_set.drop(["Salary"], axis=1)

In [None]:
test_set.head()

In [None]:
train_labels = df_copy["Salary"]

In [None]:
train_labels.head()

In [None]:
train_set_full = train_set.copy()

train_set = train_set.drop(["Salary"], axis=1)

In [None]:
train_set.head()

In [None]:
type(train_set)

In [None]:
lin_reg = LinearRegression()

lin_reg.fit(train_set, train_labels)

salary_pred = lin_reg.predict(test_set)

salary_pred

## Analyze Results

In [None]:
print("Coefficients: ", lin_reg.coef_)
print("Intercept: ", lin_reg.intercept_)

In [None]:
print(salary_pred)
print(test_set_full["Salary"])

In [None]:
lin_reg.score(test_set, test_set_full["Salary"])

In [None]:
r2_score(test_set_full["Salary"], salary_pred)

In [None]:
plt.scatter(test_set_full["YearsExperience"], test_set_full["Salary"],  color='blue')
plt.plot(test_set_full["YearsExperience"], salary_pred, color='red', linewidth=2)