### Machine Learning Algorithm - Regression

#### Simple Linear Regression - SKLearn

Implementation using Python (NumPy, Pandas, Matplotlib, Seaborn & Scikit-Learn)

In [None]:
# Importing libraries
# -------------------
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Importing the dataset
# ---------------------
dataset = pd.read_csv('../input/headbrain-linear-regression/headbrain_dataset.csv')

In [None]:
dataset.shape

In [None]:
dataset.head()

In [None]:
dataset.tail()

In [None]:
# Checking for null values
dataset.isnull().sum()

In [None]:
dataset['Age Range'].max()

In [None]:
dataset['Age Range'].min()

In [None]:
sns.scatterplot(data=dataset, x="Head Size(cm^3)", y="Brain Weight(grams)")
plt.show()

In [None]:
sns.lmplot(x='Head Size(cm^3)', y='Brain Weight(grams)', hue='Gender', data=dataset)
plt.show()

In [None]:
sns.lmplot(x='Head Size(cm^3)', y='Brain Weight(grams)', hue='Age Range', data=dataset)
plt.show()

In [None]:
sns.regplot(x=dataset['Head Size(cm^3)'], y=dataset['Brain Weight(grams)'])
plt.show()

In [None]:
dataset.describe()

In [None]:
dataset.info()

In [None]:
X = dataset.iloc[:, 2].values    # predictor 
y = dataset.iloc[:, 3].values    # target

In [None]:
# Plot the Input Data
# -------------------
plt.scatter(X, y, color='green', label='Data Points')
plt.xlabel('Head Size in cm3')
plt.ylabel('Brain Weight in grams')
plt.legend()
plt.show()

In [None]:
from sklearn.linear_model import LinearRegression

In [None]:
# Simple Linear Regression - using Scikit-Learn
# ---------------------------------------------
# Model definition
# ----------------
model = LinearRegression()   # Ordinary least squares (OLS) Linear Regression

Link: https://scikit-learn.org/stable/modules/linear_model.html#ordinary-least-squares

In [None]:
X.shape

In [None]:
X = X.reshape((len(X), 1))  # We cannot use rank 1 matrices in sklearn so we reshape

In [None]:
X.shape

In [None]:
# Model fitting
# -------------
model.fit(X, y)   

In [None]:
model.coef_

In [None]:
model.intercept_

In [None]:
# X

In [None]:
# Model prediction
# ----------------
y_pred = model.predict(X)

In [None]:
# y_pred

In [None]:
# Visualising: Plotting Values and Regression Line
# -------------------------------------------------
# Ploting Values 
plt.scatter(X, y, color='green', label='Scatter Points')
# Ploting Regression Line
plt.plot(X, y_pred, color='blue', label='Regression Line')

plt.title('Head Size vs Brain Weight')
plt.xlabel('Head Size(cm^3)')
plt.ylabel('Brain Weight(grams)')
plt.legend()
plt.show()

R–squared value is the statistical measure to show how close the data are to the fitted regression line.

In [None]:
model.score(X, y)  # Returns the coefficient of determination R^2 of the prediction

R-squared score: 1 is perfect prediction.

In [None]:
from sklearn.metrics import r2_score

In [None]:
# Model evaluation
# ----------------
r_squared_score = r2_score(y, y_pred)  # R^2 (coefficient of determination) regression score function

In [None]:
print((r_squared_score * 100).round(2), "%")

## Happy Machine Learning :)