#Estimator Class

## Reference
- https://scikit-learn.org/stable/developers/contributing.html#rolling-your-own-estimator
- https://scikit-learn.org/stable/tutorial/basic/tutorial.html#learning-and-predicting
- https://scikit-learn.org/stable/modules/generated/sklearn.base.RegressorMixin.html
- https://scikit-learn.org/stable/modules/generated/sklearn.base.ClassifierMixin.html
- https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html
- https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html
- https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
- http://danielhnyk.cz/creating-your-own-estimator-scikit-learn/
- https://towardsdatascience.com/introduction-to-machine-learning-algorithms-linear-regression-14c4e325882a
- https://towardsdatascience.com/linear-regression-using-gradient-descent-in-10-lines-of-code-642f995339c0

## Table of Contents
1. Introduction
1. Create an estimator __object__
1. Create an estimator __class__

##1. Introduction

A _estimator_ is an object that converts (transforms) input data into output data. Typically, the output data are predictions of a  target variable.

In Scikit-learn an estimator must implement the `fit` and `predict` methods.

Internal attributes of the object are set, using the `fit` method, using information from one dataframe, but then these attributes are used to create predictions, using the `predict` method for other dataframes.

Estimators have two key functions (methods):

- `fit(X,y)`: This sets internal parameters (attributes) based on the input data. The `X` argument usually contains the independent variables and `y` usually contains the dependent/target variable, which are from the train dataset.

- `predict(X)`: This method performs the prediction. The `X` argument may be from the train dataset or from the test dataset.

Import the `pandas` and `numpy` libraries. In addition, import the `LinearRegression` class which is an estimator.

In [0]:
import pandas  as pd
import numpy   as np
import sklearn as sk
from sklearn.linear_model import LinearRegression

Display the version numbers of the numpy, pandas and scikit-learn packages:

In [2]:
print('numpy  :',np.__version__)
print('pandas :',pd.__version__)
print('sklearn:',sk.__version__)

numpy  : 1.16.4
pandas : 0.24.2
sklearn: 0.21.3


Note that these version number may not be identical to the references provide above.

## 2. Create an estimator object

In this section, 
1. the iris dataset is use to create an independent and dependent/target features
1. an object of the `LinearRegression` class is created 
2. this object is then fit to the previously created features, creating an intercept and coefficients
3. the intercept and coefficient are displayed
4. predictions are created and are scored with the R squared metric

The iris dataset is used to demonstrate use of an object of the `LinearRegression` estimator class. 

Separate the independent features `ind_pdf` and dependent features `dep_pdf`.

In [3]:
from sklearn.datasets import load_iris
ind_pdf = load_iris().data[:,0:1]
ind_pdf.shape

(150, 1)

In [4]:
from sklearn.datasets import load_iris
dep_pdf = load_iris().data[:,3]
dep_pdf.shape

(150,)

Create a transformer object `linreg`, which is an instance of the `LinearRegression` class, created by calling the constructor (init method) of that class. 

The constructor (init method) does not have any arguments and so uses default values for all parameters, which are displayed below.

In [5]:
from sklearn.linear_model import LinearRegression
linreg = LinearRegression()
linreg

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

Fit the `LinearRegression` object `linreg` to the independent data `ind_pdf` and dependent data `dep_pdf`.

In [6]:
linreg.fit(ind_pdf, dep_pdf)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

Recall that the `fit` method returns the object itself.

The `coef_` attribute, of the `linreg` object, stores an array of values, which are the coefficients of the linear regression model (fit to the data). 

In this cases there is only one coefficient as there is only a single independent feature.

In [7]:
linreg.intercept_, linreg.coef_

(-3.200215004649192, array([0.75291757]))

Predictions are created. Display the first three.

In [17]:
linreg.predict(ind_pdf)[:3]

array([0.63966461, 0.48908109, 0.33849758])

The predictions made from `ind_pdf` are scored, by comparing them to the actual values `dep_pdf` and using the R squared metric to score the results and error.

In [9]:
linreg.score(ind_pdf,dep_pdf)

0.6690276860464137

## 3. Create an estimator class

Every estimator class (in Python and used with scikit-learn) should 
- define an init method, named `__init__` 
- define two methods, `fit` and `predict` 
- inherit the `BaseEstimator` and `TransformerMixin` classes 
- inherit the `RegressorMixin` or `ClassifierMixin` classes

The `fit` method should return `self` and the `predict` method:
- should return an object of the same shape as the `y` parameter to the `fit` method if it is a regressor or classifier
- cluster labels if the estimator inherits the [???]

Methods `get_params()` and `set_params()` are added from the `BaseEstimator` class and are useful for automatic hyperparameter tuning.

In the remainder of this section, 
- an estimator class `SimpleRegression` is created, 
- fit to the same data as above and 
- its predictions/scoring compared to those provided by the `linreg` object above.

Recall that estimators should inherit `TransformerMixin` and `BaseEstimator`. 
In addition, as we are creating a regressor, it should inherit `RegressorMixin`. 

Import these classes:

In [0]:
from sklearn.base import BaseEstimator, TransformerMixin, RegressorMixin

Define the `SimpleRegression` estimator class.

In [0]:
class SimpleRegression(BaseEstimator, TransformerMixin, RegressorMixin):
  def __init__(self, epochs=1000):
    self.epochs = epochs
  def fit(self, X, y=None):
    alpha = 0.01
    n = len(X)
    a_0 = 0
    a_1 = 0
    x = X[:,0]
    for i in range(self.epochs):
      p = (a_0 + a_1*x)
      error = p - y 
      mean_sq_er = np.sum(error**2)/n
      a_0 = a_0 - alpha * 2 * np.sum(error)/n 
      a_1 = a_1 - alpha * 2 * np.sum(error * x)/n
    self.intercept_ = a_0
    self.coef_      = a_1
    return self
  def predict(self, X):
    return self.intercept_ + self.coef_*X

Create an object `reg` of this class and fit it to the data.

In [12]:
reg = SimpleRegression(epochs=10000)
reg.fit(ind_pdf,dep_pdf)

SimpleRegression(epochs=10000)

Display the `intercept_` and `coef_` attributes.

In [13]:
reg.intercept_, reg.coef_

(-3.128209194385261, 0.7408292479398456)

Notice they are similar to those from the `LinearRegression` object.

In [14]:
linreg.intercept_, linreg.coef_

(-3.200215004649192, array([0.75291757]))

The R squared values are also similar.

In [15]:
linreg.score(ind_pdf, dep_pdf), reg.score(ind_pdf, dep_pdf)

(0.6690276860464137, 0.6688519781121873)

__The End__