### Customizing Sklearn
- Sklearn is a ML library based of Classes and Objects
- There are multiple important sklearn classes, such as:
  - Estimators
  - Predictors
  - Transformers
  - Model

#### Sklearn Estimators:
A lot of sklearn classes are based on this. Meaning they inherit from the estimator class. Estimator needs following methods:
- .fit(X,y) : fitting the estimator to X and y
- .get_params() : return the parameters of the estimator (here parameters refers to hyperparameters, meaning they arent trainable by the model, you have to set them by yourself)
- .set_params(**params) : change the parameters of the estimator

**Sklearn Predictor adds:**
- .predict(X)

**Sklearn Transformer adds:**
- .transform(X,y=None)

**Sklearn Model Adds:**
- .score(X,y)


### Examples Sklearn Estimator:

A dummy estimator that fit the mean of the median of the data
- median : the parameter of the estimator (this is indeed the hyperparameter)
- by Sklearn's documentation its a convention to use underscore in parameters, which are trainable! For example: **value_**!

In [38]:
import numpy as np
class MyDummyEstimator:
    # defining the constructor
    def __init__(self, use_median=False):
        self.use_median = use_median
    
    def fit (self, X,y):
        if self.use_median:
            self.value_ = np.median(y)
        else:
            self.value_ = np.mean(y)
        return self # returning self is mandatory

    
    def get_params(self,deep=True):
        return dict(use_median = self.use_median)
    
    def set_params(self, **parameters):
        for parameter, value in parameters.items():
            setattr(self,parameter,value)
        return self

- The methods **.set_params(), .get_params()** can be considered as boilerplate code, as you can inherit them from **BaseEstimator**!

In [39]:
from sklearn.base import BaseEstimator

class MyDummyEstimator(BaseEstimator):
    # defining the constructor
    def __init__(self, use_median=False):
        self.use_median = use_median
    
    def fit (self, X,y):
        if self.use_median:
            self.value_ = np.median(y)
        else:
            self.value_ = np.mean(y)
        return self # returning self is mandatory

In [40]:
est = MyDummyEstimator(use_median=True)
est.get_params()

{'use_median': True}

- See, now you can use get_params and set_params as your estimator inherits from BaseEstimator

#### Example Predictor
Now we wanna add predict funtionality

In [41]:
class MyDummyPredictor(BaseEstimator):
    # defining the constructor
    def __init__(self, use_median=False):
        self.use_median = use_median
    
    def fit (self, X,y):
        if self.use_median:
            self.value_ = np.median(y)
        else:
            self.value_ = np.mean(y)
        return self # returning self is mandatory
        
    def predict(self, X):
        out = np.empty(len(X))
        out.fill(self.value_)
        return out

In [42]:
X = np.arange(60).reshape([20,3])
y = np.arange(20)
y[-1]=200 # just adding an outlier

In [43]:
pred = MyDummyPredictor(use_median=False)
pred = pred.fit(X,y)
pred.predict(X)

array([18.55, 18.55, 18.55, 18.55, 18.55, 18.55, 18.55, 18.55, 18.55,
       18.55, 18.55, 18.55, 18.55, 18.55, 18.55, 18.55, 18.55, 18.55,
       18.55, 18.55])

In [44]:
pred = MyDummyPredictor(use_median=True)
pred = pred.fit(X,y)
pred.predict(X)

array([9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5,
       9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5])

#### Example Model:

In [45]:
from sklearn.metrics import r2_score

class MyDummyModel(BaseEstimator):
    # defining the constructor
    def __init__(self, use_median=False):
        self.use_median = use_median
    
    def fit (self, X,y):
        if self.use_median:
            self.value_ = np.median(y)
        else:
            self.value_ = np.mean(y)
        return self # returning self is mandatory
        
    def predict(self, X):
        out = np.empty(len(X))
        out.fill(self.value_)
        return out
    
    def score(self, X,y):
        pred = self.predict(X)
        return r2_score(y,pred)


In [46]:
model = MyDummyModel(use_median=True)

model = model.fit(X,y)
model.score(X,y)

-0.046499909870142186

- You can use your **model** in model validation schemes such as **cross-val-score**

In [47]:
from sklearn.model_selection import cross_val_score
cross_val_score(MyDummyModel(), X, y)

array([-363.378125  , -212.878125  , -102.378125  ,  -31.878125  ,
         -0.48610102])

---

#### Mixin in Sklearn

**Supervised Models:**
- base.RegressorMixin
- base.ClassifierMixin

**Clustering Models:**
- base.biclusterMixin
- base.clusterMixin

**Feature Transformation and Selection:**
- base.TransformerMixin
- feature_selection.SelectorMixin

There are many mixin that can be added which arent available through sklearn!



Now, say we wanna make a transformer that scales the data. To do this we can use **BaseEstimator** and **TransformerMixin** as our superclasses!

In [48]:
import numpy as np
from sklearn.base import BaseEstimator, TransformerMixin

class Standarizer(BaseEstimator, TransformerMixin):
    def __init__(self, mean_after_transform=0):
        self.mean_after_transform = mean_after_transform
    


    def fit(self, X, y=None):
        self.mean_ = np.mean(X, axis=0)
        self.std_ = np.std(X, axis=0)
        return self
    
    def transform(self, X):
        return (X - self.mean_) / self.std_ + self.mean_after_transform

In [49]:
X = np.arange(9).reshape(3,3)
st = Standarizer().fit(X)
st.transform(X)

array([[-1.22474487, -1.22474487, -1.22474487],
       [ 0.        ,  0.        ,  0.        ],
       [ 1.22474487,  1.22474487,  1.22474487]])

- we can use **.fit_transform()** as we have inherited TransformerMixin

In [50]:
st.fit_transform(X)

array([[-1.22474487, -1.22474487, -1.22474487],
       [ 0.        ,  0.        ,  0.        ],
       [ 1.22474487,  1.22474487,  1.22474487]])

#### Example: Models

Implements scoring for:
- Regressor (RegressorMixin)
- Classifier (ClassifierMixin)

The mixin also add other important variable Sklearn uses internally. Regression is very straight forward, but Classfiers need some extra methods.

To use Regressor, you need to implement predict method

In [53]:
from sklearn.base import RegressorMixin

class MyDummyRegressor(BaseEstimator, RegressorMixin):
    # defining the constructor
    def __init__(self, use_median=False):
        self.use_median = use_median
    
    def fit (self, X ,y):
        if self.use_median:
            self.value_ = np.median(y)
        else:
            self.value_ = np.mean(y)
        return self # returning self is mandatory
        
    def predict(self, X):
        out = np.empty(len(X))
        out.fill(self.value_)
        return out

In [None]:
dummy = MyDummyRegressor().fit(X,y)
dummy.score(X,y) # inherited from regressorMixin

For classfication you need to implement:
- predict
- predict_proba,predict_log_proba or decision_function