In [None]:
#default_exp benchmark
#hide
%load_ext autoreload
%autoreload 2

In [None]:
#hide
from nbdev.showdoc import show_doc

# Automatic benchmark model
> Functions to create a relevant, fast and reasonably well-performing benchmark

A Benchmark object has a similar API to a `sciki-learn` estimator: you build an instance with the desired arguments, and fit it to the data at a later moment. Benchmarks is a convenience wrapper for reading the training data, passing it through a simplified pipeline consisting of data imputation and a standard scalar, and then the benchmark function calibrated with a grid search.

A `gingado` Benchmark object seeks to automatise a significant part of creating a benchmark model. Importantly, the Benchmark object also has a `compare` method that helps users evaluate if candidate models are better than the benchmark, and if one of them is, it becomes the new benchmark. This `compare` method takes as argument another fitted estimator (which could be itself a solo estimator or a whole pipeline) or a list of fitted estimators. 

Benchmarks start with default values that should perform reasonably well in most settings, but the user is also free to choose any of the benchmark's components by passing as arguments the data split, pipeline, and/or a dictionary of parameters for the hyperparameter tuning.

### Base class

`gingado` has a `ggdBenchmark` base class that contains the basic functionalities for Benchmark objects. It is not meant to be used by itself, but only as a hyperclass for Benchmark objects. `gingado` ships with two of these objects that subclass `ggdBenchmark`: `ClassificationBenchmark` and `RegressionBenchmark`. They are both described below in their respective sections.

Users are encouraged to submit a PR with their own benchmark models subclassing `ggdBenchmark`.

In [None]:
#hide
#export
import pandas as pd
from sklearn.base import BaseEstimator
from sklearn.model_selection import TimeSeriesSplit, StratifiedShuffleSplit, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.utils.metaestimators import available_if
from sklearn.utils.validation import check_is_fitted
from gingado.model_documentation import ModelCard

def _benchmark_has(attr):
    """This function is used in `ggdBenchmark` to check if the benchmark has certain attributes"""
    def check(self):
        getattr(self.benchmark, attr)
        return True
    return check
        
class ggdBenchmark(BaseEstimator):
    """
    The base class for gingado's Benchmark objects.
    """
    def _check_is_time_series(self, X, y=None):
        """
        Checks whether the data is a time series, and sets a data splitter
        accordingly if no data splitter is provided by the user
        Note: all data without an index (eg, a Numpy array) are considered to NOT be a time series
        """
        if hasattr(X, "index"):
            self.is_timeseries = pd.core.dtypes.common.is_datetime_or_timedelta_dtype(X.index)
        else:
            self.is_timeseries = False
        if self.is_timeseries and y is not None:
            if hasattr(y, "index"):
                self.is_timeseries = pd.core.dtypes.common.is_datetime_or_timedelta_dtype(y.index)
            else:
                self.is_timeseries = False

        if self.cv is None:
            self.cv = TimeSeriesSplit() if self.is_timeseries else StratifiedShuffleSplit()

    def _creates_estimator(self):
        if self.estimator is None:
            pass

    def _fit(self, X, y):
        self._check_is_time_series(X, y)

        X, y = self._validate_data(X, y)

        if hasattr(self.estimator, "random_state"):
            self.estimator.random_state = self.random_state

        if self.param_search and self.param_grid:                
            self.benchmark = self.param_search(estimator=self.estimator, param_grid=self.param_grid, scoring=self.scoring, verbose=self.verbose_grid)
        else:
            self.benchmark = self.estimator
            
        self.benchmark.fit(X, y)

        if self.auto_document is not None:
            self.document()

        return self
    
    def set_benchmark(self, estimator):
        self.benchmark = estimator

    def _read_candidate_params(self, candidates, ensemble_method):
                param_grid = []
                for i, model in enumerate(candidates):
                    check_is_fitted(model)
                    param_grid.append({
                        **{'candidate_estimator': [model]},
                        **{
                            'candidate_estimator__' + k: (v,)
                            for k, v in model.get_params().items()
                        }}
                    )
                if ensemble_method is not None:
                    candidate_models = [('candidate_'+str(i+1), model) for i, model in enumerate(candidates)]
                    voting = ensemble_method(estimators=candidate_models)
                    ensemble = {'candidate_estimator': [voting]}
                    param_grid.append(ensemble)
                return param_grid

    def compare(self, X, y, candidates, ensemble_method='object_default', update_benchmark=True):
        """
        Uses a test dataset to compare the performance of the fitted benchmark model with one or more candidate models
        This method achieves this by conducting a grid search 
        """
        # Step 0: check if the benchmark is fitted
        check_is_fitted(self.benchmark)
        old_benchmark_params = self.benchmark.get_params()
        
        candidates = list(candidates) if type(candidates) != list else candidates
        list_candidates = [self.benchmark] + candidates
        
        est = self.benchmark.base_estimator_ if hasattr(self.benchmark, "base_estimator_") else self.benchmark.best_estimator_
        cand_pipeline = Pipeline([('candidate_estimator', est)])
        
        if ensemble_method == 'object_default':
            ensemble_method = self.ensemble_method
        cand_params = self._read_candidate_params(list_candidates, ensemble_method=ensemble_method)
        cand_grid = GridSearchCV(cand_pipeline, cand_params, verbose=self.verbose_grid).fit(X, y)
        
        self.model_comparison_ = cand_grid

        if update_benchmark:
            if cand_grid.best_estimator_.get_params() != old_benchmark_params:
                self.set_benchmark(cand_grid)
                print("Benchmark updated!")

        if self.auto_document is not None:
            self.document()
            
    def document(self):
        self.auto_document()

    @available_if(_benchmark_has("predict"))
    def predict(self, X, **predict_params):
        return self.benchmark.predict(X, **predict_params)

    @available_if(_benchmark_has("fit_predict"))
    def fit_predict(self, X, y=None, **predict_params):
        return self.benchmark.fit_predict(X, y, **predict_params)

    @available_if(_benchmark_has("predict_proba"))
    def predict_proba(self, X, **predict_proba_params):
        return self.benchmark.predict_proba(X, **predict_proba_params)

    @available_if(_benchmark_has("decision_function"))
    def decision_function(self, X):
        return self.benchmark.decision_function(X)
    
    @available_if(_benchmark_has("decision_function"))
    def decision_function(self, X):
        return self.benchmark.decision_function(X)

    @available_if(_benchmark_has("score_samples"))
    def score_samples(self, X):
        return self.benchmark.score_samples(X)

    @available_if(_benchmark_has("predict_log_proba"))
    def predict_log_proba(self, X, **predict_log_proba_params):
        return self.benchmark.predict_log_proba(X, **predict_log_proba_params)

In [None]:
show_doc(ggdBenchmark)

<h2 id="ggdBenchmark" class="doc_header"><code>class</code> <code>ggdBenchmark</code><a href="" class="source_link" style="float:right">[source]</a></h2>

> <code>ggdBenchmark</code>() :: `BaseEstimator`

The base class for gingado's Benchmark objects.

### Classification tasks

The default benchmark for classification tasks is a [`RandomForestClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) object. Its parameters are fine-tuned in each case according to the user's data.

In [None]:
#hide
#export
from gingado.model_documentation import ModelCard
from sklearn.base import ClassifierMixin
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier, VotingClassifier

class ClassificationBenchmark(ggdBenchmark, ClassifierMixin):
    def __init__(self, 
    cv=None, 
    estimator=RandomForestClassifier(), 
    param_grid={'n_estimators': [50, 100, 250]}, 
    param_search=GridSearchCV, 
    scoring=None, 
    auto_document=ModelCard, 
    random_state=None,
    verbose_grid=False,
    ensemble_method=VotingClassifier):
        self.cv = cv
        self.estimator = estimator
        self.param_grid = param_grid
        self.param_search = param_search
        self.scoring = scoring
        self.auto_document = auto_document
        self.random_state = random_state
        self.verbose_grid = verbose_grid
        self.ensemble_method = ensemble_method

    def fit(self, X, y=None):
        self._fit(X, y)
        return self

In [None]:
show_doc(ClassificationBenchmark)

<h2 id="ClassificationBenchmark" class="doc_header"><code>class</code> <code>ClassificationBenchmark</code><a href="" class="source_link" style="float:right">[source]</a></h2>

> <code>ClassificationBenchmark</code>(**`cv`**=*`None`*, **`estimator`**=*`RandomForestClassifier()`*, **`param_grid`**=*`{'n_estimators': [50, 100, 250]}`*, **`param_search`**=*`GridSearchCV`*, **`scoring`**=*`None`*, **`auto_document`**=*`ModelCard`*, **`random_state`**=*`None`*, **`verbose_grid`**=*`False`*, **`ensemble_method`**=*`VotingClassifier`*) :: [`ggdBenchmark`](/gingado/benchmark.html#ggdBenchmark)

The base class for gingado's Benchmark objects.

In [None]:
from gingado.benchmark import ClassificationBenchmark
from sklearn.datasets import make_classification

# some mock up data
X, y = make_classification()

# the gingado benchmark
bm = ClassificationBenchmark(verbose_grid=3).fit(X, y)

# note that now the `bm` object can be used as an estimator
assert bm.predict(X).shape == y.shape

Fitting 5 folds for each of 3 candidates, totalling 15 fits
[CV 1/5] END ...................n_estimators=50;, score=0.850 total time=   0.1s
[CV 2/5] END ...................n_estimators=50;, score=0.950 total time=   0.1s
[CV 3/5] END ...................n_estimators=50;, score=0.950 total time=   0.0s
[CV 4/5] END ...................n_estimators=50;, score=0.900 total time=   0.0s
[CV 5/5] END ...................n_estimators=50;, score=0.900 total time=   0.1s
[CV 1/5] END ..................n_estimators=100;, score=0.900 total time=   0.1s
[CV 2/5] END ..................n_estimators=100;, score=0.950 total time=   0.1s
[CV 3/5] END ..................n_estimators=100;, score=0.950 total time=   0.1s
[CV 4/5] END ..................n_estimators=100;, score=0.900 total time=   0.1s
[CV 5/5] END ..................n_estimators=100;, score=0.900 total time=   0.1s
[CV 1/5] END ..................n_estimators=250;, score=0.900 total time=   0.3s
[CV 2/5] END ..................n_estimators=250;,

It is also simple to define as benchmark a model that you already fitted and still benefit from the other functionalities provided by `Benchmark` class. This can also be done in case you are using a saved version of a fitted model (eg, the model you are using in production) and want to have that as the benchmark.

In [None]:
from sklearn.ensemble import RandomForestClassifier
forest = RandomForestClassifier().fit(X, y)

bm.set_benchmark(estimator=forest)

assert forest == bm.benchmark
assert hasattr(bm.benchmark, "predict")
assert bm.predict(X).shape == y.shape

### Regression tasks

The default benchmark for regression tasks is a [`RandomForestRegressor`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html) object.  Its parameters are fine-tuned in each case according to the user's data.

In [None]:
#hide
#export
from sklearn.base import RegressorMixin
from sklearn.ensemble import RandomForestRegressor, VotingRegressor

class RegressionBenchmark(ggdBenchmark, RegressorMixin):
    def __init__(self, 
    cv=None, 
    estimator=RandomForestRegressor(), 
    param_grid={'n_estimators': [50, 100, 250]}, 
    param_search=GridSearchCV, 
    scoring=None, 
    auto_document=ModelCard, 
    random_state=None,
    verbose_grid=False,
    ensemble_method=VotingRegressor):
        self.cv = cv
        self.estimator = estimator
        self.param_grid = param_grid
        self.param_search = param_search
        self.scoring = scoring
        self.auto_document = auto_document
        self.random_state = random_state
        self.verbose_grid = verbose_grid
        self.ensemble_method = ensemble_method

    def fit(self, X, y=None):
        self._fit(X, y)
        return self

In [None]:
show_doc(RegressionBenchmark)

<h2 id="RegressionBenchmark" class="doc_header"><code>class</code> <code>RegressionBenchmark</code><a href="" class="source_link" style="float:right">[source]</a></h2>

> <code>RegressionBenchmark</code>(**`cv`**=*`None`*, **`estimator`**=*`RandomForestRegressor()`*, **`param_grid`**=*`{'n_estimators': [50, 100, 250]}`*, **`param_search`**=*`GridSearchCV`*, **`scoring`**=*`None`*, **`auto_document`**=*`ModelCard`*, **`random_state`**=*`None`*, **`verbose_grid`**=*`False`*, **`ensemble_method`**=*`VotingRegressor`*) :: [`ggdBenchmark`](/gingado/benchmark.html#ggdBenchmark)

The base class for gingado's Benchmark objects.

In [None]:
from gingado.benchmark import RegressionBenchmark
from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import make_regression

# some mock up data
X, y = make_regression()

# the gingado benchmark
bm = RegressionBenchmark(verbose_grid=3).fit(X, y)

# note that now the `bm` object can be used as an estimator
assert bm.predict(X).shape == y.shape

# the user might also like to set another model as the benchmark
forest = RandomForestRegressor().fit(X, y)
bm.set_benchmark(estimator=forest)

assert forest == bm.benchmark
assert hasattr(bm.benchmark, "predict")
assert bm.predict(X).shape == y.shape

Fitting 5 folds for each of 3 candidates, totalling 15 fits
[CV 1/5] END ...................n_estimators=50;, score=0.156 total time=   0.1s
[CV 2/5] END ...................n_estimators=50;, score=0.342 total time=   0.1s
[CV 3/5] END ...................n_estimators=50;, score=0.287 total time=   0.1s
[CV 4/5] END ...................n_estimators=50;, score=0.235 total time=   0.1s
[CV 5/5] END ...................n_estimators=50;, score=0.290 total time=   0.1s
[CV 1/5] END ..................n_estimators=100;, score=0.204 total time=   0.2s
[CV 2/5] END ..................n_estimators=100;, score=0.328 total time=   0.2s
[CV 3/5] END ..................n_estimators=100;, score=0.239 total time=   0.2s
[CV 4/5] END ..................n_estimators=100;, score=0.244 total time=   0.2s
[CV 5/5] END ..................n_estimators=100;, score=0.269 total time=   0.2s
[CV 1/5] END ..................n_estimators=250;, score=0.165 total time=   0.5s
[CV 2/5] END ..................n_estimators=250;,

In [None]:
bm.compare(X, y, forest)

Fitting 5 folds for each of 102 candidates, totalling 510 fits
[CV 1/5] END candidate_estimator=RandomForestRegressor(), candidate_estimator__bootstrap=True, candidate_estimator__ccp_alpha=0.0, candidate_estimator__criterion=squared_error, candidate_estimator__max_depth=None, candidate_estimator__max_features=auto, candidate_estimator__max_leaf_nodes=None, candidate_estimator__max_samples=None, candidate_estimator__min_impurity_decrease=0.0, candidate_estimator__min_samples_leaf=1, candidate_estimator__min_samples_split=2, candidate_estimator__min_weight_fraction_leaf=0.0, candidate_estimator__n_estimators=100, candidate_estimator__n_jobs=None, candidate_estimator__oob_score=False, candidate_estimator__random_state=None, candidate_estimator__verbose=0, candidate_estimator__warm_start=False;, score=0.128 total time=   0.2s
[CV 2/5] END candidate_estimator=RandomForestRegressor(), candidate_estimator__bootstrap=True, candidate_estimator__ccp_alpha=0.0, candidate_estimator__criterion=squa

### General comments on benchmarks

#### Scoring

`ClassificationBenchmark` and `RegressionBenchmark` both use the default scoring method for comparing model alternatives, both during estimation of the benchmark model and when comparing this benchmark with candidate models. Users are encouraged to consider if another scoring method is more suitable for their use case. More information on available scoring methods that are compatible with `gingado` Benchmark objects can be found [here](https://scikit-learn.org/stable/modules/model_evaluation.html).

### Data split

Please refer to [this page](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.model_selection) for more information on the different `Splitter` classes available on `scikit-learn`, and [this page](https://scikit-learn.org/stable/auto_examples/model_selection/plot_cv_indices.html#sphx-glr-auto-examples-model-selection-plot-cv-indices-py) for practical advice on how to choose a splitter for data that are not time series. Any one of these objects (or a custom splitter that is compatible with them) can be passed to a `Benchmark` object.

The API does not accept custom parameters for the splitters. Users that wish to use specific parameters should include the actual `Splitter` object as the parameter.

### Custom benchmarks

`gingado` provides users with two `Benchmark` objects out of the box: `ClassificationBenchmark` and `RegressionBenchmark`, to be used depending on the task at hand. Both classes derive from a base class `ggdBenchmark`, which implements methods that facilitate model comparison. Users that want to create a customised benchmark model for themselves have two options:

* the simpler possibility is to train the estimator as usual, and then assign the fitted estimator to a `Benchmark` object. 
* if the user wants more control over the fitting process of estimating the benchmark, they can create a class that subclasses from `ggdBenchmark` and either implements custom `fit`, `predict` and `score` methods, or also subclasses from [`scikit-learn`'s `BaseEstimator`](https://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html). 
  * In any case, if the user wants the benchmark to automatically detect if the data is a time series and also to document the model right after fitting, the `fit` method should call `self._fit` on the data. Otherwise, the user can simply implement any consistent logic in fit as the user sees fit (pun intended).
