<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

In [None]:
#| include: false
! [ -e /content ] && pip install -Uqq gingado nbdev # install or upgrade gingado on colab

In [None]:
#| include: false
#| echo: false
# Code below included to ensure compatibility with scikit-learn v1.1.x
from sklearn import set_config
set_config(display='text')

In [None]:
#| include: false
from nbdev.showdoc import show_doc

A Benchmark object has a similar API to a `sciki-learn` estimator: you build an instance with the desired arguments, and fit it to the data at a later moment. Benchmarks is a convenience wrapper for reading the training data, passing it through a simplified pipeline consisting of data imputation and a standard scalar, and then the benchmark function calibrated with a grid search.

A `gingado` Benchmark object seeks to automatise a significant part of creating a benchmark model. Importantly, the Benchmark object also has a `compare` method that helps users evaluate if candidate models are better than the benchmark, and if one of them is, it becomes the new benchmark. This `compare` method takes as argument another fitted estimator (which could be itself a solo estimator or a whole pipeline) or a list of fitted estimators. 

Benchmarks start with default values that should perform reasonably well in most settings, but the user is also free to choose any of the benchmark's components by passing as arguments the data split, pipeline, and/or a dictionary of parameters for the hyperparameter tuning.

## Base class

`gingado` has a [`ggdBenchmark`](https://dkgaraujo.github.io/gingado/benchmark.html#ggdbenchmark) base class that contains the basic functionalities for Benchmark objects. It is not meant to be used by itself, but only as a hyperclass for Benchmark objects. `gingado` ships with two of these objects that subclass [`ggdBenchmark`](https://dkgaraujo.github.io/gingado/benchmark.html#ggdbenchmark): [`ClassificationBenchmark`](https://dkgaraujo.github.io/gingado/benchmark.html#classificationbenchmark) and [`RegressionBenchmark`](https://dkgaraujo.github.io/gingado/benchmark.html#regressionbenchmark). They are both described below in their respective sections.

Users are encouraged to submit a PR with their own benchmark models subclassing [`ggdBenchmark`](https://dkgaraujo.github.io/gingado/benchmark.html#ggdbenchmark).

In [None]:
#| echo: false
#| output: asis
show_doc(ggdBenchmark)

---

[source](https://github.com/dkgaraujo/gingado/blob/main/gingado/benchmark.py#L24){target="_blank" style="float:right; font-size:smaller"}

### ggdBenchmark

>      ggdBenchmark ()

The base class for gingado's Benchmark objects.

In [None]:
#| echo: false
#| output: asis
show_doc(ggdBenchmark.compare)

---

[source](https://github.com/dkgaraujo/gingado/blob/main/gingado/benchmark.py#L96){target="_blank" style="float:right; font-size:smaller"}

### ggdBenchmark.compare

>      ggdBenchmark.compare (X:numpy.ndarray, y:numpy.ndarray, candidates,
>                            ensemble_method='object_default',
>                            update_benchmark:bool=True)

Use a testing dataset to compare the performance of the fitted benchmark model with one or more candidate models using a grid search

|    | **Type** | **Default** | **Details** |
| -- | -------- | ----------- | ----------- |
| X | ndarray |  | Array-like data of shape (n_samples, n_features) |
| y | ndarray |  | Array-like data of shape (n_samples,) or (n_samples, n_targets) |
| candidates |  |  | Candidate estimator or list of candidate estimator(s) |
| ensemble_method | str | object_default |  |
| update_benchmark | bool | True | Whether to use the best performing candidate model as the new benchmark |

In [None]:
#| echo: false
#| output: asis
show_doc(ggdBenchmark.document)

---

[source](https://github.com/dkgaraujo/gingado/blob/main/gingado/benchmark.py#L150){target="_blank" style="float:right; font-size:smaller"}

### ggdBenchmark.document

>      ggdBenchmark.document (documenter=None)

Document the benchmark model using the template in `documenter`

|    | **Type** | **Default** | **Details** |
| -- | -------- | ----------- | ----------- |
| documenter | NoneType | None | A gingado Documenter or the documenter set in `auto_document` if None. |

## Classification tasks

The default benchmark for classification tasks is a [`RandomForestClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) object. Its parameters are fine-tuned in each case according to the user's data.

In [None]:
#| echo: false
#| output: asis
show_doc(ClassificationBenchmark)

---

[source](https://github.com/dkgaraujo/gingado/blob/main/gingado/benchmark.py#L198){target="_blank" style="float:right; font-size:smaller"}

### ClassificationBenchmark

>      ClassificationBenchmark (cv=None,
>                               estimator=RandomForestClassifier(oob_score=True)
>                               , param_grid={'n_estimators': [100, 250],
>                               'max_features': ['sqrt', 'log2', None]},
>                               param_search=<class
>                               'sklearn.model_selection._search.GridSearchCV'>,
>                               scoring=None, auto_document=<class
>                               'gingado.model_documentation.ModelCard'>,
>                               random_state=None, verbose_grid=False,
>                               ensemble_method=<class
>                               'sklearn.ensemble._voting.VotingClassifier'>)

A gingado Benchmark object used for classification tasks

In [None]:
#| echo: false
#| output: asis
show_doc(ClassificationBenchmark.fit)

---

[source](https://github.com/dkgaraujo/gingado/blob/main/gingado/benchmark.py#L220){target="_blank" style="float:right; font-size:smaller"}

### ClassificationBenchmark.fit

>      ClassificationBenchmark.fit (X:numpy.ndarray, y=None)

Fit the [`ClassificationBenchmark`](https://dkgaraujo.github.io/gingado/benchmark.html#classificationbenchmark) model

|    | **Type** | **Default** | **Details** |
| -- | -------- | ----------- | ----------- |
| X | ndarray |  | Array-like data of shape (n_samples, n_features) |
| y | NoneType | None | Array-like data of shape (n_samples,) or (n_samples, n_targets) or None |

In [None]:
from sklearn.datasets import make_classification

In [None]:
# some mock up data
X, y = make_classification()

# the gingado benchmark
bm = ClassificationBenchmark(verbose_grid=3).fit(X, y)

# note that now the `bm` object can be used as an estimator
assert bm.predict(X).shape == y.shape

Fitting 5 folds for each of 6 candidates, totalling 30 fits
[CV 1/5] END max_features=sqrt, n_estimators=100;, score=1.000 total time=   0.1s
[CV 2/5] END max_features=sqrt, n_estimators=100;, score=0.750 total time=   0.1s
[CV 3/5] END max_features=sqrt, n_estimators=100;, score=0.950 total time=   0.1s
[CV 4/5] END max_features=sqrt, n_estimators=100;, score=1.000 total time=   0.1s
[CV 5/5] END max_features=sqrt, n_estimators=100;, score=0.950 total time=   0.1s
[CV 1/5] END max_features=sqrt, n_estimators=250;, score=1.000 total time=   0.3s
[CV 2/5] END max_features=sqrt, n_estimators=250;, score=0.750 total time=   0.3s
[CV 3/5] END max_features=sqrt, n_estimators=250;, score=0.950 total time=   0.3s
[CV 4/5] END max_features=sqrt, n_estimators=250;, score=1.000 total time=   0.3s
[CV 5/5] END max_features=sqrt, n_estimators=250;, score=0.900 total time=   0.3s
[CV 1/5] END max_features=log2, n_estimators=100;, score=1.000 total time=   0.1s
[CV 2/5] END max_features=log2, n_esti

Importantly, `gingado` automatically provides some information to help the user documentat the benchmark model. More specifically, [`ggdBenchmark`](https://dkgaraujo.github.io/gingado/benchmark.html#ggdbenchmark) objects collect model information and pass it to a dictionary with key `info` in a field called `model_details`. 

In [None]:
bm.model_documentation.show_json()

{'model_details': {'developer': 'Person or organisation developing the model',
  'datetime': '2022-09-24 00:37:21 ',
  'version': 'Model version',
  'type': 'Model type',
  'info': {'_estimator_type': 'classifier',
   'best_estimator_': RandomForestClassifier(max_features=None, oob_score=True),
   'best_index_': 4,
   'best_params_': {'max_features': None, 'n_estimators': 100},
   'best_score_': 0.95,
   'classes_': array([0, 1]),
   'cv_results_': {'mean_fit_time': array([0.11072736, 0.30638943, 0.12044678, 0.29842086, 0.11654415,
           0.31644664]),
    'std_fit_time': array([0.00281184, 0.01535436, 0.0065928 , 0.01652688, 0.00285051,
           0.00959558]),
    'mean_score_time': array([0.00769496, 0.0195919 , 0.00786419, 0.0195241 , 0.00688624,
           0.0202014 ]),
    'std_score_time': array([0.00060471, 0.00127096, 0.0007323 , 0.00172929, 0.00021744,
           0.00237004]),
    'param_max_features': masked_array(data=['sqrt', 'sqrt', 'log2', 'log2', None, None],
      

It is also simple to define as benchmark a model that you already fitted and still benefit from the other functionalities provided by `Benchmark` class. This can also be done in case you are using a saved version of a fitted model (eg, the model you are using in production) and want to have that as the benchmark.

In [None]:
from sklearn.ensemble import RandomForestClassifier

In [None]:
forest = RandomForestClassifier().fit(X, y)

bm.set_benchmark(estimator=forest)

assert forest == bm.benchmark
assert hasattr(bm.benchmark, "predict")
assert bm.predict(X).shape == y.shape

## Regression tasks

The default benchmark for regression tasks is a [`RandomForestRegressor`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html) object.  Its parameters are fine-tuned in each case according to the user's data.

In [None]:
#| echo: false
#| output: asis
show_doc(RegressionBenchmark)

---

[source](https://github.com/dkgaraujo/gingado/blob/main/gingado/benchmark.py#L236){target="_blank" style="float:right; font-size:smaller"}

### RegressionBenchmark

>      RegressionBenchmark (cv=None,
>                           estimator=RandomForestRegressor(oob_score=True),
>                           param_grid={'n_estimators': [100, 250],
>                           'max_features': ['sqrt', 'log2', None]},
>                           param_search=<class
>                           'sklearn.model_selection._search.GridSearchCV'>,
>                           scoring=None, auto_document=<class
>                           'gingado.model_documentation.ModelCard'>,
>                           random_state=None, verbose_grid=False,
>                           ensemble_method=<class
>                           'sklearn.ensemble._voting.VotingRegressor'>)

A gingado Benchmark object used for regression tasks

In [None]:
#| echo: false
#| output: asis
show_doc(RegressionBenchmark.fit)

---

[source](https://github.com/dkgaraujo/gingado/blob/main/gingado/benchmark.py#L258){target="_blank" style="float:right; font-size:smaller"}

### RegressionBenchmark.fit

>      RegressionBenchmark.fit (X:numpy.ndarray, y=None)

Fit the [`RegressionBenchmark`](https://dkgaraujo.github.io/gingado/benchmark.html#regressionbenchmark) model

|    | **Type** | **Default** | **Details** |
| -- | -------- | ----------- | ----------- |
| X | ndarray |  | Array-like data of shape (n_samples, n_features) |
| y | NoneType | None | Array-like data of shape (n_samples,) or (n_samples, n_targets) or None |

In [None]:
from sklearn.datasets import make_regression
from sklearn.ensemble import AdaBoostRegressor

In [None]:
# some mock up data
X, y = make_regression()

# the gingado benchmark
bm = RegressionBenchmark(verbose_grid=2).fit(X, y)

# note that now the `bm` object can be used as an estimator
assert bm.predict(X).shape == y.shape

# the user might also like to set another model as the benchmark
adaboost = AdaBoostRegressor().fit(X, y)
bm.set_benchmark(estimator=adaboost)

assert adaboost == bm.benchmark
assert hasattr(bm.benchmark, "predict")
assert bm.predict(X).shape == y.shape

Fitting 5 folds for each of 6 candidates, totalling 30 fits
[CV] END ................max_features=sqrt, n_estimators=100; total time=   0.1s
[CV] END ................max_features=sqrt, n_estimators=100; total time=   0.1s
[CV] END ................max_features=sqrt, n_estimators=100; total time=   0.1s
[CV] END ................max_features=sqrt, n_estimators=100; total time=   0.1s
[CV] END ................max_features=sqrt, n_estimators=100; total time=   0.1s
[CV] END ................max_features=sqrt, n_estimators=250; total time=   0.3s
[CV] END ................max_features=sqrt, n_estimators=250; total time=   0.3s
[CV] END ................max_features=sqrt, n_estimators=250; total time=   0.3s
[CV] END ................max_features=sqrt, n_estimators=250; total time=   0.3s
[CV] END ................max_features=sqrt, n_estimators=250; total time=   0.3s
[CV] END ................max_features=log2, n_estimators=100; total time=   0.1s
[CV] END ................max_features=log2, n_est

Below we compare the benchmark (set above manually to be the adaboost algorithm) with two other candidate models: a Gaussian process and a linear Support Vector Machine (SVM).

In [None]:
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.svm import LinearSVR

In [None]:
gauss_reg = GaussianProcessRegressor().fit(X, y)
svm_reg = LinearSVR().fit(X, y)

bm.compare(X, y, candidates=[gauss_reg, svm_reg])

Fitting 5 folds for each of 4 candidates, totalling 20 fits
[CV] END candidate_estimator=AdaBoostRegressor(), candidate_estimator__base_estimator=None, candidate_estimator__learning_rate=1.0, candidate_estimator__loss=linear, candidate_estimator__n_estimators=50, candidate_estimator__random_state=None; total time=   0.1s
[CV] END candidate_estimator=AdaBoostRegressor(), candidate_estimator__base_estimator=None, candidate_estimator__learning_rate=1.0, candidate_estimator__loss=linear, candidate_estimator__n_estimators=50, candidate_estimator__random_state=None; total time=   0.1s
[CV] END candidate_estimator=AdaBoostRegressor(), candidate_estimator__base_estimator=None, candidate_estimator__learning_rate=1.0, candidate_estimator__loss=linear, candidate_estimator__n_estimators=50, candidate_estimator__random_state=None; total time=   0.1s
[CV] END candidate_estimator=AdaBoostRegressor(), candidate_estimator__base_estimator=None, candidate_estimator__learning_rate=1.0, candidate_estimator

Note that when the benchmark object finds a model that performs better than it does, the user is informed that the benchmark is updated and the new benchmark model is shown. This only happens when the argument `update_benchmark` is set to True (as default).

Below we can see by how much it outperformed the other candidates, including the previous benchmark model and an ensemble of the previous benchmark and all the candidates. It is also a good opportunity to see how stable the performance of each model was, as judged by the standard deviation of the scores across the validation folds.

In [None]:
pd.DataFrame(bm.benchmark.cv_results_)[['params', 'mean_test_score', 'std_test_score', 'rank_test_score']]

Unnamed: 0,params,mean_test_score,std_test_score,rank_test_score
0,{'candidate_estimator': (DecisionTreeRegressor...,0.354251,0.092104,1
1,{'candidate_estimator': GaussianProcessRegress...,-0.138693,0.137167,4
2,"{'candidate_estimator': LinearSVR(), 'candidat...",0.298425,0.103647,2
3,{'candidate_estimator': VotingRegressor(estima...,0.216893,0.097473,3


## General comments on benchmarks

### Scoring

[`ClassificationBenchmark`](https://dkgaraujo.github.io/gingado/benchmark.html#classificationbenchmark) and [`RegressionBenchmark`](https://dkgaraujo.github.io/gingado/benchmark.html#regressionbenchmark) both use the default scoring method for comparing model alternatives, both during estimation of the benchmark model and when comparing this benchmark with candidate models. Users are encouraged to consider if another scoring method is more suitable for their use case. More information on available scoring methods that are compatible with `gingado` Benchmark objects can be found [here](https://scikit-learn.org/stable/modules/model_evaluation.html).

### Data split

`gingado` benchmarks rely on hyperparameter tuning to discover the benchmark specification that is most likely to perform better with the user data. This tuning in turn depends on a data splitting strategy for the cross-validation. By default, `gingado` uses `StratifiedShuffleSplit` if the data is not time series and `TimeSeriesSplit` otherwise.

Please refer to [this page](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.model_selection) for more information on the different `Splitter` classes available on `scikit-learn`, and [this page](https://scikit-learn.org/stable/auto_examples/model_selection/plot_cv_indices.html#sphx-glr-auto-examples-model-selection-plot-cv-indices-py) for practical advice on how to choose a splitter for data that are not time series. Any one of these objects (or a custom splitter that is compatible with them) can be passed to a `Benchmark` object.

The API does not accept custom parameters for the splitters. Users that wish to use specific parameters should include the actual `Splitter` object as the parameter.

### Custom benchmarks

`gingado` provides users with two `Benchmark` objects out of the box: [`ClassificationBenchmark`](https://dkgaraujo.github.io/gingado/benchmark.html#classificationbenchmark) and [`RegressionBenchmark`](https://dkgaraujo.github.io/gingado/benchmark.html#regressionbenchmark), to be used depending on the task at hand. Both classes derive from a base class [`ggdBenchmark`](https://dkgaraujo.github.io/gingado/benchmark.html#ggdbenchmark), which implements methods that facilitate model comparison. Users that want to create a customised benchmark model for themselves have two options:

* the simpler possibility is to train the estimator as usual, and then assign the fitted estimator to a `Benchmark` object. 

* if the user wants more control over the fitting process of estimating the benchmark, they can create a class that subclasses from [`ggdBenchmark`](https://dkgaraujo.github.io/gingado/benchmark.html#ggdbenchmark) and either implements custom `fit`, `predict` and `score` methods, or also subclasses from [`scikit-learn`'s `BaseEstimator`](https://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html). 
  * In any case, if the user wants the benchmark to automatically detect if the data is a time series and also to document the model right after fitting, the `fit` method should call `self._fit` on the data. Otherwise, the user can simply implement any consistent logic in fit as the user sees fit (pun intended).

In [None]:
#| echo: false
import nbdev; nbdev.nbdev_export()