# Evaluate metrics and record fit and score times

#### Authors:

* Juan Carlos Alfaro Jiménez

In this notebook, we evaluate the performance of an estimator by using cross-validation. Below, we detail the dataset used, the estimator tested and the results (in terms of score and time).

## 1. Data

First, we load the data to fit (`X`) and the target rankings to try to predict (`Y`). But before, let us obtain the identifier of the dataset (`data_id`) used to fetch the benchmark data from the [`OpenML` repository](https://www.openml.org/u/25829/data):

---

**Note:** All the parameters are provided by `guildai` via environment variables with prefix `FLAG`.

---

In [None]:
import os

In [None]:
data_id = os.environ.get("FLAG_DATA_ID")

Now, we can fetch the dataset:

In [None]:
from sklearn.datasets import fetch_openml

In [None]:
data = fetch_openml(data_id=data_id, as_frame=False)

In [None]:
X, Y = data["data"], data["target"]

Finally, let us print the name of the dataset:

In [None]:
name = data["details"]["name"]

In [None]:
print(f"The {name} dataset will be used.")

## 2. Estimator

Second, we initialize the estimator object (`estimator`) to use to fit the data. But first, let us obtain the model (`model`) and estimator type (`estimator_type`) to evaluate:

In [None]:
model = os.environ.get("GUILD_OP").split(":")[0]

In [None]:
estimator_type = os.environ.get("FLAG_ESTIMATOR_TYPE")

Now, we can get the estimator:

In [None]:
estimator = __import__(model).get_estimator(estimator_type)

Let us print some information about the estimator:

In [None]:
print(f"The {model} {estimator_type} will be tested.")

Then, we obtain the object (`preprocessing`) to delete a label from the training dataset. First, we get the probability for the deletion of a label (`probability`):

In [None]:
probability = os.environ.get("FLAG_PROBABILITY")

In [None]:
probability = float(probability)

Let us print the probability:

In [None]:
print(f"The probability to delete a label is {probability}.")

Now, we declare the random number generator (`rng`) to use for the deletion of a label:

---

**Note**: We initialize the `RandomState` instance outside of the function to delete a different set of labels on each fold of the cross-validation.

---

In [None]:
random_state = os.environ.get("FLAG_RANDOM_STATE")

In [None]:
random_state = int(random_state)

In [None]:
a = [False, True]

In [None]:
p = [1 - probability, probability]

In [None]:
from sklearn.utils import check_random_state

In [None]:
rng = check_random_state(random_state)

Next, we define the object to use to delete a label:

In [None]:
import numpy as np

In [None]:
func = lambda X, Y = X, np.where(rng.choice(a, Y.shape, p=p), -1, Y)

In [None]:
from imblearn import FunctionSampler

In [None]:
preprocessing = FunctionSampler(func=func, validate=False)

Finally, we integrate these objects within a pipeline:

In [None]:
from imblearn.pipeline import make_pipeline

In [None]:
estimator = make_pipeline(preprocessing, estimator)

## 3. Cross-validation strategy

Third, we define the strategy to evaluate the performance of the cross-validated estimator on the test dataset. In particular, we use a $ r \times k $ cross-validation method. Let us obtain the number of folds (`n_splits`) and the number of times cross-validator needs to be repeated (`n_repeats`):

In [None]:
n_splits = os.environ.get("FLAG_N_SPLITS")

In [None]:
n_splits = int(n_splits)

In [None]:
n_repeats = os.environ.get("FLAG_N_REPEATS")

In [None]:
n_repeats = int(n_repeats)

Then, we can initialize the cross-validation strategy:

In [None]:
from sklearn.model_selection import RepeatedKFold

In [None]:
cv = RepeatedKFold(n_splits=n_splits, n_repeats=n_repeats, random_state=random_state)

## 4. Evaluation

Fourth, we obtain the array scores of the estimator for each run of the cross validation. Before that, we obtain the number of jobs to run in parallel (`n_jobs`) and the verbosity level (`verbose`):

In [None]:
n_jobs = os.environ.get("FLAG_N_JOBS")

In [None]:
n_jobs = int(n_jobs)

In [None]:
verbose = os.environ.get("FLAG_VERBOSE")

In [None]:
verbose = int(verbose)

Now, we validate the estimator:

In [None]:
from sklearn.model_selection import cross_validate

In [None]:
scores = cross_validate(estimator, X, Y, cv=cv, n_jobs=n_jobs, verbose=verbose, return_train_score=True, return_estimator=True)

Moreover, we store the scores in a `csv` file:

In [None]:
import pandas as pd

In [None]:
df = pd.DataFrame(scores)

In [None]:
df = df.drop("estimator", axis=1)

In [None]:
path = "scores.csv"

In [None]:
df.to_csv(path)

Furthermore, we save the estimators in a `tar.gz` file:

In [None]:
estimators = scores["estimator"]

In [None]:
import tarfile

In [None]:
tar = tarfile.open("estimators.tar.gz", "w:gz")

In [None]:
from joblib import dump

In [None]:
for index, estimator in enumerate(estimators):
    #
    name = f"estimator_{index}.joblib"

    #
    dump(estimator, name)

    #
    tar.add(name)

    #
    os.remove(name)

## 5. Results

Finally, we show the results from the experiment. Let us start with the test score:

In [None]:
test_score = scores["test_score"].mean(axis=0)

In [None]:
print(f"test_score: {test_score}")

We follow up with the train score:

In [None]:
train_score = scores["train_score"].mean(axis=0)

In [None]:
print(f"train_score: {train_score}")

We continue with the time for fitting the estimator:

In [None]:
fit_time = scores["fit_time"].mean(axis=0)

In [None]:
print(f"fit_time: {fit_time}")

Finally, we show the time for scoring the estimator:

In [None]:
score_time = scores["score_time"].mean(axis=0)

In [None]:
print(f"score_time: {score_time}")