# Evaluate metrics and record fit and score times

#### Authors:

* Juan Carlos Alfaro Jiménez

---

**Note:** All the parameters are provided by `guildai` via environment variables with prefix `FLAG`.

---

In this notebook, we evaluate

## 1. Data

First, we load the data to fit (`X`) and the target rankings to try to predict (`Y`):

---

**Note**: We use the `fetch_openml` function because the benchmark data has been uploaded to the `OpenML` repository.

---

In [None]:
import os

In [None]:
data_id = os.environ.get("FLAG_DATA_ID")

In [None]:
from sklearn.datasets import fetch_openml

In [None]:
data = fetch_openml(data_id=data_id)

In [None]:
X, Y = data["data"], data["target"]

Let us print the name of the dataset:

In [None]:
name = data["details"]["name"]

In [None]:
f"The experiment will be run using the {name} dataset."

## 2. Estimator

Second, we initialize the estimator object (`estimator`) to use to fit the data:

In [None]:
module = os.environ.get("GUILD_OP").split(":")[0]

In [None]:
estimator_type = os.environ.get("FLAG_ESTIMATOR_TYPE")

In [None]:
estimator = __import__(module).estimator

## 3. Cross-validation strategy

Third, we define the strategy to evaluate the performance of the cross-validated estimator on the test dataset. In particular, we use a $ r \times k $ cross-validation method:

In [None]:
n_splits = os.environ.get("FLAG_N_SPLITS")

In [None]:
n_repeats = os.environ.get("FLAG_N_REPEATS")

In [None]:
random_state = os.environ.get("FLAG_RANDOM_STATE")

In [None]:
from sklearn.model_selection import RepeatedKFold

In [None]:
cv = RepeatedKFold(n_splits=n_splits, n_repeats=n_repeats, random_state=random_state)