<h1><center>Metadata Routing in scikit-learn</center></h1>
<h2><center>Case Study: Revenue based scoring in `GridSearchCV`</center></h2>
<h3><center>Adrin Jalali</center></h3>
<h4><center>github.com/adrinjalali/talks</center></h3>
<h4><center>@probabl.ai</center></h3>
<h4><center>April 2024</center></h3>

## Me
- PhD in interpretable methods for cancer diagnostics
- ML consulting
- Worked in an algorithmic privacy and fairness team
- Cofounder @probabl.ai, Open Source
- Open source
    - `scikit-learn`
    - `fairlearn`
    - `skops`

## ToC
- scikit-learn's API and limitations
    - nested cross validation scenario
    - `sample_weight`, `groups`, custom scorer and metadata
- Fix the issues with metadata routing
- Custom scorer using arbitrary metadata

## scikit-learn's API

### Simple Estimators
```python
estimator = LogisticRegression()
estimator.fit(X_train, y_train)
estimator.predict(X_test)
```

### Meta-Estimators
```python
est = GridSearchCV(LogisticRegression(), param_grid=...)
est.fit(X_train, y_train)
est.predict(X_test)
```

## Metadata
- In scikit-learn: `sample_weight`, `groups`
- Fairness related: `gender`, `zipcode`, `race`
- Business related: `transaction-revenue`, `customer-risk`

## Old scikit-learn (aka. w/o metadata routing)

In [1]:
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import (
    GridSearchCV, cross_validate,
    GroupKFold, KFold
)

rng = np.random.default_rng(42)
X = rng.random(size=(100, 10))
y = rng.integers(0, 2, size=100)
groups = rng.integers(0, 10, size=100)
revenue = rng.random(100)
sample_weight = rng.random(100)

## `sample_weight`

In [2]:
cv = KFold(n_splits=5)
model = LogisticRegression()
grid = {"C": [1, 10]}
search = GridSearchCV(model, grid, cv=cv, n_jobs=1)

In [3]:
search.fit(X, y)

In [4]:
# sample weights?
search.fit(X, y, sample_weight=sample_weight)

In [5]:
# sample weights?
outer_cv = KFold(n_splits=5)
outer_eval = cross_validate(
    search,
    X,
    y=y,
    cv=outer_cv,
    n_jobs=1,
    params={"sample_weight": sample_weight}
)
print(outer_eval["test_score"])

[0.5  0.35 0.35 0.45 0.3 ]


In [6]:
cv = KFold(n_splits=5)
model = make_pipeline(
    StandardScaler(),
    LogisticRegression()
)
grid = {"logisticregression__C": [1, 10]}
search = GridSearchCV(model, grid, cv=cv, n_jobs=1)

In [7]:
search.fit(X, y)

In [9]:
# sample weights?
search.fit(
    X, y,
    logisticregression__sample_weight=sample_weight,
    standardscaler__sample_weight=sample_weight,
)

In [11]:
# sample weights?
outer_cv = KFold(n_splits=5)
outer_eval = cross_validate(
    search,
    X,
    y=y,
    cv=outer_cv,
    n_jobs=1,
    params={
        "logisticregression__sample_weight": sample_weight,
        "standardscaler__sample_weight": sample_weight,
    }
)
print(outer_eval["test_score"])

[0.6  0.35 0.35 0.55 0.3 ]


## `groups`

In [12]:
inner_cv = GroupKFold(n_splits=5)
model = LogisticRegression()
grid = {"C": [1, 10]}
search = GridSearchCV(model, grid, cv=inner_cv, n_jobs=1)

In [13]:
search.fit(X, y)

ValueError: The 'groups' parameter should not be None.

In [14]:
# groups?
search.fit(X, y, groups=groups)

In [15]:
# groups?
outer_cv = GroupKFold(n_splits=5)
outer_eval = cross_validate(
    search,
    X,
    y=y,
    cv=outer_cv,
    n_jobs=1,
    groups=groups,
)

ValueError: 
All the 5 fits failed.
It is very likely that your model is misconfigured.
You can try to debug the error by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
5 fits failed with the following error:
Traceback (most recent call last):
  File "/home/adrin/miniforge3/envs/talks/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 895, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/home/adrin/miniforge3/envs/talks/lib/python3.9/site-packages/sklearn/base.py", line 1474, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "/home/adrin/miniforge3/envs/talks/lib/python3.9/site-packages/sklearn/model_selection/_search.py", line 970, in fit
    self._run_search(evaluate_candidates)
  File "/home/adrin/miniforge3/envs/talks/lib/python3.9/site-packages/sklearn/model_selection/_search.py", line 1527, in _run_search
    evaluate_candidates(ParameterGrid(self.param_grid))
  File "/home/adrin/miniforge3/envs/talks/lib/python3.9/site-packages/sklearn/model_selection/_search.py", line 928, in evaluate_candidates
    for (cand_idx, parameters), (split_idx, (train, test)) in product(
  File "/home/adrin/miniforge3/envs/talks/lib/python3.9/site-packages/sklearn/model_selection/_split.py", line 377, in split
    for train, test in super().split(X, y, groups):
  File "/home/adrin/miniforge3/envs/talks/lib/python3.9/site-packages/sklearn/model_selection/_split.py", line 108, in split
    for test_index in self._iter_test_masks(X, y, groups):
  File "/home/adrin/miniforge3/envs/talks/lib/python3.9/site-packages/sklearn/model_selection/_split.py", line 120, in _iter_test_masks
    for test_index in self._iter_test_indices(X, y, groups):
  File "/home/adrin/miniforge3/envs/talks/lib/python3.9/site-packages/sklearn/model_selection/_split.py", line 563, in _iter_test_indices
    raise ValueError("The 'groups' parameter should not be None.")
ValueError: The 'groups' parameter should not be None.


In [16]:
# groups with a simple estimator
outer_cv = GroupKFold(n_splits=5)
outer_eval = cross_validate(
    LogisticRegression(),
    X,
    y=y,
    groups=groups,
    cv=outer_cv,
    n_jobs=1,
)
print(outer_eval["test_score"])

[0.5        0.41176471 0.31818182 0.52380952 0.35      ]


## Custom Scorer

In [17]:
from sklearn.metrics import fbeta_score, make_scorer

def my_score(y_true, y_pred, sample_weight=None):
    assert sample_weight is not None, "I LOVE sample weights!"
    return fbeta_score(
        y_true=y_true,
        y_pred=y_pred,
        sample_weight=sample_weight,
        beta=2 # recall twice as important
    )

my_scorer = make_scorer(my_score)

In [18]:
# custom score?
cv = KFold(n_splits=5)
model = LogisticRegression()
grid = {"C": [1, 10]}
search = GridSearchCV(model, grid, cv=cv, n_jobs=1, scoring=my_scorer)

In [19]:
search.fit(X, y)

Traceback (most recent call last):
  File "/home/adrin/miniforge3/envs/talks/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 982, in _score
    scores = scorer(estimator, X_test, y_test, **score_params)
  File "/home/adrin/miniforge3/envs/talks/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 253, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true, **_kwargs)
  File "/home/adrin/miniforge3/envs/talks/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 350, in _score
    return self._sign * self._score_func(y_true, y_pred, **scoring_kwargs)
  File "/tmp/ipykernel_2857/3865217053.py", line 4, in my_score
    assert sample_weight is not None, "I LOVE sample weights!"
AssertionError: I LOVE sample weights!

Traceback (most recent call last):
  File "/home/adrin/miniforge3/envs/talks/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 982, in _score
    scores = scorer(estimator, X_test,

In [20]:
# sample weight?
search.fit(X, y, sample_weight=sample_weight)

Traceback (most recent call last):
  File "/home/adrin/miniforge3/envs/talks/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 982, in _score
    scores = scorer(estimator, X_test, y_test, **score_params)
  File "/home/adrin/miniforge3/envs/talks/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 253, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true, **_kwargs)
  File "/home/adrin/miniforge3/envs/talks/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 350, in _score
    return self._sign * self._score_func(y_true, y_pred, **scoring_kwargs)
  File "/tmp/ipykernel_2857/3865217053.py", line 4, in my_score
    assert sample_weight is not None, "I LOVE sample weights!"
AssertionError: I LOVE sample weights!

Traceback (most recent call last):
  File "/home/adrin/miniforge3/envs/talks/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 982, in _score
    scores = scorer(estimator, X_test,

[StackOverflow Q: sklearn GridSearchCV not using sample_weight in score function](https://stackoverflow.com/questions/49581104/sklearn-gridsearchcv-not-using-sample-weight-in-score-function/49598597#49598597)

## Summary
### `sample_weight`

In [None]:
cv = KFold(n_splits=5)
model = make_pipeline(
    StandardScaler(),
    LogisticRegression()
)
grid = {"logisticregression__C": [1, 10]}
search = GridSearchCV(model, grid, cv=cv, n_jobs=1)

search.fit(
    X, y,
    logisticregression__sample_weight=sample_weight,
    standardscaler__sample_weight=sample_weight
)

### `groups`

In [None]:
inner_cv = GroupKFold(n_splits=5)
model = LogisticRegression()
grid = {"C": [1, 10]}
search = GridSearchCV(model, grid, cv=inner_cv, n_jobs=1)

outer_cv = GroupKFold(n_splits=5)
outer_eval = cross_validate(
    search,
    X,
    y=y,
    groups=groups,
    cv=outer_cv,
    n_jobs=1,
)

### Custom Scorer and `sample_weight`

In [None]:
from sklearn.metrics import fbeta_score, make_scorer

def my_score(y_true, y_pred, sample_weight=None):
    assert sample_weight is not None, "I LOVE sample weights!"
    return fbeta_score(
        y_true=y_true,
        y_pred=y_pred,
        sample_weight=sample_weight,
        beta=2 # recall twice as important
    )

my_scorer = make_scorer(my_score)

cv = KFold(n_splits=5)
model = LogisticRegression()
grid = {"C": [1, 10]}
search = GridSearchCV(model, grid, cv=cv, n_jobs=1, scoring=my_scorer)

search.fit(X, y, sample_weight=sample_weight)

## Metadata Routing
- Metadata: `sample_weight`, `groups`, etc
- Consumer: `LogisticRegression`, scorers, CVs, etc
- Router: `GridSearchCV`, `Pipeline`, `cross_validate`

In [21]:
import sklearn
sklearn.set_config(enable_metadata_routing=True)

### `sample_weight`

In [23]:
cv = KFold(n_splits=5)
model = make_pipeline(
    StandardScaler().set_fit_request(sample_weight=True),
    LogisticRegression().set_fit_request(
        sample_weight=True
    ).set_score_request(
        sample_weight=True
    ),
)
grid = {"logisticregression__C": [1, 10]}
search = GridSearchCV(model, grid, cv=cv, n_jobs=1)

outer_eval = cross_validate(
    search,
    X,
    y=y,
    cv=cv,
    n_jobs=1,
    params={"sample_weight": sample_weight}
)
print(outer_eval["test_score"])

[0.60696754 0.27429827 0.31338554 0.46093804 0.38876022]


### `groups`

In [25]:
inner_cv = GroupKFold(n_splits=5)
model = LogisticRegression()
grid = {"C": [1, 10]}
search = GridSearchCV(model, grid, cv=inner_cv, n_jobs=1)

outer_cv = GroupKFold(n_splits=5)
# groups?
outer_eval = cross_validate(
    search,
    X,
    y=y,
    cv=outer_cv,
    n_jobs=1,
    params={
        "groups": groups
    }
)
print(outer_eval["test_score"])

[0.45       0.35294118 0.31818182 0.52380952 0.4       ]


### Custom Scorer and `sample_weight`

In [26]:
from sklearn.metrics import fbeta_score, make_scorer

def my_score(y_true, y_pred, sample_weight=None):
    assert sample_weight is not None, "I LOVE sample weights!"
    return fbeta_score(
        y_true=y_true,
        y_pred=y_pred,
        sample_weight=sample_weight,
        beta=2 # recall twice as important
    )

my_scorer = make_scorer(my_score).set_score_request(sample_weight=True)

cv = KFold(n_splits=5)
model = LogisticRegression().set_fit_request(sample_weight=True)
grid = {"C": [1, 10]}
search = GridSearchCV(model, grid, cv=cv, n_jobs=1, scoring=my_scorer)

search.fit(X, y, sample_weight=sample_weight)

## Scorer consuming revenue

In [27]:
def my_score(y_true, y_pred, revenue=None):
    assert revenue is not None, "Where's my data?!!!"
    revenue = revenue / np.sum(revenue)
    score = np.sum(
        np.abs(y_true - y_pred) * revenue
    ) / len(y_true)
    return score

my_scorer = make_scorer(
    my_score, greater_is_better=False
).set_score_request(revenue=True)

cv = KFold(n_splits=5)
model = LogisticRegression()
grid = {"C": [1, 10]}
search = GridSearchCV(model, grid, cv=cv, n_jobs=1, scoring=my_scorer)

search.fit(X, y, revenue=revenue)

### All Metadata Together

In [28]:
def my_score(y_true, y_pred, revenue=None):
    assert revenue is not None, "Where's my data?!!!"
    revenue = revenue / np.sum(revenue)
    score = np.sum(
        np.abs(y_true - y_pred) * revenue
    ) / len(y_true)
    return score

my_scorer = make_scorer(
    my_score, greater_is_better=False
).set_score_request(revenue=True)


inner_cv = GroupKFold(n_splits=5)
model = make_pipeline(
    StandardScaler().set_fit_request(sample_weight=True),
    LogisticRegression().set_fit_request(sample_weight=True),
)
grid = {"logisticregression__C": [1, 10]}
search = GridSearchCV(model, grid, cv=cv, n_jobs=1, scoring=my_scorer)

outer_cv = GroupKFold(n_splits=5)
outer_eval = cross_validate(
    search,
    X,
    y=y,
    cv=outer_cv,
    n_jobs=1,
    params={
        "groups": groups,
        "sample_weight": sample_weight,
        "revenue": revenue
    }
)
print(outer_eval["test_score"])

[-0.02935423 -0.03429426 -0.02871404 -0.0211812  -0.03127314]


### `get_{method}_request`
[Dynamically generated methods with a non-generic signature @EuroPython 2023](https://www.youtube.com/watch?v=1rf6HI-pYq8)