TunedThreasholdClassifierCV failing inside a SearchCV object #29019

adrinjalali · 2024-05-14T14:06:28Z

I changed the existing example slightly, to put the estimator inside the SearchCV instead of tuning after the search. Here's the reproducer:

# %%
from sklearn.datasets import fetch_openml

# %%
credit_card = fetch_openml(data_id=1597, as_frame=True, parser="pandas")
credit_card.frame.info()

# %%
columns_to_drop = ["Class"]
data = credit_card.frame.drop(columns=columns_to_drop)
target = credit_card.frame["Class"].astype(int)

# %%
def business_metric(y_true, y_pred, amount):
    mask_true_positive = (y_true == 1) & (y_pred == 1)
    mask_true_negative = (y_true == 0) & (y_pred == 0)
    mask_false_positive = (y_true == 0) & (y_pred == 1)
    mask_false_negative = (y_true == 1) & (y_pred == 0)
    fraudulent_refuse = (mask_true_positive.sum() * 50) + amount[
        mask_true_positive
    ].sum()
    fraudulent_accept = -amount[mask_false_negative].sum()
    legitimate_refuse = mask_false_positive.sum() * -5
    legitimate_accept = (amount[mask_true_negative] * 0.02).sum()
    return fraudulent_refuse + fraudulent_accept + legitimate_refuse + legitimate_accept


# %%
import sklearn
from sklearn.metrics import make_scorer

sklearn.set_config(enable_metadata_routing=True)
business_scorer = make_scorer(business_metric).set_score_request(amount=True)

# %%
amount = credit_card.frame["Amount"].to_numpy()

# %%
from sklearn.model_selection import train_test_split

data_train, data_test, target_train, target_test, amount_train, amount_test = (
    train_test_split(
        data, target, amount, stratify=target, test_size=0.5, random_state=42
    )
)

# %%
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import TunedThresholdClassifierCV

logistic_regression = make_pipeline(StandardScaler(), LogisticRegression())

tuned_model = TunedThresholdClassifierCV(
    estimator=logistic_regression,
    scoring=business_scorer,
    thresholds=100,
    n_jobs=2,
)

param_grid = {"estimator__logisticregression__C": np.logspace(-6, 6, 13)}
model = GridSearchCV(tuned_model, param_grid, scoring=business_scorer).fit(
    data_train, target_train, amount=amount_train
)

It gives:

AttributeError: 'TunedThresholdClassifierCV' object has no attribute '_response_method'. Did you mean: 'response_method'?

This solves the issue temporarily:

tuned_model._response_method = "predict_proba"

cc @glemaitre @jeremiedbb

I'd argue this needs to be fixed for the release.

glemaitre · 2024-05-14T16:45:56Z

I open #29021 where I replaced setting a private _response_method in fit by a private method _get_response_method that can be called before fit to handle the metadata routing.

adrinjalali added this to the 1.5 milestone May 14, 2024

github-actions bot added the Needs Triage Issue requires triage label May 14, 2024

adrinjalali added Bug Blocker and removed Needs Triage Issue requires triage labels May 14, 2024

OmarManzoor mentioned this issue May 15, 2024

TST check compatibility with metadata routing for *ThresholdClassifier* #29021

Merged

OmarManzoor closed this as completed in #29021 May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TunedThreasholdClassifierCV failing inside a SearchCV object #29019

TunedThreasholdClassifierCV failing inside a SearchCV object #29019

adrinjalali commented May 14, 2024

glemaitre commented May 14, 2024

TunedThreasholdClassifierCV failing inside a SearchCV object #29019

TunedThreasholdClassifierCV failing inside a SearchCV object #29019

Comments

adrinjalali commented May 14, 2024

glemaitre commented May 14, 2024