Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TunedThreasholdClassifierCV failing inside a SearchCV object #29019

Closed
adrinjalali opened this issue May 14, 2024 · 1 comment · Fixed by #29021
Closed

TunedThreasholdClassifierCV failing inside a SearchCV object #29019

adrinjalali opened this issue May 14, 2024 · 1 comment · Fixed by #29021
Milestone

Comments

@adrinjalali
Copy link
Member

I changed the existing example slightly, to put the estimator inside the SearchCV instead of tuning after the search. Here's the reproducer:

# %%
from sklearn.datasets import fetch_openml

# %%
credit_card = fetch_openml(data_id=1597, as_frame=True, parser="pandas")
credit_card.frame.info()

# %%
columns_to_drop = ["Class"]
data = credit_card.frame.drop(columns=columns_to_drop)
target = credit_card.frame["Class"].astype(int)

# %%
def business_metric(y_true, y_pred, amount):
    mask_true_positive = (y_true == 1) & (y_pred == 1)
    mask_true_negative = (y_true == 0) & (y_pred == 0)
    mask_false_positive = (y_true == 0) & (y_pred == 1)
    mask_false_negative = (y_true == 1) & (y_pred == 0)
    fraudulent_refuse = (mask_true_positive.sum() * 50) + amount[
        mask_true_positive
    ].sum()
    fraudulent_accept = -amount[mask_false_negative].sum()
    legitimate_refuse = mask_false_positive.sum() * -5
    legitimate_accept = (amount[mask_true_negative] * 0.02).sum()
    return fraudulent_refuse + fraudulent_accept + legitimate_refuse + legitimate_accept


# %%
import sklearn
from sklearn.metrics import make_scorer

sklearn.set_config(enable_metadata_routing=True)
business_scorer = make_scorer(business_metric).set_score_request(amount=True)

# %%
amount = credit_card.frame["Amount"].to_numpy()

# %%
from sklearn.model_selection import train_test_split

data_train, data_test, target_train, target_test, amount_train, amount_test = (
    train_test_split(
        data, target, amount, stratify=target, test_size=0.5, random_state=42
    )
)

# %%
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import TunedThresholdClassifierCV

logistic_regression = make_pipeline(StandardScaler(), LogisticRegression())

tuned_model = TunedThresholdClassifierCV(
    estimator=logistic_regression,
    scoring=business_scorer,
    thresholds=100,
    n_jobs=2,
)

param_grid = {"estimator__logisticregression__C": np.logspace(-6, 6, 13)}
model = GridSearchCV(tuned_model, param_grid, scoring=business_scorer).fit(
    data_train, target_train, amount=amount_train
)

It gives:

AttributeError: 'TunedThresholdClassifierCV' object has no attribute '_response_method'. Did you mean: 'response_method'?

This solves the issue temporarily:

tuned_model._response_method = "predict_proba"

cc @glemaitre @jeremiedbb

I'd argue this needs to be fixed for the release.

@adrinjalali adrinjalali added this to the 1.5 milestone May 14, 2024
@github-actions github-actions bot added the Needs Triage Issue requires triage label May 14, 2024
@adrinjalali adrinjalali added Bug Blocker and removed Needs Triage Issue requires triage labels May 14, 2024
@glemaitre
Copy link
Member

I open #29021 where I replaced setting a private _response_method in fit by a private method _get_response_method that can be called before fit to handle the metadata routing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants