GreedyRuleListClassifier has wildly varying performance and sometimes crashes #145

davidefiocco · 2022-12-05T15:39:08Z

When running a certain number of experiments with different splits of a given dataset, I see that GreedyRuleListClassifier's accuracy wildly varies, and sometimes the code (see for loop below) crashes.

So, for example running 10 experiments like this, with different random splits of the same set:

import pandas
import sklearn
import sklearn.datasets
from sklearn.model_selection import train_test_split

from imodels import GreedyRuleListClassifier

X, Y = sklearn.datasets.load_breast_cancer(as_frame=True, return_X_y=True)

model = GreedyRuleListClassifier(max_depth=10)

for i in range(10):
  try:
    X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.3)
    model.fit(X_train, y_train, feature_names=X_train.columns)
    y_pred = model.predict(X_test)
    from sklearn.metrics import accuracy_score
    score = accuracy_score(y_test.values,y_pred)
    print('Accuracy:\n', score)
  except KeyError as e:
    print("Failed with KeyError")

Will give as output something along the lines of

Accuracy: 0.6081871345029239
Failed with KeyError
Accuracy: 0.4619883040935672
Accuracy: 0.45614035087719296
Accuracy: 0.2222222222222222
Failed with KeyError
Failed with KeyError
Failed with KeyError
Accuracy: 0.18128654970760233
Failed with KeyError

Is this intended behavior? While my test dataset is smallish, the variation in accuracy is still surprising for me and so is the throwing of a KeyError. I'm using scikit-learn==1.0.2 and imodels=1.3.6 and can edit the issue here to add more details.

Incidentally, the same behaviour was observed in https://datascience.stackexchange.com/a/116283/50519, noticed by @jonnor.

Thanks!

The text was updated successfully, but these errors were encountered:

csinva · 2022-12-05T18:59:41Z

Thanks for raising this issue! Will look into it shortly...

csinva · 2022-12-06T20:17:09Z

Hi @davidefiocco, just looked into it. I fixed the KeyError issue and just pushed/bumped the imodels version, so if you upgrade with pip install --upgrade and rerun you should no longer get that error. Sorry about that...we haven't been maintaining this model well over time.

The accuracy does indeed fluctuate quite a lot for this dataset....GRL is a good algorithm when you are trying to identify a clear subgroup that has high probability of being in a single class, but does poorly with finding interactions since it only ever identifies samples from class 1 and the remaining samples after all rules are predicted as class 0.

If you want to look into it farther, you can visualize some of the models and see how they are overfitting (just need to add the line model._print_list().

davidefiocco · 2022-12-06T20:30:46Z

Thanks so much @csinva and of course absolutely no worries and kudos for your great work on imodels!
Thanks for tips as well!

davidefiocco · 2023-03-13T08:31:53Z

The performance of the model is not "wildly varying" anymore after @mcschmitz fix of the behavior in #167, released with 1.3.17 (@csinva FYI!).

Accuracy:
0.9005847953216374
Accuracy:
0.9064327485380117
Accuracy:
0.8947368421052632
Accuracy:
0.9181286549707602
Accuracy:
0.8830409356725146
Accuracy:
0.8947368421052632
Accuracy:
0.8888888888888888
Accuracy:
0.9122807017543859
Accuracy:
0.8947368421052632
Accuracy:
0.8713450292397661

davidefiocco closed this as completed Dec 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GreedyRuleListClassifier has wildly varying performance and sometimes crashes #145

GreedyRuleListClassifier has wildly varying performance and sometimes crashes #145

davidefiocco commented Dec 5, 2022 •

edited

csinva commented Dec 5, 2022

csinva commented Dec 6, 2022

davidefiocco commented Dec 6, 2022

davidefiocco commented Mar 13, 2023 •

edited

GreedyRuleListClassifier has wildly varying performance and sometimes crashes #145

GreedyRuleListClassifier has wildly varying performance and sometimes crashes #145

Comments

davidefiocco commented Dec 5, 2022 • edited

csinva commented Dec 5, 2022

csinva commented Dec 6, 2022

davidefiocco commented Dec 6, 2022

davidefiocco commented Mar 13, 2023 • edited

davidefiocco commented Dec 5, 2022 •

edited

davidefiocco commented Mar 13, 2023 •

edited