Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 

MDL Rule Lists for prediction and data mining

The most recent version of MDL rule lists for classification, regression and subgroup discovery can be found on the https://github.com/HMProenca/RuleList repository or in PyPi https://pypi.org/project/rulelist/ and installed using:

pip install rulelist

The new version offers several improvements in terms of algorithmic performance and theory.

Algorithmic performance improvements:

  • It accepts numeric and categorical explanatory variables.
  • It accepts numeric or categorical target variables.
  • It accepts single or multiple target variables, which translates into: classification; regression; multi-target classification; multi-target regression.
  • It uses a beam-search for searching the candidates which makes it several orders of magnitude faster.
  • It can perform subgroup list discovery.

Theory improvements:

  • Uses the Normalized Maximum Likelihood (NML) to encode categorical variables, instead of the prequential plug-in code.
  • Uses Bayesian Gaussian code for numeric targets.

Example of usage:

import pandas as pd
from rulelist import RuleList
from sklearn import datasets
from sklearn.model_selection import train_test_split

task = 'prediction'
target_model = 'categorical'

data = datasets.load_breast_cancer()
Y = pd.Series(data.target)
X = pd.DataFrame(data.data)

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.3)

model = RuleList(task = task, target_model = target_model)

model.fit(X_train, y_train)

y_pred = model.predict(X_test)
from sklearn.metrics import accuracy_score
accuracy_score(y_test.values,y_pred)

print(model)

Contact

If there are any questions or issues, please contact me by mail at hugo.manuel.proenca@gmail.com or open an issue here on Github.

Citation

In a machine learning (prediction) context for problems of classification, regression, multi-label classification, multi-category classification, or multivariate regression cite the corresponding bibtex of the first classification application of MDL rule lists:

@article{proencca2020interpretable,
  title={Interpretable multiclass classification by MDL-based rule lists},
  author={Proen{\c{c}}a, Hugo M and van Leeuwen, Matthijs},
  journal={Information Sciences},
  volume={512},
  pages={1372--1393},
  year={2020},
  publisher={Elsevier}
}

in the context of data mining and subgroup discovery please refer to subgroup lists:

@article{proencca2020discovering,
  title={Discovering outstanding subgroup lists for numeric targets using MDL},
  author={Proen{\c{c}}a, Hugo M and Gr{\"u}nwald, Peter and B{\"a}ck, Thomas and van Leeuwen, Matthijs},
  journal={arXiv preprint arXiv:2006.09186},
  year={2020}
} 

References

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages