MDL Rule Lists for prediction and data mining

The most recent version of MDL rule lists for classification, regression and subgroup discovery can be found on the https://github.com/HMProenca/RuleList repository or in PyPi https://pypi.org/project/rulelist/ and installed using:

pip install rulelist

The new version offers several improvements in terms of algorithmic performance and theory.

Algorithmic performance improvements:

It accepts numeric and categorical explanatory variables.
It accepts numeric or categorical target variables.
It accepts single or multiple target variables, which translates into: classification; regression; multi-target classification; multi-target regression.
It uses a beam-search for searching the candidates which makes it several orders of magnitude faster.
It can perform subgroup list discovery.

Theory improvements:

Uses the Normalized Maximum Likelihood (NML) to encode categorical variables, instead of the prequential plug-in code.
Uses Bayesian Gaussian code for numeric targets.

Example of usage:

import pandas as pd
from rulelist import RuleList
from sklearn import datasets
from sklearn.model_selection import train_test_split

task = 'prediction'
target_model = 'categorical'

data = datasets.load_breast_cancer()
Y = pd.Series(data.target)
X = pd.DataFrame(data.data)

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.3)

model = RuleList(task = task, target_model = target_model)

model.fit(X_train, y_train)

y_pred = model.predict(X_test)
from sklearn.metrics import accuracy_score
accuracy_score(y_test.values,y_pred)

print(model)

Contact

If there are any questions or issues, please contact me by mail at hugo.manuel.proenca@gmail.com or open an issue here on Github.

Citation

In a machine learning (prediction) context for problems of classification, regression, multi-label classification, multi-category classification, or multivariate regression cite the corresponding bibtex of the first classification application of MDL rule lists:

@article{proencca2020interpretable,
  title={Interpretable multiclass classification by MDL-based rule lists},
  author={Proen{\c{c}}a, Hugo M and van Leeuwen, Matthijs},
  journal={Information Sciences},
  volume={512},
  pages={1372--1393},
  year={2020},
  publisher={Elsevier}
}

in the context of data mining and subgroup discovery please refer to subgroup lists:

@article{proencca2020discovering,
  title={Discovering outstanding subgroup lists for numeric targets using MDL},
  author={Proen{\c{c}}a, Hugo M and Gr{\"u}nwald, Peter and B{\"a}ck, Thomas and van Leeuwen, Matthijs},
  journal={arXiv preprint arXiv:2006.09186},
  year={2020}
}

References

Interpretable multiclass classification by MDL-based rule lists. Hugo M. Proença, Matthijs van Leeuwen. Information Sciences 512 (2020): 1372-1393. or publicly available in ArXiv -- experiments code (old version) available here
Discovering outstanding subgroup lists for numeric targets using MDL. Hugo M. Proença,Thomas Bäck, Matthijs van Leeuwen. ECML-PKDD(2020): -- experiments code available here

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
LICENSE		LICENSE
README.md		README.md
mdl_rulelists.py		mdl_rulelists.py
runpaperexperiments.py		runpaperexperiments.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MDL Rule Lists for prediction and data mining

Example of usage:

Contact

Citation

References

About

Releases 1

Packages

Languages

License

HMProenca/MDLRuleLists

Folders and files

Latest commit

History

Repository files navigation

MDL Rule Lists for prediction and data mining

Example of usage:

Contact

Citation

References

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages