FeatureBinarizerFromTrees #77

floidgilbert · 2020-03-10T19:25:03Z

Proposing a transformer compatible with FeatureBinarizer. The new transformer, FeatureBinarizerFromTrees, significantly shortens training times and often results in simpler rule sets. Please see examples/rbm/feature_binarizer_from_trees.ipynb for an overview and a formal performance comparison.

A test module is included. Detailed parameter information is available in the doc strings.

Thank you for sharing AIX360.

dennislwei · 2020-03-10T23:10:35Z

@floidgilbert This is really great. Thanks for contributing! The notebook examples/rbm/feature_binarizer_from_trees.ipynb is pretty compelling.

Just to confirm, the features returned by FeatureBinarizerFromTrees are all of the form [feature] [operation] [value], e.g. age >= 50, like FeatureBinarizer, right? In other words, it doesn't create interactions between two or more original features, leaving that to BooleanRuleCG, LogisticRuleRegression, etc. (Although such interactions are happening within the decision trees that FeatureBinarizerFromTrees uses.)

@vijay-arya Does the test module tests/rbm/test_Feature_Binarizer_From_Trees.py look comparable to the tests for existing algorithms?

floidgilbert · 2020-03-11T00:11:11Z

@dennislwei The transformer does not produce interactions. Every split in the tree is considered an independent feature. FeatureBinarizerFromTrees attempts to maintain compatibility with FeatureBinarizer in almost every case. For example, public members like maps, enc, etc. are all included. There are only three practical compatibility differences.

FeatureBinarizerFromTrees does not accept missing values. Because it fits a scikit-learn decision tree, missing values must be imputed. It's difficult to generalize an appropriate method for imputation, so the user must do it themselves beforehand.
FeatureBinarizerFromTrees populates its ordinal member even when returnOrd=False. This is a matter of preference and convenience. It's nice to have the list of ordinal features available, but it could be changed if necessary.
FeatureBinarizerFromTrees does not convert categorical feature values to strings in the transformed data frame's multi-index unless the user sets threshStr=True. There are a few reasons for this which I can explain if it's a problem. Perhaps I should rename the threshStr parameter to something more inclusive... I just took the name from FeatureBinarizer.

� Conflicts: � aix360/algorithms/rbm/features.py

vijay-arya

Thanks @floidgilbert @dennislwei

FeatureBinarizerFromTrees

d262bdd

dennislwei requested a review from vijay-arya March 10, 2020 23:55

floidgilbert added 3 commits March 11, 2020 12:53

Merge branch 'master' of https://github.com/IBM/AIX360 into fbt

1578963

� Conflicts: � aix360/algorithms/rbm/features.py

minor update to feature_binarizer_from_trees.ipynb

ca927ee

another minor update to feature_binarizer_from_trees.ipynb

0be6808

vijay-arya approved these changes Mar 24, 2020

View reviewed changes

vijay-arya merged commit 5ef5213 into Trusted-AI:master Mar 24, 2020

dennislwei mentioned this pull request Jul 1, 2020

Regarding memory exhaustion with BRCG #94

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FeatureBinarizerFromTrees #77

FeatureBinarizerFromTrees #77

floidgilbert commented Mar 10, 2020

dennislwei commented Mar 10, 2020

floidgilbert commented Mar 11, 2020

vijay-arya left a comment

FeatureBinarizerFromTrees #77

FeatureBinarizerFromTrees #77

Conversation

floidgilbert commented Mar 10, 2020

dennislwei commented Mar 10, 2020

floidgilbert commented Mar 11, 2020

vijay-arya left a comment

Choose a reason for hiding this comment