Bugfixes and change to default behavior of get_rules #18
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I fixed two bugs I ran into:
If one of the feature columns has a constant feature, then its standard deviation will be zero, and the friedman scaling done will have a divide by 0 error. I added a small constant to prevent this division by zero.
If Cs is passed to RuleFit when instantiating the object, it won't be passed properly to the LogisticRegression subroutine -- it should be self.Cs, instead of Cs.
I also ran into quirky behavior with get_rules versus transform. When transforming, I would get out a matrix with 116 columns, corresponding to 116 transformed features. When inspecting the rules with the output of get_rules, the total number of rules would only be 115. This was pretty frustrating, but I tracked down the source of the issue to be that when exclude_zero_coef was set to True, one of the rules was being eliminated. I think that the behavior between get_rules and transform should be identical -- either the variables with zero coefficient are eliminated from both, or neither. So this change at least makes the two consistent.