[New Feature] Interaction constraints #3135

BlueTea88 · 2018-02-26T07:06:52Z

Hi, I would like to add interaction constraints functionality to tree building.

Basically, this would constrain the combination of variables in each tree based on constraints specified by the user. Initial variables will still get selected in a greedy fashion (best variable at the time will be chosen) but subsequent variables will be limited to variables that have permitted interactions with the initial variables.

Potential benefits include:

Better predictive performance from focusing on interactions that work - whether through domain specific knowledge or algorithms that rank interactions
Less noise in predictions
More control to the user on what the model can fit (for example, the user may want to exclude some interactions even if they perform well due to regulatory constraints)

The idea is discussed briefly in the paper:
Delta Boosting Machine and its Application in Actuarial Modelling (Antonio et al., 2015)
https://actuaries.asn.au/Library/Events/ASTINAFIRERMColloquium/2015/AntonioEtAlDeltaBoostingPaper.pdf

BlueTea88 · 2018-02-26T07:41:16Z

I have an experimental version in the repo below.
https://github.com/BlueTea88/xgboost/tree/int_cont

If possible, I would love to merge it to this main repository (see #3136).

It has two arguments:

int_constraints_flag - TRUE/FALSE whether to impose interaction constraints
int_constraints_list - permitted interactions specified as a list, where each item of the list is a set of column names that represents one permitted interaction (all column names listed in the set can be interacted with each other)

As an example, please see:
https://github.com/BlueTea88/xgboost/blob/int_cont/R-package/demo/interaction_constraints.R

Limitations

Currently only supports the exact greedy algorithm on multi-core
API only updated for R

cassieqiao · 2018-04-03T20:16:39Z

I followed your example and I assume int_constraints_list = list(c('V1','V2'),c('V3','V4','V5')) means V1 can only interact with V2 and vice versa. Same idea for V3, V4 and V5. But I got results like below. Basically, the constraints are not applied. Can you clarify it? Thanks.

temp.int
[[1]]
[1] "V3" "V5"

[[2]]
[1] "V3" "V4"

[[3]]
[1] "V4" "V5"

[[4]]
[1] "V3" "V4" "V5"

[[5]]
[1] "V1" "V3" "V4" "V5"

[[6]]
[1] "V2" "V3" "V4" "V5"

[[7]]
[1] "V3" "V4" "V5" "V6"

[[8]]
[1] "V4" "V5" "V8"

[[9]]
[1] "V4" "V5" "V7"

[[10]]
[1] "V2" "V4" "V5"

Best,
Cassie

BlueTea88 · 2018-04-03T21:23:59Z

Hi Cassie,

The changes hasn't been merged to this master repo. Just checking whether you tried the code using xgboost from my experimental branch:
https://github.com/BlueTea88/xgboost/tree/int_cont

Cheers,
Andrew

BlueTea88 · 2018-04-03T22:06:18Z

If you are using Windows and you want an easy way to test interaction constraints, you can download the binary file here:
https://github.com/BlueTea88/xgboost/releases/download/v0.70-int/xgboost_0.7.0.zip

And install in R using:

remove.packages('xgboost')
install.packages(file.path(your-file-directory, 'xgboost_0.7.0.zip'), repos=NULL)

hcho3 · 2018-07-04T23:00:07Z

Consolidating to #3439. This issue should be re-opened when you and others decide to actively work on implementing this feature. I look forward to working with you to have feature interaction implemented.

BlueTea88 · 2018-08-16T12:58:06Z

@hcho3 I've submitted a PR #3466 for this. Any chance this issue could be re-opened? Happy to make changes if required.

hcho3 · 2018-08-16T18:24:27Z

@BlueTea88 Sure. Let me know when your pull request is ready for review.

BlueTea88 · 2018-08-16T23:27:28Z

@hcho3 It is ready for review now. Thanks

yanyachen · 2018-09-10T17:36:39Z

Is there any way to constrain the feature not to interact with itself? I didn't find it in the document. Thanks.

hcho3 · 2018-09-10T18:19:12Z

@yanyachen I don't think it's possible yet. Is it something you or others would find useful? If so, why and how?

yanyachen · 2018-09-10T22:52:32Z

I think it's useful in these 2 slightly different scenario:

There is a very strong feature but we think there might be small leakage (risk of overfitting because of that feature), and want to regularize that.
There are a small group of feature set A (e.g. FICO Score) that are significantly important than other less important features set B (e.g. credit utilization, age, education level). One thing to be noted is that, same information that A contains can be also learned from B, it's just features in B are override by A. Therefore sometimes people may want to regularize the model to not relying on feature set A too much through intentionally fitting weaker trees using more weak features.

I think this is useful but also admitted this feature may be not popular to others, or say similar effect can be achieved through feature bootstrapping and then model bagging.

BlueTea88 mentioned this issue Feb 26, 2018

[TREE] Interaction constraints #3136

Closed

BlueTea88 mentioned this issue Apr 3, 2018

Clarification on int_constraints_list #3220

Closed

hcho3 closed this as completed Jul 4, 2018

hcho3 mentioned this issue Jul 4, 2018

Roadmap: feature requests #3439

Open

32 tasks

This was referenced Jul 5, 2018

[TREE] Interaction Constraints again #3448

Closed

[TREE] add interaction constraints #3466

Merged

hcho3 reopened this Aug 16, 2018

hcho3 closed this as completed in #3466 Sep 4, 2018

lock bot locked as resolved and limited conversation to collaborators Dec 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New Feature] Interaction constraints #3135

[New Feature] Interaction constraints #3135

BlueTea88 commented Feb 26, 2018

BlueTea88 commented Feb 26, 2018

cassieqiao commented Apr 3, 2018

BlueTea88 commented Apr 3, 2018

BlueTea88 commented Apr 3, 2018

hcho3 commented Jul 4, 2018 •

edited

BlueTea88 commented Aug 16, 2018

hcho3 commented Aug 16, 2018 •

edited

BlueTea88 commented Aug 16, 2018

yanyachen commented Sep 10, 2018

hcho3 commented Sep 10, 2018

yanyachen commented Sep 10, 2018

[New Feature] Interaction constraints #3135

[New Feature] Interaction constraints #3135

Comments

BlueTea88 commented Feb 26, 2018

BlueTea88 commented Feb 26, 2018

Limitations

cassieqiao commented Apr 3, 2018

BlueTea88 commented Apr 3, 2018

BlueTea88 commented Apr 3, 2018

hcho3 commented Jul 4, 2018 • edited

BlueTea88 commented Aug 16, 2018

hcho3 commented Aug 16, 2018 • edited

BlueTea88 commented Aug 16, 2018

yanyachen commented Sep 10, 2018

hcho3 commented Sep 10, 2018

yanyachen commented Sep 10, 2018

hcho3 commented Jul 4, 2018 •

edited

hcho3 commented Aug 16, 2018 •

edited