Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TREE] add interaction constraints #3466

Merged
merged 15 commits into from Sep 4, 2018
Merged

[TREE] add interaction constraints #3466

merged 15 commits into from Sep 4, 2018

Conversation

BlueTea88
Copy link
Contributor

@BlueTea88 BlueTea88 commented Jul 12, 2018

Add interaction constraints as mentioned in #3135 and as previously tried in #3448 before I stuffed up the commits while trying to rebase. At the moment both interaction and monotonic constraints cannot be applied at the same time. But will be able to add in this functionality once #3429 goes through.

Closes #3135

@hcho3
Copy link
Collaborator

hcho3 commented Aug 16, 2018

@BlueTea88 Can you write some tests in R? See R-package/tests/testthat directory for examples. Let me know if you need help with this.

Copy link
Collaborator

@hcho3 hcho3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall. Couple things:

  • I'm wondering if there is more efficient way to pass interaction constraints to C++. Instead of 0's and 1's, we can pass the string in the form [[0, 1, 2], [3, 4]] and have the DMLC JSON parser parse it to std::vector<std::vector<int>>.
  • A future PR should expose this functionality to Python.

src/tree/split_evaluator.cc Outdated Show resolved Hide resolved
src/tree/split_evaluator.cc Outdated Show resolved Hide resolved
src/tree/split_evaluator.cc Outdated Show resolved Hide resolved
@hcho3
Copy link
Collaborator

hcho3 commented Aug 17, 2018

Here is a sample code for accepting [[1,2],[3,4,5]] as input:

#include <dmlc/json.h>
#include <string>
#include <vector>
#include <sstream>

int main(void) {
  std::vector<std::vector<int>> int_constraint;
  std::istringstream iss("[[1,2],[3,4,5]]");

  dmlc::JSONReader reader(&iss);
  reader.Read(&int_constraint);

  for (const auto& constraint : int_constraint) {
    std::cout << "Constraint group: ";
    for (int e : constraint) {
      std::cout << e << ",";
    }   
    std::cout << std::endl;
  }
  return 0;
}

@BlueTea88
Copy link
Contributor Author

@hcho3 OK, will try to add the things you mentioned

@BlueTea88
Copy link
Contributor Author

@hcho3 I added a test in R and made changes based on your comments. For the [[1,2],[3,4,5]] method of input, I think it is better and it sounds it would work. But I don't have much C++ experience at the moment so I'm not exactly sure how to implement it. Would new std::ostream and std::istream methods need to be added in param.h? (Sorry, I usually just code in R so it could take me a while if I try to do this part.)

@codecov-io
Copy link

codecov-io commented Aug 19, 2018

Codecov Report

Merging #3466 into master will increase coverage by 4.4%.
The diff coverage is 22.47%.

Impacted file tree graph

@@             Coverage Diff             @@
##             master    #3466     +/-   ##
===========================================
+ Coverage     45.69%   50.09%   +4.4%     
  Complexity      188      188             
===========================================
  Files           166      172      +6     
  Lines         13168    14015    +847     
  Branches        445      457     +12     
===========================================
+ Hits           6017     7021   +1004     
+ Misses         6947     6769    -178     
- Partials        204      225     +21
Impacted Files Coverage Δ Complexity Δ
include/xgboost/tree_model.h 33.66% <ø> (ø) 0 <0> (ø) ⬇️
src/tree/param.h 72.86% <100%> (+1.09%) 0 <0> (ø) ⬇️
src/tree/split_evaluator.cc 19.4% <15.85%> (-1.89%) 0 <0> (ø)
src/data/simple_csr_source.cc 85.91% <0%> (-12.68%) 0% <0%> (ø)
src/predictor/cpu_predictor.cc 64.93% <0%> (-3.56%) 0% <0%> (ø)
src/common/host_device_vector.cc 54.16% <0%> (-2.36%) 0% <0%> (ø)
python-package/xgboost/core.py 82.43% <0%> (-1.74%) 0% <0%> (ø)
tests/cpp/data/test_metainfo.cc 95.83% <0%> (-1.31%) 0% <0%> (ø)
src/learner.cc 29.04% <0%> (-0.52%) 0% <0%> (ø)
src/c_api/c_api.cc 23.63% <0%> (-0.33%) 0% <0%> (ø)
... and 55 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a13e29e...1138c48. Read the comment docs.

@hcho3
Copy link
Collaborator

hcho3 commented Aug 19, 2018

@BlueTea88 I can help you out with C++ part. Let me review your R test and get back to you.

R-package/R/utils.R Outdated Show resolved Hide resolved
src/tree/param.h Outdated Show resolved Hide resolved
R-package/demo/interaction_constraints.R Outdated Show resolved Hide resolved
@hcho3
Copy link
Collaborator

hcho3 commented Aug 22, 2018

@BlueTea88 Can you enable "Allow edits by maintainers"? I have a new commit that adds dmlc::JSONReader to the pull request.

@BlueTea88
Copy link
Contributor Author

@hcho3 Should be allowed now. Let me know if this still doesn't work.

@hcho3
Copy link
Collaborator

hcho3 commented Aug 22, 2018

@BlueTea88 It still doesn't work. Can you look at it again?

@BlueTea88
Copy link
Contributor Author

@hcho3 I have the "allow edits by maintainers" box checked. So i'm not sure why its not working. I've also sent you an invite to be a collaborator in case that can give you more access.

include/xgboost/tree_model.h Outdated Show resolved Hide resolved
@hcho3
Copy link
Collaborator

hcho3 commented Aug 23, 2018

@BlueTea88 Thanks, I was able to push my commit to your repo. The commit adds dmlc::JSONParser to understand constraints in form of [[0,1], [2,3,4]].

@BlueTea88
Copy link
Contributor Author

OK Thanks. I will make changes to the other parts.

@BlueTea88
Copy link
Contributor Author

@khotilov @hcho3 I have updated the R argument so interaction_constraints expects a list of feature IDs instead of column names (so now arguments are no longer dependent on column names being available). Also updated the R example to be more in line with the R coding style. Thanks @hcho3 for the changes to the C++ code.

@hcho3
Copy link
Collaborator

hcho3 commented Aug 30, 2018

@BlueTea88 @khotilov I think this PR is good to merge, once all lint errors are fixed

src/tree/split_evaluator.cc Outdated Show resolved Hide resolved
src/tree/split_evaluator.cc Outdated Show resolved Hide resolved
src/tree/split_evaluator.cc Outdated Show resolved Hide resolved
@hcho3
Copy link
Collaborator

hcho3 commented Aug 31, 2018

@BlueTea88 @khotilov I added a tutorial for feature interaction constraints. Let me know if the explanation makes sense.

Rendered doc: rendered_tutorial.zip

@BlueTea88
Copy link
Contributor Author

@hcho3 I think it makes sense. I like the graphs as well. Just the last line:
To use monotonic constraints, be sure to set the tree_method parameter to either exact or hist.
Is that supposed to be interaction constraints?

@hcho3
Copy link
Collaborator

hcho3 commented Aug 31, 2018

@BlueTea88 Yes, let me fix the typo

Copy link
Member

@khotilov khotilov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My last review pass, I promise :)

Nice tutorial @hcho3

CONTRIBUTORS.md Outdated Show resolved Hide resolved
R-package/R/utils.R Outdated Show resolved Hide resolved
doc/tutorials/feature_interaction_constraint.rst Outdated Show resolved Hide resolved
src/tree/split_evaluator.cc Outdated Show resolved Hide resolved
src/tree/split_evaluator.cc Outdated Show resolved Hide resolved
@khotilov
Copy link
Member

khotilov commented Sep 4, 2018

Looks good to go!
Thanks @BlueTea88 @hcho3

@hcho3 hcho3 merged commit 9254c58 into dmlc:master Sep 4, 2018
CodingCat pushed a commit to CodingCat/xgboost that referenced this pull request Sep 18, 2018
* add interaction constraints

* enable both interaction and monotonic constraints at the same time

* fix lint

* add R test, fix lint, update demo

* Use dmlc::JSONReader to express interaction constraints as nested lists; Use sparse arrays for bookkeeping

* Add Python test for interaction constraints

* make R interaction constraints parameter based on feature index instead of column names, fix R coding style

* Fix lint

* Add BlueTea88 to CONTRIBUTORS.md

* Short circuit when no constraint is specified; address review comments

* Add tutorial for feature interaction constraints

* allow interaction constraints to be passed as string, remove redundant column_names argument

* Fix typo

* Address review comments

* Add comments to Python test
@lock lock bot locked as resolved and limited conversation to collaborators Dec 3, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants