-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xgbfi C-API #2846
xgbfi C-API #2846
Conversation
Far0n
commented
Oct 30, 2017
- C++ implementation of xgbfi
- XGBoosterGetFeatureInteractions C-API
a4a6c23
to
a52787c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks really good in general. I wondered if it would be better to try to use the existing tree model classes but I think it's fine to define your own so long as they are encapsulated within the .cc file as per my comment.
I would like to see some form of unit test.
Do you have plans to add any Python/R bindings to this functionality? This is not necessary for this PR but just curious.
Also I am interested in the choice to return the interactions as a vector of strings. What are the advantages of doing it this way instead of for example returning pairs of integer indices.
src/c_api/c_api.cc
Outdated
xgbfi::XgbModel model = xgbfi::XgbModelParser::GetXgbModelFromDump(dump, ntrees); | ||
xgbfi::FeatureInteractions fi = model.GetFeatureInteractions(max_fi_depth, | ||
max_tree_depth, | ||
max_deepening); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you don't need all of these classes here. Is it possible to have one single function (not method) GetFeatureInteractions() here? That way you do not need to define these classes in xgbfi.h, you can instead define them in the .cc file. This is cleaner as these classes are only specific to xgbfi and are not needed in other code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, will do.
Thank you for the timely review @RAMitchell! I'm wanted to ship unit tests along with the Python bindings in a 2nd commit to this PR. I do not plan to add R bindings here. A feature interaction has the structure of
One reason for strings is, that I want to avoid feature id ->feature names mapping in python and that this csv-like format is easily converted into a pandas DataFrame. |
7c27386
to
2fbadd0
Compare
Codecov Report
@@ Coverage Diff @@
## master #2846 +/- ##
============================================
- Coverage 43.08% 42.32% -0.77%
Complexity 200 200
============================================
Files 151 152 +1
Lines 11518 11767 +249
Branches 1167 1191 +24
============================================
+ Hits 4963 4980 +17
- Misses 6225 6457 +232
Partials 330 330
Continue to review full report at Codecov.
|
b1dc3a1
to
0e26ea2
Compare
@RAMitchell Could you please review the changes. I don't know what to do with the following lint-error, because I think the method get_feature_interactions is at the right place:
Thank you! |
d626d90
to
ace0b82
Compare
Looks good. My inclination would be to disable the lint-error. I agree with you that the method is in the correct place. For the R build you probably need to add the new c++ file to the amalgamation. One of the Python failures is also due to flake8 warnings so also address these. @khotilov can I get a second review? |
I'll try to take a better look this weekend, but so far I've had just under an hour to look through the code and to write some notes and questions. @Far0n My first question is about understanding your intentions better: do you see this addition as sort of a "plugin" to xgboost that you would like to have here, and you would like to keep it as similar as reasonably possible in structure and function to your original tool (which you want to keep developing)? Or would you prefer this to be absorbed/integrated into xgboost? I'm asking because I see that your code is currently very loosely coupled with the core (just by passing a text dump of a model).
Again, those are just my observations so far, which are more for a discussion than for action. |
For obv reasons, I don't like this dump parsing as well, but my journey starting within c_api hit a private wall in booster -> learner. So how to get the model without changing core code? I like the idea to keep it loosely coupled, because I have high hopes regarding sth like treelite to provide it for any tree based model. GBTreeModel seems also very optimzed for internal use. |
Yes, the tree model is well hidden, for the reason of having the same interface in all GradientBoosters. However, we can change the core code for a good cause. Doing various tree model analysis tasks is a good cause. I did it previously for my own hacking experiments, but we'll think about some clean option. The treelite is more for deployment than analysis. There's dmlc/treelite/issues/5 about a possibility of doing some transformations though... Maybe @hcho3 can comment. |
40ac241
to
3254874
Compare
@Far0n Yes, treelite is currently geared toward deployment. I am actually debating whether there should be a separate project for model analysis and transformation. The workflow would be:
The original plan was to have transformation step as part of treelite, but it may be beneficial to have a separate project for transformation and analysis. Let me get back to you on this decision. |
Also, it would be nice if we can set a time to chat about what kind of model analysis you'd like to do. Can we schedule a time for a Google hangout? |
@RAMitchell That's great! I really want to come now. I'll let you know by end of this week whether I'd be able to attend. |
make it happen @hcho3 :) |
@Far0n I am able to attend h2oworld. I will see you next month. |
awesome @hcho3, looking forward to! |
So, what was the outcome of you all meeting in December? |
Thank you @hcho3. I'm looking forward to it. Take your time though. :) |