-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Monotone_constraints in LightGbm #1651
Comments
I agree this would be a useful addition to ML.NET. We provide a wrapper for LightGBM but this parameter is not exposed. I will file this for our triage team to review and prioritize. |
Here are links from lightGBM for adding monotone constraints: The version of lightGBM we are using in ml.net is 2.2.1.1 -- need to confirm if this version contains the support for monotone_constraints. |
@daholste: Assuming I'm understanding this parameter correctly, this can also help with model stacking. Currently, LightGBM is allowed to map the output of the sub-models in the stack without the constraint that "as the sub-models score increases/decreases, so should the final score". Hence it could map as |
Update the LightGBM and expose the arg. Note if this fixes #1625 as well. |
@glebuk: No need to update LightGBM for Our current LightGBM version is from 3mo ago; the So our current version of LightGBM should work without updating. Work item should be: Expose the monotone_constraints parameter of LightGBM |
Hi @justinormont, @glebuk and @petterton, LightGBM allows for the setting of monotone constraints for each feature. For example, if you have a column called Features that is made up of 10 features, you can specify what constraint to use in the order of the features by setting the constraint value of either 1 for increasing, 0 for no constraint and -1 for decreasing. So 1,-1,0,0,0..0 would apply an increasing constraint to the first feature, decreasing constraint to the second feature and no constraint to the remaining features. While having a per feature control is probably very powerful, I start to think about how does the user specify this in cases where they have a large number of features -- even with 20 or 30 features, no one wants to type in an array that long as that is not only tedious but error prone. Do we need to control the constraint at a per feature level? Or would applying the same constraint to all features suffice? |
I think all or nothing is ok. I'd like to have control at the per column level, but I'm not sure how to make it user friendly. My specific use case is, I have a stacked model. The sub-models are rational, therefore I'd like them to be positive-monotone when combined to give the final score; I also want to feed some raw features to the final learner as this stacking method shows promise. To get the same results, I currently duplicate the raw features as Two ways that are slightly user friendly:
|
LightGBM. This is done through the LightGBM Options class via the MonotoneConstraints member. To handle the monotone constraints, this adds the ability to specify whether to use a positive constraint or a negative constraint along with a range. Multiple ranges can be specified. This checkin also includes tests for the parsing of the ranges to validate the expected value that will be passed to LightGBM. This fixes dotnet#1651.
After talking with Tom, this needs more thought through on how this can be made more user friendly. The PR that I currently have here: #2330 has the user specifying the constraints based upon indices. This was primarily to handle the way LightGBM works, but ML.Net does not work this way and using indices after concatenating columns into a Features column does not give a clear way to know which indices map to which features. Tom recommended to do something similar to how Categorical Features works and somehow we manage the mapping of feature name to indices (such as in the metadata). This would allow the user to specify the constraints based upon a name (which would map to indices) rather than the specific indices. Also from talking with Tom, this work can be done post v1.0 as it would not affect any API changes. My vote is to pause on this for now and reinvestigate post 1.0. @shauheen and @TomFinley feel free to comment if you have anything additional. |
TLDR; For the moment, I'd be quite happy to have purely positive / purely negative for all slots. I agree w/ @TomFinley. I have similar concerns about the usability. I would also like the style similar to what I think @TomFinley is purposing, the Categorical Features style, is this similar to the second style on my list; I thought it would be too hard; but if Tom thinks it's doable it seems like a great longer term solution.
For the moment, I'd be quite happy to have purely positive / purely negative for all slots. This addresses AutoML team's ability to use it for model stacking:
/cc @shauheen |
What's the status on this? Reading the thread it seems to me like we can use monotonic constraints with LightGBM, but only if we constrain all features in the same way? Is there any documentation? |
@luisquintanilla are you aware of any documentation on this? Closing this issue as its been resolved by the PR, but please post any documentation here on it anyways. |
@michaelgsharp we don't have docs that I'm aware of. If you can't point me to the PR that solved this issue or any related tests we can add this to our docs backlog |
In my ML problem I get significantly better results from
LightGbm
by using themonotone_constraints
parameter. I can not see that this is available through the ML.NET interface. Could this be added?The text was updated successfully, but these errors were encountered: