Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monotone_constraints in LightGbm #1651

Closed
Tracked by #6337
petterton opened this issue Nov 16, 2018 · 12 comments
Closed
Tracked by #6337

Monotone_constraints in LightGbm #1651

petterton opened this issue Nov 16, 2018 · 12 comments
Assignees
Labels
API Issues pertaining the friendly API enhancement New feature or request P2 Priority of the issue for triage purpose: Needs to be fixed at some point.
Projects

Comments

@petterton
Copy link

In my ML problem I get significantly better results from LightGbm by using the monotone_constraints parameter. I can not see that this is available through the ML.NET interface. Could this be added?

@najeeb-kazmi
Copy link
Member

I agree this would be a useful addition to ML.NET. We provide a wrapper for LightGBM but this parameter is not exposed. I will file this for our triage team to review and prioritize.

@najeeb-kazmi najeeb-kazmi added enhancement New feature or request API Issues pertaining the friendly API labels Nov 16, 2018
@singlis
Copy link
Member

singlis commented Dec 21, 2018

Here are links from lightGBM for adding monotone constraints:
Issue filed here: microsoft/LightGBM#14
Committed here: microsoft/LightGBM#1314

The version of lightGBM we are using in ml.net is 2.2.1.1 -- need to confirm if this version contains the support for monotone_constraints.

@justinormont
Copy link
Contributor

@daholste: Assuming I'm understanding this parameter correctly, this can also help with model stacking. Currently, LightGBM is allowed to map the output of the sub-models in the stack without the constraint that "as the sub-models score increases/decreases, so should the final score". Hence it could map as f(x) = ( x < 3 ? 1.0 : (x < 4 ? 0.0 : 2.0 )). When the sub-models correlate well with the label, we likely would benefit from a monotonically increasing meta-model for stacking.

@glebuk glebuk added this to To Do in v0.10 Jan 14, 2019
@glebuk
Copy link
Contributor

glebuk commented Jan 14, 2019

Update the LightGBM and expose the arg. Note if this fixes #1625 as well.

@justinormont
Copy link
Contributor

justinormont commented Jan 17, 2019

@glebuk: No need to update LightGBM for monotone_constraints.

Our current LightGBM version is from 3mo ago; the monotone_constraints was added 9 mo ago.

So our current version of LightGBM should work without updating.

Work item should be: Expose the monotone_constraints parameter of LightGBM

@singlis singlis self-assigned this Jan 18, 2019
@singlis singlis moved this from To Do to In Progress in v0.10 Jan 18, 2019
@singlis
Copy link
Member

singlis commented Jan 23, 2019

Hi @justinormont, @glebuk and @petterton,

LightGBM allows for the setting of monotone constraints for each feature. For example, if you have a column called Features that is made up of 10 features, you can specify what constraint to use in the order of the features by setting the constraint value of either 1 for increasing, 0 for no constraint and -1 for decreasing. So 1,-1,0,0,0..0 would apply an increasing constraint to the first feature, decreasing constraint to the second feature and no constraint to the remaining features.

While having a per feature control is probably very powerful, I start to think about how does the user specify this in cases where they have a large number of features -- even with 20 or 30 features, no one wants to type in an array that long as that is not only tedious but error prone.

Do we need to control the constraint at a per feature level? Or would applying the same constraint to all features suffice?

@justinormont
Copy link
Contributor

justinormont commented Jan 23, 2019

I think all or nothing is ok.

I'd like to have control at the per column level, but I'm not sure how to make it user friendly.

My specific use case is, I have a stacked model. The sub-models are rational, therefore I'd like them to be positive-monotone when combined to give the final score; I also want to feed some raw features to the final learner as this stacking method shows promise. To get the same results, I currently duplicate the raw features as NegRawFeat = RawFeat * -1, then xf=Concat{Features:SubModelScores,NegRawFeat,RawFeat} tr=LogisticRegression{nn=+}. This accomplishes the goal (for LR, though not for LightGBM), though looses the slot names making feature importance difficult.

Two ways that are slightly user friendly:

  • I suppose we could have three input columns: { Features, PosFeatures, NegFeatures }, where Features is the normal as current. Then users can concatenate the columns into PosFeatures which they want to be positive-monotone. I think inventing new input columns is confusion to users, so this is bad.
  • I don't think it is possible, but another route would be to take in a single Features column, as current, then the user specifies the names of the column (within) the Features column that they want as positive/negative. I don't think it's possible to locate the slots within the Features column corresponding to the user's specified column. Perhaps match on slot names?

singlis added a commit to singlis/machinelearning that referenced this issue Jan 30, 2019
LightGBM. This is done through the LightGBM Options class via the
MonotoneConstraints member. To handle the monotone constraints, this
adds the ability to specify whether to use a positive constraint or a
negative constraint along with a range. Multiple ranges can be
specified.

This checkin also includes tests for the parsing of the ranges to
validate the expected value that will be passed to LightGBM.

This fixes dotnet#1651.
@shauheen shauheen removed this from In Progress in v0.10 Jan 31, 2019
@shauheen shauheen added this to In Progress in v0.11 Jan 31, 2019
@singlis
Copy link
Member

singlis commented Feb 7, 2019

After talking with Tom, this needs more thought through on how this can be made more user friendly. The PR that I currently have here: #2330 has the user specifying the constraints based upon indices. This was primarily to handle the way LightGBM works, but ML.Net does not work this way and using indices after concatenating columns into a Features column does not give a clear way to know which indices map to which features.

Tom recommended to do something similar to how Categorical Features works and somehow we manage the mapping of feature name to indices (such as in the metadata). This would allow the user to specify the constraints based upon a name (which would map to indices) rather than the specific indices.

Also from talking with Tom, this work can be done post v1.0 as it would not affect any API changes.

My vote is to pause on this for now and reinvestigate post 1.0.

@shauheen and @TomFinley feel free to comment if you have anything additional.

@justinormont
Copy link
Contributor

TLDR; For the moment, I'd be quite happy to have purely positive / purely negative for all slots.


I agree w/ @TomFinley. I have similar concerns about the usability.

I would also like the style similar to what I think @TomFinley is purposing, the Categorical Features style, is this similar to the second style on my list; I thought it would be too hard; but if Tom thinks it's doable it seems like a great longer term solution.
#1651 (comment)

  • I don't think it is possible, but another route would be to take in a single Features column, as current, then the user specifies the names of the column (within) the Features column that they want as positive/negative. I don't think it's possible to locate the slots within the Features column corresponding to the user's specified column. Perhaps match on slot names?

For the moment, I'd be quite happy to have purely positive / purely negative for all slots. This addresses AutoML team's ability to use it for model stacking:

#1651 (comment),

@daholste: Assuming I'm understanding this parameter correctly, this can also help with model stacking. Currently, LightGBM is allowed to map the output of the sub-models in the stack without the constraint that "as the sub-models score increases/decreases, so should the final score". Hence it could map as f(x) = ( x < 3 ? 1.0 : (x < 4 ? 0.0 : 2.0 )). When the sub-models correlate well with the label, we likely would benefit from a monotonically increasing meta-model for stacking.

#1651 (comment)

My specific use case is, I have a stacked model. The sub-models are rational, therefore I'd like them to be positive-monotone when combined to give the final score; I also want to feed some raw features to the final learner as this stacking method shows promise. To get the same results, I currently duplicate the raw features as NegRawFeat = RawFeat * -1, then xf=Concat{Features:SubModelScores,NegRawFeat,RawFeat} tr=LogisticRegression{nn=+}. This accomplishes the goal (for LR, though not for LightGBM), though looses the slot names making feature importance difficult.

/cc @shauheen

@shauheen shauheen added this to To Do in Backlog via automation Mar 1, 2019
@shauheen shauheen removed this from In Progress in v0.11 Mar 1, 2019
@harishsk harishsk added the P2 Priority of the issue for triage purpose: Needs to be fixed at some point. label Jan 10, 2020
@KyBroecker
Copy link

What's the status on this? Reading the thread it seems to me like we can use monotonic constraints with LightGBM, but only if we constrain all features in the same way? Is there any documentation?

@michaelgsharp
Copy link
Member

@luisquintanilla are you aware of any documentation on this? Closing this issue as its been resolved by the PR, but please post any documentation here on it anyways.

@luisquintanilla
Copy link
Contributor

@michaelgsharp we don't have docs that I'm aware of. If you can't point me to the PR that solved this issue or any related tests we can add this to our docs backlog

@ghost ghost locked as resolved and limited conversation to collaborators Jan 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API enhancement New feature or request P2 Priority of the issue for triage purpose: Needs to be fixed at some point.
Projects
Backlog
  
To Do
Development

Successfully merging a pull request may close this issue.

9 participants