New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-15569: Split on the right subtree when node value == threshold on MultipleAdditiveTreesModel #242
base: main
Are you sure you want to change the base?
Conversation
…n MultipleAdditiveTreesModel
if (featureVector[regressionTreeNode.featureIndex] <= regressionTreeNode.threshold) { | ||
regressionTreeNode = regressionTreeNode.left; | ||
} else { | ||
if (featureVector[regressionTreeNode.featureIndex] >= regressionTreeNode.threshold) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep it consistent, isn't <= in XGboost and similar?
This is a pretty big change, and I think it shouldn't be necssary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ==
going left or right is an interesting question! One way to maintain backwards compatibility whilst supporting "go right instead of left" could be via an optional configuration element e.g.
- "params" : { "trees" : [ ... ] }
+ "params" : { "trees" : [ ... ], "splitToRight" : true }
And if there is a scoreNode
change then explainNode
requires a matching change I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We run into case with a specific XGBoost model where a variable might take on 0 or 1 and the split condition being 1.0
. So, in that case if we go left by default the right path is never evaluated, which results in silently decreased model performance.
But, agree it's a rather big change, there might be issues with a) other tree models that split to the left b) backwards incompatibility, so @cpoerschke suggestion makes a lot of sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ==
should be always on the left. But I agree with splitToRight
for backward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diegoceccarelli not sure if there are different tree models that either split to left or right when equal. At least in the case of the XGBoost model, it seems to split to the right by default. I'll check if I can add that flag functionality on the PR.
"weight" : "1f", | ||
"root": { | ||
"feature": "constantScoreToForceMultipleAdditiveTreesScoreAllDocs", | ||
"threshold": "1.0f", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should verify this, I think branching is generated on <=left and > right in the training algorithm
@@ -249,5 +274,5 @@ public void multipleAdditiveTreesTestUnknownFeature(){ | |||
}); | |||
assertEquals(expectedException.toString(), ex.toString()); | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unintended?
I took a stab on the 'splitToRight' flag implementation. Please review and let me know of any issues, thanks! |
This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the dev@solr.apache.org mailing list. Thank you for your contribution! |
https://issues.apache.org/jira/browse/SOLR-15569
Description
Fixes an issue where the MultipleAdditiveTreesModel does not split correctly when the tree node value equals the split threshold. This was discovered while testing a translated XGBoost model for LTR and getting slightly different score results.
NOTE: The previous logic split to the left and also added a NODE_SPLIT_SLACK that has been removed. Not sure if this original logic was part of another model, or served another purpose.
Solution
Tests
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.