-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21623][ML]fix RF doc #18832
[SPARK-21623][ML]fix RF doc #18832
Conversation
Test build #80199 has finished for PR 18832 at commit
|
The comment is not wrong. It's added for when we are finding the best split, to compute the right child stats from the left child stats. We would have just used the stats that are already available on the node var gainAndImpurityStats: ImpurityStats = if (level == 0) {
null
} else {
node.stats
} Otherwise, instead of parent stats we would just use |
node.stats is ImpurityStats, and parentStats is Array[Double], there are different. Maybe this comment should be used on node.stats, but not on parentStats. Is my understanding wrong? |
parentStats is used in this code: binAggregates.getParentImpurityCalculator(), this is used in all iteration. |
I don't agree the comment is misleading. It might be confusing, but that's something different. The reason that the If you want to change it, then something like "Parent stats need to be explicitly tracked in the I doubt that's much clearer. Just to note, this comment was intended for developers anyway, since it's all private APIs. |
I know your point. |
No, I don't think so. Computing parent stats is a very small fraction of the time and memory compared with the overall |
I agree with you. Do you think we should update the comment to help others understand the code. |
If you want to change it, that's fine. I think it's fine either way. |
Thanks @sethah . |
@mpjlu can you either close or update the change to reflect your input and Seth's? |
Thanks @srowen , I revised the comments per Seth's suggestion: "Parent stats need to be explicitly tracked in the DTStatsAggregator because the parent [[Node]] object does not have ImpurityStats on the first iteration." |
Test build #80331 has finished for PR 18832 at commit
|
Test build #80333 has finished for PR 18832 at commit
|
merged to master |
What changes were proposed in this pull request?
comments of parentStats in RF are wrong.
parentStats is not only used for the first iteration, it is used with all the iteration for unordered features.
How was this patch tested?