-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-19953][ML] Random Forest Models use parent UID when being fit #17296
[SPARK-19953][ML] Random Forest Models use parent UID when being fit #17296
Conversation
Test build #74557 has finished for PR 17296 at commit
|
ping @jkbradley @MLnick |
Seems fine - are there other instances of this happening? I'm wondering why test cases did not pick this up... seems like we should have a standard test case for it? |
Thanks @MLnick! I checked and didn't see this happening anywhere else. It's not great to put into a test case because it requires training to get the model. It could be tacked onto existing test, but I don't know if it's really worth it. The python tests will eventually test this after SPARK-10931, which is where I discovered it. |
hmm I would prefer to test it though I do get it's pretty tricky to do generically. I don't think training a tiny model on a couple data points will add too much overhead. |
… added check to suites where missing
@MLnick , I found an existing |
cc @MLnick @jkbradley , I updated with the latest and added a check for the model uid to match parent. I don't think it's great that this check is tacked on to various other tests because it makes it easy to forget it if adding additional algorithms. Hopefully this is good enough for now to get this fix in and I can still follow up with another JIRA to refactor basic checks like this to make it more consistent. |
Test build #75488 has finished for PR 17296 at commit
|
High-level seems good now, though there are new conflicts in Did you create a JIRA to track the broader issue of trying to make the testing more generic? Or at least - we could perhaps try to "enforce" the tests through a test trait (e.g. Of course we still need to ensure new tests implement the trait - but at least if all existing test are adapted in this way it provides the blueprint going forward. The only other way I can think of would be via some reflection approach (but the correct form of dataset needs to be generated for each estimator...) |
…nt-uid-SPARK-19953
Test build #75551 has finished for PR 17296 at commit
|
@MLnick this should be good to go. I made https://issues.apache.org/jira/browse/SPARK-20234 to address some better consistency in these basic checks. |
LGTM, merged to master. Thanks for creating the follow up JIRA. |
What changes were proposed in this pull request?
The ML
RandomForestClassificationModel
andRandomForestRegressionModel
were not using the estimator parent UID when being fit. This change fixes that so the models can be properly be identified with their parents.How was this patch tested?Existing tests.
Added check to verify that model uid matches that of the parent, then renamed
checkCopy
tocheckCopyAndUids
and verified that it was called by one test for each ML algorithm.