-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[MLlib][Doc] Seed fix in mllib naive bayes example #7477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
I can't reproduce this. On master this results in a test set with 1 element. changing to 13 yields a test set with 2 elements. I don't mind changing it but are you sure? |
|
I'm getting an empty test set with spark-core_2.11 version 1.4.0 and spark-mllib_2.11 v. 1.4.0 used as SBT dependencies. |
|
OK. This could be down to difference in Spark, Scala, the RNG across versions. If 13 works in your case, and still works for the plain-vanilla master + 2.10 build, I think we can just make this change. |
|
So I thought. I can confirm that version 1.4.1 still results in empty test set. |
|
We use Java's Random in randomSplit, which shouldn't be affected by the Scala version. Which JVM did you use? |
|
1.8.0_45, Windows 8 |
|
I'm on 1.8.0_45 / OS X |
|
Of course, no problem :). I'll prepare the commit later today. |
Previous seed resulted in empty test data set.
|
@mengxr done. I added a few rows to the sample data and reverted the seed change. However, I had to add two rows for each class for the test set not to be empty after the split. Adding one row didn't resolve the problem. |
|
LGTM. Merged into master (skipped Jenkins). Thanks! |
Previous seed resulted in empty test data set.