Skip to content

Conversation

@mupakoz
Copy link
Contributor

@mupakoz mupakoz commented Jul 17, 2015

Previous seed resulted in empty test data set.

@mupakoz mupakoz changed the title [MLlib][Doc] Fix mllib naive bayes example [MLlib][Doc] Seed fix in mllib naive bayes example Jul 17, 2015
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Jul 17, 2015

I can't reproduce this. On master this results in a test set with 1 element. changing to 13 yields a test set with 2 elements. I don't mind changing it but are you sure?

@mupakoz
Copy link
Contributor Author

mupakoz commented Jul 17, 2015

I'm getting an empty test set with spark-core_2.11 version 1.4.0 and spark-mllib_2.11 v. 1.4.0 used as SBT dependencies.

@srowen
Copy link
Member

srowen commented Jul 17, 2015

OK. This could be down to difference in Spark, Scala, the RNG across versions. If 13 works in your case, and still works for the plain-vanilla master + 2.10 build, I think we can just make this change.

@mupakoz
Copy link
Contributor Author

mupakoz commented Jul 17, 2015

So I thought. I can confirm that version 1.4.1 still results in empty test set.

@mengxr
Copy link
Contributor

mengxr commented Jul 17, 2015

We use Java's Random in randomSplit, which shouldn't be affected by the Scala version. Which JVM did you use?

@mupakoz
Copy link
Contributor Author

mupakoz commented Jul 17, 2015

1.8.0_45, Windows 8

@srowen
Copy link
Member

srowen commented Jul 17, 2015

I'm on 1.8.0_45 / OS X

@mengxr
Copy link
Contributor

mengxr commented Jul 18, 2015

@mupakoz I got the same result as @srowen . I don't think changing the seed is the solution. Instead, we should add more rows to sample_naive_bayes_data.txt. Could you do that instead?

@mupakoz
Copy link
Contributor Author

mupakoz commented Jul 18, 2015

Of course, no problem :). I'll prepare the commit later today.

Previous seed resulted in empty test data set.
@mupakoz
Copy link
Contributor Author

mupakoz commented Jul 18, 2015

@mengxr done. I added a few rows to the sample data and reverted the seed change. However, I had to add two rows for each class for the test set not to be empty after the split. Adding one row didn't resolve the problem.

@mengxr
Copy link
Contributor

mengxr commented Jul 18, 2015

LGTM. Merged into master (skipped Jenkins). Thanks!

@asfgit asfgit closed this in b9ef7ac Jul 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants