[MLlib][Doc] Seed fix in mllib naive bayes example #7477

mupakoz · 2015-07-17T19:13:44Z

Previous seed resulted in empty test data set.

AmplabJenkins · 2015-07-17T19:17:13Z

Can one of the admins verify this patch?

srowen · 2015-07-17T19:40:18Z

I can't reproduce this. On master this results in a test set with 1 element. changing to 13 yields a test set with 2 elements. I don't mind changing it but are you sure?

mupakoz · 2015-07-17T19:48:52Z

I'm getting an empty test set with spark-core_2.11 version 1.4.0 and spark-mllib_2.11 v. 1.4.0 used as SBT dependencies.

srowen · 2015-07-17T19:52:13Z

OK. This could be down to difference in Spark, Scala, the RNG across versions. If 13 works in your case, and still works for the plain-vanilla master + 2.10 build, I think we can just make this change.

mupakoz · 2015-07-17T19:57:57Z

So I thought. I can confirm that version 1.4.1 still results in empty test set.

mengxr · 2015-07-17T19:59:26Z

We use Java's Random in randomSplit, which shouldn't be affected by the Scala version. Which JVM did you use?

mupakoz · 2015-07-17T20:00:01Z

1.8.0_45, Windows 8

srowen · 2015-07-17T20:15:10Z

I'm on 1.8.0_45 / OS X

mengxr · 2015-07-18T03:11:30Z

@mupakoz I got the same result as @srowen . I don't think changing the seed is the solution. Instead, we should add more rows to sample_naive_bayes_data.txt. Could you do that instead?

mupakoz · 2015-07-18T06:12:53Z

Of course, no problem :). I'll prepare the commit later today.

Previous seed resulted in empty test data set.

mupakoz · 2015-07-18T07:45:13Z

@mengxr done. I added a few rows to the sample data and reverted the seed change. However, I had to add two rows for each class for the test set not to be empty after the split. Adding one row didn't resolve the problem.

mengxr · 2015-07-18T17:13:37Z

LGTM. Merged into master (skipped Jenkins). Thanks!

mupakoz changed the title ~~[MLlib][Doc] Fix mllib naive bayes example~~ [MLlib][Doc] Seed fix in mllib naive bayes example Jul 17, 2015

Mllib Naive Bayes example data set enlarged

f5d41ee

Previous seed resulted in empty test data set.

mupakoz force-pushed the patch-1 branch from e4a53d1 to f5d41ee Compare July 18, 2015 07:41

asfgit closed this in b9ef7ac Jul 18, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MLlib][Doc] Seed fix in mllib naive bayes example #7477

[MLlib][Doc] Seed fix in mllib naive bayes example #7477

Uh oh!

mupakoz commented Jul 17, 2015

Uh oh!

AmplabJenkins commented Jul 17, 2015

Uh oh!

srowen commented Jul 17, 2015

Uh oh!

mupakoz commented Jul 17, 2015

Uh oh!

srowen commented Jul 17, 2015

Uh oh!

mupakoz commented Jul 17, 2015

Uh oh!

mengxr commented Jul 17, 2015

Uh oh!

mupakoz commented Jul 17, 2015

Uh oh!

srowen commented Jul 17, 2015

Uh oh!

mengxr commented Jul 18, 2015

Uh oh!

mupakoz commented Jul 18, 2015

Uh oh!

mupakoz commented Jul 18, 2015

Uh oh!

mengxr commented Jul 18, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[MLlib][Doc] Seed fix in mllib naive bayes example #7477

[MLlib][Doc] Seed fix in mllib naive bayes example #7477

Uh oh!

Conversation

mupakoz commented Jul 17, 2015

Uh oh!

AmplabJenkins commented Jul 17, 2015

Uh oh!

srowen commented Jul 17, 2015

Uh oh!

mupakoz commented Jul 17, 2015

Uh oh!

srowen commented Jul 17, 2015

Uh oh!

mupakoz commented Jul 17, 2015

Uh oh!

mengxr commented Jul 17, 2015

Uh oh!

mupakoz commented Jul 17, 2015

Uh oh!

srowen commented Jul 17, 2015

Uh oh!

mengxr commented Jul 18, 2015

Uh oh!

mupakoz commented Jul 18, 2015

Uh oh!

mupakoz commented Jul 18, 2015

Uh oh!

mengxr commented Jul 18, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants