-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-2864][MLLIB] fix random seed in word2vec; move model to local #1790
Conversation
@@ -246,22 +246,24 @@ class Word2Vec( | |||
} | |||
|
|||
val newSentences = sentences.repartition(parallelism).cache() | |||
val seed = 5875483L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does more than fix the seed for unit tests, but for every call. Is it not a bit better to make the RNG injectable via a discreet package-private setter and let the tests inject a seeded RNG?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added setters and made seed configurable.
@mengxr LGTM. We may need better implementation of TopK. It also worth trying to change the starting alpha in each iteration. |
Jenkins, test this please. |
Jenkins, where are you? |
Jenkins, test this please. |
Jenkins, retest this please. |
Jenkins, test this please. |
QA tests have started for PR 1790. This patch merges cleanly. |
QA results for PR 1790: |
Merged into both master and branch-1.1. |
It also moves the model to local in order to map `RDD[String]` to `RDD[Vector]`. Ishiihara Author: Xiangrui Meng <meng@databricks.com> Closes #1790 from mengxr/word2vec-fix and squashes the following commits: a87146c [Xiangrui Meng] add setters and make a default constructor e5c923b [Xiangrui Meng] fix random seed in word2vec; move model to local (cherry picked from commit cc491f6) Signed-off-by: Xiangrui Meng <meng@databricks.com>
It also moves the model to local in order to map `RDD[String]` to `RDD[Vector]`. Ishiihara Author: Xiangrui Meng <meng@databricks.com> Closes apache#1790 from mengxr/word2vec-fix and squashes the following commits: a87146c [Xiangrui Meng] add setters and make a default constructor e5c923b [Xiangrui Meng] fix random seed in word2vec; move model to local
It also moves the model to local in order to map
RDD[String]
toRDD[Vector]
.@Ishiihara