Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2864][MLLIB] fix random seed in word2vec; move model to local #1790

Closed
wants to merge 2 commits into from

Conversation

mengxr
Copy link
Contributor

@mengxr mengxr commented Aug 5, 2014

It also moves the model to local in order to map RDD[String] to RDD[Vector].

@Ishiihara

@@ -246,22 +246,24 @@ class Word2Vec(
}

val newSentences = sentences.repartition(parallelism).cache()
val seed = 5875483L
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does more than fix the seed for unit tests, but for every call. Is it not a bit better to make the RNG injectable via a discreet package-private setter and let the tests inject a seeded RNG?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added setters and made seed configurable.

@Ishiihara
Copy link
Contributor

@mengxr LGTM. We may need better implementation of TopK. It also worth trying to change the starting alpha in each iteration.

@mengxr
Copy link
Contributor Author

mengxr commented Aug 5, 2014

Jenkins, test this please.

@mengxr
Copy link
Contributor Author

mengxr commented Aug 5, 2014

Jenkins, where are you?

@mengxr
Copy link
Contributor Author

mengxr commented Aug 5, 2014

Jenkins, test this please.

@mengxr
Copy link
Contributor Author

mengxr commented Aug 5, 2014

Jenkins, retest this please.

@mengxr
Copy link
Contributor Author

mengxr commented Aug 5, 2014

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Aug 5, 2014

QA tests have started for PR 1790. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17949/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 5, 2014

QA results for PR 1790:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class Word2Vec extends Serializable with Logging {

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17949/consoleFull

@mengxr
Copy link
Contributor Author

mengxr commented Aug 5, 2014

Merged into both master and branch-1.1.

asfgit pushed a commit that referenced this pull request Aug 6, 2014
It also moves the model to local in order to map `RDD[String]` to `RDD[Vector]`.

Ishiihara

Author: Xiangrui Meng <meng@databricks.com>

Closes #1790 from mengxr/word2vec-fix and squashes the following commits:

a87146c [Xiangrui Meng] add setters and make a default constructor
e5c923b [Xiangrui Meng] fix random seed in word2vec; move model to local

(cherry picked from commit cc491f6)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
@asfgit asfgit closed this in cc491f6 Aug 6, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
It also moves the model to local in order to map `RDD[String]` to `RDD[Vector]`.

Ishiihara

Author: Xiangrui Meng <meng@databricks.com>

Closes apache#1790 from mengxr/word2vec-fix and squashes the following commits:

a87146c [Xiangrui Meng] add setters and make a default constructor
e5c923b [Xiangrui Meng] fix random seed in word2vec; move model to local
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants