Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3097][MLlib] Word2Vec performance improvement #1932

Closed
wants to merge 5 commits into from

Conversation

Ishiihara
Copy link
Contributor

@mengxr Please review the code. Adding weights in reduceByKey soon.

Only output model entry for words appeared in the partition before merging and use reduceByKey to combine model. In general, this implementation is 30s or so faster than implementation using big array.

@Ishiihara Ishiihara changed the title [SPARK-2907][MLlib] Word2Vec performance improve [SPARK-2907][MLlib] Word2Vec performance improvement Aug 13, 2014
@Ishiihara Ishiihara changed the title [SPARK-2907][MLlib] Word2Vec performance improvement [MLlib] Word2Vec performance improvement Aug 14, 2014
@mengxr
Copy link
Contributor

mengxr commented Aug 14, 2014

Jenkins, test this please.

@@ -34,7 +34,7 @@ import org.apache.spark.mllib.rdd.RDDFunctions._
import org.apache.spark.rdd._
import org.apache.spark.util.Utils
import org.apache.spark.util.random.XORShiftRandom

import org.apache.spark.util.collection.PrimitiveKeyOpenHashMap
/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an empty line after imports

@SparkQA
Copy link

SparkQA commented Aug 14, 2014

QA tests have started for PR 1932. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18547/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 14, 2014

QA results for PR 1932:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18547/consoleFull

@mengxr
Copy link
Contributor

mengxr commented Aug 16, 2014

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Aug 16, 2014

QA tests have started for PR 1932 at commit cad2011.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 17, 2014

QA tests have finished for PR 1932 at commit cad2011.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Aug 18, 2014

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Aug 18, 2014

QA tests have started for PR 1932 at commit d5377a9.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 18, 2014

QA tests have finished for PR 1932 at commit d5377a9.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Ishiihara Ishiihara changed the title [MLlib] Word2Vec performance improvement [SPARK-3097][MLlib] Word2Vec performance improvement Aug 18, 2014
@mengxr
Copy link
Contributor

mengxr commented Aug 18, 2014

LGTM. Merged into master and branch-1.1. Thanks!

asfgit pushed a commit that referenced this pull request Aug 18, 2014
mengxr Please review the code. Adding weights in reduceByKey soon.

Only output model entry for words appeared in the partition before merging and use reduceByKey to combine model. In general, this implementation is 30s or so faster than implementation using big array.

Author: Liquan Pei <liquanpei@gmail.com>

Closes #1932 from Ishiihara/Word2Vec-improve2 and squashes the following commits:

d5377a9 [Liquan Pei] use syn0Global and syn1Global to represent model
cad2011 [Liquan Pei] bug fix for synModify array out of bound
083aa66 [Liquan Pei] update synGlobal in place and reduce synOut size
9075e1c [Liquan Pei] combine syn0Global and syn1Global to synGlobal
aa2ab36 [Liquan Pei] use reduceByKey to combine models

(cherry picked from commit 3c8fa50)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
@asfgit asfgit closed this in 3c8fa50 Aug 18, 2014
@loveconan1988
Copy link

------------------ 原始邮件 ------------------
发件人: "asfgit";notifications@github.com;
发送时间: 2014年8月18日(星期一) 下午2:31
收件人: "apache/spark"spark@noreply.github.com;

主题: Re: [spark] [SPARK-3097][MLlib] Word2Vec performance improvement(#1932)

Closed #1932 via 3c8fa50.


Reply to this email directly or view it on GitHub.

xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
mengxr Please review the code. Adding weights in reduceByKey soon.

Only output model entry for words appeared in the partition before merging and use reduceByKey to combine model. In general, this implementation is 30s or so faster than implementation using big array.

Author: Liquan Pei <liquanpei@gmail.com>

Closes apache#1932 from Ishiihara/Word2Vec-improve2 and squashes the following commits:

d5377a9 [Liquan Pei] use syn0Global and syn1Global to represent model
cad2011 [Liquan Pei] bug fix for synModify array out of bound
083aa66 [Liquan Pei] update synGlobal in place and reduce synOut size
9075e1c [Liquan Pei] combine syn0Global and syn1Global to synGlobal
aa2ab36 [Liquan Pei] use reduceByKey to combine models
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants