-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34189][ML] w2v findSynonyms optimization #31276
Conversation
train a model with https://en.wikipedia.org/wiki/Word2vec as the training data;
performance test
results: |
Test build #134320 has finished for PR 31276 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test starting |
Test build #134347 has finished for PR 31276 at commit
|
Kubernetes integration test status success |
friendly ping @srowen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK, merge when ready.
Merged to master, thanks @srowen ! |
### What changes were proposed in this pull request? 1, use Guavaording instead of BoundedPriorityQueue; 2, use local variables; 3, avoid conversion: ml.vector -> mllib.vector ### Why are the changes needed? this pr is about 30% faster than existing impl ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? existing testsuites Closes apache#31276 from zhengruifeng/w2v_findSynonyms_opt. Authored-by: Ruifeng Zheng <ruifengz@foxmail.com> Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
What changes were proposed in this pull request?
1, use Guavaording instead of BoundedPriorityQueue;
2, use local variables;
3, avoid conversion: ml.vector -> mllib.vector
Why are the changes needed?
this pr is about 30% faster than existing impl
Does this PR introduce any user-facing change?
NO
How was this patch tested?
existing testsuites