-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22156][MLLIB] Fix update equation of learning rate in Word2Vec.scala #19372
Conversation
You should make a JIRA as it's a non-trivial behavior change. |
Thank you for your comment, @srowen. |
I think the PR is incorrect:
|
Thank you for your comment, @LowikC. Correct update formula based on your comment is alpha = learningRate *
(1 - numPartitions * wordCount.toDouble + (k - 1) * trainWordsCount /
(numIterations * trainWordsCount + 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct parentheses
learningRate * | ||
(1 - numPartitions * wordCount.toDouble / (numIterations * trainWordsCount + 1)) | ||
alpha = learningRate * | ||
(1 - numPartitions * wordCount.toDouble + (k - 1) * trainWordsCount / |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need numPartitions * wordCount.toDouble + (k - 1) * trainWordsCount
between parentheses
alpha = learningRate * (1 - (numPartitions * wordCount.toDouble + (k - 1) * trainWordsCount) / (numIterations * trainWordsCount + 1))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh... Thanks! I fixed it.
What do you think @LowikC ? |
Test build #3939 has finished for PR 19372 at commit
|
Looks good to me. |
Thank you for your reviews, @LowikC. Like this? val totalWordsCounts = numIterations * trainWordsCount + 1
val numWordsProcessedInPreviousIterations = (k - 1) * trainWordsCount
alpha = learningRate *
(1 - (numPartitions * wordCount.toDouble + numWordsProcessedInPreviousIterations) /
totalWordsCounts)
if (alpha < learningRate * 0.0001) alpha = learningRate * 0.0001
logInfo("wordCount = " + (wordCount + numWordsProcessedInPreviousIterations) +
", alpha = " + alpha) |
I updated the results of word2vec example based on this PR in the first comment. |
if (alpha < learningRate * 0.0001) alpha = learningRate * 0.0001 | ||
logInfo("wordCount = " + wordCount + ", alpha = " + alpha) | ||
logInfo("wordCount = " + (wordCount + numWordsProcessedInPreviousIterations) + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you update this again, you can use string interpolation: logInfo(s"wordCount = ${wordCount + ...}, alpha = $alpha")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK with me if OK with @LowikC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok for me
Test build #3943 has finished for PR 19372 at commit
|
Merged to master |
Thank you for your kindful reviews! |
What changes were proposed in this pull request?
Current equation of learning rate is incorrect when
numIterations
>1
.This PR is based on original C code.
cc: @mengxr
How was this patch tested?
manual tests
I modified this example code.
numIteration=1
Code
Result
numIteration=5
Code
Result