Skip to content
Browse files

[SPARK-14322][MLLIB] Use treeAggregate instead of reduce in OnlineLDA…


## What changes were proposed in this pull request?

OnlineLDAOptimizer uses RDD.reduce in two places where it could use treeAggregate. This can cause scalability issues. This should be an easy fix.
This is also a bug since it modifies the first argument to reduce, so we should use aggregate or treeAggregate.
See this line:
and a few lines below it.

## How was this patch tested?
unit tests

Author: Yuhao Yang <>

Closes #12106 from hhbyyh/ldaTreeReduce.
  • Loading branch information
hhbyyh authored and jkbradley committed Apr 6, 2016
1 parent db0b06c commit 8cffcb60deb82d04a5c6e144ec9927f6f7addc8b
Showing with 3 additions and 2 deletions.
  1. +3 −2 mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala
@@ -451,10 +451,11 @@ final class OnlineLDAOptimizer extends LDAOptimizer {
Iterator((stat, gammaPart))
val statsSum: BDM[Double] = += _)
val statsSum: BDM[Double] =[Double](k, vocabSize))(
_ += _, _ += _)
val gammat: BDM[Double] = breeze.linalg.DenseMatrix.vertcat( ++ _).map(_.toDenseMatrix): _*) => list).collect().map(_.toDenseMatrix): _*)
val batchResult = statsSum :* expElogbeta.t

// Note that this is an optimization to avoid batch.count

0 comments on commit 8cffcb6

Please sign in to comment.
You can’t perform that action at this time.