Please sign in to comment.
[SPARK-14322][MLLIB] Use treeAggregate instead of reduce in OnlineLDA…
…Optimizer ## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-14322 OnlineLDAOptimizer uses RDD.reduce in two places where it could use treeAggregate. This can cause scalability issues. This should be an easy fix. This is also a bug since it modifies the first argument to reduce, so we should use aggregate or treeAggregate. See this line: https://github.com/apache/spark/blob/f12f11e578169b47e3f8b18b299948c0670ba585/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala#L452 and a few lines below it. ## How was this patch tested? unit tests Author: Yuhao Yang <email@example.com> Closes #12106 from hhbyyh/ldaTreeReduce.
- Loading branch information
Showing with 3 additions and 2 deletions.