New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-5315][Streaming] Fix reduceByWindow Java API not work bug #4104
Conversation
Test build #25757 has started for PR 4104 at commit
|
Test build #25757 has finished for PR 4104 at commit
|
Test PASSed. |
) ++ Seq( | ||
// SPARK-5315 Spark Streaming Java API returns Scala DStream | ||
ProblemFilters.exclude[MissingMethodProblem]( | ||
"org.apache.spark.streaming.api.java.JavaDStreamLike.reduceByWindow") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed since you didn't remove the old API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Patrick, Mima test complains about this, so I added it here.
Besides, what's your opinion about deprecating the old API or modifying it if this API is actually not correct? I'm not sure which is the good way, but @srowen suggested me to deprecate rather than modify.
Thanks a lot :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, it doesn't make sense that Mima would complain since no methods have been removed. Can you double-check by trying a clean build without the exclusion? or else I am worried we're overlooking some obscure reason that this change still change the API, but I can't figure out why that could be. It could be a false positive in Mima.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will do a clean test to double-check this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi all, I tested with a clean build, still Mima complains about it, seems it is false positive.
[error] * method reduceByWindow(org.apache.spark.api.java.function.Function2,org.apache.spark.streaming.Duration,org.apache.spark.streaming.Duration)org.apache.spark.streaming.api.java.JavaDStream in trait org.apache.spark.streaming.api.java.JavaDStreamLike does not have a correspondent in old version
[error] filter with: ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.streaming.api.java.JavaDStreamLike.reduceByWindow")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually not a false positive. Since JavaDStreamLike is a trait, it boils down to an abstract class. If someone has inherited their own class from DStreamLike, and had a function reduceByKeyAndWindow with this signature (the new one), his code will break as his code will now have to "override" the method. So technically it is correctly. But I dont envision any one doing this, so this binary compatibility break is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, my understanding of this issue never fails to fail. In a recent PR, I think the decision was to give up on supporting user code extending JavaRDDLike
so indeed this is OK.
Since no methods are deleted I assume this is a Mima glitch. I suppose the exclusion must stay. |
*/ | ||
@deprecated("Use Java compatible version", "1.3.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Use Java-compatible version of reduceByWindow"
Can you update this patch to deal with the merge conflicts. |
Thanks TD, done with code rebase. |
Test build #25967 has started for PR 4104 at commit
|
Test build #25967 has finished for PR 4104 at commit
|
Test PASSed. |
LGTM. Merging this. Thanks! |
commit ea74365 Author: Xiangrui Meng <meng@databricks.com> Date: Thu Jan 22 22:09:13 2015 -0800 [SPARK-3541][MLLIB] New ALS implementation with improved storage This PR adds a new ALS implementation to `spark.ml` using the pipeline API, which should be able to scale to billions of ratings. Compared with the ALS under `spark.mllib`, the new implementation 1. uses the same algorithm, 2. uses float type for ratings, 3. uses primitive arrays to avoid GC, 4. sorts and compresses ratings on each block so that we can solve least squares subproblems one by one using only one normal equation instance. The following figure shows performance comparison on copies of the Amazon Reviews dataset using a 16-node (m3.2xlarge) EC2 cluster (the same setup as in http://databricks.com/blog/2014/07/23/scalable-collaborative-filtering-with-spark-mllib.html): ![als-wip](https://cloud.githubusercontent.com/assets/829644/5659447/4c4ff8e0-96c7-11e4-87a9-73c1c63d07f3.png) I keep the `spark.mllib`'s ALS untouched for easy comparison. If the new implementation works well, I'm going to match the features of the ALS under `spark.mllib` and then make it a wrapper of the new implementation, in a separate PR. TODO: - [X] Add unit tests for implicit preferences. Author: Xiangrui Meng <meng@databricks.com> Closes apache#3720 from mengxr/SPARK-3541 and squashes the following commits: 1b9e852 [Xiangrui Meng] fix compile 5129be9 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-3541 dd0d0e8 [Xiangrui Meng] simplify test code c627de3 [Xiangrui Meng] add tests for implicit feedback b84f41c [Xiangrui Meng] address comments a76da7b [Xiangrui Meng] update ALS tests 2a8deb3 [Xiangrui Meng] add some ALS tests 857e876 [Xiangrui Meng] add tests for rating block and encoded block d3c1ac4 [Xiangrui Meng] rename some classes for better code readability add more doc and comments 213d163 [Xiangrui Meng] org imports 771baf3 [Xiangrui Meng] chol doc update ca9ad9d [Xiangrui Meng] add unit tests for chol b4fd17c [Xiangrui Meng] add unit tests for NormalEquation d0f99d3 [Xiangrui Meng] add tests for LocalIndexEncoder 80b8e61 [Xiangrui Meng] fix imports 4937fd4 [Xiangrui Meng] update ALS example 56c253c [Xiangrui Meng] rename product to item bce8692 [Xiangrui Meng] doc for parameters and project the output columns 3f2d81a [Xiangrui Meng] add doc 1efaecf [Xiangrui Meng] add example code 8ae86b5 [Xiangrui Meng] add a working copy of the new ALS implementation commit e0f7fb7 Author: jerryshao <saisai.shao@intel.com> Date: Thu Jan 22 22:04:21 2015 -0800 [SPARK-5315][Streaming] Fix reduceByWindow Java API not work bug `reduceByWindow` for Java API is actually not Java compatible, change to make it Java compatible. Current solution is to deprecate the old one and add a new API, but since old API actually is not correct, so is keeping the old one meaningful? just to keep the binary compatible? Also even adding new API still need to add to Mima exclusion, I'm not sure to change the API, or deprecate the old API and add a new one, which is the best solution? Author: jerryshao <saisai.shao@intel.com> Closes apache#4104 from jerryshao/SPARK-5315 and squashes the following commits: 5bc8987 [jerryshao] Address the comment c7aa1b4 [jerryshao] Deprecate the old one to keep binary compatible 8e9dc67 [jerryshao] Fix JavaDStream reduceByWindow signature error commit 3c3fa63 Author: jerryshao <saisai.shao@intel.com> Date: Thu Jan 22 21:58:53 2015 -0800 [SPARK-5233][Streaming] Fix error replaying of WAL introduced bug Because of lacking of `BlockAllocationEvent` in WAL recovery, the dangled event will mix into the new batch, which will lead to the wrong result. Details can be seen in [SPARK-5233](https://issues.apache.org/jira/browse/SPARK-5233). Author: jerryshao <saisai.shao@intel.com> Closes apache#4032 from jerryshao/SPARK-5233 and squashes the following commits: f0b0c0b [jerryshao] Further address the comments a237c75 [jerryshao] Address the comments e356258 [jerryshao] Fix bug in unit test 558bdc3 [jerryshao] Correctly replay the WAL log when recovering from failure
reduceByWindow
for Java API is actually not Java compatible, change to make it Java compatible.Current solution is to deprecate the old one and add a new API, but since old API actually is not correct, so is keeping the old one meaningful? just to keep the binary compatible? Also even adding new API still need to add to Mima exclusion, I'm not sure to change the API, or deprecate the old API and add a new one, which is the best solution?