-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-11198][STREAMING][KINESIS] Support de-aggregation of records during recovery #9403
Conversation
Test build #44784 has finished for PR 9403 at commit
|
Tested that this patch successfully de-aggregates in recovery as well. |
Test build #44941 has finished for PR 9403 at commit
|
Test build #44944 has finished for PR 9403 at commit
|
Test build #44956 has finished for PR 9403 at commit
|
Test build #44975 has finished for PR 9403 at commit
|
test this again. |
@@ -56,6 +56,8 @@ class KinesisStreamSuite extends KinesisFunSuite | |||
private var ssc: StreamingContext = null | |||
private var sc: SparkContext = null | |||
|
|||
protected val aggregateTestData: Boolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn it easier to have a parameter in the constructor? Less code while subclassing class WithAggregationKinesisStreamSuite extends KinesisStreamTests(aggregateTestData = true)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That can also work :)
overall the code looks fine, but Kinesis tests are not passing |
Test build #45049 has finished for PR 9403 at commit
|
Test build #45149 has finished for PR 9403 at commit
|
test this please |
The failure may be because protobuf 2.6.1 was not enabled. BTW, if 2.6.1 is enabled, shouldn't protobuf 2.6.1 break other stuff in Spark and make the end-to-end test fail? |
@zsxwing probably not. Locally tests pass. They fail on jenkins for some reason. I've enabled a lot of the logging to look deeper into it. |
Test build #45181 has finished for PR 9403 at commit
|
Test build #45249 has started for PR 9403 at commit |
test this please |
Test build #45255 has finished for PR 9403 at commit
|
Test build #45343 has finished for PR 9403 at commit
|
Huh. Didn't change much but the tests passed this time. I wonder if it was a Java 7 vs. 8 mismatch... Just to be sure, will re-run tests |
test this please |
Test build #45377 has finished for PR 9403 at commit
|
Aww man. The Kinesis tests passed, hive tests failed. Re-running |
test this please |
Test build #45391 has finished for PR 9403 at commit
|
test this please |
1 similar comment
test this please |
Test build #45422 has finished for PR 9403 at commit
|
LGTM. Merging this to master and 1.6. Thanks @brkyvz |
…uring recovery While the KCL handles de-aggregation during the regular operation, during recovery we use the lower level api, and therefore need to de-aggregate the records. tdas Testing is an issue, we need protobuf magic to do the aggregated records. Maybe we could depend on KPL for tests? Author: Burak Yavuz <brkyvz@gmail.com> Closes #9403 from brkyvz/kinesis-deaggregation. (cherry picked from commit 26062d2) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
Test build #45440 has finished for PR 9403 at commit
|
Despite being closed, I'm confused about why this causes hive unit tests to fail. Is this unrelated?Also, the publisher in the unit test looks like it's publishing an intentionally non-aggregated stream. Is that expected? Am I reading this incorrectly? |
Hi @lordnynex. Why did you think this causes Hive tests to fail? That was most probably a flaky test. |
KinesisStreamTests in test.py is broken because of this PR. See https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46896/testReport/(root)/KinesisStreamTests/test_kinesis_stream/ Is the new dependency the failure cause? The PR builds for this PR actually didn't report the Python failure because of #9669. |
KinesisStreamTests in test.py is broken because of #9403. See https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46896/testReport/(root)/KinesisStreamTests/test_kinesis_stream/ Because Streaming Python didn’t work when merging #9403, the PR build didn’t report the Python test failure actually. This PR just disabled the test to unblock #10039 Author: Shixiong Zhu <shixiong@databricks.com> Closes #10047 from zsxwing/disable-python-kinesis-test.
While the KCL handles de-aggregation during the regular operation, during recovery we use the lower level api, and therefore need to de-aggregate the records.
@tdas Testing is an issue, we need protobuf magic to do the aggregated records. Maybe we could depend on KPL for tests?