Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11198][STREAMING][KINESIS] Support de-aggregation of records during recovery #9403

Closed
wants to merge 12 commits into from

Conversation

brkyvz
Copy link
Contributor

@brkyvz brkyvz commented Nov 2, 2015

While the KCL handles de-aggregation during the regular operation, during recovery we use the lower level api, and therefore need to de-aggregate the records.

@tdas Testing is an issue, we need protobuf magic to do the aggregated records. Maybe we could depend on KPL for tests?

@SparkQA
Copy link

SparkQA commented Nov 2, 2015

Test build #44784 has finished for PR 9403 at commit de3e91b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * case class RepartitionByExpression(\n

@brkyvz
Copy link
Contributor Author

brkyvz commented Nov 2, 2015

Tested that this patch successfully de-aggregates in recovery as well.

@SparkQA
Copy link

SparkQA commented Nov 3, 2015

Test build #44941 has finished for PR 9403 at commit 6b0b290.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 3, 2015

Test build #44944 has finished for PR 9403 at commit 023055c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 3, 2015

Test build #44956 has finished for PR 9403 at commit 5f7a763.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 4, 2015

Test build #44975 has finished for PR 9403 at commit 7c1edd7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tdas
Copy link
Contributor

tdas commented Nov 4, 2015

test this again.

@@ -56,6 +56,8 @@ class KinesisStreamSuite extends KinesisFunSuite
private var ssc: StreamingContext = null
private var sc: SparkContext = null

protected val aggregateTestData: Boolean
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn it easier to have a parameter in the constructor? Less code while subclassing class WithAggregationKinesisStreamSuite extends KinesisStreamTests(aggregateTestData = true)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can also work :)

@tdas
Copy link
Contributor

tdas commented Nov 4, 2015

overall the code looks fine, but Kinesis tests are not passing

@SparkQA
Copy link

SparkQA commented Nov 5, 2015

Test build #45049 has finished for PR 9403 at commit 83a0f13.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 6, 2015

Test build #45149 has finished for PR 9403 at commit 8bcd6c3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@brkyvz
Copy link
Contributor Author

brkyvz commented Nov 6, 2015

test this please

@zsxwing
Copy link
Member

zsxwing commented Nov 6, 2015

The failure may be because protobuf 2.6.1 was not enabled. BTW, if 2.6.1 is enabled, shouldn't protobuf 2.6.1 break other stuff in Spark and make the end-to-end test fail?

@brkyvz
Copy link
Contributor Author

brkyvz commented Nov 6, 2015

@zsxwing probably not. Locally tests pass. They fail on jenkins for some reason. I've enabled a lot of the logging to look deeper into it.

@SparkQA
Copy link

SparkQA commented Nov 6, 2015

Test build #45181 has finished for PR 9403 at commit 4acf47e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 6, 2015

Test build #45249 has started for PR 9403 at commit 66fade4.

@brkyvz
Copy link
Contributor Author

brkyvz commented Nov 6, 2015

test this please

@SparkQA
Copy link

SparkQA commented Nov 6, 2015

Test build #45255 has finished for PR 9403 at commit 66fade4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 9, 2015

Test build #45343 has finished for PR 9403 at commit 2ef1717.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@brkyvz
Copy link
Contributor Author

brkyvz commented Nov 9, 2015

Huh. Didn't change much but the tests passed this time. I wonder if it was a Java 7 vs. 8 mismatch... Just to be sure, will re-run tests

@brkyvz
Copy link
Contributor Author

brkyvz commented Nov 9, 2015

test this please

@SparkQA
Copy link

SparkQA commented Nov 9, 2015

Test build #45377 has finished for PR 9403 at commit 2ef1717.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@brkyvz
Copy link
Contributor Author

brkyvz commented Nov 9, 2015

Aww man. The Kinesis tests passed, hive tests failed. Re-running

@brkyvz
Copy link
Contributor Author

brkyvz commented Nov 9, 2015

test this please

@SparkQA
Copy link

SparkQA commented Nov 9, 2015

Test build #45391 has finished for PR 9403 at commit 2ef1717.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@brkyvz
Copy link
Contributor Author

brkyvz commented Nov 9, 2015

test this please

1 similar comment
@brkyvz
Copy link
Contributor Author

brkyvz commented Nov 9, 2015

test this please

@SparkQA
Copy link

SparkQA commented Nov 10, 2015

Test build #45422 has finished for PR 9403 at commit 932485a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tdas
Copy link
Contributor

tdas commented Nov 10, 2015

LGTM. Merging this to master and 1.6. Thanks @brkyvz

@asfgit asfgit closed this in 26062d2 Nov 10, 2015
asfgit pushed a commit that referenced this pull request Nov 10, 2015
…uring recovery

While the KCL handles de-aggregation during the regular operation, during recovery we use the lower level api, and therefore need to de-aggregate the records.

tdas Testing is an issue, we need protobuf magic to do the aggregated records. Maybe we could depend on KPL for tests?

Author: Burak Yavuz <brkyvz@gmail.com>

Closes #9403 from brkyvz/kinesis-deaggregation.

(cherry picked from commit 26062d2)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
@SparkQA
Copy link

SparkQA commented Nov 10, 2015

Test build #45440 has finished for PR 9403 at commit 932485a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@lordnynex
Copy link

Despite being closed, I'm confused about why this causes hive unit tests to fail. Is this unrelated?Also, the publisher in the unit test looks like it's publishing an intentionally non-aggregated stream. Is that expected? Am I reading this incorrectly?

@brkyvz
Copy link
Contributor Author

brkyvz commented Nov 18, 2015

Hi @lordnynex. Why did you think this causes Hive tests to fail? That was most probably a flaky test.
Regarding publishing a non-aggregated stream, yes that is correct. Not everyone uses KPL, therefore we need to make sure we support both non-aggregated and aggregated streams :)

@zsxwing
Copy link
Member

zsxwing commented Nov 30, 2015

KinesisStreamTests in test.py is broken because of this PR. See https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46896/testReport/(root)/KinesisStreamTests/test_kinesis_stream/

Is the new dependency the failure cause?

The PR builds for this PR actually didn't report the Python failure because of #9669.

asfgit pushed a commit that referenced this pull request Dec 1, 2015
KinesisStreamTests in test.py is broken because of #9403. See https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46896/testReport/(root)/KinesisStreamTests/test_kinesis_stream/

Because Streaming Python didn’t work when merging #9403, the PR build didn’t report the Python test failure actually.

This PR just disabled the test to unblock #10039

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10047 from zsxwing/disable-python-kinesis-test.
@brkyvz brkyvz deleted the kinesis-deaggregation branch February 3, 2019 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants