Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13843][Streaming]Move streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, streaming-twitter to Spark packages #11672

Closed
wants to merge 5 commits into from

Conversation

zsxwing
Copy link
Member

@zsxwing zsxwing commented Mar 12, 2016

What changes were proposed in this pull request?

Currently there are a few sub-projects, each for integrating with different external sources for Streaming. Now that we have better ability to include external libraries (spark packages) and with Spark 2.0 coming up, we can move the following projects out of Spark to https://github.com/spark-packages

  • streaming-flume
  • streaming-akka
  • streaming-mqtt
  • streaming-zeromq
  • streaming-twitter

They are just some ancillary packages and considering the overhead of maintenance, running tests and PR failures, it's better to maintain them out of Spark. In addition, these projects can have their different release cycles and we can release them faster.

I have already copied these projects to https://github.com/spark-packages

How was this patch tested?

Jenkins tests

@SparkQA
Copy link

SparkQA commented Mar 12, 2016

Test build #52978 has finished for PR 11672 at commit a02d1e5.

  • This patch fails executing the dev/run-tests script.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 12, 2016

Test build #52989 has finished for PR 11672 at commit d6bd942.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 12, 2016

Test build #52990 has finished for PR 11672 at commit 29eeba1.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Mar 12, 2016

FWIW I think that's a grand idea. These are fairly ancillary packages and probably the overhead of maintaining them, running tests, patching failures doesn't justify keeping them in the core project. There might even be more packages for which this is true.

@SparkQA
Copy link

SparkQA commented Mar 12, 2016

Test build #53010 has finished for PR 11672 at commit 563614f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ksakellis
Copy link

I would also like to see the Kafka modules removed in a similar way. We have had trouble balancing Spark's compatibility requirements and Kafka's every breaking client APIs. By having this clean separation, it makes it easier for user applications to just declare which Kafka client version they want to use.

@zsxwing zsxwing changed the title [WIP]Remove streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, streaming-twitter to Spark packages [WIP][test-maven]Remove streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, streaming-twitter to Spark packages Mar 13, 2016
@zsxwing
Copy link
Member Author

zsxwing commented Mar 13, 2016

retest this please

@SparkQA
Copy link

SparkQA commented Mar 13, 2016

Test build #53017 has finished for PR 11672 at commit 563614f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing zsxwing changed the title [WIP][test-maven]Remove streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, streaming-twitter to Spark packages [SPARK-13843][Streaming]Remove streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, streaming-twitter to Spark packages Mar 13, 2016
@zsxwing
Copy link
Member Author

zsxwing commented Mar 13, 2016

retest this please

@zsxwing
Copy link
Member Author

zsxwing commented Mar 13, 2016

cc @rxin @tdas @JoshRosen

@zsxwing
Copy link
Member Author

zsxwing commented Mar 13, 2016

@ksakellis Most of Streaming users use Kafka. It may affect a lot of people. I think it's better to discuss how to support Kafka 0.9 (SPARK-13252) before deciding whether moving Kafka out.

@SparkQA
Copy link

SparkQA commented Mar 13, 2016

Test build #53023 has finished for PR 11672 at commit 563614f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented Mar 14, 2016

retest this please

@SparkQA
Copy link

SparkQA commented Mar 14, 2016

Test build #53043 has finished for PR 11672 at commit 563614f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@harishreedharan
Copy link
Contributor

I agree with @ksakellis on this one. It would be great if we can pull Kafka out as well. I understand that there are a lot of users who might find it difficult, but if you think about it, most people use the plugins via mvn anyway (since we don't actually package them in our assembly). I am not sure what the policy is if we pull it into a different repo and if we can keep the same groupId and artifactId, but that could be an alternative and most likely will not break too many users.

@SparkQA
Copy link

SparkQA commented Mar 14, 2016

Test build #53047 has finished for PR 11672 at commit 97fcb46.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 14, 2016

Test build #53078 has finished for PR 11672 at commit 01dadc5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

Merge conflict is probably my fault (I modified a file that this PR deleted).

@SparkQA
Copy link

SparkQA commented Mar 14, 2016

Test build #53095 has finished for PR 11672 at commit 806dba8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class ShuffleServiceHeartbeat extends BlockTransferMessage

@zsxwing
Copy link
Member Author

zsxwing commented Mar 14, 2016

@JoshRosen could you take a look at this PR?

@harishreedharan
Copy link
Contributor

Hi @zsxwing, I think the discussion on supporting Kafka 0.9 should happen if we decide to keep Kafka in Spark itself.
At this point, I think the piece that benefits the most out of moving out of Spark is the kafka integration - since that is the one where more of the API compatibility issues are.

I really think we should discuss moving Kafka out and come to an agreement on that as well.

@rxin
Copy link
Contributor

rxin commented Mar 14, 2016

Can we have a JIRA ticket to discuss that? I am not sure this github pr is the place to discuss moving kafka out.

@harishreedharan
Copy link
Contributor

Thanks @rxin. Opened SPARK-13877 to discuss this

@rxin
Copy link
Contributor

rxin commented Mar 14, 2016

OK apparently the recent mima failures on the master branch is caused by the mqtt dependency not being in maven. We can either fix that with some build hack, or just merge this pr.

Since these are the least contentious choices, I'm going to merge this pull request. We should discuss kafka/kinesis separately.

@rxin
Copy link
Contributor

rxin commented Mar 14, 2016

Merging in master.

@zsxwing zsxwing changed the title [SPARK-13843][Streaming]Remove streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, streaming-twitter to Spark packages [SPARK-13843][Streaming]Move streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, streaming-twitter to Spark packages Mar 14, 2016
@JoshRosen
Copy link
Contributor

FYI this has been merged but GitHub mirror is lagging; see https://git-wip-us.apache.org/repos/asf?p=spark.git. Until the GitHub merge goes through, I think that PR builds are going to continue to fail since the merge commits that we test are automatically generated by GitHub.

@asfgit asfgit closed this in 06dec37 Mar 15, 2016
@zsxwing zsxwing deleted the remove-external-pkg branch March 15, 2016 04:09
@steveloughran
Copy link
Contributor

This didn't cut the relevant doc/streaming-*.md files BTW

@zsxwing
Copy link
Member Author

zsxwing commented Mar 18, 2016

This didn't cut the relevant doc/streaming-*.md files BTW

Thanks for reminding. I will submit a PR to remove them after I copy all of them to the new projects.

roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
…aming-zeromq, streaming-akka, streaming-twitter to Spark packages

## What changes were proposed in this pull request?

Currently there are a few sub-projects, each for integrating with different external sources for Streaming.  Now that we have better ability to include external libraries (spark packages) and with Spark 2.0 coming up, we can move the following projects out of Spark to https://github.com/spark-packages

- streaming-flume
- streaming-akka
- streaming-mqtt
- streaming-zeromq
- streaming-twitter

They are just some ancillary packages and considering the overhead of maintenance, running tests and PR failures, it's better to maintain them out of Spark. In addition, these projects can have their different release cycles and we can release them faster.

I have already copied these projects to https://github.com/spark-packages

## How was this patch tested?

Jenkins tests

Author: Shixiong Zhu <shixiong@databricks.com>

Closes apache#11672 from zsxwing/remove-external-pkg.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants