Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17935][SQL]Add KafkaForeachWriter in external kafka-0.8.0 for structured streaming module #15483

Conversation

zhangxinyu1
Copy link

What changes were proposed in this pull request?

I propose a KafkaForeachWriter.scala in external kafka-0.8.0.
It's useful for someone who requires to write results to kafka in structured streaming module. It can be used with ForeachSink, and the usage is as below:

val query = input.writeStream
.outputMode("append")
.foreach(new KafkaForeachWriter[Row](producerProps, topics, input.schema)
.start()

How was this patch tested?

Unit test in KafkaForeachWriterSuite.scala

@zhangxinyu1 zhangxinyu1 changed the title Add KafkaForeachWriter in external kafka-0.8.0 for structured streaming module [SPARK-17935][SQL]Add KafkaForeachWriter in external kafka-0.8.0 for structured streaming module Oct 14, 2016
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@zhangxinyu1
Copy link
Author

@marmbrus Could you take a look? Please tell me if you hava any suggestions?
Thanks!

@marmbrus
Copy link
Contributor

Thanks for working on this! However, I'm not sure that this is something that we should merge into the core repository (Though I think its an awesome example of how to use the ForeachWriter interface! You should consider making a blog post or even a spark package). Specifically, when we add support for writing to kafka, I think we'll want to do it both through the DataSource API, as well as the Sink API. This would let users seamlessly write batch and streaming jobs to Kafka in java, scala, python, R and SQL using the same code.

@zhangxinyu1
Copy link
Author

zhangxinyu1 commented Oct 21, 2016

@marmbrus
Yes, I also think KafkaSink is better than KafkaForeachWriter for users. I 'm trying to complete KafkaSink these days. Like KafkaSource in external kafka-0-10-sql, there is a procuer cache in executor to store kafkaProducers, which avoids repeatly creating and closing producers.
Do you think KafkaSink is necessary?

@marmbrus
Copy link
Contributor

It would be good to post on the design / interfaces before you get too far.

@zhangxinyu1
Copy link
Author

@marmbrus
I write a short deasign doc of KafkaSink in jira https://issues.apache.org/jira/browse/SPARK-17935
Could you please take a look?

@marmbrus
Copy link
Contributor

I will try to take a look soon. Can we close this PR until we have an updated implementation?

@zhangxinyu1
Copy link
Author

Sure. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants