[SPARK-27064][SS] create StreamingWrite at the beginning of streaming execution #23981

cloud-fan · 2019-03-05T16:43:52Z

What changes were proposed in this pull request?

According to the design, the life cycle of StreamingWrite should be the same as the read side MicroBatch/ContinuousStream, i.e. each run of the stream query, instead of each epoch.

This PR fixes it.

How was this patch tested?

existing tests

cloud-fan · 2019-03-05T16:46:51Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala

@@ -585,7 +585,7 @@ abstract class StreamExecution(
      options: Map[String, String],
      inputPlan: LogicalPlan): StreamingWrite = {
    val writeBuilder = table.newWriteBuilder(new DataSourceOptions(options.asJava))
-      .withQueryId(runId.toString)
+      .withQueryId(id.toString)


this is an unrelated change, but it was obviously a mistake: the sink doesn't care about the runId which gets changed after query restart. The sink needs the id which is reliable across the life cycle of the stream query.

No builtin streaming sinks use this id, so this is for future-proof.

cloud-fan · 2019-03-05T16:47:31Z

cc @jose-torres @gatorsmile

SparkQA · 2019-03-05T20:26:06Z

Test build #103059 has finished for PR 23981 at commit 3261ed5.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class WriteToMicroBatchDataSource(write: StreamingWrite, query: LogicalPlan)

SparkQA · 2019-03-06T08:05:01Z

Test build #103078 has finished for PR 23981 at commit f835a5c.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-03-06T12:37:32Z

retest this please

SparkQA · 2019-03-06T16:33:01Z

Test build #103092 has finished for PR 23981 at commit f835a5c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jose-torres · 2019-03-12T15:30:25Z

LGTM.

cloud-fan · 2019-03-13T04:58:49Z

retest this please

SparkQA · 2019-03-13T07:05:01Z

Test build #103412 has finished for PR 23981 at commit f835a5c.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-03-13T07:18:28Z

retest this please

SparkQA · 2019-03-13T11:20:03Z

Test build #103426 has finished for PR 23981 at commit f835a5c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-03-13T11:48:58Z

thanks, merging to master!

… execution ## What changes were proposed in this pull request? According to the [design](https://docs.google.com/document/d/1vI26UEuDpVuOjWw4WPoH2T6y8WAekwtI7qoowhOFnI4/edit?usp=sharing), the life cycle of `StreamingWrite` should be the same as the read side `MicroBatch/ContinuousStream`, i.e. each run of the stream query, instead of each epoch. This PR fixes it. ## How was this patch tested? existing tests Closes apache#23981 from cloud-fan/dsv2. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

create StreamingWrite at the begining of streaming execution

3261ed5

cloud-fan commented Mar 5, 2019

View reviewed changes

fix kafka test

f835a5c

cloud-fan closed this in d3813d8 Mar 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-27064][SS] create StreamingWrite at the beginning of streaming execution #23981

[SPARK-27064][SS] create StreamingWrite at the beginning of streaming execution #23981

cloud-fan commented Mar 5, 2019

cloud-fan Mar 5, 2019

cloud-fan commented Mar 5, 2019 •

edited

SparkQA commented Mar 5, 2019

SparkQA commented Mar 6, 2019

cloud-fan commented Mar 6, 2019

SparkQA commented Mar 6, 2019

jose-torres commented Mar 12, 2019

cloud-fan commented Mar 13, 2019

SparkQA commented Mar 13, 2019

cloud-fan commented Mar 13, 2019

SparkQA commented Mar 13, 2019

cloud-fan commented Mar 13, 2019

[SPARK-27064][SS] create StreamingWrite at the beginning of streaming execution #23981

[SPARK-27064][SS] create StreamingWrite at the beginning of streaming execution #23981

Conversation

cloud-fan commented Mar 5, 2019

What changes were proposed in this pull request?

How was this patch tested?

cloud-fan Mar 5, 2019

Choose a reason for hiding this comment

cloud-fan commented Mar 5, 2019 • edited

SparkQA commented Mar 5, 2019

SparkQA commented Mar 6, 2019

cloud-fan commented Mar 6, 2019

SparkQA commented Mar 6, 2019

jose-torres commented Mar 12, 2019

cloud-fan commented Mar 13, 2019

SparkQA commented Mar 13, 2019

cloud-fan commented Mar 13, 2019

SparkQA commented Mar 13, 2019

cloud-fan commented Mar 13, 2019

cloud-fan commented Mar 5, 2019 •

edited