Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-27064][SS] create StreamingWrite at the beginning of streaming execution #23981

Closed
wants to merge 2 commits into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

According to the design, the life cycle of StreamingWrite should be the same as the read side MicroBatch/ContinuousStream, i.e. each run of the stream query, instead of each epoch.

This PR fixes it.

How was this patch tested?

existing tests

@@ -585,7 +585,7 @@ abstract class StreamExecution(
options: Map[String, String],
inputPlan: LogicalPlan): StreamingWrite = {
val writeBuilder = table.newWriteBuilder(new DataSourceOptions(options.asJava))
.withQueryId(runId.toString)
.withQueryId(id.toString)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an unrelated change, but it was obviously a mistake: the sink doesn't care about the runId which gets changed after query restart. The sink needs the id which is reliable across the life cycle of the stream query.

No builtin streaming sinks use this id, so this is for future-proof.

@cloud-fan
Copy link
Contributor Author

cloud-fan commented Mar 5, 2019

cc @jose-torres @gatorsmile

@SparkQA
Copy link

SparkQA commented Mar 5, 2019

Test build #103059 has finished for PR 23981 at commit 3261ed5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class WriteToMicroBatchDataSource(write: StreamingWrite, query: LogicalPlan)

@SparkQA
Copy link

SparkQA commented Mar 6, 2019

Test build #103078 has finished for PR 23981 at commit f835a5c.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 6, 2019

Test build #103092 has finished for PR 23981 at commit f835a5c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jose-torres
Copy link
Contributor

LGTM.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 13, 2019

Test build #103412 has finished for PR 23981 at commit f835a5c.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 13, 2019

Test build #103426 has finished for PR 23981 at commit f835a5c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

thanks, merging to master!

@cloud-fan cloud-fan closed this in d3813d8 Mar 13, 2019
mccheah pushed a commit to palantir/spark that referenced this pull request May 15, 2019
… execution

## What changes were proposed in this pull request?

According to the [design](https://docs.google.com/document/d/1vI26UEuDpVuOjWw4WPoH2T6y8WAekwtI7qoowhOFnI4/edit?usp=sharing), the life cycle of `StreamingWrite` should be the same as the read side `MicroBatch/ContinuousStream`, i.e. each run of the stream query, instead of each epoch.

This PR fixes it.

## How was this patch tested?

existing tests

Closes apache#23981 from cloud-fan/dsv2.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants