[SPARK-13146][SQL] Management API for continuous queries #11030

tdas · 2016-02-02T20:55:00Z

Management API for Continuous Queries

API for getting status of each query

Whether active or not
Unique name of each query
Status of the sources and sinks
Exceptions

API for managing each query

Immediately stop an active query
Waiting for a query to be terminated, correctly or with error

API for managing multiple queries

Listing all active queries
Getting an active query by name
Waiting for any one of the active queries to be terminated

API for listening to query life cycle events

ContinuousQueryListener API for query start, progress and termination events.

tdas · 2016-02-02T21:00:48Z

sql/core/src/test/scala/org/apache/spark/sql/StreamTest.scala

-         """
+         """.stripMargin
+
+    def assert(condition: => Boolean, message: String): Unit = {


These are a bunch of changes to make sure that any kind of assert and eventually that is needed within testStream is wrapped in a try catch so that we can catch them and enrich the message with the testState.

I like this method, but I wouldn't shadow an existing method.

Renamed to verify.

SparkQA · 2016-02-02T21:03:16Z

Test build #50589 has finished for PR 11030 at commit 40c6444.

This patch fails RAT tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class ContinuousQueryManager

tdas · 2016-02-02T21:55:16Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/ContinuousQueryManagerSuite.scala

+    }
+  }
+
+  test("awaitAnyTermination") {


These tests are likely to be flaky because of the timing related issues. Other then making the timings more coarse, i am not sure how else to test awaitTerminations's behavior.

tdas · 2016-02-02T22:06:00Z

@marmbrus @zsxwing Please review!

SparkQA · 2016-02-02T22:09:17Z

Test build #50597 has finished for PR 11030 at commit b6c2517.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

marmbrus · 2016-02-02T22:38:17Z

sql/core/src/main/scala/org/apache/spark/sql/DataStreamWriter.scala

@@ -84,6 +84,17 @@ final class DataStreamWriter private[sql](df: DataFrame) {
  }

  /**
+   * Specifies a name to the [[ContinuousQuery]] to be started. This name must be unique among


"Specifies a name for this [[ContinuousQuery]]. The name must be unique for any active query, and will be auto assigned if unspecified."

DataStreamWriter is not a ContinuousQuery. I think looking at the scala doc page of DataStreamWriter, the phrase "this ContinuousQuery" is confusing.

It doesn't have to be exactly that text, but Specifies a name to the ContinuousQuery to be started doesn't sound right. In other parts of the documentation we say "for the underlying datastream"

How about just "Specifies a name for this query."... and then stuff about auto assigning an uniqueness.

still confusing. this query ... this is a not a query. That's why it make more sense if name is associated directly with the start() method.

val query = ds.streamTo.path("some-path").format("text").start("query-name")

OR

val query = ds.streamTo("path").format("text").start("query-name")

Then its easy to write the docs of def start(name: String) "@param name Name of the query".

Then its easy to write the docs of def start(name: String) "@param name Name of the query".

There is already a start(path: String)

SparkQA · 2016-02-09T03:21:25Z

Test build #50954 has finished for PR 11030 at commit 144adbb.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-09T05:19:49Z

Test build #50955 has finished for PR 11030 at commit 5c3c690.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing · 2016-02-09T20:32:24Z

sql/core/src/main/scala/org/apache/spark/sql/ContinuousQuery.scala

+   * If the query has terminated with an exception, then the exception will be thrown.
+   *
+   * If the query has terminated, then all subsequent calls to this method will either return
+   * `true` immediately (if the query was terminated by `stop()`), or throw the exception


nit: this method doesn't return a Boolean.

zsxwing · 2016-02-09T20:56:41Z

Just some nits. Otherwise LGTM

…ination

tdas · 2016-02-10T02:41:27Z

I addressed the multifailure case and added unit test for it.

SparkQA · 2016-02-10T02:41:44Z

Test build #51020 has finished for PR 11030 at commit d0003cf.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-10T03:22:40Z

Test build #51021 has finished for PR 11030 at commit b0d5533.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-10T04:41:01Z

Test build #51022 has finished for PR 11030 at commit d7b1d97.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2016-02-10T07:54:08Z

test this again.

tdas · 2016-02-10T07:59:07Z

test this please.

tdas · 2016-02-10T07:59:12Z

retest this

SparkQA · 2016-02-10T09:02:57Z

Test build #2529 has finished for PR 11030 at commit d7b1d97.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-10T09:15:56Z

Test build #51030 has finished for PR 11030 at commit d7b1d97.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing · 2016-02-10T18:15:22Z

Found the failure cause:

Hadoop FileSystem API will wrap InterruptedException as IOException...

java.io.IOException: java.lang.InterruptedException
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:508)
    at org.apache.hadoop.util.Shell.run(Shell.java:418)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
    at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633)
    at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:467)
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
    at org.apache.spark.sql.execution.streaming.FileStreamSource.writeBatch(FileStreamSource.scala:186)
    at org.apache.spark.sql.execution.streaming.FileStreamSource.fetchMaxOffset(FileStreamSource.scala:115)
    at org.apache.spark.sql.execution.streaming.FileStreamSource.getNextBatch(FileStreamSource.scala:139)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$3.applyOrElse(StreamExecution.scala:182)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$3.applyOrElse(StreamExecution.scala:179)
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:262)
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:262)
    at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:261)
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:304)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
    at scala.collection.Iterator$class.foreach(Iterator.scala:742)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
    at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
    at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:308)
    at scala.collection.AbstractIterator.to(Iterator.scala:1194)
    at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:300)
    at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1194)
    at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:287)
    at scala.collection.AbstractIterator.toArray(Iterator.scala:1194)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:353)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:267)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:251)
    at org.apache.spark.sql.execution.streaming.StreamExecution.attemptBatch(StreamExecution.scala:179)
    at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:123)
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:74)

So if we call stop()(interrupt) while a batch is writing some file, we will get IOException...

One workaround is adding addCheck() above this line: https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/StreamTest.scala#L339
Then we can make sure no batch is writing any file when we call stop.

Another one is we should allow exception happens in this test since we randomly start and stop the stream multiple times.

…-management-api

SparkQA · 2016-02-10T21:40:39Z

Test build #51055 has finished for PR 11030 at commit 9caec83.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-11T00:15:21Z

Test build #51062 has finished for PR 11030 at commit 458199b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing · 2016-02-11T00:20:25Z

LGTM

Management API with tests and everything

40c6444

tdas reviewed Feb 2, 2016
View reviewed changes

tdas added 2 commits February 2, 2016 13:49

Added docs

23ae9a8

Added license

4bb32d2

tdas reviewed Feb 2, 2016
View reviewed changes

Added since 2.0.0

b6c2517

marmbrus reviewed Feb 2, 2016
View reviewed changes

Fixed bug

144adbb

fixed style

5c3c690

zsxwing reviewed Feb 9, 2016
View reviewed changes

Addressed comments, and added multiple failure tests for awaitAnyTerm…

d0003cf

…ination

fix imports

b0d5533

fix bug and style

d7b1d97

tdas added 2 commits February 10, 2016 12:55

Merge remote-tracking branch 'apache-github/master' into streaming-df…

1902593

…-management-api

Fix test

9caec83

Removed SynchBuffer

458199b

asfgit closed this in 0902e20 Feb 11, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-13146][SQL] Management API for continuous queries #11030

[SPARK-13146][SQL] Management API for continuous queries #11030

tdas commented Feb 2, 2016

tdas Feb 2, 2016

marmbrus Feb 3, 2016

tdas Feb 4, 2016

SparkQA commented Feb 2, 2016

tdas Feb 2, 2016

tdas commented Feb 2, 2016

SparkQA commented Feb 2, 2016

marmbrus Feb 2, 2016

tdas Feb 2, 2016

marmbrus Feb 2, 2016

tdas Feb 3, 2016

zsxwing Feb 3, 2016

SparkQA commented Feb 9, 2016

SparkQA commented Feb 9, 2016

zsxwing Feb 9, 2016

zsxwing commented Feb 9, 2016

tdas commented Feb 10, 2016

SparkQA commented Feb 10, 2016

SparkQA commented Feb 10, 2016

SparkQA commented Feb 10, 2016

tdas commented Feb 10, 2016

tdas commented Feb 10, 2016

tdas commented Feb 10, 2016

SparkQA commented Feb 10, 2016

SparkQA commented Feb 10, 2016

zsxwing commented Feb 10, 2016

SparkQA commented Feb 10, 2016

SparkQA commented Feb 11, 2016

zsxwing commented Feb 11, 2016

[SPARK-13146][SQL] Management API for continuous queries #11030

[SPARK-13146][SQL] Management API for continuous queries #11030

Conversation

tdas commented Feb 2, 2016

Management API for Continuous Queries

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Feb 2, 2016

Choose a reason for hiding this comment

tdas commented Feb 2, 2016

SparkQA commented Feb 2, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Feb 9, 2016

SparkQA commented Feb 9, 2016

Choose a reason for hiding this comment

zsxwing commented Feb 9, 2016

tdas commented Feb 10, 2016

SparkQA commented Feb 10, 2016

SparkQA commented Feb 10, 2016

SparkQA commented Feb 10, 2016

tdas commented Feb 10, 2016

tdas commented Feb 10, 2016

tdas commented Feb 10, 2016

SparkQA commented Feb 10, 2016

SparkQA commented Feb 10, 2016

zsxwing commented Feb 10, 2016

SparkQA commented Feb 10, 2016

SparkQA commented Feb 11, 2016

zsxwing commented Feb 11, 2016