[SPARK-2973][SQL] Lightweight SQL commands without distributed jobs when calling .collect() #2215

liancheng · 2014-08-30T09:51:37Z

By overriding executeCollect() in physical plan classes of all commands, we can avoid to kick off a distributed job when collecting result of a SQL command, e.g. sql("SET").collect().

Previously, Command.sideEffectResult returns a Seq[Any], and the execute() method in sub-classes of Command typically convert that to a Seq[Row] then parallelize it to an RDD. Now with this PR, sideEffectResult is required to return a Seq[Row] directly, so that executeCollect() can directly leverage that and be factored to the Command parent class.

…prefixes

SparkQA · 2014-08-30T09:54:04Z

QA tests have started for PR 2215 at commit 995bdd8.

This patch merges cleanly.

SparkQA · 2014-08-30T11:16:57Z

QA tests have finished for PR 2215 at commit 995bdd8.

This patch passes unit tests.
This patch merges cleanly.
This patch adds no public classes.

marmbrus · 2014-08-31T00:58:38Z

sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala

-  }
+  def execute(): RDD[Row] = context.sparkContext.parallelize(executeCollect(), 1)
+
+  override def executeCollect(): Array[Row] = sideEffectResult.map(Row(_)).toArray


Is there a reason we can't just define these in the super class command?

Good idea. Refactored a bit, now Command.sideEffectResult return Seq[Row] and Command.executeCollect() simply returns sideEffectResult.toArray.

SparkQA · 2014-09-01T06:44:10Z

QA tests have started for PR 2215 at commit e0e12e9.

This patch merges cleanly.

SparkQA · 2014-09-01T08:20:51Z

QA tests have finished for PR 2215 at commit e0e12e9.

This patch fails unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2014-09-02T06:29:07Z

QA tests have started for PR 2215 at commit 5a0e16c.

This patch merges cleanly.

SparkQA · 2014-09-02T08:16:46Z

QA tests have finished for PR 2215 at commit 5a0e16c.

This patch passes unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class ByteArrayChunkOutputStream(chunkSize: Int) extends OutputStream

marmbrus · 2014-09-03T03:35:11Z

sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala

-    val rows = sideEffectResult.map { line => new GenericRow(Array[Any](line)) }
-    context.sparkContext.parallelize(rows, 1)
-  }
+  def execute(): RDD[Row] = context.sparkContext.parallelize(sideEffectResult, 1)


Can this also be in Command? The implementation looks the same everywhere.

liancheng · 2014-09-03T07:25:20Z

ok to test

SparkQA · 2014-09-03T07:29:43Z

QA tests have started for PR 2215 at commit 3fbef60.

This patch merges cleanly.

SparkQA · 2014-09-03T09:11:28Z

QA tests have finished for PR 2215 at commit 3fbef60.

This patch fails unit tests.
This patch merges cleanly.
This patch adds no public classes.

liancheng · 2014-09-03T19:26:25Z

Build failure caused by unrelated GraphX test suite.

retest this please.

liancheng · 2014-09-03T21:06:29Z

test this please

SparkQA · 2014-09-03T21:09:19Z

QA tests have started for PR 2215 at commit 3fbef60.

This patch merges cleanly.

SparkQA · 2014-09-03T22:48:17Z

QA tests have finished for PR 2215 at commit 3fbef60.

This patch fails unit tests.
This patch merges cleanly.
This patch adds no public classes.

liancheng · 2014-09-03T23:12:47Z

Build failure caused by streaming test suite.

retest this please.

liancheng · 2014-09-03T23:29:09Z

ok to test

SparkQA · 2014-09-03T23:34:24Z

QA tests have started for PR 2215 at commit 3fbef60.

This patch merges cleanly.

SparkQA · 2014-09-04T01:22:54Z

QA tests have finished for PR 2215 at commit 3fbef60.

This patch passes unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class SparkListenerBlockManagerAdded(time: Long, blockManagerId: BlockManagerId, maxMem: Long)
- case class SparkListenerBlockManagerRemoved(time: Long, blockManagerId: BlockManagerId)
- case class SparkListenerApplicationStart(appName: String, appId: Option[String], time: Long,

…hen calling .collect() By overriding `executeCollect()` in physical plan classes of all commands, we can avoid to kick off a distributed job when collecting result of a SQL command, e.g. `sql("SET").collect()`. Previously, `Command.sideEffectResult` returns a `Seq[Any]`, and the `execute()` method in sub-classes of `Command` typically convert that to a `Seq[Row]` then parallelize it to an RDD. Now with this PR, `sideEffectResult` is required to return a `Seq[Row]` directly, so that `executeCollect()` can directly leverage that and be factored to the `Command` parent class. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes apache#2215 from liancheng/lightweight-commands and squashes the following commits: 3fbef60 [Cheng Lian] Factored execute() method of physical commands to parent class Command 5a0e16c [Cheng Lian] Passes test suites e0e12e9 [Cheng Lian] Refactored Command.sideEffectResult and Command.executeCollect 995bdd8 [Cheng Lian] Cleaned up DescribeHiveTableCommand 542977c [Cheng Lian] Avoids confusion between logical and physical plan by adding package prefixes 55b2aa5 [Cheng Lian] Avoids distributed jobs when execution SQL commands

Adds logical and physical command classes for the "add jar" command. Note that this PR conflicts with and should be merged after #2215. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #2242 from liancheng/add-jar and squashes the following commits: e43a2f1 [Cheng Lian] Updates AddJar according to conventions introduced in #2215 b99107f [Cheng Lian] Added test case for ADD JAR command 095b2c7 [Cheng Lian] Also forward ADD JAR command to Hive 9be031b [Cheng Lian] Trims Jar path string 8195056 [Cheng Lian] Added support for the "add jar" command

Adds logical and physical command classes for the "add jar" command. Note that this PR conflicts with and should be merged after apache#2215. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes apache#2242 from liancheng/add-jar and squashes the following commits: e43a2f1 [Cheng Lian] Updates AddJar according to conventions introduced in apache#2215 b99107f [Cheng Lian] Added test case for ADD JAR command 095b2c7 [Cheng Lian] Also forward ADD JAR command to Hive 9be031b [Cheng Lian] Trims Jar path string 8195056 [Cheng Lian] Added support for the "add jar" command Conflicts: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala

liancheng added 3 commits August 30, 2014 02:26

Avoids distributed jobs when execution SQL commands

55b2aa5

Avoids confusion between logical and physical plan by adding package …

542977c

…prefixes

Cleaned up DescribeHiveTableCommand

995bdd8

marmbrus reviewed Aug 31, 2014
View reviewed changes

Refactored Command.sideEffectResult and Command.executeCollect

e0e12e9

Passes test suites

5a0e16c

liancheng mentioned this pull request Sep 3, 2014

[SPARK-2219][SQL] Added support for the "add jar" command #2242

Closed

marmbrus reviewed Sep 3, 2014
View reviewed changes

Factored execute() method of physical commands to parent class Command

3fbef60

asfgit closed this in f48420f Sep 4, 2014

liancheng added a commit to liancheng/spark that referenced this pull request Sep 4, 2014

Updates AddJar according to conventions introduced in apache#2215

e43a2f1

liancheng deleted the lightweight-commands branch September 24, 2014 00:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-2973][SQL] Lightweight SQL commands without distributed jobs when calling .collect() #2215

[SPARK-2973][SQL] Lightweight SQL commands without distributed jobs when calling .collect() #2215

liancheng commented Aug 30, 2014

SparkQA commented Aug 30, 2014

SparkQA commented Aug 30, 2014

marmbrus Aug 31, 2014

liancheng Sep 1, 2014

SparkQA commented Sep 1, 2014

SparkQA commented Sep 1, 2014

SparkQA commented Sep 2, 2014

SparkQA commented Sep 2, 2014

marmbrus Sep 3, 2014

liancheng commented Sep 3, 2014

SparkQA commented Sep 3, 2014

SparkQA commented Sep 3, 2014

liancheng commented Sep 3, 2014

liancheng commented Sep 3, 2014

SparkQA commented Sep 3, 2014

SparkQA commented Sep 3, 2014

liancheng commented Sep 3, 2014

liancheng commented Sep 3, 2014

SparkQA commented Sep 3, 2014

SparkQA commented Sep 4, 2014

[SPARK-2973][SQL] Lightweight SQL commands without distributed jobs when calling .collect() #2215

[SPARK-2973][SQL] Lightweight SQL commands without distributed jobs when calling .collect() #2215

Conversation

liancheng commented Aug 30, 2014

SparkQA commented Aug 30, 2014

SparkQA commented Aug 30, 2014

marmbrus Aug 31, 2014

Choose a reason for hiding this comment

liancheng Sep 1, 2014

Choose a reason for hiding this comment

SparkQA commented Sep 1, 2014

SparkQA commented Sep 1, 2014

SparkQA commented Sep 2, 2014

SparkQA commented Sep 2, 2014

marmbrus Sep 3, 2014

Choose a reason for hiding this comment

liancheng commented Sep 3, 2014

SparkQA commented Sep 3, 2014

SparkQA commented Sep 3, 2014

liancheng commented Sep 3, 2014

liancheng commented Sep 3, 2014

SparkQA commented Sep 3, 2014

SparkQA commented Sep 3, 2014

liancheng commented Sep 3, 2014

liancheng commented Sep 3, 2014

SparkQA commented Sep 3, 2014

SparkQA commented Sep 4, 2014