Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2973][SQL] Lightweight SQL commands without distributed jobs when calling .collect() #2215

Closed
wants to merge 6 commits into from

Conversation

liancheng
Copy link
Contributor

By overriding executeCollect() in physical plan classes of all commands, we can avoid to kick off a distributed job when collecting result of a SQL command, e.g. sql("SET").collect().

Previously, Command.sideEffectResult returns a Seq[Any], and the execute() method in sub-classes of Command typically convert that to a Seq[Row] then parallelize it to an RDD. Now with this PR, sideEffectResult is required to return a Seq[Row] directly, so that executeCollect() can directly leverage that and be factored to the Command parent class.

@SparkQA
Copy link

SparkQA commented Aug 30, 2014

QA tests have started for PR 2215 at commit 995bdd8.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 30, 2014

QA tests have finished for PR 2215 at commit 995bdd8.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}
def execute(): RDD[Row] = context.sparkContext.parallelize(executeCollect(), 1)

override def executeCollect(): Array[Row] = sideEffectResult.map(Row(_)).toArray
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we can't just define these in the super class command?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Refactored a bit, now Command.sideEffectResult return Seq[Row] and Command.executeCollect() simply returns sideEffectResult.toArray.

@SparkQA
Copy link

SparkQA commented Sep 1, 2014

QA tests have started for PR 2215 at commit e0e12e9.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 1, 2014

QA tests have finished for PR 2215 at commit e0e12e9.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 2, 2014

QA tests have started for PR 2215 at commit 5a0e16c.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 2, 2014

QA tests have finished for PR 2215 at commit 5a0e16c.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class ByteArrayChunkOutputStream(chunkSize: Int) extends OutputStream

val rows = sideEffectResult.map { line => new GenericRow(Array[Any](line)) }
context.sparkContext.parallelize(rows, 1)
}
def execute(): RDD[Row] = context.sparkContext.parallelize(sideEffectResult, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this also be in Command? The implementation looks the same everywhere.

@liancheng
Copy link
Contributor Author

ok to test

@SparkQA
Copy link

SparkQA commented Sep 3, 2014

QA tests have started for PR 2215 at commit 3fbef60.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 3, 2014

QA tests have finished for PR 2215 at commit 3fbef60.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor Author

Build failure caused by unrelated GraphX test suite.

retest this please.

@liancheng
Copy link
Contributor Author

test this please

@SparkQA
Copy link

SparkQA commented Sep 3, 2014

QA tests have started for PR 2215 at commit 3fbef60.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 3, 2014

QA tests have finished for PR 2215 at commit 3fbef60.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor Author

Build failure caused by streaming test suite.

retest this please.

@liancheng
Copy link
Contributor Author

ok to test

@SparkQA
Copy link

SparkQA commented Sep 3, 2014

QA tests have started for PR 2215 at commit 3fbef60.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 4, 2014

QA tests have finished for PR 2215 at commit 3fbef60.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class SparkListenerBlockManagerAdded(time: Long, blockManagerId: BlockManagerId, maxMem: Long)
    • case class SparkListenerBlockManagerRemoved(time: Long, blockManagerId: BlockManagerId)
    • case class SparkListenerApplicationStart(appName: String, appId: Option[String], time: Long,

@asfgit asfgit closed this in f48420f Sep 4, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
…hen calling .collect()

By overriding `executeCollect()` in physical plan classes of all commands, we can avoid to kick off a distributed job when collecting result of a SQL command, e.g. `sql("SET").collect()`.

Previously, `Command.sideEffectResult` returns a `Seq[Any]`, and the `execute()` method in sub-classes of `Command` typically convert that to a `Seq[Row]` then parallelize it to an RDD. Now with this PR, `sideEffectResult` is required to return a `Seq[Row]` directly, so that `executeCollect()` can directly leverage that and be factored to the `Command` parent class.

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes apache#2215 from liancheng/lightweight-commands and squashes the following commits:

3fbef60 [Cheng Lian] Factored execute() method of physical commands to parent class Command
5a0e16c [Cheng Lian] Passes test suites
e0e12e9 [Cheng Lian] Refactored Command.sideEffectResult and Command.executeCollect
995bdd8 [Cheng Lian] Cleaned up DescribeHiveTableCommand
542977c [Cheng Lian] Avoids confusion between logical and physical plan by adding package prefixes
55b2aa5 [Cheng Lian] Avoids distributed jobs when execution SQL commands
liancheng added a commit to liancheng/spark that referenced this pull request Sep 4, 2014
asfgit pushed a commit that referenced this pull request Sep 5, 2014
Adds logical and physical command classes for the "add jar" command.

Note that this PR conflicts with and should be merged after #2215.

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes #2242 from liancheng/add-jar and squashes the following commits:

e43a2f1 [Cheng Lian] Updates AddJar according to conventions introduced in #2215
b99107f [Cheng Lian] Added test case for ADD JAR command
095b2c7 [Cheng Lian] Also forward ADD JAR command to Hive
9be031b [Cheng Lian] Trims Jar path string
8195056 [Cheng Lian] Added support for the "add jar" command
@liancheng liancheng deleted the lightweight-commands branch September 24, 2014 00:05
tbfenet added a commit to tbfenet/spark that referenced this pull request Dec 10, 2014
Adds logical and physical command classes for the "add jar" command.

Note that this PR conflicts with and should be merged after apache#2215.

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes apache#2242 from liancheng/add-jar and squashes the following commits:

e43a2f1 [Cheng Lian] Updates AddJar according to conventions introduced in apache#2215
b99107f [Cheng Lian] Added test case for ADD JAR command
095b2c7 [Cheng Lian] Also forward ADD JAR command to Hive
9be031b [Cheng Lian] Trims Jar path string
8195056 [Cheng Lian] Added support for the "add jar" command

Conflicts:
	sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala
	sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala
	sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants