[SPARK-9986][SPARK-9991][SPARK-9993][SQL]Create a simple test framework for local operators #8464

zsxwing · 2015-08-26T15:46:03Z

This PR includes the following changes:

Add LocalNodeTest for local operator tests and add unit tests for FilterNode and ProjectNode.
Add LimitNode and UnionNode and their unit tests to show how to use LocalNodeTest. (SPARK-9991, SPARK-9993)

…terNode and ProjectNode

zsxwing · 2015-08-26T15:58:51Z

cloud-fan · 2015-08-26T16:22:18Z

sql/core/src/main/scala/org/apache/spark/sql/execution/local/SeqScanNode.scala

-  override def open(): Unit = {
-    iterator = data.iterator
-  }
+    private var iter: Iterator[InternalRow] = _


Why not just assign value here(we can use val then)? Your changes to FilterNode and ProjectNode also assign value to predicate/project immediately instead of in the open.

SparkQA · 2015-08-26T17:36:56Z

Test build #41628 has finished for PR 8464 at commit 4e101ee.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode
- case class UnionNode(children: Seq[LocalNode]) extends LocalNode

rxin · 2015-08-27T00:23:23Z

I talked to @marmbrus and we thought it'd make more sense to just have the operators be iterators themselves, and then we will create new instances of these somewhere else.

zsxwing · 2015-08-27T07:33:07Z

I removed OpenCloseIterator and made LocalNodes be Iterators as discussion with @rxin. Also updated the description of this PR.

cloud-fan · 2015-08-27T09:19:33Z

sql/core/src/test/scala/org/apache/spark/sql/execution/local/LocalNodeTest.scala

+      CatalystTypeConverters.createToCatalystConverter(StructType.fromAttributes(output))
+    new SeqScanNode(
+      output,
+      df.collect().map(r => converter(r).asInstanceOf[InternalRow]))


we can use df.queryExecution.toRDD.collect() to get the internal rows directly.

Cool. Fixed it.

SparkQA · 2015-08-27T09:57:17Z

Test build #41676 has finished for PR 8464 at commit 62b8d24.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode
- case class UnionNode(children: Seq[LocalNode]) extends LocalNode

SparkQA · 2015-08-27T12:33:03Z

Test build #41681 has finished for PR 8464 at commit 22e7bc0.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode
- case class UnionNode(children: Seq[LocalNode]) extends LocalNode

marmbrus · 2015-08-27T18:06:59Z

sql/core/src/test/scala/org/apache/spark/sql/execution/local/LocalNodeTest.scala

+   * @param sortAnswers if true, the answers will be sorted by their toString representations prior
+   *                    to being compared.
+   */
+  protected def checkAnswer2(


Just name this checkAnswer.

It needs to be checkAnswer2 because there is a default parameter sortAnswers and it cannot work with overload.

Can you use overloading instead of defaults for that as well?

If so, the type inference doesn't work well... E.g.,

checkAnswer( testData, node => FilterNode(condition.expr, node), testData.filter(condition).collect() )

will need to change to

checkAnswer( testData, (node: LocalNode) => FilterNode(condition.expr, node), testData.filter(condition).collect() )

It reminds me some old arguments about overloading in the Scala community. It seems that they don't like overloading and don't want to improve it: https://groups.google.com/forum/#!msg/scala-language/h7akCAFnu8c/dmReTTsW11gJ

SparkQA · 2015-08-28T04:24:12Z

Test build #41717 has finished for PR 8464 at commit 7dcd502.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode
- case class UnionNode(children: Seq[LocalNode]) extends LocalNode

rxin · 2015-08-28T05:53:43Z

sql/core/src/test/scala/org/apache/spark/sql/execution/local/FilterNodeSuite.scala

+
+import org.apache.spark.sql.test.SharedSQLContext
+
+class FilterNodeSuite extends LocalNodeTest with SharedSQLContext {


so I was thinking it would be great if we can get rid of the SQLContext in the test cases for these local stuff.

I just want to reuse SQLTestData. It would be easy if we could use DataFrame to test the local stuff. I feel if we make sure not using SQLContext in the main codes of LocalNodes, we don't need to get rid of SQLContext in the test cases.

Note that SQLTestData is going away though.... #7406

Note that SQLTestData is going away though.... #7406

Didn't notice that. I will add test data for each test case manually.

Just realized we need to call DataFrame.resolve to create a Column. Looks it's hard to get rid of SQLContext only in tests. I think it's better to do it in a separate PR to add Analyzer for LocalNode.

rxin · 2015-08-30T01:10:07Z

OK I'm going to merge this so we have the infrastructure in. We can fix issues later.

zsxwing added 3 commits August 26, 2015 23:39

Add OpenCloseIterator and refactor LocalNode to make it reusable

4dc2583

Add LocalNodeTest for local operator tests and add unit tests for Fil…

99433c1

…terNode and ProjectNode

Add LimitNode and UnionNode

4e101ee

cloud-fan reviewed Aug 26, 2015
View reviewed changes

Remove OpenCloseIterator and make LocalNodes be Iterators

76bae2e

Revert unnecesary change to ProjectNode.scala

62b8d24

cloud-fan reviewed Aug 27, 2015
View reviewed changes

Fix dataFrameToSeqScanNode

22e7bc0

marmbrus reviewed Aug 27, 2015
View reviewed changes

zsxwing added 3 commits August 28, 2015 09:37

get() -> fetch()

094d361

Fix the code style

ea263f5

Add compareAnswers to object SQLTestUtils

7dcd502

rxin reviewed Aug 28, 2015
View reviewed changes

asfgit closed this in 13f5f8e Aug 30, 2015

zsxwing deleted the local-execution branch August 31, 2015 07:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-9986][SPARK-9991][SPARK-9993][SQL]Create a simple test framework for local operators #8464

[SPARK-9986][SPARK-9991][SPARK-9993][SQL]Create a simple test framework for local operators #8464

zsxwing commented Aug 26, 2015

zsxwing commented Aug 26, 2015

cloud-fan Aug 26, 2015

SparkQA commented Aug 26, 2015

rxin commented Aug 27, 2015

zsxwing commented Aug 27, 2015

cloud-fan Aug 27, 2015

zsxwing Aug 27, 2015

SparkQA commented Aug 27, 2015

SparkQA commented Aug 27, 2015

marmbrus Aug 27, 2015

zsxwing Aug 28, 2015

marmbrus Aug 28, 2015

zsxwing Aug 29, 2015

zsxwing Aug 29, 2015

SparkQA commented Aug 28, 2015

rxin Aug 28, 2015

zsxwing Aug 28, 2015

rxin Aug 28, 2015

zsxwing Aug 28, 2015

zsxwing Aug 28, 2015

rxin commented Aug 30, 2015


		import org.apache.spark.sql.test.SharedSQLContext

		class FilterNodeSuite extends LocalNodeTest with SharedSQLContext {

[SPARK-9986][SPARK-9991][SPARK-9993][SQL]Create a simple test framework for local operators #8464

[SPARK-9986][SPARK-9991][SPARK-9993][SQL]Create a simple test framework for local operators #8464

Conversation

zsxwing commented Aug 26, 2015

zsxwing commented Aug 26, 2015

Choose a reason for hiding this comment

SparkQA commented Aug 26, 2015

rxin commented Aug 27, 2015

zsxwing commented Aug 27, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Aug 27, 2015

SparkQA commented Aug 27, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Aug 28, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rxin commented Aug 30, 2015