[SPARK-13031] [SQL] cleanup codegen and improve test coverage #10977

davies · 2016-01-29T06:26:06Z

enable whole stage codegen during tests even there is only one operator supports that.
split doProduce() into two APIs: upstream() and doProduce()
generate prefix for fresh names of each operator
pass UnsafeRow to parent directly (avoid getters and create UnsafeRow again)
fix bugs and tests.

This PR re-open #10944 and fix the bug.

rxin · 2016-01-29T06:28:48Z

What's the bug?

davies · 2016-01-29T06:34:19Z

When there is no aggregate functions, it did not generate the output using resultExpression, which have only literals (I was mislead by the comment in AggregateIterator).

rxin · 2016-01-29T07:12:53Z

Thanks - can you add a test case that would catch this? In the long run, we should beef up our own test coverage and don't want to rely on HiveCompatibilitySuite.

davies · 2016-01-29T07:26:55Z

The way we managed HiveCompatibilitySuite is actually better than our unit tests (sql query and golden results in text format). Even if we don't want to be compatible with Hive, it's still good to have those tests (don't call them HiveCompatibilitySuite), and also managed in similar way.

rxin · 2016-01-29T07:29:01Z

Sure it's a good idea to use that golden file infrastructure. Given we don't have that yet, can you just add a test case?

rxin · 2016-01-29T07:30:31Z

The issue here is that we want test cases that are targeted for specific problems, and the Hive ones are not (they are just a giant blackbox we took to bootstrap coverage, and not to mention that a targeted test helps you catch problems with aggregations earlier without rerunning the entire Hive suite).

SparkQA · 2016-01-29T07:32:51Z

Test build #50351 has finished for PR 10977 at commit 951e2cd.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

This reverts commit 70a7c7e.

davies · 2016-01-29T08:17:34Z

@rxin Added.

rxin · 2016-01-29T08:20:22Z

LGTM

SparkQA · 2016-01-29T08:40:17Z

Test build #50359 has finished for PR 10977 at commit ffa8e6b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-01-29T09:34:53Z

Test build #2473 has finished for PR 10977 at commit ffa8e6b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-01-29T09:59:58Z

I've merged this.

Davies Liu added 3 commits January 27, 2016 00:04

cleanup whole stage codegen

b4db006

improve stddev and variance

70a7c7e

fix aggregation without functions

951e2cd

Davies Liu added 2 commits January 29, 2016 00:08

Revert "improve stddev and variance"

f90b38d

This reverts commit 70a7c7e.

add regression test

ffa8e6b

asfgit closed this in 55561e7 Jan 29, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-13031] [SQL] cleanup codegen and improve test coverage #10977

[SPARK-13031] [SQL] cleanup codegen and improve test coverage #10977

davies commented Jan 29, 2016

rxin commented Jan 29, 2016

davies commented Jan 29, 2016

rxin commented Jan 29, 2016

davies commented Jan 29, 2016

rxin commented Jan 29, 2016

rxin commented Jan 29, 2016

SparkQA commented Jan 29, 2016

davies commented Jan 29, 2016

rxin commented Jan 29, 2016

SparkQA commented Jan 29, 2016

SparkQA commented Jan 29, 2016

rxin commented Jan 29, 2016

[SPARK-13031] [SQL] cleanup codegen and improve test coverage #10977

[SPARK-13031] [SQL] cleanup codegen and improve test coverage #10977

Conversation

davies commented Jan 29, 2016

rxin commented Jan 29, 2016

davies commented Jan 29, 2016

rxin commented Jan 29, 2016

davies commented Jan 29, 2016

rxin commented Jan 29, 2016

rxin commented Jan 29, 2016

SparkQA commented Jan 29, 2016

davies commented Jan 29, 2016

rxin commented Jan 29, 2016

SparkQA commented Jan 29, 2016

SparkQA commented Jan 29, 2016

rxin commented Jan 29, 2016