[SPARK-30590][SQL] Untyped select API cannot take typed column expression that needs input type #27499

viirya · 2020-02-08T07:21:30Z

What changes were proposed in this pull request?

This patch proposes to throw clear analysis exception if untyped Dataset.select takes typed column expression that needs input type.

Why are the changes needed?

Dataset provides few typed select helper functions to select typed column expressions. The maximum number of typed columns supported is 5. If wanting to select more than 5 typed columns, it silently calls untyped Dataset.select and can causes weird unresolved error, like:

org.apache.spark.sql.AnalysisException: unresolved operator 'Aggregate [fooagg(FooAgg(1), None, None, None, input[0, int, false] AS value#114, assertnotnull(cast(value#114 as int)), input[0, int, false] AS value#113, IntegerType, IntegerType, false) AS foo_agg_1#116, fooagg(FooAgg(2), None, None, None, input[0, int, false] AS value#119, assertnotnull(cast(value#119 as int)), input[0, int, false] AS value#118, IntegerType, IntegerType, false) AS foo_agg_2#121, fooagg(FooAgg(3), None, None, None, input[0, int, false] AS value#124, assertnotnull(cast(value#124 as int)), input[0, int, false] AS value#123, IntegerType, IntegerType, false) AS foo_agg_3#126, fooagg(FooAgg(4), None, None, None, input[0, int, false] AS value#129, assertnotnull(cast(value#129 as int)), input[0, int, false] AS value#128, IntegerType, IntegerType, false) AS foo_agg_4#131, fooagg(FooAgg(5), None, None, None, input[0, int, false] AS value#134, assertnotnull(cast(value#134 as int)), input[0, int, false] AS value#133, IntegerType, IntegerType, false) AS foo_agg_5#136, fooagg(FooAgg(6), None, None, None, input[0, int, false] AS value#139, assertnotnull(cast(value#139 as int)), input[0, int, false] AS value#138, IntegerType, IntegerType, false) AS foo_agg_6#141];;
'Aggregate [fooagg(FooAgg(1), None, None, None, input[0, int, false] AS value#114, assertnotnull(cast(value#114 as int)), input[0, int, false] AS value#113, IntegerType, IntegerType, false) AS foo_agg_1#116, fooagg(FooAgg(2), None, None, None, input[0, int, false] AS value#119, assertnotnull(cast(value#119 as int)), input[0, int, false] AS value#118, IntegerType, IntegerType, false) AS foo_agg_2#121, fooagg(FooAgg(3), None, None, None, input[0, int, false] AS value#124, assertnotnull(cast(value#124 as int)), input[0, int, false] AS value#123, IntegerType, IntegerType, false) AS foo_agg_3#126, fooagg(FooAgg(4), None, None, None, input[0, int, false] AS value#129, assertnotnull(cast(value#129 as int)), input[0, int, false] AS value#128, IntegerType, IntegerType, false) AS foo_agg_4#131, fooagg(FooAgg(5), None, None, None, input[0, int, false] AS value#134, assertnotnull(cast(value#134 as int)), input[0, int, false] AS value#133, IntegerType, IntegerType, false) AS foo_agg_5#136, fooagg(FooAgg(6), None, None, None, input[0, int, false] AS value#139, assertnotnull(cast(value#139 as int)), input[0, int, false] AS value#138, IntegerType, IntegerType, false) AS foo_agg_6#141]
+- Project [_1#6 AS a#13, _2#7 AS b#14, _3#8 AS c#15, _4#9 AS d#16, _5#10 AS e#17, _6#11 AS F#18]
 +- LocalRelation [_1#6, _2#7, _3#8, _4#9, _5#10, _6#11]

at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:43)
 at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:95)
 at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$3.apply(CheckAnalysis.scala:431)
 at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$3.apply(CheckAnalysis.scala:430)
 at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
 at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:430)

However, to fully disallow typed columns as input to untyped select API will break current usage like count that is a TypedColumn in functions. In order to keep compatibility, we should allow current usage of certain TypedColumns as input to untyped select API. For the TypedColumns that will cause unresolved exception, we should explicitly let users know that they are incorrectly calling untyped select with typed columns which need input type.

Does this PR introduce any user-facing change?

Yes, but this PR only refines the error message.

When users call Dataset.select API with typed column that needs input type, an analysis exception will be thrown. Previously an unresolved error will be thrown.

How was this patch tested?

Unit tests.

SparkQA · 2020-02-08T08:05:02Z

Test build #118060 has finished for PR 27499 at commit 8aafa57.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2020-02-08T08:14:29Z

retest this please

SparkQA · 2020-02-08T11:31:36Z

Test build #118062 has finished for PR 27499 at commit 8aafa57.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-02-08T21:23:19Z

Test build #118077 has finished for PR 27499 at commit ab7060e.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2020-02-08T22:34:18Z

sql/core/src/main/scala/org/apache/spark/sql/functions.scala

@@ -352,8 +352,7 @@ object functions {
   * @group agg_funcs
   * @since 1.3.0
   */
-  def count(columnName: String): TypedColumn[Any, Long] =
-    count(Column(columnName)).as(ExpressionEncoder[Long]())
+  def count(columnName: String): Column = count(Column(columnName))


This seems to me it is wrongly being a TypedColumn. Count is a DeclarativeAggregate.

This is a breaking change, right?

At least https://github.com/apache/spark/pull/27499/files#diff-2c67e6ae3d5115b5521681f6ef871b1dR43 is broken.

It seems a right change but let's revert this line considering it's code freeze period ..

viirya · 2020-02-08T23:03:22Z

project/MimaExcludes.scala

+    ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.streaming.StreamingQueryListener#QueryStartedEvent.this"),
+
+    // [SPARK-30590][SQL] Untyped select API cannot take typed column expression
+    ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.sql.functions.count")


Put it under 3.0 exclude rules temporarily. The version number in the master branch is still 3.0.0.

SparkQA · 2020-02-09T01:38:38Z

Test build #118080 has finished for PR 27499 at commit c9d3cd3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2020-02-09T06:51:13Z

cc @cloud-fan @dongjoon-hyun @HyukjinKwon

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

SparkQA · 2020-02-11T01:59:57Z

Test build #118189 has finished for PR 27499 at commit b784ba5.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2020-02-11T02:12:49Z

retest this please

SparkQA · 2020-02-11T02:20:26Z

Test build #118196 has finished for PR 27499 at commit b784ba5.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2020-02-11T02:35:39Z

retest this please

SparkQA · 2020-02-11T05:03:58Z

Test build #118197 has finished for PR 27499 at commit b784ba5.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2020-02-11T05:55:03Z

retest this please.

SparkQA · 2020-02-11T08:05:02Z

Test build #118209 has finished for PR 27499 at commit b784ba5.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala

HyukjinKwon · 2020-02-21T09:55:01Z

Seems fine except #27499 (comment). Might need to update title and PR description too.

SparkQA · 2020-02-21T12:41:31Z

Test build #118777 has finished for PR 27499 at commit 45feb5c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2020-02-21T16:29:23Z

Updated the description. I think the title is still ok?

SparkQA · 2020-02-21T19:40:53Z

Test build #118800 has finished for PR 27499 at commit 53ba69c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

SparkQA · 2020-02-22T12:51:46Z

Test build #118813 has finished for PR 27499 at commit 096ce42.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2020-02-22T18:42:11Z

also cc @dongjoon-hyun

cloud-fan · 2020-02-24T06:54:40Z

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

+        }
+        if (isSimpleEncoder) {
+          // This typed column produces simple type output that can be fit into untyped `DataFrame`.
+          typedCol.withInputType(exprEnc, logicalPlan.output)


Previously we didn't call withInputType for count, right?

count has no TypedAggregateExpression. withInputType only works on TypedAggregateExpression.

I mean, df.select(count("*")) works without calling withInputType, right?

Yes, for TypedColumn that doesn't contain TypedAggregateExpression, withInputType is no-op, so you don't need to call withInputType for df.select(count("*")).

So here we are supporting more cases than before?

Oh, I get your point now. Yea, we should not allow more cases than before.

SparkQA · 2020-02-25T08:05:02Z

Test build #118897 has finished for PR 27499 at commit 83958fb.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2020-02-25T08:05:26Z

retest this please.

cloud-fan · 2020-02-25T08:27:25Z

sql/core/src/main/scala/org/apache/spark/sql/Column.scala

+   */
+  private[sql] def needInputType: Boolean = {
+    expr.find {
+      case ta: TypedAggregateExpression if ta.inputDeserializer.isEmpty => true


nit: case ta: TypedAggregateExpression => ta.inputDeserializer.isEmpty

cloud-fan · 2020-02-25T08:29:22Z

LGTM, let's highlight that it only refines the error message in the Does this PR introduce any user-facing change? section.

cloud-fan · 2020-02-25T08:32:30Z

sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala

+
+    // Passes typed columns to untyped `Dataset.select` API.
+    val err = intercept[AnalysisException] {
+      df.select(fooAgg(1), fooAgg(2), fooAgg(3), fooAgg(4), fooAgg(5), fooAgg(6))


Not related to this PR, just a note:

We have 5 overloads of typed select, and typed count is supported in both typed and untyped select. That said, if we add a 6th overload of typed select, it can break queries that call the untyped select with 6 typed counts.

I'm not sure what's the best way to move forward. Maybe we should add new methods typedSelect to disambiguate the untyped version.

Yea, to be clear, if we add a 6th overload of typed select, a call to the untyped select with 6 typed count could return Dataset[(Long, Long, ...)] instead of DataFrame.

I think you meant something like existing selectUntyped? Although its naming is confusing.

SparkQA · 2020-02-25T12:56:04Z

Test build #118909 has finished for PR 27499 at commit 83958fb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-02-26T00:42:07Z

Test build #118931 has finished for PR 27499 at commit 68d17f7.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2020-02-26T03:23:59Z

retest this please

SparkQA · 2020-02-26T07:36:54Z

Test build #118944 has finished for PR 27499 at commit 68d17f7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-02-26T11:05:59Z

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

-    Project(cols.map(_.named), logicalPlan)
+    val untypedCols = cols.map {
+      case typedCol: TypedColumn[_, _] =>
+        if (!typedCol.needInputType) {


just noticed: why don't we inline this method? Then we can centralize the changes here. The methods in TypedColumn can still be accessed by java users who ignore "private[spark]", so better to avoid adding if we can.

Oh ok. I previously don't want to make select looks complicated. Inlined it now.

SparkQA · 2020-02-26T21:26:43Z

Test build #118984 has finished for PR 27499 at commit 7d045fb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-02-27T06:09:21Z

thanks, merging to master/3.0!

viirya · 2020-02-27T06:24:12Z

Thanks! I will open a JIRA for discussion of typed select API.

HyukjinKwon · 2020-02-27T06:30:00Z

+1 LGTM too

…sion that needs input type ### What changes were proposed in this pull request? This patch proposes to throw clear analysis exception if untyped `Dataset.select` takes typed column expression that needs input type. ### Why are the changes needed? `Dataset` provides few typed `select` helper functions to select typed column expressions. The maximum number of typed columns supported is 5. If wanting to select more than 5 typed columns, it silently calls untyped `Dataset.select` and can causes weird unresolved error, like: ``` org.apache.spark.sql.AnalysisException: unresolved operator 'Aggregate [fooagg(FooAgg(1), None, None, None, input[0, int, false] AS value#114, assertnotnull(cast(value#114 as int)), input[0, int, false] AS value#113, IntegerType, IntegerType, false) AS foo_agg_1#116, fooagg(FooAgg(2), None, None, None, input[0, int, false] AS value#119, assertnotnull(cast(value#119 as int)), input[0, int, false] AS value#118, IntegerType, IntegerType, false) AS foo_agg_2#121, fooagg(FooAgg(3), None, None, None, input[0, int, false] AS value#124, assertnotnull(cast(value#124 as int)), input[0, int, false] AS value#123, IntegerType, IntegerType, false) AS foo_agg_3#126, fooagg(FooAgg(4), None, None, None, input[0, int, false] AS value#129, assertnotnull(cast(value#129 as int)), input[0, int, false] AS value#128, IntegerType, IntegerType, false) AS foo_agg_4#131, fooagg(FooAgg(5), None, None, None, input[0, int, false] AS value#134, assertnotnull(cast(value#134 as int)), input[0, int, false] AS value#133, IntegerType, IntegerType, false) AS foo_agg_5#136, fooagg(FooAgg(6), None, None, None, input[0, int, false] AS value#139, assertnotnull(cast(value#139 as int)), input[0, int, false] AS value#138, IntegerType, IntegerType, false) AS foo_agg_6#141];; 'Aggregate [fooagg(FooAgg(1), None, None, None, input[0, int, false] AS value#114, assertnotnull(cast(value#114 as int)), input[0, int, false] AS value#113, IntegerType, IntegerType, false) AS foo_agg_1#116, fooagg(FooAgg(2), None, None, None, input[0, int, false] AS value#119, assertnotnull(cast(value#119 as int)), input[0, int, false] AS value#118, IntegerType, IntegerType, false) AS foo_agg_2#121, fooagg(FooAgg(3), None, None, None, input[0, int, false] AS value#124, assertnotnull(cast(value#124 as int)), input[0, int, false] AS value#123, IntegerType, IntegerType, false) AS foo_agg_3#126, fooagg(FooAgg(4), None, None, None, input[0, int, false] AS value#129, assertnotnull(cast(value#129 as int)), input[0, int, false] AS value#128, IntegerType, IntegerType, false) AS foo_agg_4#131, fooagg(FooAgg(5), None, None, None, input[0, int, false] AS value#134, assertnotnull(cast(value#134 as int)), input[0, int, false] AS value#133, IntegerType, IntegerType, false) AS foo_agg_5#136, fooagg(FooAgg(6), None, None, None, input[0, int, false] AS value#139, assertnotnull(cast(value#139 as int)), input[0, int, false] AS value#138, IntegerType, IntegerType, false) AS foo_agg_6#141] +- Project [_1#6 AS a#13, _2#7 AS b#14, _3#8 AS c#15, _4#9 AS d#16, _5#10 AS e#17, _6#11 AS F#18] +- LocalRelation [_1#6, _2#7, _3#8, _4#9, _5#10, _6#11] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:43) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:95) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$3.apply(CheckAnalysis.scala:431) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$3.apply(CheckAnalysis.scala:430) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:430) ``` However, to fully disallow typed columns as input to untyped `select` API will break current usage like `count` that is a `TypedColumn` in `functions`. In order to keep compatibility, we should allow current usage of certain `TypedColumn`s as input to untyped `select` API. For the `TypedColumn`s that will cause unresolved exception, we should explicitly let users know that they are incorrectly calling untyped `select` with typed columns which need input type. ### Does this PR introduce any user-facing change? Yes, but this PR only refines the error message. When users call `Dataset.select` API with typed column that needs input type, an analysis exception will be thrown. Previously an unresolved error will be thrown. ### How was this patch tested? Unit tests. Closes #27499 from viirya/SPARK-30590. Lead-authored-by: Liang-Chi Hsieh <viirya@gmail.com> Co-authored-by: Liang-Chi Hsieh <liangchi@uber.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 160c144) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

viirya · 2020-02-28T08:20:52Z

Created SPARK-30983 for discussion of typed select API.

…sion that needs input type ### What changes were proposed in this pull request? This patch proposes to throw clear analysis exception if untyped `Dataset.select` takes typed column expression that needs input type. ### Why are the changes needed? `Dataset` provides few typed `select` helper functions to select typed column expressions. The maximum number of typed columns supported is 5. If wanting to select more than 5 typed columns, it silently calls untyped `Dataset.select` and can causes weird unresolved error, like: ``` org.apache.spark.sql.AnalysisException: unresolved operator 'Aggregate [fooagg(FooAgg(1), None, None, None, input[0, int, false] AS value#114, assertnotnull(cast(value#114 as int)), input[0, int, false] AS value#113, IntegerType, IntegerType, false) AS foo_agg_1#116, fooagg(FooAgg(2), None, None, None, input[0, int, false] AS value#119, assertnotnull(cast(value#119 as int)), input[0, int, false] AS value#118, IntegerType, IntegerType, false) AS foo_agg_2#121, fooagg(FooAgg(3), None, None, None, input[0, int, false] AS value#124, assertnotnull(cast(value#124 as int)), input[0, int, false] AS value#123, IntegerType, IntegerType, false) AS foo_agg_3#126, fooagg(FooAgg(4), None, None, None, input[0, int, false] AS value#129, assertnotnull(cast(value#129 as int)), input[0, int, false] AS value#128, IntegerType, IntegerType, false) AS foo_agg_4#131, fooagg(FooAgg(5), None, None, None, input[0, int, false] AS value#134, assertnotnull(cast(value#134 as int)), input[0, int, false] AS value#133, IntegerType, IntegerType, false) AS foo_agg_5#136, fooagg(FooAgg(6), None, None, None, input[0, int, false] AS value#139, assertnotnull(cast(value#139 as int)), input[0, int, false] AS value#138, IntegerType, IntegerType, false) AS foo_agg_6#141];; 'Aggregate [fooagg(FooAgg(1), None, None, None, input[0, int, false] AS value#114, assertnotnull(cast(value#114 as int)), input[0, int, false] AS value#113, IntegerType, IntegerType, false) AS foo_agg_1#116, fooagg(FooAgg(2), None, None, None, input[0, int, false] AS value#119, assertnotnull(cast(value#119 as int)), input[0, int, false] AS value#118, IntegerType, IntegerType, false) AS foo_agg_2#121, fooagg(FooAgg(3), None, None, None, input[0, int, false] AS value#124, assertnotnull(cast(value#124 as int)), input[0, int, false] AS value#123, IntegerType, IntegerType, false) AS foo_agg_3#126, fooagg(FooAgg(4), None, None, None, input[0, int, false] AS value#129, assertnotnull(cast(value#129 as int)), input[0, int, false] AS value#128, IntegerType, IntegerType, false) AS foo_agg_4#131, fooagg(FooAgg(5), None, None, None, input[0, int, false] AS value#134, assertnotnull(cast(value#134 as int)), input[0, int, false] AS value#133, IntegerType, IntegerType, false) AS foo_agg_5#136, fooagg(FooAgg(6), None, None, None, input[0, int, false] AS value#139, assertnotnull(cast(value#139 as int)), input[0, int, false] AS value#138, IntegerType, IntegerType, false) AS foo_agg_6#141] +- Project [_1#6 AS a#13, _2#7 AS b#14, _3#8 AS c#15, _4#9 AS d#16, _5#10 AS e#17, _6#11 AS F#18] +- LocalRelation [_1#6, _2#7, _3#8, _4#9, _5#10, _6#11] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:43) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:95) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$3.apply(CheckAnalysis.scala:431) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$3.apply(CheckAnalysis.scala:430) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:430) ``` However, to fully disallow typed columns as input to untyped `select` API will break current usage like `count` that is a `TypedColumn` in `functions`. In order to keep compatibility, we should allow current usage of certain `TypedColumn`s as input to untyped `select` API. For the `TypedColumn`s that will cause unresolved exception, we should explicitly let users know that they are incorrectly calling untyped `select` with typed columns which need input type. ### Does this PR introduce any user-facing change? Yes, but this PR only refines the error message. When users call `Dataset.select` API with typed column that needs input type, an analysis exception will be thrown. Previously an unresolved error will be thrown. ### How was this patch tested? Unit tests. Closes apache#27499 from viirya/SPARK-30590. Lead-authored-by: Liang-Chi Hsieh <viirya@gmail.com> Co-authored-by: Liang-Chi Hsieh <liangchi@uber.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

untyped select API cannot take typed column expression.

8aafa57

count should not be typed column.

ab7060e

viirya commented Feb 8, 2020

View reviewed changes

fix mima.

c9d3cd3

viirya commented Feb 8, 2020

View reviewed changes

cloud-fan reviewed Feb 10, 2020

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala Show resolved Hide resolved

dongjoon-hyun added the SPARK CORE label Feb 10, 2020

Input type of typed columns to selectUntyped should be fixed to T.

b784ba5

cloud-fan reviewed Feb 11, 2020

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Feb 11, 2020

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala Show resolved Hide resolved

viirya added 2 commits February 21, 2020 01:13

Remove selectUntyped change.

4103413

Merge branch 'SPARK-30590' of github.com:viirya/spark-1 into SPARK-30590

45feb5c

Revert change of count.

53ba69c

viirya commented Feb 22, 2020

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala Show resolved Hide resolved

cloud-fan reviewed Feb 24, 2020

View reviewed changes

untyped select API disallows TypedColumn without input type.

83958fb

viirya changed the title ~~[SPARK-30590][SQL] Untyped select API cannot take typed column expression~~ [SPARK-30590][SQL] Untyped select API cannot take typed column expression without input type Feb 25, 2020

viirya changed the title ~~[SPARK-30590][SQL] Untyped select API cannot take typed column expression without input type~~ [SPARK-30590][SQL] Untyped select API cannot take typed column expression that needs input type Feb 25, 2020

Fix test name.

0ef18e8

cloud-fan reviewed Feb 25, 2020

View reviewed changes

For review comment.

68d17f7

cloud-fan reviewed Feb 26, 2020

View reviewed changes

viirya added 2 commits February 26, 2020 08:39

Inline helper method.

c50a357

Merge branch 'SPARK-30590' of github.com:viirya/spark-1 into SPARK-30590

7d045fb

cloud-fan closed this in 160c144 Feb 27, 2020

viirya deleted the SPARK-30590 branch December 27, 2023 18:38

[SPARK-30590][SQL] Untyped select API cannot take typed column expression that needs input type #27499

[SPARK-30590][SQL] Untyped select API cannot take typed column expression that needs input type #27499

Conversation

viirya commented Feb 8, 2020 • edited

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Feb 8, 2020

viirya commented Feb 8, 2020

SparkQA commented Feb 8, 2020

SparkQA commented Feb 8, 2020

Choose a reason for hiding this comment

cloud-fan Feb 21, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Feb 9, 2020

viirya commented Feb 9, 2020

SparkQA commented Feb 11, 2020

viirya commented Feb 11, 2020

SparkQA commented Feb 11, 2020

viirya commented Feb 11, 2020

SparkQA commented Feb 11, 2020

viirya commented Feb 11, 2020

SparkQA commented Feb 11, 2020

HyukjinKwon commented Feb 21, 2020

SparkQA commented Feb 21, 2020

viirya commented Feb 21, 2020

SparkQA commented Feb 21, 2020

SparkQA commented Feb 22, 2020

viirya commented Feb 22, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Feb 25, 2020

viirya commented Feb 25, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented Feb 25, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Feb 25, 2020

SparkQA commented Feb 26, 2020

viirya commented Feb 26, 2020

SparkQA commented Feb 26, 2020

cloud-fan Feb 26, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Feb 26, 2020

cloud-fan commented Feb 27, 2020

viirya commented Feb 27, 2020

HyukjinKwon commented Feb 27, 2020

viirya commented Feb 28, 2020

viirya commented Feb 8, 2020 •

edited

cloud-fan Feb 21, 2020 •

edited

cloud-fan Feb 26, 2020 •

edited