[SPARK-16730][SQL] Implement function aliases for type casts #14364

petermaxlee · 2016-07-26T07:39:29Z

What changes were proposed in this pull request?

Spark 1.x supports using the Hive type name as function names for doing casts, e.g.

SELECT int(1.0);
SELECT string(2.0);

The above query would work in Spark 1.x because Spark 1.x fail back to Hive for unimplemented functions, and break in Spark 2.0 because the fall back was removed.

This patch implements function aliases using an analyzer rule for the following cast functions:

boolean
tinyint
smallint
int
bigint
float
double
decimal
date
timestamp
binary
string

How was this patch tested?

Added end-to-end tests in SQLCompatibilityFunctionSuite.

petermaxlee · 2016-07-26T07:40:10Z

@cloud-fan this is an alternative implementation using FunctionRegistry.

Let me know if you prefer this one over #14362.

cloud-fan · 2016-07-26T07:44:17Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala

+    expression[BitwiseXor]("^"),
+
+    // Cast aliases (SPARK-16730)
+    castAlias("boolean", BooleanType),


in hive, if users create a udf called boolean, will hive throw exception or override the type casting one?

boolean is just a normal function in Hive (same as for example acos), so it would do whatever a normal function's behavior is.

SparkQA · 2016-07-26T09:10:24Z

Test build #62872 has finished for PR 14364 at commit 8b087b2.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-07-27T00:55:27Z

Test build #62902 has finished for PR 14364 at commit b8fbcab.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-07-27T04:21:13Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala

+    castAlias("tinyint", ByteType),
+    castAlias("smallint", ShortType),
+    castAlias("int", IntegerType),
+    castAlias("bigint", LongType),


use LongType.simpleString instead of bigint looks better. Same to others.

I think that's actually worse, because it makes it less clear what the function name is by looking at this source file. Also if for some reason we change LongType.simpleString in the future, these functions will subtly break.

cloud-fan · 2016-07-27T04:31:52Z

mostly LGTM, thanks for working on it!

SparkQA · 2016-07-27T06:28:33Z

Test build #62909 has finished for PR 14364 at commit 3b78da3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2016-07-27T07:23:56Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala

+  }
+
+  /**
+   * Creates a function lookup registry for cast aliases (SPARK-16730).


NIT: ...function lookup registry... should that ben...function registry lookup entry ... or something similar?

hvanhovell · 2016-07-27T07:27:10Z

A few minor comments. LGTM otherwise.

petermaxlee · 2016-07-27T16:10:44Z

I've updated the pull request based on @hvanhovell's comment.

SparkQA · 2016-07-27T17:58:46Z

Test build #62928 has finished for PR 14364 at commit 5bbe995.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-07-28T04:20:05Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala

+      }
+      Cast(args.head, dataType)
+    }
+    (name, (expressionInfo[Cast](name), builder))


so whatever cast function we describe, we will always show Cast's description right? Is it same with hive?

Yes - this is a limitation. That's not what Hive does because Hive actually does not have a single cast expression. It has a cast expression for each target type. I think it's a pretty small detail and fixing it would require a lot of work.

## What changes were proposed in this pull request? Spark 1.x supports using the Hive type name as function names for doing casts, e.g. ```sql SELECT int(1.0); SELECT string(2.0); ``` The above query would work in Spark 1.x because Spark 1.x fail back to Hive for unimplemented functions, and break in Spark 2.0 because the fall back was removed. This patch implements function aliases using an analyzer rule for the following cast functions: - boolean - tinyint - smallint - int - bigint - float - double - decimal - date - timestamp - binary - string ## How was this patch tested? Added end-to-end tests in SQLCompatibilityFunctionSuite. Author: petermaxlee <petermaxlee@gmail.com> Closes apache#14364 from petermaxlee/SPARK-16730-2.

## What changes were proposed in this pull request? Spark 1.x supports using the Hive type name as function names for doing casts, e.g. ```sql SELECT int(1.0); SELECT string(2.0); ``` The above query would work in Spark 1.x because Spark 1.x fail back to Hive for unimplemented functions, and break in Spark 2.0 because the fall back was removed. This patch implements function aliases using an analyzer rule for the following cast functions: - boolean - tinyint - smallint - int - bigint - float - double - decimal - date - timestamp - binary - string ## How was this patch tested? Added end-to-end tests in SQLCompatibilityFunctionSuite. Author: petermaxlee <petermaxlee@gmail.com> Closes apache#14364 from petermaxlee/SPARK-16730-2. (cherry picked from commit 11d427c) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

cloud-fan · 2016-07-28T06:11:24Z

thanks, merging to master and 2.0!

petermaxlee · 2016-07-28T06:37:14Z

Great. Thanks for merging.

[SPARK-16730][SQL] Implement function aliases for type casts

8b087b2

cloud-fan reviewed Jul 26, 2016
View reviewed changes

petermaxlee mentioned this pull request Jul 26, 2016

[SPARK-16730][SQL] Implement function aliases for type casts #14362

Closed

Add ExpressionDescription to cast

b8fbcab

cloud-fan reviewed Jul 27, 2016
View reviewed changes

Code review

3b78da3

hvanhovell reviewed Jul 27, 2016
View reviewed changes

Code review

5bbe995

cloud-fan reviewed Jul 28, 2016
View reviewed changes

petermaxlee closed this Jul 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-16730][SQL] Implement function aliases for type casts #14364

[SPARK-16730][SQL] Implement function aliases for type casts #14364

petermaxlee commented Jul 26, 2016

petermaxlee commented Jul 26, 2016

cloud-fan Jul 26, 2016

petermaxlee Jul 26, 2016

SparkQA commented Jul 26, 2016

SparkQA commented Jul 27, 2016

cloud-fan Jul 27, 2016

petermaxlee Jul 27, 2016

cloud-fan Jul 27, 2016

cloud-fan commented Jul 27, 2016

SparkQA commented Jul 27, 2016

hvanhovell Jul 27, 2016

hvanhovell commented Jul 27, 2016

petermaxlee commented Jul 27, 2016

SparkQA commented Jul 27, 2016

cloud-fan Jul 28, 2016

petermaxlee Jul 28, 2016

cloud-fan commented Jul 28, 2016

petermaxlee commented Jul 28, 2016

[SPARK-16730][SQL] Implement function aliases for type casts #14364

[SPARK-16730][SQL] Implement function aliases for type casts #14364

Conversation

petermaxlee commented Jul 26, 2016

What changes were proposed in this pull request?

How was this patch tested?

petermaxlee commented Jul 26, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jul 26, 2016

SparkQA commented Jul 27, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented Jul 27, 2016

SparkQA commented Jul 27, 2016

Choose a reason for hiding this comment

hvanhovell commented Jul 27, 2016

petermaxlee commented Jul 27, 2016

SparkQA commented Jul 27, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented Jul 28, 2016

petermaxlee commented Jul 28, 2016