Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-16730][SQL] Implement function aliases for type casts #14364

Closed
wants to merge 4 commits into from

Conversation

petermaxlee
Copy link
Contributor

What changes were proposed in this pull request?

Spark 1.x supports using the Hive type name as function names for doing casts, e.g.

SELECT int(1.0);
SELECT string(2.0);

The above query would work in Spark 1.x because Spark 1.x fail back to Hive for unimplemented functions, and break in Spark 2.0 because the fall back was removed.

This patch implements function aliases using an analyzer rule for the following cast functions:

  • boolean
  • tinyint
  • smallint
  • int
  • bigint
  • float
  • double
  • decimal
  • date
  • timestamp
  • binary
  • string

How was this patch tested?

Added end-to-end tests in SQLCompatibilityFunctionSuite.

@petermaxlee
Copy link
Contributor Author

@cloud-fan this is an alternative implementation using FunctionRegistry.

Let me know if you prefer this one over #14362.

expression[BitwiseXor]("^"),

// Cast aliases (SPARK-16730)
castAlias("boolean", BooleanType),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in hive, if users create a udf called boolean, will hive throw exception or override the type casting one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boolean is just a normal function in Hive (same as for example acos), so it would do whatever a normal function's behavior is.

@SparkQA
Copy link

SparkQA commented Jul 26, 2016

Test build #62872 has finished for PR 14364 at commit 8b087b2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 27, 2016

Test build #62902 has finished for PR 14364 at commit b8fbcab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

castAlias("tinyint", ByteType),
castAlias("smallint", ShortType),
castAlias("int", IntegerType),
castAlias("bigint", LongType),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use LongType.simpleString instead of bigint looks better. Same to others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's actually worse, because it makes it less clear what the function name is by looking at this source file. Also if for some reason we change LongType.simpleString in the future, these functions will subtly break.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok agree

@cloud-fan
Copy link
Contributor

mostly LGTM, thanks for working on it!

@SparkQA
Copy link

SparkQA commented Jul 27, 2016

Test build #62909 has finished for PR 14364 at commit 3b78da3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}

/**
* Creates a function lookup registry for cast aliases (SPARK-16730).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: ...function lookup registry... should that ben...function registry lookup entry ... or something similar?

@hvanhovell
Copy link
Contributor

A few minor comments. LGTM otherwise.

@petermaxlee
Copy link
Contributor Author

I've updated the pull request based on @hvanhovell's comment.

@SparkQA
Copy link

SparkQA commented Jul 27, 2016

Test build #62928 has finished for PR 14364 at commit 5bbe995.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}
Cast(args.head, dataType)
}
(name, (expressionInfo[Cast](name), builder))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so whatever cast function we describe, we will always show Cast's description right? Is it same with hive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - this is a limitation. That's not what Hive does because Hive actually does not have a single cast expression. It has a cast expression for each target type. I think it's a pretty small detail and fixing it would require a lot of work.

zzcclp pushed a commit to zzcclp/spark that referenced this pull request Jul 28, 2016
## What changes were proposed in this pull request?
Spark 1.x supports using the Hive type name as function names for doing casts, e.g.
```sql
SELECT int(1.0);
SELECT string(2.0);
```

The above query would work in Spark 1.x because Spark 1.x fail back to Hive for unimplemented functions, and break in Spark 2.0 because the fall back was removed.

This patch implements function aliases using an analyzer rule for the following cast functions:
- boolean
- tinyint
- smallint
- int
- bigint
- float
- double
- decimal
- date
- timestamp
- binary
- string

## How was this patch tested?
Added end-to-end tests in SQLCompatibilityFunctionSuite.

Author: petermaxlee <petermaxlee@gmail.com>

Closes apache#14364 from petermaxlee/SPARK-16730-2.
zzcclp pushed a commit to zzcclp/spark that referenced this pull request Jul 28, 2016
## What changes were proposed in this pull request?
Spark 1.x supports using the Hive type name as function names for doing casts, e.g.
```sql
SELECT int(1.0);
SELECT string(2.0);
```

The above query would work in Spark 1.x because Spark 1.x fail back to Hive for unimplemented functions, and break in Spark 2.0 because the fall back was removed.

This patch implements function aliases using an analyzer rule for the following cast functions:
- boolean
- tinyint
- smallint
- int
- bigint
- float
- double
- decimal
- date
- timestamp
- binary
- string

## How was this patch tested?
Added end-to-end tests in SQLCompatibilityFunctionSuite.

Author: petermaxlee <petermaxlee@gmail.com>

Closes apache#14364 from petermaxlee/SPARK-16730-2.

(cherry picked from commit 11d427c)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan
Copy link
Contributor

thanks, merging to master and 2.0!

@petermaxlee
Copy link
Contributor Author

Great. Thanks for merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants