Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-38176][SQL] ANSI mode: allow implicitly casting String to other simple types #35478

Closed

Conversation

gengliangwang
Copy link
Member

@gengliangwang gengliangwang commented Feb 10, 2022

What changes were proposed in this pull request?

Compared to the default behavior, the current ANSI type coercion rules don't allow the following cases:

  • Comparing String with other simple types, e.g. str_col > date'2022-01-01'
  • Arithmetic operation containing String and other simple types
  • Union/Intersect/Except containing String and other simple types
  • SQL function expects non-string types but got string input
  • other SQL operators..

This PR is to remove the limitation. After changes, the String type can be implicit cast as Long/Double/Date/Timestamp/Boolean/Binary.

Note that Byte/Short/Int is not on the precedent list of String: str_col > 1 will become cast(str_col as long) > 1L. So that we can avoid string parsing error if the string is out of the range of Byte/Short/Int in comparison/arithmetic/union operations.
The design applies to Float/Decimal (especially Decimal), for SQL operators containing Float/Decimal and String, the type coercion system will convert both as Double.
image

Why are the changes needed?

The purpose of the current limitation is to prevent potential String parsing errors under ANSI mode. However, after doing research among real-world Spark SQL queries, I find that many users are actually using String as Date/Timestamp/Numeric/etc in their queries. For example, the purpose of query where str_col > date'2022-01-01' is quite obvious, but users have to rewrite it as where cast(str_col as date) > date'2022-01-01' under ANSI mode.
To make the migration to ANSI mode easier, I suggest removing this limitation. Let's treat it as an extension in our SQL dialect.

Does this PR introduce any user-facing change?

Yes, allow implicitly casting String to other simple types under ANSI mode

How was this patch tested?

Unit tests

@gengliangwang
Copy link
Member Author

| Double | Double |
| Date | Date -> Timestamp |
| Timestamp | Timestamp |
| String | String, Long -> Double, Date -> Timestamp, Boolean, Interval, Binary |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this line a bit hard to understand. Shall we add some explanation below?

Some(DoubleType)

// If the target type is any Decimal type, convert the String type literal as Double type.
case (StringType, DecimalType) if isInputFoldable =>
// If the target type is any Decimal type, convert the String type literal as the default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not literal anymore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update other similar comments in this file.

@gengliangwang
Copy link
Member Author

Merging to master for unblocking https://issues.apache.org/jira/browse/SPARK-38154

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants