Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17013][SQL] handle corner case for negative integral literal #14599

Closed
wants to merge 1 commit into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

Spark 2.0 parses negative numeric literals as the unary minus of positive literals. This introduces problems for the edge cases such as -9223372036854775809 being parsed as decimal instead of bigint.

This PR fixes it by make negative integral a direct literal in parser.

How was this patch tested?

number-format.sql

@cloud-fan
Copy link
Contributor Author

cc @rxin @petermaxlee @hvanhovell

@petermaxlee
Copy link
Contributor

@cloud-fan I have a small patch that is a little bit more comprehensive (it makes it very consistent for all the data types).

https://github.com/apache/spark/compare/master...petermaxlee:SPARK-17013?expand=1

I have no submitted it because it depends on #14598

@cloud-fan
Copy link
Contributor Author

@petermaxlee , your fix looks good, but should we make them consistent? I checked with postgres and mysql, they don't have the type suffix, e.g. Y, L, etc. I think we followed hive for this feature, and hive throws exception for select -9223372036854775808L from xxx. It looks to me that the type suffix is more like a cast operator, and has higher precedence than -.

@petermaxlee
Copy link
Contributor

That's a good question. I would expect this is not a "precedence" thing, but simply a numeric literal though, since there are no operators involved here. For example, in Scala (or Java), you can do -9223372036854775808L which is a valid Long value (basically Long.MinValue).

I suspect this is actually a bug in Hive?

@cloud-fan
Copy link
Contributor Author

cc @yhuai for the hive part.

@SparkQA
Copy link

SparkQA commented Aug 11, 2016

Test build #63601 has finished for PR 14599 at commit 65f6d6c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -627,6 +627,7 @@ quotedIdentifier
number
: DECIMAL_VALUE #decimalLiteral
| SCIENTIFIC_DECIMAL_VALUE #scientificDecimalLiteral
| MINUS INTEGER_VALUE #negativeIntegerLiteral
| INTEGER_VALUE #integerLiteral
Copy link
Contributor

@hvanhovell hvanhovell Aug 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not add change this case into: MINUS? INTEGER_LITERAL #integerLiteral. That also works and this would save quite a bit of code in the AstBuilder.

@hvanhovell
Copy link
Contributor

hvanhovell commented Aug 11, 2016

The L, S & Y suffixes come from Hive.

The fix that is proposed by @petermaxlee has a potential problem when we try parse something like a-1. This will be tokenized into an IDENTIFIER and a INTEGER_VALUE instead of IDENTIFIER MINUS and a INTEGER_VALUE; the first one will result in a ParserException. This caused by the greedy behavior of the Lexer, the INTEGER_VALUE rule is better fit than the combination of the MINUS and a INTEGER_VALUE rules.

This PR does not have this problem because it uses a parser rule, and we can set presence there by ordering the rules (the sequence in which they are defined). Here binary minus takes precedence over unary minus. We could add this rule for the other dataTypes, but that is IMO merely a matter of aesthetics.

@petermaxlee
Copy link
Contributor

Thanks. How about this one? #14608 It takes the approach here but applies to all types for consistency.

@cloud-fan
Copy link
Contributor Author

closing in favor of #14608

@cloud-fan cloud-fan closed this Aug 12, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants