[SPARK-17013][SQL] handle corner case for negative integral literal #14599

cloud-fan · 2016-08-11T08:38:58Z

What changes were proposed in this pull request?

Spark 2.0 parses negative numeric literals as the unary minus of positive literals. This introduces problems for the edge cases such as -9223372036854775809 being parsed as decimal instead of bigint.

This PR fixes it by make negative integral a direct literal in parser.

How was this patch tested?

number-format.sql

cloud-fan · 2016-08-11T08:39:50Z

cc @rxin @petermaxlee @hvanhovell

petermaxlee · 2016-08-11T08:46:43Z

@cloud-fan I have a small patch that is a little bit more comprehensive (it makes it very consistent for all the data types).

https://github.com/apache/spark/compare/master...petermaxlee:SPARK-17013?expand=1

I have no submitted it because it depends on #14598

cloud-fan · 2016-08-11T09:07:03Z

@petermaxlee , your fix looks good, but should we make them consistent? I checked with postgres and mysql, they don't have the type suffix, e.g. Y, L, etc. I think we followed hive for this feature, and hive throws exception for select -9223372036854775808L from xxx. It looks to me that the type suffix is more like a cast operator, and has higher precedence than -.

petermaxlee · 2016-08-11T09:12:44Z

That's a good question. I would expect this is not a "precedence" thing, but simply a numeric literal though, since there are no operators involved here. For example, in Scala (or Java), you can do -9223372036854775808L which is a valid Long value (basically Long.MinValue).

I suspect this is actually a bug in Hive?

cloud-fan · 2016-08-11T09:14:24Z

cc @yhuai for the hive part.

SparkQA · 2016-08-11T10:02:32Z

Test build #63601 has finished for PR 14599 at commit 65f6d6c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2016-08-11T13:41:50Z

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

@@ -627,6 +627,7 @@ quotedIdentifier
 number
    : DECIMAL_VALUE            #decimalLiteral
    | SCIENTIFIC_DECIMAL_VALUE #scientificDecimalLiteral
+    | MINUS INTEGER_VALUE      #negativeIntegerLiteral
    | INTEGER_VALUE            #integerLiteral


Why not add change this case into: MINUS? INTEGER_LITERAL #integerLiteral. That also works and this would save quite a bit of code in the AstBuilder.

hvanhovell · 2016-08-11T14:22:21Z

The L, S & Y suffixes come from Hive.

The fix that is proposed by @petermaxlee has a potential problem when we try parse something like a-1. This will be tokenized into an IDENTIFIER and a INTEGER_VALUE instead of IDENTIFIER MINUS and a INTEGER_VALUE; the first one will result in a ParserException. This caused by the greedy behavior of the Lexer, the INTEGER_VALUE rule is better fit than the combination of the MINUS and a INTEGER_VALUE rules.

This PR does not have this problem because it uses a parser rule, and we can set presence there by ordering the rules (the sequence in which they are defined). Here binary minus takes precedence over unary minus. We could add this rule for the other dataTypes, but that is IMO merely a matter of aesthetics.

petermaxlee · 2016-08-11T21:07:00Z

Thanks. How about this one? #14608 It takes the approach here but applies to all types for consistency.

cloud-fan · 2016-08-12T02:50:50Z

closing in favor of #14608

corner case for negative literal

65f6d6c

hvanhovell reviewed Aug 11, 2016
View reviewed changes

cloud-fan closed this Aug 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-17013][SQL] handle corner case for negative integral literal #14599

[SPARK-17013][SQL] handle corner case for negative integral literal #14599

cloud-fan commented Aug 11, 2016

cloud-fan commented Aug 11, 2016

petermaxlee commented Aug 11, 2016

cloud-fan commented Aug 11, 2016

petermaxlee commented Aug 11, 2016

cloud-fan commented Aug 11, 2016

SparkQA commented Aug 11, 2016

hvanhovell Aug 11, 2016 •

edited

Loading

hvanhovell commented Aug 11, 2016 •

edited

Loading

petermaxlee commented Aug 11, 2016

cloud-fan commented Aug 12, 2016

[SPARK-17013][SQL] handle corner case for negative integral literal #14599

[SPARK-17013][SQL] handle corner case for negative integral literal #14599

Conversation

cloud-fan commented Aug 11, 2016

What changes were proposed in this pull request?

How was this patch tested?

cloud-fan commented Aug 11, 2016

petermaxlee commented Aug 11, 2016

cloud-fan commented Aug 11, 2016

petermaxlee commented Aug 11, 2016

cloud-fan commented Aug 11, 2016

SparkQA commented Aug 11, 2016

hvanhovell Aug 11, 2016 • edited Loading

Choose a reason for hiding this comment

hvanhovell commented Aug 11, 2016 • edited Loading

petermaxlee commented Aug 11, 2016

cloud-fan commented Aug 12, 2016

hvanhovell Aug 11, 2016 •

edited

Loading

hvanhovell commented Aug 11, 2016 •

edited

Loading