New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29822][SQL] Fix cast error when there are white spaces between signs and values #26449
Conversation
cc @MaxGekk @maropu @cloud-fan, thanks in advance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, could you regenerate benchmark results for JDK 11 - IntervalBenchmark-jdk11-results.txt
@@ -1,25 +1,29 @@ | |||
Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15.1 | |||
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz | |||
Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.14.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So old jdk? ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated and regen-ed with both new jdk8 and jdk11, thanks.
Test build #113501 has finished for PR 26449 at commit
|
Test build #113515 has finished for PR 26449 at commit
|
Test build #113516 has finished for PR 26449 at commit
|
Can you file a new jira for this for better traceability? |
Done, thanks for your suggestion @maropu Here is the ticket: |
@@ -41,3 +41,12 @@ select max(cast(v as interval)) from VALUES ('1 seconds'), ('4 seconds'), ('3 se | |||
|
|||
-- min | |||
select min(cast(v as interval)) from VALUES ('1 seconds'), ('4 seconds'), ('3 seconds') t(v); | |||
|
|||
-- SPARK-29605: cast string to intervals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be SPARK-29822
because this is newly added test coverage for this PR SPARK-29822
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I just rm it from here, thanks
Hi, @yaooqinn .
|
@dongjoon-hyun, thanks for review. The pr description is updated |
Test build #113553 has finished for PR 26449 at commit
|
val PREFIX, | ||
BEGIN_VALUE, | ||
PARSE_SIGN, | ||
TRIM_VALUE, | ||
PARSE_UNIT_VALUE, | ||
FRACTIONAL_PART, | ||
BEGIN_UNIT_NAME, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In BEGIN_UNIT_NAME
we also trim the spaces. Can we have a consistent way to do it? Now there are 2 ways:
- have an intermedia state to trim the spaces
- trim the spaces at the beginning of a state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can add a TRIM_UNIT
state to unify these 2 ways
val PREFIX, | ||
BEGIN_VALUE, | ||
PARSE_SIGN, | ||
TRIM_VALUE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make the name clearer? e.g. TRIM_BEFORE_PARSE_UNIT_VALUE
@@ -425,11 +425,15 @@ object IntervalUtils { | |||
} | |||
|
|||
private object ParseState extends Enumeration { | |||
type ParseState = Value | |||
|
|||
val PREFIX, | |||
BEGIN_VALUE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can rename it to TRIM_BEFORE_PARSE_VALUE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we may change PARSE_UNIT_VALUE
to PARSE_VALUE
too
How about renaming them all as,
val PREFIX,
BODY,
SIGN,
TRIM_BEFORE_VALUE,
VALUE,
VALUE_FRACTIONAL_PART,
TRIM_BEFORE_UNIT,
UNIT_BEGIN,
UNIT_SUFFIX,
UNIT_END = Value
Seems loud and clear enough for each state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does BODY
mean? others LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interval (PREFIX, BODY)
BODY (SIGN, VALUE , UNIT)+
Seems not persuadable enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or NEXT_VALUE_UNIT
? NEXT_VALUE_UNIT_PAIR
BEGIN_UNIT_NAME, | ||
UNIT_NAME_SUFFIX, | ||
END_UNIT_NAME = Value | ||
NEXT_VALUE_UNIT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about TRIM_BEFORE_SIGN
to be consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool. good naming
Test build #113571 has finished for PR 26449 at commit
|
Test build #113568 has finished for PR 26449 at commit
|
Test build #113575 has finished for PR 26449 at commit
|
Test build #113569 has finished for PR 26449 at commit
|
@@ -65,3 +65,12 @@ select make_interval(1, 2, 3, 4); | |||
select make_interval(1, 2, 3, 4, 5); | |||
select make_interval(1, 2, 3, 4, 5, 6); | |||
select make_interval(1, 2, 3, 4, 5, 6, 7.008009); | |||
|
|||
-- cast string to intervals | |||
select cast(v as interval) from values ('1 second') t(v); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: select cast('1 second' as interval)
Test build #113578 has finished for PR 26449 at commit
|
thanks, merging to master! |
What changes were proposed in this pull request?
With the latest string to literal optimization #26256, some interval strings can not be cast when there are some spaces between signs and unit values. After state
PARSE_SIGN
, it directly goes toPARSE_UNIT_VALUE
when takes a space character as the end. So when there are some white spaces come before the real unit value, it fails to parse, we should add a new state likeTRIM_VALUE
to trim all these spaces.How to re-produce, which aim the revisions since #26256 is merged
Why are the changes needed?
bug fix
Does this PR introduce any user-facing change?
no
How was this patch tested?