Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29822][SQL] Fix cast error when there are white spaces between signs and values #26449

Closed
wants to merge 14 commits into from
Closed

Conversation

yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Nov 9, 2019

What changes were proposed in this pull request?

With the latest string to literal optimization #26256, some interval strings can not be cast when there are some spaces between signs and unit values. After state PARSE_SIGN, it directly goes to PARSE_UNIT_VALUE when takes a space character as the end. So when there are some white spaces come before the real unit value, it fails to parse, we should add a new state like TRIM_VALUE to trim all these spaces.

How to re-produce, which aim the revisions since #26256 is merged

select cast(v as interval) from values ('+     1 second') t(v);
select cast(v as interval) from values ('-     1 second') t(v);

Why are the changes needed?

bug fix

Does this PR introduce any user-facing change?

no

How was this patch tested?

  1. ut
  2. new benchmark test

@yaooqinn
Copy link
Member Author

yaooqinn commented Nov 9, 2019

cc @MaxGekk @maropu @cloud-fan, thanks in advance.

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, could you regenerate benchmark results for JDK 11 - IntervalBenchmark-jdk11-results.txt

@@ -1,25 +1,29 @@
Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15.1
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.14.6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So old jdk? ;-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated and regen-ed with both new jdk8 and jdk11, thanks.

@SparkQA
Copy link

SparkQA commented Nov 9, 2019

Test build #113501 has finished for PR 26449 at commit 8f0dba0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 10, 2019

Test build #113515 has finished for PR 26449 at commit 65e2a1c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 10, 2019

Test build #113516 has finished for PR 26449 at commit 8851f35.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Nov 10, 2019

Can you file a new jira for this for better traceability?

@yaooqinn
Copy link
Member Author

Can you file a new jira for this for better traceability?

Done, thanks for your suggestion @maropu

Here is the ticket:
https://issues.apache.org/jira/browse/SPARK-29822

@yaooqinn yaooqinn changed the title [SPARK-29605][SQL][FOLLOWUP] Fix cast error when there are spaces between signs and values [SPARK-29822][SQL] Fix cast error when there are spaces between signs and values Nov 10, 2019
@@ -41,3 +41,12 @@ select max(cast(v as interval)) from VALUES ('1 seconds'), ('4 seconds'), ('3 se

-- min
select min(cast(v as interval)) from VALUES ('1 seconds'), ('4 seconds'), ('3 seconds') t(v);

-- SPARK-29605: cast string to intervals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be SPARK-29822 because this is newly added test coverage for this PR SPARK-29822.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I just rm it from here, thanks

@dongjoon-hyun
Copy link
Member

Hi, @yaooqinn .
I know this PR is valid on master branch. Could you put the pointer in the PR description about what causes this? For 3.0-preview, it worked like the following.

spark-sql> select cast(v as interval) from values ('+     1 second') t(v);
interval 1 seconds
Time taken: 2.011 seconds, Fetched 1 row(s)

spark-sql> select cast(v as interval) from values ('-     1 second') t(v);
interval -1 seconds
Time taken: 0.033 seconds, Fetched 1 row(s)

spark-sql> select version();
3.0.0 007c873ae34f58651481ccba30e8e2ba38a692c4
Time taken: 1.181 seconds, Fetched 1 row(s)

@yaooqinn yaooqinn changed the title [SPARK-29822][SQL] Fix cast error when there are spaces between signs and values [SPARK-29822][SQL] Fix cast error when there are white spaces between signs and values Nov 11, 2019
@yaooqinn
Copy link
Member Author

@dongjoon-hyun, thanks for review. The pr description is updated

@SparkQA
Copy link

SparkQA commented Nov 11, 2019

Test build #113553 has finished for PR 26449 at commit 1e8272c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val PREFIX,
BEGIN_VALUE,
PARSE_SIGN,
TRIM_VALUE,
PARSE_UNIT_VALUE,
FRACTIONAL_PART,
BEGIN_UNIT_NAME,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In BEGIN_UNIT_NAME we also trim the spaces. Can we have a consistent way to do it? Now there are 2 ways:

  1. have an intermedia state to trim the spaces
  2. trim the spaces at the beginning of a state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add a TRIM_UNIT state to unify these 2 ways

val PREFIX,
BEGIN_VALUE,
PARSE_SIGN,
TRIM_VALUE,
Copy link
Contributor

@cloud-fan cloud-fan Nov 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make the name clearer? e.g. TRIM_BEFORE_PARSE_UNIT_VALUE

@@ -425,11 +425,15 @@ object IntervalUtils {
}

private object ParseState extends Enumeration {
type ParseState = Value

val PREFIX,
BEGIN_VALUE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can rename it to TRIM_BEFORE_PARSE_VALUE

Copy link
Member Author

@yaooqinn yaooqinn Nov 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we may change PARSE_UNIT_VALUE to PARSE_VALUE too

How about renaming them all as,

    val PREFIX,
        BODY,
        SIGN,
        TRIM_BEFORE_VALUE,
        VALUE,
        VALUE_FRACTIONAL_PART,
        TRIM_BEFORE_UNIT,
        UNIT_BEGIN,
        UNIT_SUFFIX,
        UNIT_END = Value

Seems loud and clear enough for each state

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does BODY mean? others LGTM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interval (PREFIX, BODY)
BODY (SIGN, VALUE , UNIT)+

Seems not persuadable enough

Copy link
Member Author

@yaooqinn yaooqinn Nov 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or NEXT_VALUE_UNIT? NEXT_VALUE_UNIT_PAIR

BEGIN_UNIT_NAME,
UNIT_NAME_SUFFIX,
END_UNIT_NAME = Value
NEXT_VALUE_UNIT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about TRIM_BEFORE_SIGN to be consistent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool. good naming

@SparkQA
Copy link

SparkQA commented Nov 11, 2019

Test build #113571 has finished for PR 26449 at commit 83b9ae3.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 11, 2019

Test build #113568 has finished for PR 26449 at commit cb8aa6b.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 11, 2019

Test build #113575 has finished for PR 26449 at commit 34bf719.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 11, 2019

Test build #113569 has finished for PR 26449 at commit 0a301a1.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -65,3 +65,12 @@ select make_interval(1, 2, 3, 4);
select make_interval(1, 2, 3, 4, 5);
select make_interval(1, 2, 3, 4, 5, 6);
select make_interval(1, 2, 3, 4, 5, 6, 7.008009);

-- cast string to intervals
select cast(v as interval) from values ('1 second') t(v);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: select cast('1 second' as interval)

@SparkQA
Copy link

SparkQA commented Nov 11, 2019

Test build #113578 has finished for PR 26449 at commit cb83761.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in d06a9cc Nov 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants