Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29980][SQL] Whitespaces handling for Cast and BinaryOperation between StringType and NumericTypes #26618

Closed
wants to merge 1 commit into from

Conversation

yaooqinn
Copy link
Member

What changes were proposed in this pull request?

Here is a case, let see how it goes in different SQL engines.

select cast('1 ' as int) as v1, '1 ' = 1 as v2

spark 1.6

NULL true

spark 2.1

NULL true

spark 2.2

NULL NULL

spark 2.3

NULL NULL

spark 2.4

NULL NULL

hive

NULL true

PostgreSQL

postgres=# select cast('1 ' as int) as v1, '1 ' = 1 as v2;
 v1 | v2
----+----
  1 | t
(1 row)

presto

presto> select cast('1 ' as int) as v1, '1 ' = 1 as v2;
Query 20191120_060530_00002_f5kcs failed: line 1:38: '=' cannot be applied to varchar(2), integer
select cast('1 ' as int) as v1, '1 ' = 1 as v2

presto> select cast('1 ' as int) as v1, '1 ' = '1 ' as v2;
Query 20191120_060545_00003_f5kcs failed: Cannot cast '1 ' to INT

Our behavior is unstable because type coercion changed since 2.2.
Personally, I think what PostgreSQL and Presto does here is more reasonable and consistent

Currently, this pull request obeys PostgreSQL, might need further discussion against this behavior change.

Why are the changes needed?

For better dirty data auto handling, keep consistency with older version sparks

Does this PR introduce any user-facing change?

ad ut

@yaooqinn
Copy link
Member Author

cc @cloud-fan @maropu @dongjoon-hyun @gatorsmile @HyukjinKwon, thanks for reviewing in advance.

@wangyum
Copy link
Member

wangyum commented Nov 21, 2019

We had a discussion before: #24872

@yaooqinn
Copy link
Member Author

We had a discussion before: #24872

oops, will close this.

@yaooqinn yaooqinn closed this Nov 21, 2019
@SparkQA
Copy link

SparkQA commented Nov 21, 2019

Test build #114191 has finished for PR 26618 at commit dc95213.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants