Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45561][SQL] Add proper conversions for TINYINT in MySQLDialect #43390

Closed
wants to merge 8 commits into from

Conversation

michaelzhan-db
Copy link
Contributor

@michaelzhan-db michaelzhan-db commented Oct 16, 2023

What changes were proposed in this pull request?

Change MySql Dialect to convert catalyst TINYINT into MySQL TINYINT rather than BYTE and INTEGER. BYTE does not exist in MySQL. The same applies to MsSqlServerDialect.

Why are the changes needed?

Since BYTE type does not exist in MySQL, any casts that could be pushed down involving BYTE type would fail.

Does this PR introduce any user-facing change?

No

How was this patch tested?

UT pass.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Oct 16, 2023
@michaelzhan-db michaelzhan-db changed the title [SPARK-45561] Add proper conversions for SMALLINT and TINYINT in MySQLDialect [SPARK-45561] Add proper conversions for TINYINT in MySQLDialect and MsSqlServerDialect Oct 17, 2023
@beliefer
Copy link
Contributor

#43232 mapped TINYINT to ShortType for MsSqlServerDialect.
It looks very interesting. The behavior of these two PRs is somewhat opposite.

@beliefer
Copy link
Contributor

@michaelzhan-db Could you add tests? Please refer #43232

@beliefer
Copy link
Contributor

cc @gengliangwang @cloud-fan

@@ -128,6 +128,7 @@ private object MsSqlServerDialect extends JdbcDialect {
case BinaryType => Some(JdbcType("VARBINARY(MAX)", java.sql.Types.VARBINARY))
case ShortType if !SQLConf.get.legacyMsSqlServerNumericMappingEnabled =>
Some(JdbcType("SMALLINT", java.sql.Types.SMALLINT))
case ByteType => Option(JdbcType("TINYINT", java.sql.Types.TINYINT))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Ms SQL TINYINT sufficient for negative values of spark's ByteType?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@michaelzhan-db The range of MsSqlServer's TINYINT from 0 to 255. But Spark ByteType could represent the negative value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I have been too hasty in creating the PR. I will update the PR and JIRA to just MySQL. Thanks for catching that!

@michaelzhan-db michaelzhan-db changed the title [SPARK-45561] Add proper conversions for TINYINT in MySQLDialect and MsSqlServerDialect [SPARK-45561] Add proper conversions for TINYINT in MySQLDialect Oct 17, 2023
@@ -54,10 +55,10 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite {
conn.prepareStatement("INSERT INTO tbl VALUES (42,'fred')").executeUpdate()
conn.prepareStatement("INSERT INTO tbl VALUES (17,'dave')").executeUpdate()

conn.prepareStatement("CREATE TABLE numbers (onebit BIT(1), tenbits BIT(10), "
conn.prepareStatement("CREATE TABLE numbers (onebit BIT(1), tenbits BIT(10), tiny TINYINT, "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add tiny TINYINT at last?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to maintain the rough order of increasing size of numeric type but I am ok with adding it at the end instead.

Copy link
Contributor

@beliefer beliefer Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can reduce the change if we add it at the end. It is easy to review.

@@ -43,7 +43,8 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite {
override val usesIpc = false
override val jdbcPort: Int = 3306
override def getJdbcUrl(ip: String, port: Int): String =
s"jdbc:mysql://$ip:$port/mysql?user=root&password=rootpass"
s"jdbc:mysql://$ip:$port/" +
s"mysql?user=root&password=rootpass&allowPublicKeyRetrieval=true&useSSL=false"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add allowPublicKeyRetrieval and useSSL?

Comment on lines 46 to 47
s"jdbc:mysql://$ip:$port/" +
s"mysql?user=root&password=rootpass"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please restore the two lines.

Copy link
Contributor

@beliefer beliefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@MaxGekk MaxGekk changed the title [SPARK-45561] Add proper conversions for TINYINT in MySQLDialect [SPARK-45561][SQL] Add proper conversions for TINYINT in MySQLDialect Oct 23, 2023
@MaxGekk
Copy link
Member

MaxGekk commented Oct 24, 2023

@michaelzhan-db Does Spark 3.4 suffer from the same issue?

@MaxGekk
Copy link
Member

MaxGekk commented Oct 24, 2023

+1, LGTM. Merging to master/3.5.
Thank you, @michaelzhan-db and @yaooqinn @beliefer @cloud-fan for review.

@MaxGekk MaxGekk closed this in 5092c89 Oct 24, 2023
MaxGekk pushed a commit that referenced this pull request Oct 24, 2023
### What changes were proposed in this pull request?

Change MySql Dialect to convert catalyst TINYINT into MySQL TINYINT rather than BYTE and INTEGER. BYTE does not exist in MySQL. The same applies to MsSqlServerDialect.

### Why are the changes needed?

Since BYTE type does not exist in MySQL, any casts that could be pushed down involving BYTE type would fail.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

UT pass.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #43390 from michaelzhan-db/SPARK-45561.

Lead-authored-by: Michael Zhang <m.zhang@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit 5092c89)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants