[SPARK-45139][SQL] Add DatabricksDialect to handle SQL type conversion #42896

sadikovi · 2023-09-13T03:08:57Z

What changes were proposed in this pull request?

This PR adds DatabricksDialect to Spark to allow users to query Databricks clusters and Databricks SQL warehouses with more precise SQL type conversion and quote identifiers instead of doing it manually in the code.

Why are the changes needed?

The PR fixes type conversion and makes it easier to query Databricks clusters.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

I added unit tests in JDBCSuite to check conversion.

Was this patch authored or co-authored using generative AI tooling?

No.

sadikovi · 2023-09-13T03:09:31Z

cc @HyukjinKwon @cloud-fan

yaooqinn

:)

dongjoon-hyun

+1, LGTM. Although I didn't test PR, I believe this PR works perfectly. :)

dongjoon-hyun · 2023-09-13T09:29:40Z

Merged to master for Apache Spark 4.0.
The Python failures in CIs are irrelevant to this DatabricksDialect.

sadikovi · 2023-09-13T21:34:40Z

Thank you @dongjoon-hyun!

I forgot to mention in the PR, I did test the dialect manually for a few tables and queries. For example, this query works:

scala> val df = spark.read.format("jdbc")
  .option("url", "jdbc:databricks://<host>.cloud.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/23982398239232323/0912-023325-neasdf78;AuthMech=3;UID=token;PWD=<token>")
  .option("dbtable", "ivan_test")
  .option("driver", "com.databricks.client.jdbc.Driver") // for some reason, the driver was not loading automatically on my machine
  .load()
df: org.apache.spark.sql.DataFrame = [a: int, b: string ... 3 more fields]

scala> df.show
+---+---+----+---+---+                                                          
|  a|  b|   c|  d|  e|
+---+---+----+---+---+
|  1|  2|true|3.4|5.6|
+---+---+----+---+---+

while it fails without the dialect with some type conversion errors.

dongjoon-hyun · 2023-09-13T21:40:39Z

Thank you!

update

fc9f9dd

sadikovi changed the title ~~[SPARK-45139] Add DatabricksDialect to handle SQL type conversion~~ [SPARK-45139][SQL] Add DatabricksDialect to handle SQL type conversion Sep 13, 2023

github-actions bot added the SQL label Sep 13, 2023

HyukjinKwon approved these changes Sep 13, 2023

View reviewed changes

yaooqinn approved these changes Sep 13, 2023

View reviewed changes

cloud-fan approved these changes Sep 13, 2023

View reviewed changes

dongjoon-hyun approved these changes Sep 13, 2023

View reviewed changes

zhengruifeng approved these changes Sep 13, 2023

View reviewed changes

dongjoon-hyun closed this in 2710dbe Sep 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-45139][SQL] Add DatabricksDialect to handle SQL type conversion #42896

[SPARK-45139][SQL] Add DatabricksDialect to handle SQL type conversion #42896

sadikovi commented Sep 13, 2023

sadikovi commented Sep 13, 2023

yaooqinn left a comment

dongjoon-hyun left a comment

dongjoon-hyun commented Sep 13, 2023

sadikovi commented Sep 13, 2023

dongjoon-hyun commented Sep 13, 2023

[SPARK-45139][SQL] Add DatabricksDialect to handle SQL type conversion #42896

[SPARK-45139][SQL] Add DatabricksDialect to handle SQL type conversion #42896

Conversation

sadikovi commented Sep 13, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

sadikovi commented Sep 13, 2023

yaooqinn left a comment

Choose a reason for hiding this comment

dongjoon-hyun left a comment

Choose a reason for hiding this comment

dongjoon-hyun commented Sep 13, 2023

sadikovi commented Sep 13, 2023

dongjoon-hyun commented Sep 13, 2023