Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45139][SQL] Add DatabricksDialect to handle SQL type conversion #42896

Closed
wants to merge 1 commit into from

Conversation

sadikovi
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds DatabricksDialect to Spark to allow users to query Databricks clusters and Databricks SQL warehouses with more precise SQL type conversion and quote identifiers instead of doing it manually in the code.

Why are the changes needed?

The PR fixes type conversion and makes it easier to query Databricks clusters.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

I added unit tests in JDBCSuite to check conversion.

Was this patch authored or co-authored using generative AI tooling?

No.

@sadikovi sadikovi changed the title [SPARK-45139] Add DatabricksDialect to handle SQL type conversion [SPARK-45139][SQL] Add DatabricksDialect to handle SQL type conversion Sep 13, 2023
@github-actions github-actions bot added the SQL label Sep 13, 2023
@sadikovi
Copy link
Contributor Author

cc @HyukjinKwon @cloud-fan

Copy link
Member

@yaooqinn yaooqinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Although I didn't test PR, I believe this PR works perfectly. :)

@dongjoon-hyun
Copy link
Member

Merged to master for Apache Spark 4.0.
The Python failures in CIs are irrelevant to this DatabricksDialect.

@sadikovi
Copy link
Contributor Author

Thank you @dongjoon-hyun!

I forgot to mention in the PR, I did test the dialect manually for a few tables and queries. For example, this query works:

scala> val df = spark.read.format("jdbc")
  .option("url", "jdbc:databricks://<host>.cloud.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/23982398239232323/0912-023325-neasdf78;AuthMech=3;UID=token;PWD=<token>")
  .option("dbtable", "ivan_test")
  .option("driver", "com.databricks.client.jdbc.Driver") // for some reason, the driver was not loading automatically on my machine
  .load()
df: org.apache.spark.sql.DataFrame = [a: int, b: string ... 3 more fields]

scala> df.show
+---+---+----+---+---+                                                          
|  a|  b|   c|  d|  e|
+---+---+----+---+---+
|  1|  2|true|3.4|5.6|
+---+---+----+---+---+

while it fails without the dialect with some type conversion errors.

@dongjoon-hyun
Copy link
Member

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants