Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22016][SQL] Add HiveDialect for JDBC connection to Hive #19238

Closed
wants to merge 5 commits into from

Conversation

danielfx90
Copy link

What changes were proposed in this pull request?

Added a HiveDialect for JDBC connection to Hive.
It overrides two methods:

  • canHandle
  • quoteIdentifier

How was this patch tested?

It passes the added tests and it was used with a real Hive instance with real data.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@gatorsmile
Copy link
Member

Why not directly connecting to Hive metastore?

@danielfx90
Copy link
Author

@gatorsmile if Hive lies on the same infrastructure as the application, then the metastore should definitely solve the issue, but a connection over JDBC is needed when data comes from an external source which only exposes such a connection through its Hive server. We encountered this and ended up adding the HiveDialect to solve it.

assert(df3.collect() === Array(Row(21519, 1234)))
}
assert(df3.collect() === Array(Row(21519, 1234))
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ')' is wrong. Line 1105~1107 from the original have indentation issue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It must have changed when formatting the code using the IDE. Scalastyle checks passed though, but let me rollback that anyway.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun done! Thank you!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, actually, I meant the original Spark code is also wrong in terms of indentation. You can fix the indentation of original line 1105~1107 here. :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun You are right! I misread the parenthesis. I think now is correct. Thank you for the observation :)

@gatorsmile
Copy link
Member

I can see the value, but it does not perform well in most cases if we using JDBC connection. Instead of adding the extra dialect to upstream, could you please add Hive as a separate data source? Thanks!

https://spark.apache.org/third-party-projects.html

@danielfx90
Copy link
Author

Seems logical. Then, unless someone disagrees, feel free to close this PR and we will create a new spark package with this feature in a new repository.

Thanks!

@paulstaab
Copy link
Contributor

This merge request would partly solve https://issues.apache.org/jira/browse/SPARK-21063

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants