-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20557][SQL] Support for db column type TIMESTAMP WITH TIME ZONE #17832
[SPARK-20557][SQL] Support for db column type TIMESTAMP WITH TIME ZONE #17832
Conversation
@@ -223,6 +223,9 @@ object JdbcUtils extends Logging { | |||
case java.sql.Types.STRUCT => StringType | |||
case java.sql.Types.TIME => TimestampType | |||
case java.sql.Types.TIMESTAMP => TimestampType | |||
case java.sql.Types.TIMESTAMP_WITH_TIMEZONE | |||
=> TimestampType | |||
case -101 => TimestampType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is -101?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-101
is the code Oracle databases use for TIMESTAMP WITH TIME ZONE
. There is unfortunately no equivalent in the java.sql.Types
Enum (https://docs.oracle.com/javase/8/docs/api/java/sql/Types.html)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test case to OracleIntegrationSuite
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a try. The fix works in my local docker test environment. Below is the way how we run docker integration test.
build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive-thriftserver -Phive -DskipTests install
Before running dockers test . You need to set the docker env variabled. Following does the magic:
eval $(docker-machine env default)
build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.11 compile test
Feel free to ping me if you hit any issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @JannikArndt
Could you add a comment describing about -101
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Write fix: 5 min ✅
Write test: 5 min ✅
Run test: 3 h. ✅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this -101
thing put here instead of in the Oracle dialect?
can you please rebase your PR? it's including a lot of changes from other people |
@felixcheung Done |
ok to test |
+1. LGTM. I also tested with docker integration test with/without this patch. |
Yes. I also did it. Thanks! @dongjoon-hyun @JannikArndt LGTM |
Test build #76498 has finished for PR 17832 at commit
|
Thanks! Merging to master. |
@JannikArndt Could you check whether we can do the same thing for |
NVM, I will do it in my PR. Thanks! |
…ialect ## What changes were proposed in this pull request? In the previous PRs, apache#17832 and apache#17835 , we convert `TIMESTAMP WITH TIME ZONE` and `TIME WITH TIME ZONE` to `TIMESTAMP` for all the JDBC sources. However, this conversion could be risky since it does not respect our SQL configuration `spark.sql.session.timeZone`. In addition, each vendor might have different semantics for these two types. For example, Postgres simply returns `TIMESTAMP` types for `TIMESTAMP WITH TIME ZONE`. For such supports, we should do it case by case. This PR reverts the general support of `TIMESTAMP WITH TIME ZONE` and `TIME WITH TIME ZONE` for JDBC sources, except ORACLE Dialect. When supporting the ORACLE's `TIMESTAMP WITH TIME ZONE`, we only support it when the JVM default timezone is the same as the user-specified configuration `spark.sql.session.timeZone` (whose default is the JVM default timezone). Now, we still treat `TIMESTAMP WITH TIME ZONE` as `TIMESTAMP` when fetching the values via the Oracle JDBC connector, whose client converts the timestamp values with time zone to the timestamp values using the local JVM default timezone (a test case is added to `OracleIntegrationSuite.scala` in this PR for showing the behavior). Thus, to avoid any future behavior change, we will not support it if JVM default timezone is different from `spark.sql.session.timeZone` No regression because the previous two PRs were just merged to be unreleased master branch. ## How was this patch tested? Added the test cases Author: gatorsmile <gatorsmile@gmail.com> Closes apache#19939 from gatorsmile/timezoneUpdate.
What changes were proposed in this pull request?
SparkSQL can now read from a database table with column type TIMESTAMP WITH TIME ZONE.
How was this patch tested?
Tested against Oracle database.
@JoshRosen, you seem to know the class, would you look at this? Thanks!