Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20557][SQL] Support for db column type TIMESTAMP WITH TIME ZONE #17832

Closed
wants to merge 1 commit into from
Closed

[SPARK-20557][SQL] Support for db column type TIMESTAMP WITH TIME ZONE #17832

wants to merge 1 commit into from

Conversation

JannikArndt
Copy link
Contributor

What changes were proposed in this pull request?

SparkSQL can now read from a database table with column type TIMESTAMP WITH TIME ZONE.

How was this patch tested?

Tested against Oracle database.

@JoshRosen, you seem to know the class, would you look at this? Thanks!

@@ -223,6 +223,9 @@ object JdbcUtils extends Logging {
case java.sql.Types.STRUCT => StringType
case java.sql.Types.TIME => TimestampType
case java.sql.Types.TIMESTAMP => TimestampType
case java.sql.Types.TIMESTAMP_WITH_TIMEZONE
=> TimestampType
case -101 => TimestampType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is -101?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-101 is the code Oracle databases use for TIMESTAMP WITH TIME ZONE. There is unfortunately no equivalent in the java.sql.Types Enum (https://docs.oracle.com/javase/8/docs/api/java/sql/Types.html)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test case to OracleIntegrationSuite?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a try. The fix works in my local docker test environment. Below is the way how we run docker integration test.

build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0  -Phive-thriftserver -Phive -DskipTests  install

Before running dockers test . You need to set the docker env variabled. Following does the magic:
eval $(docker-machine env default)

build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.11  compile test

Feel free to ping me if you hit any issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @JannikArndt
Could you add a comment describing about -101 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write fix: 5 min ✅
Write test: 5 min ✅
Run test: 3 h. ✅

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this -101 thing put here instead of in the Oracle dialect?

@felixcheung
Copy link
Member

can you please rebase your PR? it's including a lot of changes from other people

@JannikArndt
Copy link
Contributor Author

@felixcheung Done

@gatorsmile
Copy link
Member

ok to test

@dongjoon-hyun
Copy link
Member

+1. LGTM. I also tested with docker integration test with/without this patch.

@gatorsmile
Copy link
Member

Yes. I also did it. Thanks! @dongjoon-hyun @JannikArndt

LGTM

@SparkQA
Copy link

SparkQA commented May 5, 2017

Test build #76498 has finished for PR 17832 at commit 113fd53.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merging to master.

@gatorsmile
Copy link
Member

@JannikArndt Could you check whether we can do the same thing for TIME WITH TIME ZONE

@asfgit asfgit closed this in b31648c May 5, 2017
@gatorsmile
Copy link
Member

NVM, I will do it in my PR. Thanks!

ghost pushed a commit to dbtsai/spark that referenced this pull request Dec 12, 2017
…ialect

## What changes were proposed in this pull request?
In the previous PRs, apache#17832 and apache#17835 , we convert `TIMESTAMP WITH TIME ZONE` and `TIME WITH TIME ZONE` to `TIMESTAMP` for all the JDBC sources. However, this conversion could be risky since it does not respect our SQL configuration `spark.sql.session.timeZone`.

In addition, each vendor might have different semantics for these two types. For example, Postgres simply returns `TIMESTAMP` types for `TIMESTAMP WITH TIME ZONE`. For such supports, we should do it case by case. This PR reverts the general support of `TIMESTAMP WITH TIME ZONE` and `TIME WITH TIME ZONE` for JDBC sources, except ORACLE Dialect.

When supporting the ORACLE's `TIMESTAMP WITH TIME ZONE`, we only support it when the JVM default timezone is the same as the user-specified configuration `spark.sql.session.timeZone` (whose default is the JVM default timezone). Now, we still treat `TIMESTAMP WITH TIME ZONE` as `TIMESTAMP` when fetching the values via the Oracle JDBC connector, whose client converts the timestamp values with time zone to the timestamp values using the local JVM default timezone (a test case is added to `OracleIntegrationSuite.scala` in this PR for showing the behavior). Thus, to avoid any future behavior change, we will not support it if JVM default timezone is different from `spark.sql.session.timeZone`

No regression because the previous two PRs were just merged to be unreleased master branch.

## How was this patch tested?
Added the test cases

Author: gatorsmile <gatorsmile@gmail.com>

Closes apache#19939 from gatorsmile/timezoneUpdate.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants