Skip to content

[SPARK-2190][SQL] Specialized ColumnType for Timestamp#1440

Closed
liancheng wants to merge 5 commits intoapache:masterfrom
liancheng:timestamp-column-type
Closed

[SPARK-2190][SQL] Specialized ColumnType for Timestamp#1440
liancheng wants to merge 5 commits intoapache:masterfrom
liancheng:timestamp-column-type

Conversation

@liancheng
Copy link
Contributor

JIRA issue: SPARK-2190

Added specialized in-memory column type for Timestamp. Whitelisted all timestamp related Hive tests except timestamp_udf, which is timezone sensitive.

@SparkQA
Copy link

SparkQA commented Jul 16, 2014

QA tests have started for PR 1440. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16725/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 16, 2014

QA results for PR 1440:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16725/consoleFull

@liancheng
Copy link
Contributor Author

Hmm, just realized Timestamp.toString normalizes date and time according to current timezone and makes almost all timestamp related tests timezone sensitive. (Wouldn't notice this if I were in US...) Guess we have to blacklist them for now, and this will revert part of #1396.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, SimpleDateFormat is not thread-safe, so def should be used instead of val.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked TimestampWritable.java, it's indeed handled with a thread local variable. Thanks for pointing this out!

@liancheng
Copy link
Contributor Author

Confirmed that the following test cases are timezone sensitive and blacklisted them (by first remove all timestamp related golden answers, run them in my local timezone to generate new golden answers, then manually change my timezone settings and rerun these tests):

  • timestamp_1
  • timestamp_2
  • timestamp_3 *
  • timestamp_udf *

[*] Reverted from #1396.

@SparkQA
Copy link

SparkQA commented Jul 16, 2014

QA tests have started for PR 1440. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16728/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 16, 2014

QA tests have started for PR 1440. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16729/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 16, 2014

QA results for PR 1440:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16728/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 16, 2014

QA results for PR 1440:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16729/consoleFull

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Spark style would probably prefer != here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I read somewhere ne has better performance in certain cases...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see... I forgot that ne/eq do reference equality... in the case of null I would imagine there is no difference, but this is probably fine then.

@SparkQA
Copy link

SparkQA commented Jul 16, 2014

QA tests have started for PR 1440. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16749/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 17, 2014

QA results for PR 1440:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16749/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 21, 2014

QA tests have started for PR 1440. This patch DID NOT merge cleanly!
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16894/consoleFull

@liancheng
Copy link
Contributor Author

Rebased to the most recent master, generated all golden answer files after setting timezone of my local machine to America/Los_Angeles and hardcoded the timezone in HiveCompatibilitySuite to enable those timezone sensitive tests. Hope Jenkins is happy with this.

@SparkQA
Copy link

SparkQA commented Jul 21, 2014

QA tests have started for PR 1440. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16895/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 21, 2014

QA results for PR 1440:
- This patch PASSES unit tests.

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16894/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 21, 2014

QA results for PR 1440:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16895/consoleFull

@marmbrus
Copy link
Contributor

Thanks! I've merged this into master.

@asfgit asfgit closed this in cd273a2 Jul 21, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
JIRA issue: [SPARK-2190](https://issues.apache.org/jira/browse/SPARK-2190)

Added specialized in-memory column type for `Timestamp`. Whitelisted all timestamp related Hive tests except `timestamp_udf`, which is timezone sensitive.

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes apache#1440 from liancheng/timestamp-column-type and squashes the following commits:

e682175 [Cheng Lian] Enabled more timezone sensitive Hive tests.
53a358f [Cheng Lian] Fixed failed test suites
01b592d [Cheng Lian] Fixed SimpleDateFormat thread safety issue
2a59343 [Cheng Lian] Removed timezone sensitive Hive timestamp tests
45dd05d [Cheng Lian] Added Timestamp specific in-memory columnar representation
@liancheng liancheng deleted the timestamp-column-type branch September 5, 2014 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants