Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

TAJO-1925: Improve hive compatibility with TIMESTAMP partition column. #853

Open
wants to merge 38 commits into
base: master
Choose a base branch
from

Conversation

blrunner
Copy link
Contributor

@blrunner blrunner commented Nov 6, 2015

This patch contains following modifications:

  • When making partition directories, use partition name which are accepted in the format YYYY-MM-DD HH:MM:SS.MS.
  • When pruning partitions, convert UTC of EvalNode to customer timezone.
  • When building partition filter in catalog, convert UTC of algebra expression to customer timezone.
  • Add timestamp cast operation with the number of milliseconds.

And I found that the patch ran successfully with MySQLStore and HiveCatalogStore.

@blrunner
Copy link
Contributor Author

blrunner commented Nov 9, 2015

I fixed some bugs and added description about timezone with hive. For the reference, I tested the patch with tpc-h 1g dataset as following:

  • on Hive
hive> CREATE TABLE timeline(event string) PARTITIONED BY (happened timestamp);

hive> insert overwrite table timeline partition(happened=1446608033000) select n_name from nation where n_nationkey = 0;

hive> insert overwrite table timeline partition(happened=1446608037000) select n_name from nation where n_nationkey = 1;

hive> insert overwrite table timeline partition(happened=854419808037) select n_name from nation where n_nationkey = 2;

hive> insert overwrite table timeline partition(happened=1446712555983) select n_name from nation where n_nationkey = 3;

hive> select * from timeline;
OK
BRAZIL  1997-01-28 11:50:08.037
ALGERIA 2015-11-04 12:33:53
ARGENTINA   2015-11-04 12:33:57
CANADA  2015-11-05 17:35:55.983
  • on Tajo with HiveCatalogStore
default> SET TIME ZONE 'Asia/Seoul';

default> select * from timeline;
event,  happened
-------------------------------
BRAZIL,  1997-01-28 11:50:08.037
ALGERIA,  2015-11-04 12:33:53
ARGENTINA,  2015-11-04 12:33:57
CANADA,  2015-11-05 17:35:55.983
(4 rows, 0.092 sec, 0 B selected)

default> select * from timeline where happened = TIMESTAMP '2015-11-04 12:33:57';
event,  happened
-------------------------------
ARGENTINA,  2015-11-04 12:33:57
(1 rows, 0.126 sec, 0 B selected)

default> insert into timeline select n_name, TIMESTAMP '2015-11-06 14:45:48.066' from nation where n_nationkey = 10;
default> insert into timeline select n_name, TIMESTAMP '2015-11-06 14:45:48.75' from nation where n_nationkey = 11;

default> select * from timeline;
event,  happened
-------------------------------
BRAZIL,  1997-01-28 11:50:08.037
ALGERIA,  2015-11-04 12:33:53
ARGENTINA,  2015-11-04 12:33:57
CANADA,  2015-11-05 17:35:55.983
IRAN,  2015-11-06 14:45:48.066
IRAQ,  2015-11-06 14:45:48.75
(6 rows, 0.087 sec, 0 B selected)
  • on Tajo with MySQLStore
default> SET TIME ZONE 'Asia/Seoul';

default> CREATE EXTERNAL TABLE default.timeline (event TEXT) USING TEXT
PARTITION BY COLUMN(happened TIMESTAMP) LOCATION 'hdfs://localhost:9010/user/hive/warehouse/timeline';

default> select * from timeline where happened = TIMESTAMP '1997-01-28 11:50:08.037';
event,  happened
-------------------------------
BRAZIL,  1997-01-28 11:50:08.037
(1 rows, 0.074 sec, 0 B selected)

default> select * from timeline where happened = TIMESTAMP '2015-11-04 12:33:57';
event,  happened
-------------------------------
ARGENTINA,  2015-11-04 12:33:57
(1 rows, 0.033 sec, 0 B selected)

default> alter table timeline repair partition;

default> insert into timeline select n_name, TIMESTAMP '2015-11-09 16:47:45.012' from nation where n_nationkey = 24;

default> select * from timeline;
event,  happened
-------------------------------
BRAZIL,  1997-01-28 11:50:08.037
ALGERIA,  2015-11-04 12:33:53
ARGENTINA,  2015-11-04 12:33:57
CANADA,  2015-11-05 17:35:55.983
IRAN,  2015-11-06 14:45:48.066
IRAQ,  2015-11-06 14:45:48.75
UNITED STATES,  2015-11-09 16:47:45.012
(7 rows, 0.036 sec, 0 B selected)

default> select * from timeline where happened = TIMESTAMP '2015-11-09 16:47:45.012';
event,  happened
-------------------------------
UNITED STATES,  2015-11-09 16:47:45.012

@blrunner
Copy link
Contributor Author

Rebased. Basically, Tajo column partition focus on Hive partition compatibility. Currently, Hive just created timestamp partition directories using date format. I think that Tajo just need to follow current Hive version rule. Because even if we implement long value time for timestamp partition directory, Hive can't handle it.

…O-1925

Conflicts:
	tajo-core/src/main/java/org/apache/tajo/master/exec/DDLExecutor.java
@hyunsik
Copy link
Member

hyunsik commented Dec 4, 2017

Hi @blrunner,

It has been long time. IMO, This patch needs to be included in 0.12.0 release. If you are Ok, could your rebase it? Otherwise, I can do it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants