Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

TAJO-1345: Implement logical plan part and DDL executor for alter partition.#618

Closed
blrunner wants to merge 28 commits intoapache:masterfrom
blrunner:TAJO-1345
Closed

TAJO-1345: Implement logical plan part and DDL executor for alter partition.#618
blrunner wants to merge 28 commits intoapache:masterfrom
blrunner:TAJO-1345

Conversation

@blrunner
Copy link
Copy Markdown
Contributor

See the title. The main objective of this issue is to implement the logical planning part and the DDL executor part for alter table partition support.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. it's very trivial, but this line seems unnecessary.

@jihoonson
Copy link
Copy Markdown
Contributor

Hi @blrunner, thank you for the patch. I left trivial comments only.

In addition to them, I have one more comment.
Add and drop partition statements seems to be allowed for only file based stores such as HDFS or local file system, right? If so, the partition location should be verified.
Also, the existence of partition location should be checked.

@blrunner
Copy link
Copy Markdown
Contributor Author

blrunner commented Jul 5, 2015

Hi @jihoonson

Thanks for your detailed review. I updated the patch using your comments a few days ago.
And I tested the partition location rule for hive. When adding partition on managed table or external table in hive, if the partition's directory doesn't exist, hive make the directory by force. And when dropping partition on managed table, hive will remove the directory by force, But when dropping partition on external table, hive will not remove the directory. If users want to remove the directory with drop partition statement, they need to specify PURGE option. It seems good to me because tajo partition had been made in reference to hive partition.

@blrunner
Copy link
Copy Markdown
Contributor Author

blrunner commented Jul 5, 2015

Hi @jihoonson

I have updated the patch as following:

  • Added PURGE option.
  • When adding partition to a table, the partition's location will be made by force.
  • When dropping partition for a managed table, the location and meta data will be removed.
  • When dropping partition for an external table, the location will not be remove. Only meta data will be remove. If users want to remove the location, users need to specify PURGE option.
  • Added a document for alter table statement.

Could you review the patch again?

@jihoonson
Copy link
Copy Markdown
Contributor

@blrunner thanks for your nice work. I totally agree on that we follow Hive's partition policy.
However, I found the invalid partition key generation error as follows.

default> create table partitioned_nation (n_name text, n_comment text) partition by column (n_nationkey int8, n_regionkey int8) as select * from nation;

default> \d partitioned_nation

table name: default.partitioned_nation
table uri: hdfs://localhost:7020/tajo/warehouse/default/partitioned_nation
store type: CSV
number of rows: 25
volume: 267 B
Options: 
    'text.delimiter'='|'

schema: 
n_name  TEXT
n_comment   TEXT

Partitions: 
type:COLUMN
columns::n_nationkey (INT8), n_regionkey (INT8)

default> \dfs -ls hdfs://localhost:7020/tajo/warehouse/default/partitioned_nation/n_nationkey=0
Found 5 items
drwxr-xr-x   - jihoonson supergroup          0 2015-07-06 16:29 hdfs://localhost:7020/tajo/warehouse/default/partitioned_nation/n_nationkey=0/n_regionkey= haggle. carefully final deposits detect slyly agai
drwxr-xr-x   - jihoonson supergroup          0 2015-07-06 16:29 hdfs://localhost:7020/tajo/warehouse/default/partitioned_nation/n_nationkey=0/n_regionkey= pending excuses haggle furiously deposits. pending, express pinto beans wake fluffily past t
drwxr-xr-x   - jihoonson supergroup          0 2015-07-06 16:29 hdfs://localhost:7020/tajo/warehouse/default/partitioned_nation/n_nationkey=0/n_regionkey=rns. blithely bold courts among the closely regular packages use furiously bold platelets%3F
drwxr-xr-x   - jihoonson supergroup          0 2015-07-06 16:29 hdfs://localhost:7020/tajo/warehouse/default/partitioned_nation/n_nationkey=0/n_regionkey=s. ironic, unusual asymptotes wake blithely r
drwxr-xr-x   - jihoonson supergroup          0 2015-07-06 16:29 hdfs://localhost:7020/tajo/warehouse/default/partitioned_nation/n_nationkey=0/n_regionkey=ven packages wake quickly. regu

As you can see, the actual values of the partition key n_regionkey are those of n_comment column.

@jihoonson
Copy link
Copy Markdown
Contributor

This problem looks not to be related to this patch. I'll finish my review soon.

@jihoonson
Copy link
Copy Markdown
Contributor

@blrunner, I have some more comments as follows.

  • The error message should be improved when the given column is not a partition key. Here is an example.
default> alter table partitioned_nation add partition (n_comment=1, n_nationkey=2) location 'hdfs://localhost:7020/tajo/warehouse/default/partitioned_nation/col1=test/col2=2';
ERROR: n_comment
  • I wonder that what will happen when an add partition statement is executed with the already existing location as follows.
  • The add partition statement does not create the full path of partition as follows.
default> alter table partitioned_nation add partition (n_regionkey=1, n_nationkey=2) location 'hdfs://localhost:7020/tajo/warehouse/default/partitioned_nation/col1=test/col2=2';
OK
default> \dfs -ls hdfs://localhost:7020/tajo/warehouse/default/partitioned_nation/col1=test
default> 

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be wrong.

@blrunner
Copy link
Copy Markdown
Contributor Author

blrunner commented Jul 7, 2015

Thanks @hyunsik .
I've updated the document. :)

@blrunner
Copy link
Copy Markdown
Contributor Author

blrunner commented Jul 7, 2015

Thanks @jihoonson and @hyunsik.
I've update the patch using your comments. :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The imports in this source code are unused. Please remove them. I know that this is not related to your changes.

@hyunsik
Copy link
Copy Markdown
Member

hyunsik commented Jul 14, 2015

In overall, the patch looks good to me. I leaved few comments.

@blrunner
Copy link
Copy Markdown
Contributor Author

@hyunsik

Thank you for your detailed review.
I've updated the patch using your comments.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message means that the table of tableName does not have the partitionKey column.
It would be better to print the partitionKey column is not the partition key of the tableName table.

@jihoonson
Copy link
Copy Markdown
Contributor

It seems that there is another problem. That is, given a partitioned table, alter table add partition is executed in two ways.
Here is an example.

tpch> create table partitioned_nation (n_name text, n_comment text) using text partition by column (n_regionkey int8, n_nationkey int8) as select n_name, n_comment, n_regionkey, n_nationkey from nation;
tpch> \dfs -ls /tajo/warehouse/warehouse/tpch/partitioned_nation/n_regionkey=1
Found 5 items
drwxr-xr-x   - jihoon supergroup          0 2015-07-15 00:22 /tajo/warehouse/warehouse/tpch/partitioned_nation/n_regionkey=1/n_nationkey=1
drwxr-xr-x   - jihoon supergroup          0 2015-07-15 00:22 /tajo/warehouse/warehouse/tpch/partitioned_nation/n_regionkey=1/n_nationkey=17
drwxr-xr-x   - jihoon supergroup          0 2015-07-15 00:22 /tajo/warehouse/warehouse/tpch/partitioned_nation/n_regionkey=1/n_nationkey=2
drwxr-xr-x   - jihoon supergroup          0 2015-07-15 00:22 /tajo/warehouse/warehouse/tpch/partitioned_nation/n_regionkey=1/n_nationkey=24
drwxr-xr-x   - jihoon supergroup          0 2015-07-15 00:22 /tajo/warehouse/warehouse/tpch/partitioned_nation/n_regionkey=1/n_nationkey=3

As you can see in the above, the partition of (n_regionkey=1,n_nationkey=2) exists after creating the partition table.
When I run the following alter table add partition statement, it is successfully executed. However, I think that this should be failed because there already exists a partition for the same partition key.

tpch> alter table partitioned_nation add partition (n_regionkey=1, n_nationkey=2) location 'hdfs://localhost:7020/tajo/warehouse/default/partitioned_nation/col1=test2/col2=3';
OK

In addition, when I run the following alter table add partition statement, it is failed as I expected above. But, the inconsistent execution of alter table add partition must be fixed.

tpch> alter table partitioned_nation add partition (n_regionkey=1, n_nationkey=2) location 'hdfs://localhost:7020/tajo/warehouse/default/partitioned_nation/col1=test1/col2=2';
ERROR: partition "n_regionkey=1/n_nationkey=2 already exist in "partitioned_nation"

@blrunner
Copy link
Copy Markdown
Contributor Author

Hi @jihoonson

Thank you for your detailed test. I also think that it looks like a problem. But currently, when creating partitions using DDL, tajo just make partitioned directories and doesn't add partitions to catalog. But when executing alter table add partition, this statement just check partitions from catalog. Thus it doesn't know partitions which are generated by DDL. This issue would be resolved automatically at TAJO-1346.

@jihoonson
Copy link
Copy Markdown
Contributor

Ok. But the last query is failed even though the location is different.

@blrunner
Copy link
Copy Markdown
Contributor Author

Catalog check the identity of partition using the name of partition instead of location. In above case, the name of partition is n_regionkey=1,n_nationkey=2. If you didn't execute alter table drop partition statement, the partition of (n_regionkey=1,n_nationkey=2) exists on catalog. Could you execute alter table add partition statement after executing alter table drop partition statement?

@blrunner
Copy link
Copy Markdown
Contributor Author

Sorry @jihoonson

I misunderstood the question and will add the exception for this case. :-)

@blrunner
Copy link
Copy Markdown
Contributor Author

@jihoonson

I added the exception for above case.
If there is a directory which is assumed to be a partitioned directory, tajo will throw exception as follows.

default> create table partitioned_nation (n_name text, n_comment text) using text partition by column (n_regionkey int8, n_nationkey int8) as select n_name, n_comment, n_regionkey, n_nationkey from nation;

default> \dfs -ls /tajo/warehouse/default/partitioned_nation/n_regionkey=1
Found 5 items
drwxr-xr-x   - blrunner supergroup          0 2015-07-15 16:21 /tajo/warehouse/default/partitioned_nation/n_regionkey=1/n_nationkey=1
drwxr-xr-x   - blrunner supergroup          0 2015-07-15 16:21 /tajo/warehouse/default/partitioned_nation/n_regionkey=1/n_nationkey=17
drwxr-xr-x   - blrunner supergroup          0 2015-07-15 16:21 /tajo/warehouse/default/partitioned_nation/n_regionkey=1/n_nationkey=2
drwxr-xr-x   - blrunner supergroup          0 2015-07-15 16:21 /tajo/warehouse/default/partitioned_nation/n_regionkey=1/n_nationkey=24
drwxr-xr-x   - blrunner supergroup          0 2015-07-15 16:21 /tajo/warehouse/default/partitioned_nation/n_regionkey=1/n_nationkey=3

default> alter table partitioned_nation add partition (n_regionkey=1, n_nationkey=2) location 'hdfs://localhost:9010/tajo/warehouse/default/partitioned_nation/col1=test2/col2=3';
ERROR: There is a directory which is assumed to be a partitioned directory : hdfs://localhost:9010/tajo/warehouse/default/partitioned_nation/n_regionkey=1/n_nationkey=2

default>     alter table partitioned_nation add partition (n_regionkey=1, n_nationkey=2) location 'hdfs://localhost:9010/tajo/warehouse/default/partitioned_nation/col1=test1/col2=2';
ERROR: There is a directory which is assumed to be a partitioned directory : hdfs://localhost:9010/tajo/warehouse/default/partitioned_nation/n_regionkey=1/n_nationkey=2

@jihoonson
Copy link
Copy Markdown
Contributor

@blrunner thanks for update.
However, there still remains a similar problem when dropping a partition. That is, newly added partitions with alter table statement are successfully dropped, but others are not.
Here is an example.

tpch> \d partitioned_nation

table name: tpch.partitioned_nation
table uri: hdfs://localhost:7020/tajo/warehouse/warehouse/tpch/partitioned_nation
store type: text
number of rows: 25
volume: 2.1 kB
Options: 
    'text.delimiter'='|'

schema: 
n_name  TEXT
n_comment   TEXT

Partitions: 
type:COLUMN
columns::n_regionkey (INT8), n_nationkey (INT8)

tpch> \dfs -ls hdfs://localhost:7020/tajo/warehouse/warehouse/tpch/partitioned_nation/n_regionkey=1
Found 5 items
drwxr-xr-x   - jihoon supergroup          0 2015-07-17 10:20 hdfs://localhost:7020/tajo/warehouse/warehouse/tpch/partitioned_nation/n_regionkey=1/n_nationkey=1
drwxr-xr-x   - jihoon supergroup          0 2015-07-17 10:20 hdfs://localhost:7020/tajo/warehouse/warehouse/tpch/partitioned_nation/n_regionkey=1/n_nationkey=17
drwxr-xr-x   - jihoon supergroup          0 2015-07-17 10:20 hdfs://localhost:7020/tajo/warehouse/warehouse/tpch/partitioned_nation/n_regionkey=1/n_nationkey=2
drwxr-xr-x   - jihoon supergroup          0 2015-07-17 10:20 hdfs://localhost:7020/tajo/warehouse/warehouse/tpch/partitioned_nation/n_regionkey=1/n_nationkey=24
drwxr-xr-x   - jihoon supergroup          0 2015-07-17 10:20 hdfs://localhost:7020/tajo/warehouse/warehouse/tpch/partitioned_nation/n_regionkey=1/n_nationkey=3

tpch> alter table partitioned_nation drop partition (n_regionkey=1, n_nationkey=2);
ERROR: "n_regionkey=1/n_nationkey=2" is not the partition of "partitioned_nation".
tpch> alter table partitioned_nation drop partition (n_regionkey=10, n_nationkey=2);
OK

As you can see in this example, the partition of (n_regionkey=1, n_nationkey=2) exists, but dropping it is failed.

The fundamental problem seems that partition information is not maintained by catalog. I believe that this will be resolved in https://issues.apache.org/jira/browse/TAJO-1346, and thus the above problem will be resolved, too.
So, I'll give +1. Thanks for your long time work.

@blrunner
Copy link
Copy Markdown
Contributor Author

Hi @jihoonson

Thanks for your review of great depth.
As you commented, above case will be resolved automatically in TAJO-1346. I'll commit it to the master branch soon. :-)

@asfgit asfgit closed this in 3dba8c7 Jul 17, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants