Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

TAJO-1644: When inserting empty data into a partitioned table, existing data would be removed.#601

Closed
blrunner wants to merge 16 commits intoapache:masterfrom
blrunner:TAJO-1644
Closed

TAJO-1644: When inserting empty data into a partitioned table, existing data would be removed.#601
blrunner wants to merge 16 commits intoapache:masterfrom
blrunner:TAJO-1644

Conversation

@blrunner
Copy link
Copy Markdown
Contributor

When inserting empty data into a partitioned table, existing data would be removed. Tajo provides column value partition which is hive-style partition. In hive, when inserting empty data into a partition, there are two cases. If you use dynamic partitions, existing data never would be removed. But if you don't use dynamic partitions, existing data would be removed. When inserting a data to partition, tajo user don't specify each column and each column value. So, it is similar to dynamic partition of hive. It seems to update deletion logic of partitioned table.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better if you rename verifyMaintainExistingData to verifyKeptExistingData.

@blrunner
Copy link
Copy Markdown
Contributor Author

Thanks @hyunsik

I've just updated the patch using your comments. :)

@hyunsik
Copy link
Copy Markdown
Member

hyunsik commented Jun 19, 2015

Hi @blrunner,

Thank you for your work. How about the insert overwrite behavior for not partitioned table? Does that the same semantic?

@blrunner
Copy link
Copy Markdown
Contributor Author

Hi @hyunsik

Thank you for your review. I also agree with you. But when inserting empty data into a non partitioned table in hive, existing data would be removed always. I'm a bit anxious about user's confusion between hive and tajo.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about TABLE_PARTITION_NO_RESULT_OVERWRITE_ENABLED for the config name? It seems to be more intuitive for me.

@blrunner
Copy link
Copy Markdown
Contributor Author

@hyunsik

Thank you for your detailed review.
I've just updated the patch using your comments.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one more suggestion. It would be great if you use PARTITION instead of TABLE_PARTITION. It is still clear because we only use the term 'partition' to indicate table partition.

@hyunsik
Copy link
Copy Markdown
Member

hyunsik commented Jun 24, 2015

+1
Looks good to me. I leaved one trivial comment. If you agree, you can commit the patch after you reflect my comment.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants