Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Spark]: Support drop partition #918

Closed
1 of 2 tasks
hellojinsilei opened this issue Dec 12, 2022 · 2 comments · Fixed by #940
Closed
1 of 2 tasks

[Feature][Spark]: Support drop partition #918

hellojinsilei opened this issue Dec 12, 2022 · 2 comments · Fixed by #940
Assignees
Labels
module:mixed-spark Spark module for Mixed Format type:feature Feature Requests
Milestone

Comments

@hellojinsilei
Copy link
Contributor

hellojinsilei commented Dec 12, 2022

Description

Now Arctic Spark can not drop partitions, drop partitions is a necessary feature
Hope can support SQL like this:
ALTER TABLE table_identifier DROP [ IF EXISTS ] partition_spec [PURGE]

Use case/motivation

No response

Describe the solution

No response

Subtasks

No response

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@hellojinsilei hellojinsilei added the type:feature Feature Requests label Dec 12, 2022
@hellojinsilei hellojinsilei added this to the Release 0.4.1 milestone Dec 13, 2022
@hellojinsilei hellojinsilei added the module:mixed-spark Spark module for Mixed Format label Dec 13, 2022
@hellojinsilei
Copy link
Contributor Author

hellojinsilei commented Dec 14, 2022

What are the semantics of arctic drop partition?

In Iceberg, It only has the operation to drop the partition field, and dropping a partition field is a metadata operation and does not change any of the existing table data.
In Hive, dropping partition will delete data files and the table structure information.

Do we use hive tables to refer to the implementation of Hive, while non-hive tables refer to the implementation method of Iceberg? But how do we ensure that the table structure is consistent in this way? Or is there a better way?
@zhoujinsong @baiyangtx cc

@baiyangtx
Copy link
Contributor

baiyangtx commented Dec 14, 2022

What are the semantics of arctic drop partition?

In Iceberg, It only has the operation to drop the partition field, and dropping a partition field is a metadata operation and does not change any of the existing table data. In Hive, dropping partition will delete data files and the table structure information.

Do we use hive tables to refer to the implementation of Hive, while non-hive tables refer to the implementation method of Iceberg? But how do we ensure that the table structure is consistent in this way? Or is there a better way? @zhoujinsong @baiyangtx cc

I think the drop partition operation in arctic should convert to delete from table where pt=xxx, just simply remove files reference in the snapshot, and remove partition metadata in HMS, considering this is a DDL statement, so arctic should not write the EQ_DELETE_FILE in change store.

The Iceberg optimize rule RewriteDelete handle the delete statement, if it is a metadata delete, iceberg will not write any file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:mixed-spark Spark module for Mixed Format type:feature Feature Requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants