[Feature][Spark]: Support drop partition #918

hellojinsilei · 2022-12-12T08:40:51Z

Description

Now Arctic Spark can not drop partitions, drop partitions is a necessary feature
Hope can support SQL like this:
ALTER TABLE table_identifier DROP [ IF EXISTS ] partition_spec [PURGE]

Use case/motivation

No response

Describe the solution

No response

Subtasks

No response

Related issues

No response

Are you willing to submit a PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

hellojinsilei · 2022-12-14T06:06:20Z

What are the semantics of arctic drop partition?

In Iceberg, It only has the operation to drop the partition field, and dropping a partition field is a metadata operation and does not change any of the existing table data.
In Hive, dropping partition will delete data files and the table structure information.

Do we use hive tables to refer to the implementation of Hive, while non-hive tables refer to the implementation method of Iceberg? But how do we ensure that the table structure is consistent in this way? Or is there a better way？
@zhoujinsong @baiyangtx cc

baiyangtx · 2022-12-14T08:24:34Z

What are the semantics of arctic drop partition?

In Iceberg, It only has the operation to drop the partition field, and dropping a partition field is a metadata operation and does not change any of the existing table data. In Hive, dropping partition will delete data files and the table structure information.

Do we use hive tables to refer to the implementation of Hive, while non-hive tables refer to the implementation method of Iceberg? But how do we ensure that the table structure is consistent in this way? Or is there a better way？ @zhoujinsong @baiyangtx cc

I think the drop partition operation in arctic should convert to delete from table where pt=xxx, just simply remove files reference in the snapshot, and remove partition metadata in HMS, considering this is a DDL statement, so arctic should not write the EQ_DELETE_FILE in change store.

The Iceberg optimize rule RewriteDelete handle the delete statement, if it is a metadata delete, iceberg will not write any file.

hellojinsilei added the type:feature Feature Requests label Dec 12, 2022

hellojinsilei added this to the Release 0.4.1 milestone Dec 13, 2022

hellojinsilei added the module:mixed-spark Spark module for Mixed Format label Dec 13, 2022

baiyangtx assigned hellojinsilei Dec 13, 2022

hellojinsilei mentioned this issue Dec 14, 2022

[ARCTIC-918][SPARK]Feature support drop partition in Spark #940

Merged

3 tasks

baiyangtx closed this as completed in #940 Jan 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature][Spark]: Support drop partition #918

[Feature][Spark]: Support drop partition #918

hellojinsilei commented Dec 12, 2022 •

edited

hellojinsilei commented Dec 14, 2022 •

edited

baiyangtx commented Dec 14, 2022 •

edited

[Feature][Spark]: Support drop partition #918

[Feature][Spark]: Support drop partition #918

Comments

hellojinsilei commented Dec 12, 2022 • edited

Description

Use case/motivation

Describe the solution

Subtasks

Related issues

Are you willing to submit a PR?

Code of Conduct

hellojinsilei commented Dec 14, 2022 • edited

baiyangtx commented Dec 14, 2022 • edited

hellojinsilei commented Dec 12, 2022 •

edited

hellojinsilei commented Dec 14, 2022 •

edited

baiyangtx commented Dec 14, 2022 •

edited