Skip to content

[HUDI-7781] Filter wrong partitions when using hoodie.datasource.write.partitions.to.delete#11260

Merged
danny0405 merged 2 commits intoapache:masterfrom
Zouxxyy:dev/fix_delete_partition
May 22, 2024
Merged

[HUDI-7781] Filter wrong partitions when using hoodie.datasource.write.partitions.to.delete#11260
danny0405 merged 2 commits intoapache:masterfrom
Zouxxyy:dev/fix_delete_partition

Conversation

@Zouxxyy
Copy link
Contributor

@Zouxxyy Zouxxyy commented May 20, 2024

Change Logs

e.g.
When the actual partition structure is dt=2021-12-09/hh=1, but mistakenly config dt=2021-12-09 in hoodie.datasource.write.partitions.to.delete, then, the partition dt=2021-12-09 will be written to the metadata of replacecommit.

If we configure HMS partition synchronization at the same time, an error will always be reported:

image

Impact

Filter wrong partitions when using hoodie.datasource.write.partitions.to.delete

Risk level (write none, low medium or high below)

low

Documentation Update

None

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:S PR with lines of changes in (10, 100] label May 20, 2024
@danny0405
Copy link
Contributor

When the actual partition structure is dt=2021-12-09/hh=1, but mistakenly config dt=2021-12-09 in hoodie.datasource.write.partitions.to.delete, then, the partition dt=2021-12-09 will be written to the metadata of replacecommit.

Should we just supplement the secondary partition paths so that the whole dt=2021-12-09 partiton can be deleted, this might be the original intention of the user.

@danny0405 danny0405 added area:table-service Table services area:sql SQL interfaces labels May 20, 2024
@Zouxxyy
Copy link
Contributor Author

Zouxxyy commented May 20, 2024

Should we just supplement the secondary partition paths so that the whole dt=2021-12-09 partiton can be deleted, this might be the original intention of the user.

yes, can just config hoodie.datasource.write.partitions.to.delete = dt=2021-12-09/*

@danny0405
Copy link
Contributor

Should we just supplement the secondary partition paths so that the whole dt=2021-12-09 partiton can be deleted, this might be the original intention of the user.

yes, can just config hoodie.datasource.write.partitions.to.delete = dt=2021-12-09/*

I mean when user decpare the partition path as dt=2021-12-09, we handle it the same way as dt=2021-12-09/*.

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@Zouxxyy
Copy link
Contributor Author

Zouxxyy commented May 20, 2024

I mean when user decpare the partition path as dt=2021-12-09, we handle it the same way as dt=2021-12-09/*.

This looks good, but for example, such as third-level partitions, dt, hh, mm, if user configures dt=1/mm=1 do we need to help the user configure as dt=1/hh=*/mm=1?

I think simple is OK, user can either specify * or just full path.

@danny0405
Copy link
Contributor

This looks good, but for example, such as third-level partitions, dt, hh, mm, if user configures dt=1/mm=1 do we need to help the user configure as dt=1/hh=*/mm=1?

I'm +1 on this.

@Zouxxyy
Copy link
Contributor Author

Zouxxyy commented May 22, 2024

@danny0405 When hive_style_partitioning is false, such as 2016/03/15, it is difficult to automatically add * to identify them.
Besides I think that for this type of configuration that requires manual setting partitions, the full path and the * are sufficient, because there is no engine to parsing them, and this is just how it was used currently.
If we want to use other more flexible solutions, can use SQL instead of manual configuration.

@Zouxxyy Zouxxyy closed this May 22, 2024
@Zouxxyy Zouxxyy reopened this May 22, 2024
@danny0405 danny0405 merged commit c7d2fc0 into apache:master May 22, 2024
@Zouxxyy Zouxxyy deleted the dev/fix_delete_partition branch May 22, 2024 10:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:sql SQL interfaces area:table-service Table services size:S PR with lines of changes in (10, 100]

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

3 participants