Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-5823][RFC-65] RFC for Partition TTL Management #8062

Merged
merged 9 commits into from Dec 8, 2023

Conversation

stream2000
Copy link
Contributor

@stream2000 stream2000 commented Feb 27, 2023

Change Logs

RFC-65 Partition Lifecycle Management

Impact

None

Risk level (write none, low medium or high below)

Low

Documentation Update

None

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@stream2000 stream2000 force-pushed the rfc-65-partition-ttl-management branch 2 times, most recently from 48f8ccd to b2bdc0c Compare February 27, 2023 11:22
@stream2000 stream2000 changed the title [HUDI-5823] RFC for Partition TTL Management [HUDI-5823][RFC-65] RFC for Partition TTL Management Feb 27, 2023
@leesf leesf self-assigned this Feb 28, 2023
Copy link
Member

@codope codope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stream2000 Thanks for putting up an RFC. Left some comments to consider. Can you also share the PR in the dev list thread?

rfc/rfc-65/rfc-65.md Outdated Show resolved Hide resolved

- RelativePath. Relative path to the base path, or we can call it PartitionPath.
- LastModifiedTime. Last instant time that modified the partition. We need this information to support the `KEEP_BY_TIME` policy.
- PartitionTotalFileSize. The sum of all valid data file sizes in the partition, to support the `KEEP_BY_SIZE`policy. We calculate only the latest file slices in the partition currently.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you define valid data file? Is it just the latest version of each file id i.e. latest file slice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we calculate only the latest file slice. If we want to calculate all the file slices instead of just the latest file slice, we can add a config to control the behavior or adding another stat field. Here we choose to calculate only the latest file slice because we think it reveals the real data size of the file group.

rfc/rfc-65/rfc-65.md Outdated Show resolved Hide resolved
rfc/rfc-65/rfc-65.md Outdated Show resolved Hide resolved
rfc/rfc-65/rfc-65.md Outdated Show resolved Hide resolved
Comment on lines 99 to 149
1. Gather partitions statistics.
2. Apply each TTL policy. For explicit policies, find sub-partitions matched `PartitionSpec`defined in the policy and check if there are expired partitions according to the policy type and size. For default policy, find partitions that do not match any explicit policy and check if they are expired.
3. For all expired partitions, call delete_partition to delete them, which will generate a replace commit and mark all files in those partitions as replaced. For pending clustering and compaction that affect the target partition to delete, [HUDI-5553](https://issues.apache.org/jira/browse/HUDI-5553) introduces a pre-check that will throw an exception, and further improvement could be discussed in [HUDI-5663](https://issues.apache.org/jira/browse/HUDI-5663).
4. Clean then will delete all replaced file groups.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great if you can also discuss how the read path will work? How do we maintain a consistent filesystem view for readers given that delete partition operation can take time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete_partition has been already implemented in current hudi master branch. When we call delete_partition to delete a list partitions, the executor will list all files for the partitions to delete and store them in the replacecommit commit metadata. After the replace commit committed, all the filesystem views that have seen the replace commit will exclude files that were replaced in the replace commit.

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you help me understand how is the orchestration or scheduling of these policies are doe. Is it manual via sql procedure? for eg, how often this clean up job is triggered.

RFC should cover that as well. and definitely, we should avoid adding/storing data in adhoc json files if possible.
If we plan to run this once a day or much lower candence, its not that bad a idea to collect stats on a on-demand basis, clean up and move on rather than storing the stats and ensuring we keep it updated after every commit. It will add some additional overhead to regular writes to ensure the stats are aligned w/ actual data.


1. User hopes to manage partition TTL not only by expired time but also by sub-partitions count and sub-partitions size. So we need to support the following three different TTL policy types.
1. **KEEP_BY_TIME**. Partitions will expire N days after their last modified time.
2. **KEEP_BY_COUNT**. Keep N sub-partitions for a high-level partition. When sub partition count exceeds, delete the partitions with smaller partition values until the sub-partition count meets the policy configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you help us understand the use-case here. I mean, I am trying to get an understanding of the sub-partitions here. in hudi, we have only one partitioning, but if could be multi-leveled. so, trying to see, if we can keep it high level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel, both (2) and (3) is very much catered towards multi-field partitioning like an ProductId/datstr based partitioning. can we layout high level strategies for one level partitioning as well in addition to multi-field partitioning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to simplify the strategies where in we can achieve it for both single or multi field partitioning. for eg,
TTL any partition whose last mod time (last time when data was added/updated), is > 1 month for eg. this will work for both a single field partitioning (datestr), or multi-field (productId/datestr).
Open to ideas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also call out that the sub-partitioning might work only for day based or time based sub-partitioning right. for eg, lets say, if partitioning is datestr/productId. how do we know out of 1000 productIds under a given day, which 100 is older or newer (assuming all 1000 was created in same commit).

1. **KEEP_BY_TIME**. Partitions will expire N days after their last modified time.
2. **KEEP_BY_COUNT**. Keep N sub-partitions for a high-level partition. When sub partition count exceeds, delete the partitions with smaller partition values until the sub-partition count meets the policy configuration.
3. **KEEP_BY_SIZE**. Similar to KEEP_BY_COUNT, but to ensure that the sum of the data size of all sub-partitions does not exceed the policy configuration.
2. User need to set different policies for different partitions. For example, the hudi table is partitioned by two fields (user_id, ts). For partition(user_id='1'), we set the policy to keep 100G data for all sub-partitions, and for partition(user_id='2') we set the policy that all sub-partitions will expire 10 days after their last modified time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be able to add regex and achieve this.
for eg,
Map<{PartitionRegex/Static Partitions{ -> {TTL policy} >
so, this map can have multiple entries as well.

We have three main considerations when designing TTL policy:

1. User hopes to manage partition TTL not only by expired time but also by sub-partitions count and sub-partitions size. So we need to support the following three different TTL policy types.
1. **KEEP_BY_TIME**. Partitions will expire N days after their last modified time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is last mod time. is it referring to new inserts, or updates as well ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, inserts/updates will be treated as modification to the partition. And we track them by looking the commit/deltacommit write stats.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to fetch the info when a commit is archived?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the latest version of the RFC, we use the max instant time of the committed file slices in the partition as the partition's last modified time for simplicity. Otherwise, we need some extra mechanism to get the last modified time. In our inner version, we maintain an extra JSON file and update it incrementally as new instants committed to get the real modified time for the partition. Also, we can use metadata table to track the last modify time. What do you think about this?

rfc/rfc-65/rfc-65.md Outdated Show resolved Hide resolved
We have three main considerations when designing TTL policy:

1. User hopes to manage partition TTL not only by expired time but also by sub-partitions count and sub-partitions size. So we need to support the following three different TTL policy types.
1. **KEEP_BY_TIME**. Partitions will expire N days after their last modified time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When retiring the old/unused/not-accessed partitions, another approach we are taking internally is:
(a) stash the partitions to be cleaned up in .stashedForDeletion folder (at .hoodie level).
(b) partitions stashed for deletion will wait in the folder for a week (or time dictated by the policy) before actually getting deleted. In cases, where we realize that something has been accidentally deleted (like a bad policy configuration, TTL exclusion not configured etc), we can always move back from the stash to quickly recover from the TTL event.
(c) We shall configure policies for .stashedForDeletion// subfolders to manage for appropriate tiering level (whether to be moved to a warm/cold tier etc)
(d) in addition to the deletePartitions() API, which would stash the folder (instead of deleting) based on the configs, we would need a restore API to move the subfolder/files back to their original location.
(e) Metadata left by the delete operation to be synced with MDT to keep the file listing metadata in sync with the file system. (In cases where replication to a different region is supported, this also would warrant applying the changes on the replicated copies of data).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add the stash/restore mechanism to replace commit/clean process of hudi instead of dealing with it in TTL management? TTL management should only decide which partitions are outdated and call delete_partition to delete them. If we want to retain the deleted data we can add extra mechanism in the delete_parrtition method.

@SteNicholas
Copy link
Member

@stream2000, could we also introduce record ttl management? Partition ttl management and record ttl management both need the ttl policy.


So here we have the TTL policy definition:
```java
public class HoodiePartitionTTLPolicy {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stream2000, could we introduce HoodieTTLPolicy interface? Then HoodiePartitionTTLPolicy implements the HoodieTTLPolicy. HoodieRecordTTLPolicy could also implement this interface in feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SteNicholas Nice suggestions! I will take it into consideration when implementing the policy and make sure we can integrate more type of TTL policy in the future

rfc/rfc-65/rfc-65.md Outdated Show resolved Hide resolved
@stream2000
Copy link
Contributor Author

@codope @nsivabalan @SteNicholas @nbalajee Sorry for the delayed response, I've been busy with other things these days. Will address comments and update the RFC.

@stream2000
Copy link
Contributor Author

@stream2000, could we also introduce record ttl management? Partition ttl management and record ttl management both need the ttl policy.

@SteNicholas It will be nice to have record-level ttl management~ We can open another RFC for record-level TTL management and concentrate on Partition TTL Management in this RFC. We can also discuss more on how to make it easy to integrate other types of ttl policy in this RFC.

@stream2000 stream2000 force-pushed the rfc-65-partition-ttl-management branch from cd94fd5 to d3144f1 Compare May 31, 2023 07:27
@stream2000
Copy link
Contributor Author

stream2000 commented May 31, 2023

@nsivabalan @SteNicholas Hi, thanks for your review. I have updated the RFC according to your comments.

  1. Simplify the TTL policy, now we will only support KEEP_BY_TIME so we don't need to compare partition values between partitions.
  2. Remove the persistent JSON stats mechanism. We can gather partition stats every time we do TTL management.
  3. Add more detail about executing TTL management. We will support async table services for TTL management and both spark and flink engine can execute the async service.
  4. Consider Record level TTL policy when design Pattition TTL Policy.

And here are the answer to your remaining question.

Why store the policy in hoodie.properties instead of using write config?

I think we need to store the policy in hudi somewhere otherwise user needs to store the policy themselves. Think when the user has 1000 partitions and wants to set TTL policies for 500 of them. It's better to store the policy metadata in hudi.

Hope for your another round of review!

Copy link
Member

@SteNicholas SteNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stream2000, thanks for the update of this RFC. I left the comments for the updates.

rfc/rfc-65/rfc-65.md Outdated Show resolved Hide resolved
}

// Partition spec for which the policy takes effect, could be a regex or a static partition value
private String partitionSpec;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the partitionSpec support multiple level partition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it supports regex expressions for partitions and static partition value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stream2000, for example, there is a Hudi table partitioned by date and hour. Meanwhile, the user want to configure ttl with a year. How could user configure this ttl with current policy definition? Sets the policyValue with a year and partitionSpec with */*?

rfc/rfc-65/rfc-65.md Show resolved Hide resolved
rfc/rfc-65/rfc-65.md Outdated Show resolved Hide resolved
rfc/rfc-65/rfc-65.md Outdated Show resolved Hide resolved
rfc/rfc-65/rfc-65.md Outdated Show resolved Hide resolved
Copy link
Contributor Author

@stream2000 stream2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review!

rfc/rfc-65/rfc-65.md Outdated Show resolved Hide resolved
rfc/rfc-65/rfc-65.md Outdated Show resolved Hide resolved
rfc/rfc-65/rfc-65.md Outdated Show resolved Hide resolved
check if there are expired partitions according to the policy type and size. For default policy, find partitions that
do not match any explicit policy and check if they are expired.
3. For all expired partitions, call delete_partition to delete them, which will generate a replace commit and mark all
files in those partitions as replaced. For pending clustering and compaction that affect the target partition to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is pending clustering and compaction, the ttl policy execution should clean and delete the expired file groups after clustering and compaction. The user doesn't need do any behavior for pending clustering and compaction case.

rfc/rfc-65/rfc-65.md Outdated Show resolved Hide resolved
@SteNicholas
Copy link
Member

@stream2000, do you have any updates?

@xicm
Copy link
Contributor

xicm commented Jul 21, 2023

Partition TTL can cover most scenarios, will bring a lot of convenience.
I have a simple record level TTL idea. We configure a general TTL, update the query api to ensure that expired data can't not be queried, and then delete the expired record when we do compaction or clustering.

@zyclove
Copy link

zyclove commented Sep 14, 2023

Can this feature be supported in advance? Especially hope for the next version after 0.14.

@stream2000
Copy link
Contributor Author

Can this feature be supported in advance? Especially hope for the next version after 0.14.

I hope so, too! Will work on this feature in the few weeks and push a POC code to my personal repo ASAP.

Copy link
Contributor Author

@stream2000 stream2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geserdugarov Thanks very much for your review! I think the most important part of the design to be confirmed is whether we need to provide a feature-rich but complicated TTL policy in the first place or just implement a simple but extensible policy.

@geserdugarov @codope @nsivabalan @vinothchandar Hope for your opinion for this ~

The TTL strategies is similar to existing table service strategies. We can define TTL strategies like defining a clustering/clean/compaction strategy:

```properties
hoodie.partition.ttl.management.strategy=KEEP_BY_TIME
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can change the word TTL to lifecycle if we want to support other types of data expiration policies. In current version this RFC, we do not plan to support other types of policies like KEEP_BY_SIZE since they rely on the condition that the partition values are comparable which is not always true. However we do have some users who have this kind of requirement so we need to make the policy extensible.

}
```

Users can provide their own implementation of `PartitionTTLManagementStrategy` and hudi will help delete the expired partitions.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review~

it's just providing TTL functionality by custom implementation of PartitionTTLManagementStrategy is not user friendly.

For common users that want to do TTL management, they just need to add some write configs to their ingestion job. In the simplest situation they may just add the following config to enable async ttl management:

hoodie.partition.ttl.days.retain=10
hoodie.ttl.management.async.enabled=true

The TTL policy will default to KEEP_BY_TIME and we can automatically detect expired partitions by their last modified time and delete them.

However, this simple policy may not suit for all users, they may need to implement their own partition TTL policy so we provide an abstraction PartitionTTLManagementStrategy that returns expired partitions as users need and hudi will help delete them.

This is a main scope for TTL, and we shouldn't allow to have more flexibility.
Customized implementation of PartitionTTLManagementStrategy will allow to do anything with partitions. It still could be PartitionManagementStrategy, but then we shouldn't named it with TTL part.

Customized implementation of PartitionTTLManagementStrategy will only provide a list of expired partitions, and hudi will help delete them. We can rename the word PartitionTTLManagementStrategy to PartitionLifecycleManagementStrategy since we do not always delete the partitions by time.

The `KeepByTimePartitionTTLManagementStrategy` will calculate the `lastModifiedTime` for each input partitions. If duration between now and 'lastModifiedTime' for the partition is larger than what `hoodie.partition.ttl.days.retain` configured, `KeepByTimePartitionTTLManagementStrategy` will mark this partition as an expired partition. We use day as the unit of expired time since it is very common-used for datalakes. Open to ideas for this.

we will to use the largest commit time of committed file groups in the partition as the partition's
`lastModifiedTime`. So any write (including normal DMLs, clustering etc.) with larger instant time will change the partition's `lastModifiedTime`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should always think twice before making format changes. We have considered using HoodiePartitionMetadata to provide the lastModifiedTime, however, the current implementation of partition metadata does not support transactions, and will never be modified since the first creation. Not sure it is a good choice for maintaining the lastModifiedTime


For the first version of TTL management, we do not plan to implement a complicated strategy (For example, use an array to store strategies, introduce partition regex etc.). Instead, we add a new abstract method `getPartitionPathsForTTLManagement` in `PartitionTTLManagementStrategy` and provides a new config `hoodie.partition.ttl.management.partition.selected`.

If `hoodie.partition.ttl.management.partition.selected` is set, `getPartitionPathsForTTLManagement` will return partitions provided by this config. If not, `getPartitionPathsForTTLManagement` will return all partitions in the hudi table.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be a write config, and we will only apply TTL policy to the partition specified by this config. If this config is not specified, we will apply ttl policy to all the partitions in hudi table.


TTL management strategies will only be applied for partitions return by `getPartitionPathsForTTLManagement`.

Thus, if users want to apply different strategies for different partitions, they can do the TTL management multiple times, selecting different partitions and apply corresponding strategies. And we may provide a batch interface in the future to simplify this.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I offer to allow user set only spec and TTL values with support of add, show, and delete operations.

So, if I want to implement 6 partition policies, then I should prepare 6 PartitionTTLManagementStrategys and call them 6 times?

We can implement the advanced TTL management policy in the future to provide the ability to set different policies for different partitions. To achieve this, we can introduce a new hoodie.partition.ttl.management.strategy and implement the logic you mentioned. For most users, providing a simple policy that TTL any partition whose last mod time (last time when data was added/updated), is greater than the TTL time specified will be enough.

@geserdugarov @nsivabalan What do you think about this?


### Apply different strategies for different partitions

For some specific users, they may want to apply different strategies for different partitions. For example, they may have multi partition fileds(productId, day). For partitions under `product=1` they want to keep for 30 days while for partitions under `product=2` they want to keep for 7 days only.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can implement two independent ttl policies, one is simple while another is advanced as I commented below.

@stream2000 stream2000 changed the title [HUDI-5823][RFC-65] RFC for Partition TTL Management [HUDI-5823][RFC-65] RFC for Partition Lifecycle Management Oct 20, 2023
@stream2000
Copy link
Contributor Author

Hi community, I have implemeted a poc version of partition lifecycle management in this PR: #9723. This PR provide a simple KEEP_BY_TIME strategy and can do the partition lifecycle management inline after commit. This POC implementation will help understand the design mentioned in this RFC. Hope for your review😄


In some classic hudi use cases, users partition hudi data by time and are only interested in data from a recent period
of time. The outdated data is useless and costly, we need a lifecycle management mechanism to prevent the
dataset from growing infinitely.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess you are talking about GDPR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify: this is very different, the proposal for this doc only yields target partitons, it does not remove the whole partition.


```properties
hoodie.partition.lifecycle.management.strategy=KEEP_BY_TIME
hoodie.partition.lifecycle.management.strategy.class=org.apache.hudi.table.action.lifecycle.strategy.KeepByTimePartitionLifecycleManagementStrategy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hoodie.partition.lifecycle.management.strategy -> hoodie.partition.ttl.strategy ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the word ttl means that we can't apply time unrelative partition deletion strategy in the future. However it is more suitable for current implementation since we only provide a time based strategy. What do you think? Should we use ttl directly or remain some extensibility?


| Config key | Remarks | Default |
|---------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|---------|
| hoodie.partition.lifecycle.management.async.enabled | Enable running of lifecycle management service, asynchronously as writes happen on the table. | False |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHould we handle the cleaning duty just to the cleaner service? Like we do to the delete partition cmds, you just generate a delete partiton commit metadata on the timeline and the cleaner would take care of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I really care about is the conflict resolution for the operation, does OCC applies here? Do we have some support for the basic concurrent modification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we will use delete partiton command to delete the expired partitions, this is mentioned in Background section. And conflict resolution for the operation is the same with what we do in delete partiton command.

@zyclove
Copy link

zyclove commented Nov 26, 2023

Will version 1.0 support this feature? This feature is greatly needed.
@danny0405 @stream2000

@stream2000 stream2000 force-pushed the rfc-65-partition-ttl-management branch from 932376f to 409fbdd Compare November 30, 2023 07:19
@stream2000
Copy link
Contributor Author

For 1.0.0 and later hudi version which supports efficient completion time queries on the timeline(#9565), we can get partition's lastModifiedTime by scanning the timeline and get the last write commit for the partition. Also for efficiency, we can store the partitions' last modified time and current completion time in the replace commit metadata. The next time we need to calculate the partitions' last modified time, we can build incrementally from the replace commit metadata of the last ttl management.

@danny0405 Added new lastModifiedTime calculation method for 1.0.0 and later hudi version. We plan to implement the file listing based lastModifiedTime at first and implement the timeline-based lastModifiedTime calculation in a separate PR. This will help users with earlier hudi versions easy to pick the function to their code base.

I have addressed all comments according to online/offline discussions. If there is no other concern, we can move on this~


```properties
hoodie.partition.ttl.management.strategy=KEEP_BY_TIME
hoodie.partition.ttl.management.strategy.class=org.apache.hudi.table.action.ttl.strategy.KeepByTimePartitionTTLManagementStrategy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hoodie.partition.ttl.strategy.class ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sure. Will change it later.

*
* @return Expired partition paths.
*/
public abstract List<String> getExpiredPartitionPaths();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to pass around basic info like table path or meta client so that user can customize more easily? For example, how can a customized class knows the table path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see poc code: https://github.com/apache/hudi/pull/9723/files#diff-854a305c2cf1bbe8545bbe78866f53e84a489b60390ea1cfc7f64ed61165a0aaR33

Will provide a constructor like:

public PartitionTTLManagementStrategy(HoodieTable hoodieTable) {
    this.writeConfig = hoodieTable.getConfig();
    this.hoodieTable = hoodieTable;
 }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then make it clear in the pseudocode code.


For file groups generated by replace commit, it may not reveal the real insert/update time for the file group. However, we can assume that we won't do clustering for a partition without new writes for a long time when using the strategy. And in the future, we may introduce a more accurate mechanism to get `lastModifiedTime` of a partition, for example using metadata table.

For 1.0.0 and later hudi version which supports efficient completion time queries on the timeline(#9565), we can get partition's `lastModifiedTime` by scanning the timeline and get the last write commit for the partition. Also for efficiency, we can store the partitions' last modified time and current completion time in the replace commit metadata. The next time we need to calculate the partitions' last modified time, we can build incrementally from the replace commit metadata of the last ttl management.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will only support the feature in 1.0 right? Before 1.0, we do not have good manner to know when a log file is committed in a file slice and that might be a snag when calcurating the lastCommitTime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. We should support this feature in 1.0 only to have an accurate lastCommitTime.

} else {
// Return All partition paths
return FSUtils.getAllPartitionPaths(hoodieTable.getContext(), config.getMetadataConfig(), config.getBasePath());
}
Copy link
Contributor

@danny0405 danny0405 Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the hoodieTable. got instantiated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the constructor of PartitionTTLManagementStrategy like

  protected final HoodieTable hoodieTable;
  protected final HoodieWriteConfig writeConfig;
  public PartitionTTLManagementStrategy(HoodieTable hoodieTable) {
    this.writeConfig = hoodieTable.getConfig();
    this.hoodieTable = hoodieTable;
  }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the pseudocode clear.

@geserdugarov
Copy link
Contributor

@geserdugarov Thanks very much for your review! I think the most important part of the design to be confirmed is whether we need to provide a feature-rich but complicated TTL policy in the first place or just implement a simple but extensible policy.

@geserdugarov @codope @nsivabalan @vinothchandar Hope for your opinion for this ~

Yes, you're right about the main difference in our ideas.

Copy link
Contributor Author

@stream2000 stream2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danny0405 @geserdugarov Thanks for your review! I have addressed all comments, hope for your opinions.

} else {
// Return All partition paths
return FSUtils.getAllPartitionPaths(hoodieTable.getContext(), config.getMetadataConfig(), config.getBasePath());
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the constructor of PartitionTTLManagementStrategy like

  protected final HoodieTable hoodieTable;
  protected final HoodieWriteConfig writeConfig;
  public PartitionTTLManagementStrategy(HoodieTable hoodieTable) {
    this.writeConfig = hoodieTable.getConfig();
    this.hoodieTable = hoodieTable;
  }

The TTL strategies is similar to existing table service strategies. We can define TTL strategies like defining a clustering/clean/compaction strategy:

```properties
hoodie.partition.ttl.management.strategy=KEEP_BY_TIME
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to provide the hoodie.partition.ttl.strategy abtraction so that users can implement their own strategies and reuse the infrastructures provided by RFC-65 to delete expired partitions. For example, user wants to add a new strategy that delete partitions by their specified logic.

In current design, we will implement KEEP_BY_TIME only. However, it doesn't mean that we don't need any other kind of ttl strategy in the future.


```properties
hoodie.partition.ttl.management.strategy=KEEP_BY_TIME
hoodie.partition.ttl.management.strategy.class=org.apache.hudi.table.action.ttl.strategy.KeepByTimePartitionTTLManagementStrategy
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sure. Will change it later.

*
* @return Expired partition paths.
*/
public abstract List<String> getExpiredPartitionPaths();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see poc code: https://github.com/apache/hudi/pull/9723/files#diff-854a305c2cf1bbe8545bbe78866f53e84a489b60390ea1cfc7f64ed61165a0aaR33

Will provide a constructor like:

public PartitionTTLManagementStrategy(HoodieTable hoodieTable) {
    this.writeConfig = hoodieTable.getConfig();
    this.hoodieTable = hoodieTable;
 }

}
```

Users can provide their own implementation of `PartitionTTLManagementStrategy` and hudi will help delete the expired partitions.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I don't see any other strategies except KEEP_BY_TIME for partition level TTL.

size: not applicable, could be trigger for lower levels

We need to make the design extensible and help users with their own expire logic easy to use. I think it's better to have a strategy-level abstraction.

The `KeepByTimePartitionTTLManagementStrategy` will calculate the `lastModifiedTime` for each input partitions. If duration between now and 'lastModifiedTime' for the partition is larger than what `hoodie.partition.ttl.days.retain` configured, `KeepByTimePartitionTTLManagementStrategy` will mark this partition as an expired partition. We use day as the unit of expired time since it is very common-used for datalakes. Open to ideas for this.

we will to use the largest commit time of committed file groups in the partition as the partition's
`lastModifiedTime`. So any write (including normal DMLs, clustering etc.) with larger instant time will change the partition's `lastModifiedTime`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, leverage .hoodie_partition_metadata will bring format change, and it doesn't support any kind of transaction currently. As discussed with @danny0405 , In 1.0.0 and later version which supports efficient completion time queries on the timeline(#9565), we will have a more elegant way to get the lastCommitTime.
You can see the the updated RFC for details.


For file groups generated by replace commit, it may not reveal the real insert/update time for the file group. However, we can assume that we won't do clustering for a partition without new writes for a long time when using the strategy. And in the future, we may introduce a more accurate mechanism to get `lastModifiedTime` of a partition, for example using metadata table.

For 1.0.0 and later hudi version which supports efficient completion time queries on the timeline(#9565), we can get partition's `lastModifiedTime` by scanning the timeline and get the last write commit for the partition. Also for efficiency, we can store the partitions' last modified time and current completion time in the replace commit metadata. The next time we need to calculate the partitions' last modified time, we can build incrementally from the replace commit metadata of the last ttl management.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. We should support this feature in 1.0 only to have an accurate lastCommitTime.


### Apply different strategies for different partitions

For some specific users, they may want to apply different strategies for different partitions. For example, they may have multi partition fileds(productId, day). For partitions under `product=1` they want to keep for 30 days while for partitions under `product=2` they want to keep for 7 days only.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can fire an RFC about your design of the wildcard spec version of TTL strategy, make it clear about how user should use this kind of strategy, the data structure of the strategy, how to store them(and problems about it), etc. We can have a discussion about issues of this kind strategy in that RFC.

And implement of this advanced strategy can be added in the future by implementing a new kind of ttl strategy.

I prefer to provide a simple strategy that we can configure with a few simple and easy-to-understand write configs. @danny0405 What do you think?


For the first version of TTL management, we do not plan to implement a complicated strategy (For example, use an array to store strategies, introduce partition regex etc.). Instead, we add a new abstract method `getPartitionPathsForTTLManagement` in `PartitionTTLManagementStrategy` and provides a new config `hoodie.partition.ttl.management.partition.selected`.

If `hoodie.partition.ttl.management.partition.selected` is set, `getPartitionPathsForTTLManagement` will return partitions provided by this config. If not, `getPartitionPathsForTTLManagement` will return all partitions in the hudi table.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this hoodie.partition.ttl.management.partition.selected after we implement the advance strategy in the future.


TTL management strategies will only be applied for partitions return by `getPartitionPathsForTTLManagement`.

Thus, if users want to apply different strategies for different partitions, they can do the TTL management multiple times, selecting different partitions and apply corresponding strategies. And we may provide a batch interface in the future to simplify this.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can fire an RFC about this, and we can have a discussion about your design in detail. And it's not conflict with the design in RFC-65.

@geserdugarov
Copy link
Contributor

geserdugarov commented Dec 5, 2023

@danny0405 @geserdugarov Thanks for your review! I have addressed all comments, hope for your opinions.

@stream2000 Thank you for your reply. As you mentioned, I've wrote all my ideas in a separate RFC to simplify details discussion: #10248
@danny0405 @stream2000 Do you mind me asking for your comments?

@danny0405
Copy link
Contributor

@geserdugarov I'm gonna merge this one first, can you fire your suggestion chagne PR incrementally based on this one? And we can move the discussions there.

@danny0405 danny0405 merged commit 60b668f into apache:master Dec 8, 2023
2 checks passed
@geserdugarov
Copy link
Contributor

geserdugarov commented Dec 8, 2023

@geserdugarov I'm gonna merge this one first, can you fire your suggestion chagne PR incrementally based on this one? And we can move the discussions there.

@danny0405 Ok, I will work on it.

@stream2000 stream2000 changed the title [HUDI-5823][RFC-65] RFC for Partition Lifecycle Management [HUDI-5823][RFC-65] RFC for Partition TTL Management Dec 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

None yet