-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-6725] Support efficient completion time queries on the timeline #9565
Conversation
// the 'startTime' should be out of the eager loading range, switch to a lazy loading. | ||
// This operation is resource costly. | ||
HoodieArchivedTimeline.loadInstants(metaClient, | ||
new EQTsFilter(startTime), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might switch to point-query API on parquet if we have that.
...udi-client-common/src/main/java/org/apache/hudi/client/timeline/CompletionTimeQueryView.java
Outdated
Show resolved
Hide resolved
...udi-client-common/src/main/java/org/apache/hudi/client/timeline/CompletionTimeQueryView.java
Show resolved
Hide resolved
...udi-client-common/src/main/java/org/apache/hudi/client/timeline/CompletionTimeQueryView.java
Show resolved
Hide resolved
private final Map<String, String> startToCompletionInstantTimeMap; | ||
|
||
/** | ||
* The start instant time to eagerly load from, by default load last 3 days of completed instants. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some internal config to control last N days of completed instants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably in the following-up PR, for this patch, we can firstly expose it as a consructor param.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please file a followup JIRA.
if (completionTime != null) { | ||
return Option.of(completionTime); | ||
} | ||
if (HoodieTimeline.compareTimestamps(startTime, GREATER_THAN, this.firstInstantOnActiveTimeline)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not following this logic. If startTime > firstInstantOnActiveTimeline, it doesn't mean that instant is still pending right. Probably i'm missing something. Can you please explain with this example? Let's say firstInstantOnActiveTimeline has start time t0 and completion time t1. Another instant that has startTime t2 and completionTime t3. Here t2 > t0 but it is still completed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not following this logic. If startTime > firstInstantOnActiveTimeline, it doesn't mean that instant is still pending right.
It means it is pending, because in the #load
method, we already put all the completed instants of the active timeline into the map, if the map does not contain the startTime as a key, then it means the instant is pending.
* The constructor. | ||
* | ||
* @param metaClient The table meta client. | ||
* @param startInstant The earliest instant time to eagerly load from, by default load last 3 days of completed instants. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to match or infer from archival frequency.
Change Logs
Add a tool to query completion time efficiently on both active & archived timeline.
Impact
none
Risk level (write none, low medium or high below)
none
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist