Skip to content

[HUDI-5241] Optimize HoodieDefaultTimeline API#7241

Merged
YannByron merged 3 commits intoapache:masterfrom
YannByron:hudi_dtl
Nov 28, 2022
Merged

[HUDI-5241] Optimize HoodieDefaultTimeline API#7241
YannByron merged 3 commits intoapache:masterfrom
YannByron:hudi_dtl

Conversation

@YannByron
Copy link
Contributor

Change Logs

  • rename the origin getInstants to getInstantsAsStream.
  • add a new getInstants that return a list.
  • make sure that only use getInstants interface when using this.instants.

Impact

LOW

Risk level (write none, low medium or high below)

LOW

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@YannByron YannByron force-pushed the hudi_dtl branch 3 times, most recently from 8b5df16 to 72fc3f6 Compare November 22, 2022 02:52
@YannByron
Copy link
Contributor Author

@hudi-bot run azure

@YannByron YannByron requested a review from codope November 24, 2022 15:21
Copy link
Member

@xushiyan xushiyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codope do you have concern on the public API changes? like from users' perspective

Comment on lines +338 to +342
Stream<HoodieInstant> getInstants();
default Stream<HoodieInstant> getInstantsAsStream() {
return getInstants().stream();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we follow the existing convention to keep default impl. in HoodieDefaultTimeline? getInstants() is not implemented at this level hence getInstantsAsStream() may not be optimized to directly call stream(); for e.g. there could be a timeline subclass which is backed by stream of instants so you can return the stream directly instead of getInstants().stream(). let's leave the impl. to subclasses

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. have fixed.

Comment on lines +349 to +351
default List<HoodieInstant> getAndCopyInstants() {
return new ArrayList<>(getInstants());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

also we should use new ArrayList<> for getInstants() and never expose instants to users. so we should just have getInstants, and make javadoc clear about the behavior and usage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Comment on lines +339 to +344
Stream<HoodieInstant> getInstantsAsStream();

/**
* @return Get tht list of instants
*/
List<HoodieInstant> getInstants();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be ok to change. I have checked the usages of HoodieTimeline in other query engines Trino/Presto. We are good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @codope

@codope
Copy link
Member

codope commented Nov 28, 2022

@YannByron Just curious as to how much perf improvements does this change bring?

@codope codope added the area:performance Performance optimizations label Nov 28, 2022
@YannByron
Copy link
Contributor Author

@YannByron Just curious as to how much perf improvements does this change bring?

It is not about perf improvements. It make api suitable for users' perception, and convenient to extend, like RFC-36(https://cwiki.apache.org/confluence/display/HUDI/RFC-36%3A+HUDI+Metastore+Server)

@YannByron YannByron merged commit 78a0047 into apache:master Nov 28, 2022
fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:performance Performance optimizations

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

6 participants