perf: cache row if it is a transformed row#12113
Merged
suneet-s merged 2 commits intoapache:masterfrom Feb 15, 2022
Merged
Conversation
FrankChen021
reviewed
Jan 4, 2022
| } else { | ||
| return row.getTimestamp(); | ||
| } | ||
| return DateTimes.utc(getTimestampFromEpoch()); |
Member
There was a problem hiding this comment.
Why not cache the DateTime object instead of the long? By doing that we can save many temp objects creation.
Contributor
Author
There was a problem hiding this comment.
Sure, fixed in f444ad2.
It's not material to performance, but it also shouldn't hurt.
FrankChen021
approved these changes
Jan 5, 2022
b1f5408 to
f444ad2
Compare
Contributor
|
Thanks for the optimization @jasonk000 ! The test failure appears unrelated. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reduce the number of times a timestamp is parsed and read if the row is a transformed row.
During ingestion the parsing and appending functionality requires repeated access to the timestamp information from the row. In the case of a TransformedInputRow, this could be an expensive step:
By caching this row, a significant reduction in CPU time can be achieved, in this example from 14% to 4% of CPU involved in reading Timestamps, a 10% reduction. This change converts the processing from lazy load of timestamp during Transform to an eager transform, and then keeps the result.
Before:

After:

This PR has: