Skip to content

Comments

perf: cache row if it is a transformed row#12113

Merged
suneet-s merged 2 commits intoapache:masterfrom
jasonk000:cache-timestamp
Feb 15, 2022
Merged

perf: cache row if it is a transformed row#12113
suneet-s merged 2 commits intoapache:masterfrom
jasonk000:cache-timestamp

Conversation

@jasonk000
Copy link
Contributor

Reduce the number of times a timestamp is parsed and read if the row is a transformed row.

During ingestion the parsing and appending functionality requires repeated access to the timestamp information from the row. In the case of a TransformedInputRow, this could be an expensive step:

{
  "type": "expression",
  "name": "__time",
  "expression": "(point_seconds * 1000.0) + (point_nanos / 1000000.0)"
},

By caching this row, a significant reduction in CPU time can be achieved, in this example from 14% to 4% of CPU involved in reading Timestamps, a 10% reduction. This change converts the processing from lazy load of timestamp during Transform to an eager transform, and then keeps the result.

Before:
image

After:
image

This PR has:

  • been self-reviewed.
    • N/A using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • N/A added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • N/A added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • N/A added integration tests.
  • been tested in a test Druid cluster.

} else {
return row.getTimestamp();
}
return DateTimes.utc(getTimestampFromEpoch());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not cache the DateTime object instead of the long? By doing that we can save many temp objects creation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, fixed in f444ad2.

It's not material to performance, but it also shouldn't hurt.

@suneet-s
Copy link
Contributor

Thanks for the optimization @jasonk000 ! The test failure appears unrelated.

@suneet-s suneet-s merged commit 26bc4b7 into apache:master Feb 15, 2022
@abhishekagarwal87 abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants