Skip to content

Warn if cache size of lookup is beyond max size#11863

Merged
zachjsh merged 12 commits intoapache:masterfrom
zachjsh:warn-if-lookup-beyond-max-size
Nov 4, 2021
Merged

Warn if cache size of lookup is beyond max size#11863
zachjsh merged 12 commits intoapache:masterfrom
zachjsh:warn-if-lookup-beyond-max-size

Conversation

@zachjsh
Copy link
Contributor

@zachjsh zachjsh commented Nov 1, 2021

Description

Enhanced the ExtractionNamespace interface in lookups-cached-global core extension with the ability to set a maxHeapPercentage for the cache of the respective namespace. The reason for adding this functionality, is make it easier to detect when a lookup table grows to a size that the underlying service cannot handle, because it does not have enough memory. The default value of maxHeap for the interface is -1, which indicates that no maxHeapPercentage has been set. For the JdbcExtractionNamespace and UriExtractionNamespace implementations, the default value is null, which will cause the respective service that the lookup is loaded in, to warn when its cache is beyond mxHeapPercentage of the service's configured max heap size. If a positive non-null value is set for the namespace's maxHeapPercentage config, this value will be honored for all services that the respective lookup is loaded onto, and consequently log warning messages when the cache of the respective lookup grows beyond this respective percentage of the services configured max heap size. Warnings are logged every time that either Uri based or Jdbc based lookups are regenerated, if the maxHeapPercentage constraint is violated. No other implementations will log warnings at this time. No error is thrown when the size exceeds the maxHeapPercentage at this time, as doing so could break functionality for existing users. Previously the JdbcCacheGenerator generated its cache by materializing all rows of the underling table in memory at once; this made it difficult to log warning messages in the case that the results from the jdbc query were very large and caused the service to run out of memory. To help with this, this pr makes it so that the jdbc query results are instead streamed through an iterator.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@zachjsh zachjsh requested review from jon-wei and loquisgon November 1, 2021 06:40
@lgtm-com
Copy link

lgtm-com bot commented Nov 1, 2021

This pull request introduces 2 alerts when merging d38e711 into 90640bb - view on LGTM.com

new alerts:

  • 2 for Implicit narrowing conversion in compound assignment

},
"pollPeriod":"PT5M"
"pollPeriod":"PT5M",
"maxSize": 5242880
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd make more sense to express this as a percentage of available heap instead of a fixed byte limit, since the amount of available memory can differ across the various nodes where lookups might be loaded

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, fixed.

if ((key instanceof String) && (value instanceof String)) {
return ((String) (key)).length() + ((String) (value)).length();
} else {
LOG.warn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there cases where the value is not a string? If so, suggest adding support for those cases, this log seems likely to result in excessive logging since it's per entry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for suggestion, should be fixed now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you adjust so that it doesn't potentially log on every value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment in populate function // this top level check so that we dont keep logging inability to determine

@zachjsh zachjsh requested a review from jon-wei November 2, 2021 07:57
@zachjsh zachjsh merged commit 1d6df48 into apache:master Nov 4, 2021
@zachjsh zachjsh deleted the warn-if-lookup-beyond-max-size branch November 4, 2021 01:32
@abhishekagarwal87 abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants