Skip to content

Conversation

@scarlin-cloudera
Copy link
Contributor

…y query

In the current code base, a new CachingRelMetadataProvider is created on every query. This causes a cache miss within JaninoRelMetadataProvider, which stores the provider in the key. As a result, a new MetadataHandler is created multiple times in every query, which causes code generation and is a small hit on performance

This commit caches HiveDefaultRelMetadataProviders globally which stores a RelMetadataProvider that can be reused. Ultimately, this class could most likely be removed as well, but this commit limits the scope of the change in a first pass.

What changes were proposed in this pull request?

Stated in commit message

Why are the changes needed?

This will help performance and possibly have a smaller memory footprint.

Does this PR introduce any user-facing change?

No

How was this patch tested?

This doesn't introduce new functionality, so current regression tests should be good. I ran this through the debugger and made sure the Calcite JaninoMetadataProvider.load3 method was not called after the first query was called with the same conf variables.

…y query

In the current code base, a new CachingRelMetadataProvider is created on every
query.  This causes a cache miss within JaninoRelMetadataProvider, which stores
the provider in the key. As a result, a new MetadataHandler is created multiple
times in every query, which causes code generation and is a small hit on performance

This commit caches HiveDefaultRelMetadataProviders globally which stores a
RelMetadataProvider that can be reused. Ultimately, this class could most
likely be removed as well, but this commit limits the scope of the change in
a first pass.
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
No Duplication information No Duplication information

Comment on lines +158 to +163
bldr.put(HiveConf.ConfVars.HIVE_EXECUTION_ENGINE,
conf.getVar(HiveConf.ConfVars.HIVE_EXECUTION_ENGINE));
bldr.put(HiveConf.ConfVars.HIVE_CBO_EXTENDED_COST_MODEL,
conf.getBoolVar(HiveConf.ConfVars.HIVE_CBO_EXTENDED_COST_MODEL));
bldr.put(HiveConf.ConfVars.MAPREDMAXSPLITSIZE,
conf.getLongVar(HiveConf.ConfVars.MAPREDMAXSPLITSIZE));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these settings can be changed without hs2 restart?

Copy link
Member

@zabetak zabetak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a change claims to improve performance ideally it should be accompanied with a (micro) benchmark.

Moreover, adding a new global cache shared among queries is a significant and possibly risky change. Note that we can compile queries in parallel and I am not even sure if the generated metadata handler code or the HiveDefaultRelMetadataProvider is thread safe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants