Skip to content

[CALCITE-2828] Fix VolcanoPlanner.validate() to handle cost propagati…#1054

Open
StevenMPhillips wants to merge 1 commit intoapache:mainfrom
StevenMPhillips:CALCITE-2828
Open

[CALCITE-2828] Fix VolcanoPlanner.validate() to handle cost propagati…#1054
StevenMPhillips wants to merge 1 commit intoapache:mainfrom
StevenMPhillips:CALCITE-2828

Conversation

@StevenMPhillips
Copy link

…on properly

When getCost(rel) is called, a node's nonCumulativeCost() is computed. When using CachingRelMetadataProvider is used, metadata is cached (rowCount, cost, etc.) for future use. In order to make sure that we do not use stale metadata, each RelOptPlanner provides getRelMetadataTimestamp(rel) which is used to invalidate the cache (if the cached entry has timestamp != getRelMetadataTimestamp(rel), it is not used.

The problem in this case was due to the fact that VolcanoPlanner uses the rel's current RelSubset's timestamp as getRelMetadataTimestamp(). Since a rel can belong to multiple RelSubset, this results in inconsistent cache hits/misses. For example, if a rel belongs to RelSubset#1 and RelSubset#2 with relMetadataTimestamp of 1 and 2, respectively. If rel happens to update its cost with RelSubset#1 first, then the cache will be updated with timestamp 1 so when the same rel in RelSubset#2's context try to look up its metadata, it will fail. This results in inefficient use of the cache. The main problem occurs when we get incorrect cache hits (e.g. previous iteration of metadata query on RelSubset#2 populated the cache with timestamp 2, but later in the context of RelSubset#1, we think there is a valid cache and use the stale metadata).

Change-Id: Iefb630f5813ba497b7fbc0144c8fd6050e59b1a3

…on properly

When getCost(rel) is called, a node's nonCumulativeCost() is computed.  When using CachingRelMetadataProvider is used, metadata is cached (rowCount, cost, etc.) for future use.  In order to make sure that we do not use stale metadata, each RelOptPlanner provides getRelMetadataTimestamp(rel) which is used to invalidate the cache (if the cached entry has timestamp != getRelMetadataTimestamp(rel), it is not used.

The problem in this case was due to the fact that VolcanoPlanner uses the rel's current RelSubset's timestamp as getRelMetadataTimestamp().  Since a rel can belong to multiple RelSubset, this results in inconsistent cache hits/misses.  For example, if a rel belongs to RelSubset#1 and RelSubset#2 with relMetadataTimestamp of 1 and 2, respectively.  If rel happens to update its cost with RelSubset#1 first, then the cache will be updated with timestamp 1 so when the same rel in RelSubset#2's context try to look up its metadata, it will fail.  This results in inefficient use of the cache.  The main problem occurs when we get incorrect cache hits (e.g. previous iteration of metadata query on RelSubset#2 populated the cache with timestamp 2, but later in the context of RelSubset#1, we think there is a valid cache and use the stale metadata).

Change-Id: Iefb630f5813ba497b7fbc0144c8fd6050e59b1a3
Copy link
Member

@hsyuan hsyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @StevenMPhillips, can you add a test case for this change?

@danny0405 danny0405 force-pushed the master branch 2 times, most recently from 80f411d to ca27fe9 Compare November 30, 2019 07:52
@vlsi vlsi force-pushed the master branch 2 times, most recently from 49cb002 to 8768a23 Compare December 29, 2019 12:07
@julianhyde julianhyde force-pushed the main branch 2 times, most recently from 8a5cf83 to cf7f71b Compare June 8, 2023 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants