New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-25900: Materialized view registry does not clean non existing views at refresh #2984
HIVE-25900: Materialized view registry does not clean non existing views at refresh #2984
Conversation
e4c58a2
to
4c5ebe7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @kasakrisz . I went over the PR and left some comments.
While reviewing this part of code I get the impression that the registry/cache is not thread safe. It also seems that multiple threads do perform operations on this registry/cache concurrently which makes me think that we should invest in making it thread safe.
The changes in this PR stem from the fact that we may end up with an OOM cause we do not invalidate the entries in the cache. Cache invalidation though could come almost for free if we used some existing cache implementation (such as those provided by Guava). The additional benefit is that most existing cache implementations are thread-safe by design.
/** | ||
* Registry for materialized views. The goal of this cache is to avoid parsing and creating | ||
* logical plans for the materialized views at query runtime. When a query arrives, we will | ||
* just need to consult this cache and extract the logical plans for the views (which had | ||
* already been parsed) from it. This cache lives in HS2. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit a confusing that we have the same javadoc twice (here and in the top level class).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed one of them
* just need to consult this cache and extract the logical plans for the views (which had | ||
* already been parsed) from it. This cache lives in HS2. | ||
*/ | ||
public interface MaterializedViewsRegistry { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to have both a class (HiveMaterializedViewsRegistry
) and an inner interface (MaterializedViewsRegistry
)? This pattern is not very easy to follow and it is not clear what are the responsibilities of each.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention was to extract the Dummy and Default implementations to separate classes. The MaterializedViewsRegistry
interface is the common interface of the two implementations. Moved it to a separate file.
|
||
private final MaterializedViewsCache materializedViewsCache = new MaterializedViewsCache(); | ||
/** | ||
* Adds a newly created materialized view to the cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Use "cache" or "registry" but not both. Probably we should follow the name of the class/interface.
It seems that some methods mention registry while others mention cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed cache to registry, since this is more like a registry.
/** | ||
* Returns the materialized views in the cache for the given database. | ||
* | ||
* @return the collection of materialized views, or the empty collection if none | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The javadoc does not correspond to what this method does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
default List<HiveRelOptMaterialization> getRewritingMaterializedViews(String querySql) { | ||
return Collections.emptyList(); | ||
} | ||
|
||
default boolean isEmpty() { | ||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing javadoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
public interface MaterializedViewObjects { | ||
List<Table> getAllMaterializedViewObjectsForRewriting() throws HiveException; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new interface gives the impression that we are adding complexity instead of simplifying things. Keeping a List
of tables looks simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent was to make the Loader
testable. The MaterializedViewObjects
represents the HMS client. The interface enables us to mock it in tests. Please see TestHiveMaterializedViewsRegistryLoader
.
By default the getAllMaterializedViewObjectsForRewriting
method is called periodically by the registry.
Could you please show an example with List
of tables?
protected final HiveConf hiveConf; | ||
protected final MaterializedViewsRegistry materializedViewsRegistry; | ||
protected final MaterializedViewObjects materializedViewObjects; | ||
/* Whether the cache has been initialized or not. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bogus comment?
/* Whether the cache has been initialized or not. */ | |
/* Whether the cache has been initialized or not. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
|
||
private Loader(Hive db) { | ||
this.db = db; | ||
public static class Loader implements Runnable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the class need to be public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to package private.
|
||
for (HiveRelOptMaterialization materialization : materializedViewsRegistry.getRewritingMaterializedViews()) { | ||
Table mvTableInCache = HiveMaterializedViewUtils.extractTable(materialization); | ||
Table mvTableInHMS = materializedViewObjects.stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we keep materializedViewObjects
up-to-date with the HMS? It seems that we obtain the views once when the Loader
is created but then I don't see it changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update is scheduled at a fixed rate:
pool.scheduleAtFixedRate(new Loader(db.getConf(), SINGLETON, objects), 0, period, TimeUnit.SECONDS);
private PerfLogger getPerfLogger() { | ||
SessionState ss = new SessionState(hiveConf); | ||
ss.setIsHiveServerQuery(true); // All is served from HS2, we do not need e.g. Tez sessions | ||
SessionState.start(ss); | ||
return SessionState.getPerfLogger(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need the perf logger? Using a session just to obtain a logger seems a bit of an overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added by https://issues.apache.org/jira/browse/HIVE-21344 and I think the intention was to monitor the refresh rate.
4c5ebe7
to
bc1be69
Compare
f1a0a5e
to
733f011
Compare
…sRegistry to a separate file
…e unit tests does not call the init method.
733f011
to
9fd7738
Compare
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
What changes were proposed in this pull request?
HiveMaterializedViewsRegistry
to a dummyDummyMaterializedViewsRegistry
and the defaultInMemoryMaterializedViewsRegistry
Why are the changes needed?
Dropped MVs stayed forver in the cache if the Drop operation was done by a different HS2 instance.
Does this PR introduce any user-facing change?
No.
How was this patch tested?