New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-26564: Separate query live operation log and historical operation log #3621
Conversation
@dengzhhu653 can you help review this? Thank you! |
@@ -61,6 +60,7 @@ public void setUp() throws Exception { | |||
HiveConf.setIntVar(hiveConf, HiveConf.ConfVars.HIVE_SERVER2_WEBUI_MAX_HISTORIC_QUERIES, 1); | |||
HiveConf.setIntVar(hiveConf, HiveConf.ConfVars.HIVE_SERVER2_WEBUI_PORT, 8080); | |||
HiveConf.setBoolVar(hiveConf, HiveConf.ConfVars.HIVE_IN_TEST, true); | |||
HiveConf.setBoolVar(hiveConf, HiveConf.ConfVars.HIVE_TESTING_REMOVE_LOGS, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this property should set to false when HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED is enabled? we do not want to set this in production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/session/OperationLog.java#L80
if (hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_IN_TEST)) {
isRemoveLogs = hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_TESTING_REMOVE_LOGS);
Instead of changing the code there, I thought it may be better set it to false for this test case. In production no need to change this setting.
TimeUnit.MILLISECONDS); | ||
scheduledExecutorService.shutdown(); | ||
} else { | ||
log.info("Closing operation log {} without delay", operationLog); | ||
operationLog.close(); | ||
OperationLogManager.closeOperation(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we put OperationLogManager.closeOperation(this)
into operationLog.close()
? so we can only take care of SQLOperation's log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I did not set inside operationLog.close is that OperationLog is a wrapper of the log file itself and has no knowledge of the operation.
The idea overall makes sense to me, thank you for the contribution! By the way, which kinds of problem do you meet regarding the original design? |
Thank you @dengzhhu653 so much for your review! The reason for this change is that we want to add a feature to persist the hive QueryInfo and OperationLog, for example, persist the information on HDFS or other storage, for ephemeral clusters. Right now only a limited number of historical queries are available in hs2 UI. So OperationLogManager seems an ideal place to do that, or some other ways outside hive that look at the historical operation log location, but current mixing live queries inside historical dir makes it harder to differentiate. |
|
Hi @yigress, the failed tests seem related, cloud you please fix them? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, pending tests.
Thank you @dengzhhu653 for the review! The current test failures are unrelated. |
@zabetak ping ping, latest tests failures are unrelated. Can you help proceed? Thank you! |
Hey @yigress , I'm still busy with other reviews. Will get back to this ASAP. |
@zabetak ping ping reminder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Merged to master. Thanks @yigress |
What changes were proposed in this pull request?
Proposed change is to separate live query's operation log and historical operation log. Upon operation close, OperationLogManager.closeOperation is called to move the operation log from session directory to historical log dir. OperationLogManager is only responsible to clean up historical operation logs.
Why are the changes needed?
This reduces confusion when admin look at operation logs. Also make it easier to add feature to persist history query info and logs, for example onto hdfs/cloud.
Does this PR introduce any user-facing change?
No
How was this patch tested?
tested locally