Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-26564: Separate query live operation log and historical operation log #3621

Merged
merged 3 commits into from Oct 6, 2022
Merged

Conversation

yigress
Copy link
Contributor

@yigress yigress commented Sep 26, 2022

What changes were proposed in this pull request?

Proposed change is to separate live query's operation log and historical operation log. Upon operation close, OperationLogManager.closeOperation is called to move the operation log from session directory to historical log dir. OperationLogManager is only responsible to clean up historical operation logs.

Why are the changes needed?

This reduces confusion when admin look at operation logs. Also make it easier to add feature to persist history query info and logs, for example onto hdfs/cloud.

Does this PR introduce any user-facing change?

No

How was this patch tested?

tested locally

@yigress
Copy link
Contributor Author

yigress commented Sep 26, 2022

@dengzhhu653 can you help review this? Thank you!

@@ -61,6 +60,7 @@ public void setUp() throws Exception {
HiveConf.setIntVar(hiveConf, HiveConf.ConfVars.HIVE_SERVER2_WEBUI_MAX_HISTORIC_QUERIES, 1);
HiveConf.setIntVar(hiveConf, HiveConf.ConfVars.HIVE_SERVER2_WEBUI_PORT, 8080);
HiveConf.setBoolVar(hiveConf, HiveConf.ConfVars.HIVE_IN_TEST, true);
HiveConf.setBoolVar(hiveConf, HiveConf.ConfVars.HIVE_TESTING_REMOVE_LOGS, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this property should set to false when HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED is enabled? we do not want to set this in production.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/session/OperationLog.java#L80
if (hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_IN_TEST)) {
isRemoveLogs = hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_TESTING_REMOVE_LOGS);

Instead of changing the code there, I thought it may be better set it to false for this test case. In production no need to change this setting.

TimeUnit.MILLISECONDS);
scheduledExecutorService.shutdown();
} else {
log.info("Closing operation log {} without delay", operationLog);
operationLog.close();
OperationLogManager.closeOperation(this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we put OperationLogManager.closeOperation(this) into operationLog.close()? so we can only take care of SQLOperation's log

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I did not set inside operationLog.close is that OperationLog is a wrapper of the log file itself and has no knowledge of the operation.

@dengzhhu653
Copy link
Member

@dengzhhu653 can you help review this? Thank you!

The idea overall makes sense to me, thank you for the contribution! By the way, which kinds of problem do you meet regarding the original design?

@yigress
Copy link
Contributor Author

yigress commented Sep 27, 2022

Thank you @dengzhhu653 so much for your review! The reason for this change is that we want to add a feature to persist the hive QueryInfo and OperationLog, for example, persist the information on HDFS or other storage, for ephemeral clusters. Right now only a limited number of historical queries are available in hs2 UI. So OperationLogManager seems an ideal place to do that, or some other ways outside hive that look at the historical operation log location, but current mixing live queries inside historical dir makes it harder to differentiate.

@sfc-gh-aixu
Copy link

@dengzhhu653 can you help review this? Thank you!

@dengzhhu653
Copy link
Member

Hi @yigress, the failed tests seem related, cloud you please fix them?

Copy link
Member

@dengzhhu653 dengzhhu653 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, pending tests.

@yigress
Copy link
Contributor Author

yigress commented Sep 28, 2022

Thank you @dengzhhu653 for the review! The current test failures are unrelated.
@zabetak as pending reviewer, can you help review? Thank you!

@yigress
Copy link
Contributor Author

yigress commented Sep 29, 2022

@zabetak ping ping, latest tests failures are unrelated. Can you help proceed? Thank you!

@zabetak
Copy link
Contributor

zabetak commented Sep 29, 2022

Hey @yigress , I'm still busy with other reviews. Will get back to this ASAP.

@yigress
Copy link
Contributor Author

yigress commented Oct 4, 2022

@zabetak ping ping reminder

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sunchao sunchao merged commit 6efbcae into apache:master Oct 6, 2022
@sunchao
Copy link
Member

sunchao commented Oct 6, 2022

Merged to master. Thanks @yigress

rkirtir pushed a commit to rkirtir/hive that referenced this pull request Oct 7, 2022
DongWei-4 pushed a commit to DongWei-4/hive that referenced this pull request Oct 28, 2022
dengzhhu653 pushed a commit to dengzhhu653/hive that referenced this pull request Dec 15, 2022
yeahyung pushed a commit to yeahyung/hive that referenced this pull request Jul 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants