Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-27023: Add setting to prevent tez session from being opened duri… #4015

Merged
merged 1 commit into from
Jan 13, 2024

Conversation

maswin
Copy link
Contributor

@maswin maswin commented Feb 3, 2023

What changes were proposed in this pull request?

A setting to disable tez session being preemptively opened during startup of hive cli.

Why are the changes needed?

DML only operations won't need a tez session to be opened. Oozie action that launches hive cli for DML only operations need this setting. Sometimes before the tez session thread could complete, the oozie hive action completes its DML operations and clears the scratch folders causing the tez session to fail with following exception:

Application application_1667416881396_24229473 failed 3 times due to AM Container for appattempt_1667416881396_24229473_000003 exited with exitCode: -1000
Failing this attempt.Diagnostics: [2023-02-02 19:02:12.139]File does not exist: hdfs://<name_node>/tmp/<db>/_tez_session_dir/4050c4b0-b7af-4eda-832b-399c954eb576/.tez/application_1667416881396_24229473/tez.session.local-resources.pbjava.io.FileNotFoundException: File does not exist: hdfs://<name_node>/tmp/<db>/_tez_session_dir/4050c4b0-b7af-4eda-832b-399c954eb576/.tez/application_1667416881396_24229473/tez.session.local-resources.pbat org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1529)at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1522)at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1537)at

Does this PR introduce any user-facing change?

New config value, otherwise no.

How was this patch tested?

Manually tested

@sonarcloud
Copy link

sonarcloud bot commented Feb 3, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@aturoczy
Copy link

aturoczy commented Apr 4, 2023

@abstractdog @ramesh0201 So in this case with this config you can avoid to have a tez session? Is this safe? I can imagine a use case for it. What do you think?

@abstractdog
Copy link
Contributor

I don't get when this exactly happens: in the supported infrastructure (beeline + hiveserver2) if we don't configure an initial session pool in hiveserver2 (hive.server2.tez.initialize.default.sessions=false), there shouldn't be any tez session
how does this happen? hive cli is not supported any more

@maswin
Copy link
Contributor Author

maswin commented Jun 13, 2023

I don't get when this exactly happens: in the supported infrastructure (beeline + hiveserver2) if we don't configure an initial session pool in hiveserver2 (hive.server2.tez.initialize.default.sessions=false), there shouldn't be any tez session how does this happen? hive cli is not supported any more

@abstractdog - The issue can happen when Hive action is used in Oozie - https://github.com/apache/oozie/blob/d4e36739d798589567ac331b348e8786c876aefa/sharelib/hive/src/main/java/org/apache/oozie/action/hadoop/HiveMain.java#L312
It directly calls the CliDriver class which spawns a tez session.

@abstractdog
Copy link
Contributor

thanks for clarifying @maswin
even if HiveCLI is not supported, oozie is still using that, so it makes sense to fix this, but introducing a new property doesn't seem to address the root cause, which is:

Sometimes before the tez session thread could complete, the oozie hive action completes its DML operations and clears the scratch folders causing the tez session to fail with following exception

this makes me think that in case there are operations using the tez session and then oozie does some DML operations, this can still happen, right? so the root cause looks independent of the fact that you reproduced it with DML-only scenario, so this fix is more likely hiding the problem, which is cleaning up scratch directories before tez sessions are closed, which is unexpected and needs to be fixed

@maswin
Copy link
Contributor Author

maswin commented Jun 13, 2023

this makes me think that in case there are operations using the tez session and then oozie does some DML operations, this can still happen, right? so the root cause looks independent of the fact that you reproduced it with DML-only scenario, so this fix is more likely hiding the problem, which is cleaning up scratch directories before tez sessions are closed, which is unexpected and needs to be fixed

Ahh, true.

This happens because the tez session is opened in a separate thread in async mode via beginStart() -

if (HiveConf.getBoolVar(conf, ConfVars.HIVE_CLI_TEZ_SESSION_ASYNC)) {

And before closing the session we don't wait for the tez session to close.
I have add a call for endStart() in the finally block now. That should probably fix the root cause.

But it might also be beneficial to provide the setting to disable opening tez session everytime due to resource wastage.

@abstractdog
Copy link
Contributor

@maswin : basically it makes sense to prevent tez sessions to be opened at all

  1. we already have HIVE_CLI_TEZ_SESSION_ASYNC, let's have HIVE_CLI_TEZ_SESSION_LAZY_START for better naming
  2. clarify in HiveConf what that means, I mean having both "async" and "lazy" could be very confusing at first sight: "async" is about starting SessionState and not waiting for tez session to be started, and "lazy" is about not starting tez session at all while starting SessionState

@abstractdog
Copy link
Contributor

@maswin: I just reopened this, I think it looks good, let me merge if the tests pass

@abstractdog abstractdog self-requested a review September 1, 2023 08:50
@sonarcloud
Copy link

sonarcloud bot commented Sep 1, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

warning The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here

Copy link

github-actions bot commented Nov 1, 2023

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.

@abstractdog
Copy link
Contributor

reopened again as I forgot about this :)

@abstractdog
Copy link
Contributor

@maswin can you please rebase this PR on top of master?

@maswin
Copy link
Contributor Author

maswin commented Jan 12, 2024

@maswin can you please rebase this PR on top of master?

@abstractdog : Done

Copy link

sonarcloud bot commented Jan 12, 2024

Quality Gate Passed Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

@github-actions github-actions bot removed the stale label Jan 13, 2024
@abstractdog abstractdog merged commit 5d0f250 into apache:master Jan 13, 2024
6 checks passed
mdayakar pushed a commit to mdayakar/hive that referenced this pull request Jan 15, 2024
…ng startup (apache#4015) (Alagappan Maruthappan reviewed by Laszlo Bodor)
tarak271 pushed a commit to tarak271/hive-1 that referenced this pull request Feb 9, 2024
…ng startup (apache#4015) (Alagappan Maruthappan reviewed by Laszlo Bodor)
dengzhhu653 pushed a commit to dengzhhu653/hive that referenced this pull request Mar 7, 2024
…ng startup (apache#4015) (Alagappan Maruthappan reviewed by Laszlo Bodor)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants