Skip to content

[SPARK-36390][SQL] Replace SessionState.close with SessionState.detachSession#33621

Closed
cxzl25 wants to merge 7 commits intoapache:masterfrom
cxzl25:SPARK-36390
Closed

[SPARK-36390][SQL] Replace SessionState.close with SessionState.detachSession#33621
cxzl25 wants to merge 7 commits intoapache:masterfrom
cxzl25:SPARK-36390

Conversation

@cxzl25
Copy link
Contributor

@cxzl25 cxzl25 commented Aug 3, 2021

What changes were proposed in this pull request?

SPARK-35286 replace SessionState.start with SessionState.setCurrentSessionState, but SessionState.close will create a HiveMetaStoreClient , connect to the Hive Meta Store Server, and then load all functions.

SPARK-35556 (Remove close HiveClient's SessionState) When the Hive version used is greater than or equal to 2.1, SessionState.close is not called and the resource dir of HiveClient is not cleaned up.

Why are the changes needed?

Clean up the hive resources dir temporary directory.
Avoid wasting resources and accelerate the exit speed.

Does this PR introduce any user-facing change?

No

How was this patch tested?

add UT

@github-actions github-actions bot added the SQL label Aug 3, 2021
@cxzl25
Copy link
Contributor Author

cxzl25 commented Aug 3, 2021

SessionState.close -> Hive.create -> registerAllFunctions
image

@cxzl25
Copy link
Contributor Author

cxzl25 commented Aug 3, 2021

cc @wangyum @yaooqinn

@wangyum
Copy link
Member

wangyum commented Aug 3, 2021

ok to test.

@wangyum
Copy link
Member

wangyum commented Aug 3, 2021

cc @HyukjinKwon

@SparkQA
Copy link

SparkQA commented Aug 3, 2021

Test build #141990 has finished for PR 33621 at commit 9091c24.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 3, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46501/

@SparkQA
Copy link

SparkQA commented Aug 3, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46501/

@SparkQA
Copy link

SparkQA commented Aug 3, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46512/

@SparkQA
Copy link

SparkQA commented Aug 3, 2021

Test build #142000 has finished for PR 33621 at commit 13a71eb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member

wangyum commented Aug 4, 2021

Could we change this line to Hive threadLocalHive = Hive.getWithoutRegisterFns(sessionConf);?

It seems that this change makes hive.downloaded.resources.dir can not be removed.

@HyukjinKwon
Copy link
Member

Thanks for pinging me @wangyum. The fix seems making sense

@cxzl25
Copy link
Contributor Author

cxzl25 commented Aug 4, 2021

Could we change this line to Hive threadLocalHive = Hive.getWithoutRegisterFns(sessionConf);?

HIVE-17368 avoid creating a threadLocalHive in the unCacheDataNucleusClassLoaders method.

https://github.com/apache/hive/blob/branch-3.0/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java#L1772

It seems that this change makes hive.downloaded.resources.dir can not be removed.

AFIAK, because SessionState.start is not used, the resources directory will not be created.
Because SPARK-35286 avoids creating session dir, that is, it will not call createSessionDirs.

  /**
   * Create dirs & session paths for this session:
   * 1. HDFS scratch dir
   * 2. Local scratch dir
   * 3. Local downloaded resource dir
   * 4. HDFS session path
   * 5. hold a lock file in HDFS session dir to indicate the it is in use
   * 6. Local session path
   * 7. HDFS temp table space
   * @param userName
   * @throws IOException
   */
  private void createSessionDirs(String userName) throws IOException {

@yaooqinn
Copy link
Member

yaooqinn commented Aug 4, 2021

can we have a test with ADD JAR command to see whether the resource dir being removed successfully after detachSession

@cxzl25
Copy link
Contributor Author

cxzl25 commented Aug 4, 2021

HIVE-12853 (>=2.1) will create the hive.downloaded.resources.dir directory when constructing sessionstate.
https://github.com/apache/hive/blob/branch-2.1/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java#L389

SPARK-35556 (Remove close HiveClient's SessionState) did not clean up the resource dir even when the Hive version was greater than or equal to 2.1.

Currently need to clean up the resource dir and then SessionState#detachSession.

@cxzl25
Copy link
Contributor Author

cxzl25 commented Aug 4, 2021

can we have a test with ADD JAR command to see whether the resource dir being removed successfully after detachSession

I can add a UT to check the resource dir cleanup.

SPARK-34955 removed the behavior of ADD JAR calling runSqlHive, now ADD JAR will not put jar in the resource directory.

@cxzl25
Copy link
Contributor Author

cxzl25 commented Aug 5, 2021

The current SparkSQLCLIDriver calls SessionState.close, so its download resource dir has been cleaned up, but HiveClientImpl and SparkSQLCLIDriver have not been cleaned up.
In this PR, I clean up these resource dirs.

@SparkQA
Copy link

SparkQA commented Aug 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46602/

@SparkQA
Copy link

SparkQA commented Aug 5, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46602/

@SparkQA
Copy link

SparkQA commented Aug 5, 2021

Test build #142091 has finished for PR 33621 at commit 086c388.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

cc @juliuszsompolski FYI

@SparkQA
Copy link

SparkQA commented Aug 6, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46658/

@SparkQA
Copy link

SparkQA commented Aug 6, 2021

Test build #142146 has finished for PR 33621 at commit d0c228b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 6, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46658/

@juliuszsompolski
Copy link
Contributor

cc @dragqueen95 FYI

@SparkQA
Copy link

SparkQA commented Aug 9, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46739/

@SparkQA
Copy link

SparkQA commented Aug 9, 2021

Test build #142227 has finished for PR 33621 at commit d3086a5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class BlockSavedOnDecommissionedBlockManagerException(blockId: BlockId)
  • case class EnsureRequirements(optimizeOutRepartition: Boolean = true) extends Rule[SparkPlan]

@SparkQA
Copy link

SparkQA commented Aug 9, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46739/

@cxzl25
Copy link
Contributor Author

cxzl25 commented Aug 31, 2021

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Dec 10, 2021
# Conflicts:
#	sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
@SparkQA
Copy link

SparkQA commented Dec 10, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50534/

@SparkQA
Copy link

SparkQA commented Dec 10, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50534/

@SparkQA
Copy link

SparkQA commented Dec 10, 2021

Test build #146059 has finished for PR 33621 at commit 1435e77.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@github-actions github-actions bot closed this Dec 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants