[SPARK-4037][SQL] Removes the SessionState instance created in HiveThriftServer2 #2887

liancheng · 2014-10-22T06:51:59Z

HiveThriftServer2 creates a global singleton SessionState instance and overrides HiveContext to inject the SessionState object. This messes up SessionState initialization and causes problems.

This PR replaces the global SessionState with HiveContext.sessionState to avoid the initialization conflict. Also HiveContext reuses existing started SessionState if any (this is required by SparkSQLCLIDriver, which uses specialized CliSessionState).

SparkQA · 2014-10-22T06:54:43Z

QA tests have started for PR 2887 at commit 04b374d.

This patch merges cleanly.

SparkQA · 2014-10-22T07:35:34Z

QA tests have finished for PR 2887 at commit 04b374d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-22T07:35:37Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22024/
Test FAILed.

SparkQA · 2014-10-22T08:24:50Z

QA tests have started for PR 2887 at commit 0c5bd26.

This patch merges cleanly.

SparkQA · 2014-10-22T09:24:22Z

QA tests have finished for PR 2887 at commit 0c5bd26.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-22T09:24:25Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22032/
Test PASSed.

liancheng · 2014-10-23T06:42:53Z

...iftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala

-        s"LOAD DATA LOCAL INPATH '$dataFilePath' OVERWRITE INTO TABLE test",
-        "CACHE TABLE test")
+      val queries =
+        s"""SET spark.sql.shuffle.partitions=3;


This SET command is used as a regression test of SPARK-4037.

zhzhan · 2014-10-24T20:37:36Z

@liancheng I was just aware of this issue. In the commit this morning #2241, I have following change in HiveContext.scala, because I didn't realize that the state has changed to lazy eval. Could you check whether it effects your PR, and revert it back if necessary?

// Session state must be initilized before the CommandProcessor is created .
SessionState.start(sessionState)

liancheng · 2014-10-26T10:06:34Z

Sure, will rebase this PR to fix that.

zhzhan · 2014-10-27T22:51:11Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala

+        if (state.err == null) state.err = new PrintStream(outputBuffer, true, "UTF-8")
+        (state.getConf, state)
+      }
+      .get


Should SessionState.start(state) be invoked here? Otherwise, it relies on other code to initialize the state, and we lose the track whether the state is initialized, and other code may call SessionState.start() multiple times, resulting in unexpected behavior. Correct me if I am wrong.

I didn't call SessionState.start(...) here because before Hive 0.13.1 support is merged in, it's always called when necessary in HiveContext.runHive. Just noticed that this call was removed in the most recent master branch. I consider this change dangerous because the current SessionState may be switched at some other place and cause subtle errors.

However, add another SessionState.start(...) call here is harmless and does help to track when and whether the session state is properly initialized.

In this case, we may call SessionState.start with the session state multiple times. In my point of view, we should check whether SessionState.get is null or not, and then decide whether we need to call SessionState.start.

We should consider initializing the session state in the constructor, maybe by forcing the realization of the lazy val. Matei just bumped into a problem where running "SHOW TABLES" as the first command null pointers. Lets make sure that this PR fixes those sorts of problems too.

Yea, all commands/DDLs that are directly delegated to Hive suffer this issue, and this PR fixes it.

zhzhan · 2014-10-27T22:53:20Z

@liancheng I send a PR #2967 against similar issue. Please take a look. My major concern is that the SessionState.start(state) may be invoked multiple times. If you think PR #2967 is duplicated with this one, please let me know and I will close it.

liancheng · 2014-10-28T08:31:13Z

Hey @zhzhan, left some comments both here and in your PR. Considering all the reasons I mentioned in these comments, would you mind to close #2967? Hive SessionState is really an annoying and error prone abstraction, thanks all the same for working on this somewhat tedious but important issue!

SparkQA · 2014-10-28T10:25:04Z

Test build #22357 has started for PR 2887 at commit 49b1c5b.

This patch merges cleanly.

liancheng · 2014-10-28T10:48:38Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala

      proc match {
        case driver: Driver =>
+          driver.init()


@zhzhan This line was removed in #2241 [1]. Is there any reason behind?

[1] https://github.com/apache/spark/pull/2241/files#diff-ff50aea397a607b79df9bec6f2a841dbL295

I check the function getCommandProcessor and found it has already initialize the driver, and we don't need to init it again.

I see, thanks.

SparkQA · 2014-10-28T10:58:38Z

Test build #22357 has finished for PR 2887 at commit 49b1c5b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-28T10:58:41Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22357/
Test FAILed.

liancheng · 2014-10-28T13:45:40Z

retest this please

AmplabJenkins · 2014-10-28T14:02:23Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22360/
Test FAILed.

liancheng · 2014-10-28T14:09:22Z

retest this please

SparkQA · 2014-10-28T14:12:27Z

Test build #22361 has started for PR 2887 at commit 49b1c5b.

This patch merges cleanly.

SparkQA · 2014-10-28T14:47:00Z

Test build #22361 has finished for PR 2887 at commit 49b1c5b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-28T14:47:04Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22361/
Test FAILed.

liancheng · 2014-10-28T15:53:45Z

According to the line number information in the Hive failure output, the test was executed against Hive 0.13.1, while the assembly was built with -Phive-0.12.0.

zhzhan · 2014-10-28T16:27:08Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala

+      // Spark SQL Hive support uses a single `SessionState` for all Hive operations and breaks
+      // session isolation under multi-user scenarios (i.e. HiveThriftServer2).
+      // TODO Fix session isolation
+      SessionState.start(sessionState)


See above comments. Here we may start the same sessionState in the same thread multiple times.
How do you think following check?

if (SessionState.get == null) SessionState.start(sessionState)

Hm, sessionState do gets started within the same thread multiple times. However, at least for Hive 0.12.0, I think SessionState.start(sessionState) should be an idempotent operation, calling it multiple times shouldn't hurt. Maybe this doesn't hold anymore for Hive 0.13.1.

Another issue that really puzzles me is this line from the Jenkins test failure log:

at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:342)

The line number suggests that apparently Hive 0.13.1 was used (see here), but this Jenkins build was triggered with

-Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Pkinesis-asl -Phive -Phive-0.12.0

Anyway, I'll follow your suggestion to avoid starting sessionState multiple times and try again. Thanks!

Also, to avoid multiple start, I guess this is better?

if (SessionState.get() != sessionState) { SessionState.start(sessionState) }

I think now the default run-test is hive-0.13. So that is right.
Didn't go through the run-test detail, but there is another line
SBT_MAVEN_PROFILES_ARGS="$SBT_MAVEN_PROFILES_ARGS -Phive"

Ah I see where the problem is: the assembly jar is built with -Phive-0.12.0 [1] while the test is executed with -Phive [2], which implies -Phive-0.13.1.

[1]

spark/dev/run-tests

Line 143 in 7768a80

BUILD_MVN_PROFILE_ARGS="$SBT_MAVEN_PROFILES_ARGS -Phive -Phive-0.12.0"

[2]

spark/dev/run-tests

Line 170 in 7768a80

SBT_MAVEN_PROFILES_ARGS="$SBT_MAVEN_PROFILES_ARGS -Phive"

I check with some hive experts here, and was told that the session state should not be started again and again. It may work, but not guaranteed. In hive, there is no such use case.

I see, thanks for this important information!

BTW, just reproduced the above Jenkins build failure locally by:

Building the assembly jar with -Phadoop-2.3 -Phive-0.12.0

Running ./sbt/sbt -Phadoop-2.3 -Phive "hive/test-only *.HiveCompatibilitySuite -- -z groupby1"

Will fix dev/run-tests altogether.

zhzhan · 2014-10-28T18:40:54Z

@liancheng With the check if (SessionState.get() != sessionState), ./sbt/sbt -Phadoop-2.3 -Phive "hive/test-only *.HiveCompatibilitySuite -- -z groupby1" is running OK. Thanks for clean this up.

SparkQA · 2014-10-28T23:20:09Z

Test build #22393 has started for PR 2887 at commit a28fef5.

This patch merges cleanly.

liancheng · 2014-10-28T23:25:09Z

@zhzhan Building assembly jar with -Phive-0.12.0 but run tests without it is introduced in marmbrus@8f6b09a, and was done on purpose. Avoiding multiple SessionState.start() do fixes this failure on my machine. Trying my luck on Jenkins. Thanks for the suggestion :)

SparkQA · 2014-10-28T23:35:13Z

Test build #22395 has started for PR 2887 at commit 8446675.

This patch merges cleanly.

liancheng · 2014-10-28T23:39:51Z

@marmbrus This should be ready to go once Jenkins nods.

SparkQA · 2014-10-29T00:13:56Z

Test build #22393 has finished for PR 2887 at commit a28fef5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-29T00:13:58Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22393/
Test PASSed.

SparkQA · 2014-10-29T00:31:08Z

Test build #22395 has finished for PR 2887 at commit 8446675.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-29T00:31:11Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22395/
Test PASSed.

marmbrus · 2014-10-29T02:36:16Z

Okay, I tested this manually with a SHOW TABLES on a fresh context and it seems to init correctly. Merging this to master. Thanks!

marmbrus · 2014-10-29T02:38:02Z

Actually... sorry going to wait for the hive 13 thrift server PR so that we run tests on the server as a sanity check.

marmbrus · 2014-10-31T20:28:09Z

test this please

SparkQA · 2014-10-31T20:35:01Z

Test build #22638 has started for PR 2887 at commit 8446675.

This patch merges cleanly.

SparkQA · 2014-10-31T21:29:20Z

Test build #22638 has finished for PR 2887 at commit 8446675.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-31T21:29:22Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22638/
Test PASSed.

liancheng · 2014-10-31T23:27:46Z

@marmbrus This should be ready to go.

scwf · 2014-11-01T02:22:13Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala

+      // Spark SQL Hive support uses a single `SessionState` for all Hive operations and breaks
+      // session isolation under multi-user scenarios (i.e. HiveThriftServer2).
+      // TODO Fix session isolation
+      if (SessionState.get() != sessionState) {


In which case this condition will be true?

For example, Spark SQL CLI uses a global CliSessionState instance, which inherits from SessionState. Also, this can be useful to fix the session isolation problem HiveThriftServer2 currently suffers from.

marmbrus · 2014-11-01T22:03:40Z

Thanks! Merged to master.

…riftServer2 `HiveThriftServer2` creates a global singleton `SessionState` instance and overrides `HiveContext` to inject the `SessionState` object. This messes up `SessionState` initialization and causes problems. This PR replaces the global `SessionState` with `HiveContext.sessionState` to avoid the initialization conflict. Also `HiveContext` reuses existing started `SessionState` if any (this is required by `SparkSQLCLIDriver`, which uses specialized `CliSessionState`). Author: Cheng Lian <lian@databricks.com> Closes apache#2887 from liancheng/spark-4037 and squashes the following commits: 8446675 [Cheng Lian] Removes redundant Driver initialization a28fef5 [Cheng Lian] Avoid starting HiveContext.sessionState multiple times 49b1c5b [Cheng Lian] Reuses existing started SessionState if any 3cd6fab [Cheng Lian] Fixes SPARK-4037 Conflicts: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala

This PR backports apache#2843 to branch-1.1. The key difference is that this one doesn't support Hive 0.13.1 and thus always returns `0.12.0` when `spark.sql.hive.version` is queried. 6 other commits on which apache#2843 depends were also backported, they are: - apache#2887 for `SessionState` lifecycle control - apache#2675, apache#2823 & apache#3060 for major test suite refactoring and bug fixes - apache#2164, for Parquet test suites updates - apache#2493, for reading `spark.sql.*` configurations Author: Cheng Lian <lian@databricks.com> Author: Cheng Lian <lian.cs.zju@gmail.com> Author: Michael Armbrust <michael@databricks.com> Closes apache#3113 from liancheng/get-info-for-1.1 and squashes the following commits: d354161 [Cheng Lian] Provides Spark and Hive version in HiveThriftServer2 for branch-1.1 0c2a244 [Michael Armbrust] [SPARK-3646][SQL] Copy SQL configuration from SparkConf when a SQLContext is created. 3202a36 [Michael Armbrust] [SQL] Decrease partitions when testing 7f395b7 [Cheng Lian] [SQL] Fixes race condition in CliSuite 0dd28ec [Cheng Lian] [SQL] Fixes the race condition that may cause test failure 5928b39 [Cheng Lian] [SPARK-3809][SQL] Fixes test suites in hive-thriftserver faeca62 [Cheng Lian] [SPARK-4037][SQL] Removes the SessionState instance created in HiveThriftServer2

liancheng mentioned this pull request Oct 22, 2014

[SPARK-3939][SQL] NPE caused by SessionState.out not set in thriftserver2 #2812

Closed

liancheng reviewed Oct 23, 2014
View reviewed changes

This was referenced Oct 27, 2014

[SPARK-3791][SQL] Provides Spark version and Hive version in HiveThriftServer2 #2843

Closed

[SPARK-3826][SQL]enable hive-thriftserver to support hive-0.13.1 #2685

Closed

zhzhan reviewed Oct 27, 2014
View reviewed changes

liancheng added 2 commits October 28, 2014 16:41

Fixes SPARK-4037

3cd6fab

Reuses existing started SessionState if any

49b1c5b

liancheng force-pushed the spark-4037 branch from 0c5bd26 to 49b1c5b Compare October 28, 2014 10:19

liancheng reviewed Oct 28, 2014
View reviewed changes

zhzhan reviewed Oct 28, 2014
View reviewed changes

Avoid starting HiveContext.sessionState multiple times

a28fef5

Removes redundant Driver initialization

8446675

scwf reviewed Nov 1, 2014
View reviewed changes

asfgit closed this in ad0fde1 Nov 1, 2014

liancheng mentioned this pull request Nov 5, 2014

[SPARK-3971][SQL] Backport #2843 to branch-1.1 #3113

Closed

[SPARK-4037][SQL] Removes the SessionState instance created in HiveThriftServer2 #2887

[SPARK-4037][SQL] Removes the SessionState instance created in HiveThriftServer2 #2887

Conversation

liancheng commented Oct 22, 2014

SparkQA commented Oct 22, 2014

SparkQA commented Oct 22, 2014

AmplabJenkins commented Oct 22, 2014

SparkQA commented Oct 22, 2014

SparkQA commented Oct 22, 2014

AmplabJenkins commented Oct 22, 2014

Choose a reason for hiding this comment

zhzhan commented Oct 24, 2014

liancheng commented Oct 26, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhzhan commented Oct 27, 2014

liancheng commented Oct 28, 2014

SparkQA commented Oct 28, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Oct 28, 2014

AmplabJenkins commented Oct 28, 2014

liancheng commented Oct 28, 2014

AmplabJenkins commented Oct 28, 2014

liancheng commented Oct 28, 2014

SparkQA commented Oct 28, 2014

SparkQA commented Oct 28, 2014

AmplabJenkins commented Oct 28, 2014

liancheng commented Oct 28, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhzhan commented Oct 28, 2014

SparkQA commented Oct 28, 2014

liancheng commented Oct 28, 2014

SparkQA commented Oct 28, 2014

liancheng commented Oct 28, 2014

SparkQA commented Oct 29, 2014

AmplabJenkins commented Oct 29, 2014

SparkQA commented Oct 29, 2014

AmplabJenkins commented Oct 29, 2014

marmbrus commented Oct 29, 2014

marmbrus commented Oct 29, 2014

marmbrus commented Oct 31, 2014

SparkQA commented Oct 31, 2014

SparkQA commented Oct 31, 2014

AmplabJenkins commented Oct 31, 2014

liancheng commented Oct 31, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marmbrus commented Nov 1, 2014