Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4103][SQL] Clean up SessionState in HiveContext by using ThreadLocal varaible in Hive #2967

Closed
wants to merge 26 commits into from

Conversation

zhzhan
Copy link
Contributor

@zhzhan zhzhan commented Oct 27, 2014

Currently the sessionState in HiveContext is lazy evaluated, and to start the session, SessionState.start() has to be invoked to initialize it. Using ThreadLocal variable natively in hive to track the availability of SessionState and initialize it if required. It avoid multiple thread sharing the same SessionState or single thread invoke SessionState.start with the same SessionState multiple times, which seems to cause unexpected behavior, and break the contract defined in hive. Refer below comments form SessionState in hive.

/**

  • set current session to existing session object if a thread is running
  • multiple sessions - it must call this method with the new session object
  • when switching from one session to another.
  • @throws HiveException
    */
    public static SessionState start(SessionState startSs) {
    ...
    }

…HiveContext is lazy evaluated, and to start the session, SessionState.start() has to be invoked to initialize it. Using ThreadLocal variable natively in hive to track the availability of SessionState and initialize it if required. It avoid multiple thread sharing the same SessionState or single thread invoking SessionState.start with the same SessionState multiple times, which seems to cause unexpected behavior.
@zhzhan
Copy link
Contributor Author

zhzhan commented Oct 27, 2014

This PR may duplicate with Spark-4037 partially. @liancheng Please take a look at the code change

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@zhzhan zhzhan changed the title Spark 4103 [SPARK-4103][SQL] Clean up SessionState in HiveContext by using ThreadLocal varaible in Hive Oct 27, 2014
SessionState.start(ss)
ss.err = new PrintStream(outputBuffer, true, "UTF-8")
ss.out = new PrintStream(outputBuffer, true, "UTF-8")
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use lazy eval, we have to relies on SessionState.start(sessionState) to initialize it. It may be invoked multiple times in various locations. Here we relies on checking ThreadLocal variable and initialize it at the first call to avoid this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several comments here:

  1. HiveContext.hiveconf should be retrieved from the current SessionState if SessionState.get() is non-null. That's why hiveconf and sessionState are always initialized together in #2887
  2. IO redirection and Hive properties propagation also need to be performed when SessionState.get() returns a non-null SessionState started elsewhere. This is required by Spark SQL CLI since SparkSQLCLIDriver creates a CliSessionState before HiveContext initialization.
  3. I guess you're trying to make HiveContext play well with multi-session scenarios by introducing getSessionState() here? I had also considered this issue earlier, but unfortunately IMO proper multiple Hive session support may require major refactoring over HiveContext (e.g. make it absolutely thread safe). Also, Spark SQL Hive support will probably be reimplemented against the up coming foreign data source API. So I'd rather defer this major task later.

(As a globally used thread local object, Hive SessionState is really annoying...)

@zhzhan zhzhan closed this Oct 28, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants