Skip to content

Conversation

@nitin2goyal
Copy link

Re-use HiveConf in HiveQl

@marmbrus
Copy link
Contributor

marmbrus commented May 6, 2015

ok to test

@SparkQA
Copy link

SparkQA commented May 6, 2015

Test build #32027 has finished for PR 5880 at commit aec478e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@chenghao-intel
Copy link
Contributor

Probably it's better to get the hiveconf from SessionState instead of just new HiveConf().

@marmbrus
Copy link
Contributor

marmbrus commented May 8, 2015

Using that when SessionState.get() is not null seems reasonable. But if its null I'd just create one so that parsing doesn't have a dependence on a full running copy of hive.

@nitin2goyal
Copy link
Author

Closing this PR over master as it has been fixed as part of SPARK-7411. Will create separate PRs for 1.2 and 1.3 branch

@nitin2goyal
Copy link
Author

@chenghao-intel @marmbrus (as per my understanding) Looks like SessionState is threadlocal and gets SET only for "NativeCommands" which gets executed on hive. For Spark SQL commands, its not set and each time creates a new hive conf for Spark SQL queries and defeating the actual purpose of this issue.

@chenghao-intel
Copy link
Contributor

SessionState is a threadlocal variable internally maintained by Hive code, hence hive can isolate the configurations, context etc per user session.

@nitin2goyal
Copy link
Author

@chenghao-intel I get that. So, should we do something like :-

lazy val cachedHiveConf = new HiveConf()

private[this] def hiveConf(): HiveConf = {
val ss = SessionState.get() // SessionState is lazy initializaion, it can be null here
if (ss == null) {
cachedHiveConf
} else {
ss.getConf
}
}

@chenghao-intel
Copy link
Contributor

Oh, thanks for noticing this. But the answer is not necessary!
In user application, the HiveContext will create the SessionState (indirectly) before SQLParser being called/created, and then val ss = SessionState.get() will always be non-null.

The reason that I put the new HiveConf() is for unit test purpose for HiveQl, which can run independently without HiveContext being initialized.

Sorry, I should put more comment here.

@nitin2goyal
Copy link
Author

Sorry for not being clear in my first comment but I meant that for queries like "select a,b,c from table where a=10" where table is cached as schemaRDD (spark-1.2), SessionState.get() is always returning null. Do you see why this might be happening?

@chenghao-intel
Copy link
Contributor

In Spark 1.2, the SessionState should be initialized in HiveContext (or probably in SparkSQLCLIDriver if you are using the CLI). Can you describe how to reproduce that? Or are you using the HiveContext, not SQLContext?

@nitin2goyal
Copy link
Author

Yes, I am using HiveContext and calling hiveContext.sql, basically following line gets executed :-

https://github.com/apache/spark/blob/branch-1.2/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L94

If this problem seems to be just with 1.2, can we make the change as I suggested in 1.2 branch?

@chenghao-intel
Copy link
Contributor

Hmm, seems a bug in Spark 1.2, will cause some performance overhead (creating the HiveConf object everytime), but, we should always initialize the SessionState before the HiveQL called.

sessionState is a lazy val variable of HiveContext, and does not initialize even the .sql() method called. A better place to fix that will be trigger the initialization prior to the sql(), I think it has been fixed in the latest code, can you confirm that?

@nitin2goyal
Copy link
Author

Yes, seems like it is being done in 1.4

https://github.com/apache/spark/blob/branch-1.4/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L167

Should we do exactly like this in 1.2 as well?

@chenghao-intel
Copy link
Contributor

Actually the HiveContext changed a lot since 1.2, it will be great if you can figure out a simple / nature way to fix this. In general, we just need to trigger the initialization of the sessionState in HiveContext after HiveContext instance being created.

@nitindexter nitindexter deleted the dev-niting branch October 18, 2015 07:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants