Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11000][YARN]Load metadata.Hive class only when hive.metastore.uris was set to avoid bootting the database twice #9026

Closed
wants to merge 2 commits into from

Conversation

SaintBacchus
Copy link
Contributor

obtainTokenForHiveMetastore in yarn.Client.scala will init the Hive.
It will create a connect to the database and the meta store client in HiveContext will also create a connect to the database. If use the derby by default, it will go wrong.
So I specilized the configuration of the javax.jdo.option.ConnectionURL in the obtainTokenForHiveMetastore to avoid this issue.

/cc @KaiXinXiaoLei

@SparkQA
Copy link

SparkQA commented Oct 8, 2015

Test build #43390 has finished for PR 9026 at commit 0fab8c7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SaintBacchus
Copy link
Contributor Author

/cc @marmbrus @liancheng

@KaiXinXiaoLei
Copy link

LGTM.

@srowen
Copy link
Member

srowen commented Oct 13, 2015

This looks reasonable but as in SPARK-9776 should there ever be two HiveContexts open, or is this triggered without that? is it a problem that there are two database instances then? that is, the problem seems legitimate if the metastore DB has already been initialized but something else is doing it again.

@SaintBacchus
Copy link
Contributor Author

@srowen In this issue there is only one HiveContext, but there will have two metastoe.Hive instance in two different class loaderes. And in the implement of metastoe.Hive it will create the each database instance in loading this class.
So we have to set the configuration javax.jdo.option.ConnectionURL to a temp dir to avoid the problem I mentioned.
And actually this logic was refer to the implement of SparkSQLCLIDriver

@srowen
Copy link
Member

srowen commented Oct 13, 2015

Makes sense. Is it a problem that we actually have two metastores? Maybe not. That's my only question, looking at this from the outside.

@SaintBacchus
Copy link
Contributor Author

Actually there are two metastores. In hive-1.2.1 when we use metastoe.Hive, it will create the metastore in static code block. As spark have two class loader(main class loader and hive metastore class loader), there will be two metasotres.

@srowen
Copy link
Member

srowen commented Oct 13, 2015

OK I confess I don't know this aspect well but I think you explained how this is different from just accidentally making two HiveContexts here. To my knowledge it sounds reasonable.

"org.datanucleus.store.rdbms.adapter.DerbyAdapter")

val hiveClass = mirror.classLoader.loadClass("org.apache.hadoop.hive.ql.metadata.Hive")
val hive = hiveClass.getMethod("get").invoke(null, hiveConf.asInstanceOf[Object])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the original problem is caused by this line, right?

Since the hive object is only used inside the condition on L1301, can't you move the original line inside that if and fix the problem? You don't need delegation tokens when using Derby (and hive.metastore.uris would be empty in that case), so there's no point in even trying to call this class if Derby is being used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

@SparkQA
Copy link

SparkQA commented Oct 16, 2015

Test build #43830 has finished for PR 9026 at commit 58dcf4d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Oct 16, 2015

Much better. :-) LGTM

@vanzin
Copy link
Contributor

vanzin commented Oct 16, 2015

ah, btw, can you fix the pr description to reflect the actual change? thanks!

@SaintBacchus SaintBacchus changed the title [SPARK-11000][YARN]Bug fix: Derby have booted the database twice in yarn security mode. [SPARK-11000][YARN]Load metadata.Hive class only when hive.metastore.uris was set to avoid bootting the database twice Oct 17, 2015
@JoshRosen
Copy link
Contributor

@vanzin, looks like the description has now been updated, so this should be ready for another look I think.

@vanzin
Copy link
Contributor

vanzin commented Oct 17, 2015

The description still mentions the old approach, just the title was changed. I can delete the description while merging though.

@asfgit asfgit closed this in e2dfdbb Oct 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants