Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6749] [SQL] Make metastore client robust to underlying socket connection loss #6912

Closed
wants to merge 4 commits into from

Conversation

ericl
Copy link
Contributor

@ericl ericl commented Jun 19, 2015

This works around a bug in the underlying RetryingMetaStoreClient (HIVE-10384) by refreshing the metastore client on thrift exceptions. We attempt to emulate the proper hive behavior by retrying only as configured by hiveconf.

@ericl
Copy link
Contributor Author

ericl commented Jun 19, 2015

@yhuai

@SparkQA
Copy link

SparkQA commented Jun 20, 2015

Test build #35340 has finished for PR 6912 at commit 7c8ee16.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// We use hive's conf for compatibility.
private val retryLimit = conf.getIntVar(HiveConf.ConfVars.METASTORETHRIFTFAILURERETRIES)
private val retryDelayMillis = 1000 * conf.getIntVar(
HiveConf.ConfVars.METASTORE_CLIENT_CONNECT_RETRY_DELAY)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, Hive 0.14 changed it to

METASTORE_CLIENT_CONNECT_RETRY_DELAY("hive.metastore.client.connect.retry.delay", "1s",
        new TimeValidator(TimeUnit.SECONDS),
        "Number of seconds for the client to wait between consecutive connection attempts"),

Unfortunately we can't use getTimeVar() yet since we compile against
hive 0.13
@SparkQA
Copy link

SparkQA commented Jun 23, 2015

Test build #35512 has finished for PR 6912 at commit 980b3e5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 23, 2015

Test build #35585 has finished for PR 6912 at commit 0e3a74e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Jun 23, 2015

test this please.

@SparkQA
Copy link

SparkQA commented Jun 23, 2015

Test build #35599 has finished for PR 6912 at commit 0e3a74e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

s"(${retryLimit - numTries} tries remaining)", e)
Thread.sleep(retryDelayMillis)
try {
client = Hive.get(conf, true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we call this method from a different thread than the one created this client wrapper, the conf may be different (the class loader associated with the conf may be different). Let me think about the potential impact of this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we use the hive conf returned by state.getConf. This is the conf we used to created the original client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@SparkQA
Copy link

SparkQA commented Jun 24, 2015

Test build #35628 has finished for PR 6912 at commit 2d54b55.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Jun 24, 2015

LGTM. I am merging this to master.

@asfgit asfgit closed this in 50c3a86 Jun 24, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants