Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26533][SQL][test-hadoop2.7] Support query auto timeout cancel on thriftserver #28991

Closed
wants to merge 8 commits into from

Conversation

leoluan2009
Copy link
Contributor

@leoluan2009 leoluan2009 commented Jul 3, 2020

Support query auto cancelling when running too long on thriftserver.

What changes were proposed in this pull request?

Why are the changes needed?

For some cases,we use thriftserver as long-running applications.
Some times we want all the query need not to run more than given time.
In these cases,we can enable auto cancel for time-consumed query.Which can let us release resources for other queries to run.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added UT

@leoluan2009
Copy link
Contributor Author

leoluan2009 commented Jul 3, 2020

@wangyum please help to review
thanks!


test("SPARK-26533: Support query auto timeout cancel on thriftserver") {
withJdbcStatement() { statement =>
statement.setQueryTimeout(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you test the cases 0 and -1? They mean no limit?

   /**
     ...
     * @param seconds the new query timeout limit in seconds; zero means
     *        there is no limit
     * @exception SQLException if a database access error occurs,
     * this method is called on a closed <code>Statement</code>
     *            or the condition {@code seconds >= 0} is not satisfied
     * @see #getQueryTimeout
     */
    void setQueryTimeout(int seconds) throws SQLException;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fixed

@@ -204,6 +205,13 @@ private[hive] class SparkExecuteStatementOperation(
parentSession.getUsername)
setHasResultSet(true) // avoid no resultset for async run

if(queryTimeout > 0) {
Executors.newSingleThreadScheduledExecutor
.schedule(new Runnable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit format:

      Executors.newSingleThreadScheduledExecutor.schedule(new Runnable {
          override def run(): Unit = timeoutCancel()
        }, queryTimeout, TimeUnit.SECONDS)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fixed

if (!getStatus.getState.isTerminal) {
logInfo(s"Timeout and Cancel query with $statementId ")
cleanup()
setState(OperationState.TIMEDOUT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: logInfo is located just before HiveThriftServer2.eventManager.onXXX?

        cleanup()
        setState(OperationState.TIMEDOUT)
        logInfo(s"Timeout and Cancel query with $statementId ")
        HiveThriftServer2.eventManager.onStatementCanceled(statementId)

logInfo(s"Close statement with $statementId")
HiveThriftServer2.eventManager.onOperationClosed(statementId)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please setState before cleanup. It's an open bug, see #28912

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please setState before cleanup. It's an open bug, see #28912

Yea, nice suggestion! Might be better to add tests for that case, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please setState before cleanup. It's an open bug, see #28912

Fixed, thanks for your review

@@ -349,6 +357,17 @@ private[hive] class SparkExecuteStatementOperation(
}
}

def timeoutCancel(): Unit = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we inline this method in the line 211?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure this method could be inline, because this method should be synchronized on SparkExecuteStatementOperation object
def timeoutCancel(): Unit = { synchronized {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. btw, def timeoutCancel() -> private def timeoutCancel()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. btw, def timeoutCancel() -> private def timeoutCancel()?

Fixed, thanks!

@@ -32,7 +32,8 @@
CLOSED(TOperationState.CLOSED_STATE, true),
ERROR(TOperationState.ERROR_STATE, true),
UNKNOWN(TOperationState.UKNOWN_STATE, false),
PENDING(TOperationState.PENDING_STATE, false);
PENDING(TOperationState.PENDING_STATE, false),
TIMEDOUT(TOperationState.CANCELED_STATE, true); //do not want to change TOperationState in hive 1.2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does //do not want to change TOperationState in hive 1.2 means?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class of TOperationState was generated by thift, so we should not change it directly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not work and never be used with hive-1.2 anyway, because it's not in HIVE_CLI_SERVICE_PROTOCOL_V8.
But adding the state here does not hurt, it allows it to compile with hive-1.2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Only hive-2.3 support timeout.

@maropu maropu changed the title [SPARK-26533][SQL] Support query auto timeout cancel on thriftserver [SPARK-26533][SQL][test-hive1.2] Support query auto timeout cancel on thriftserver Jul 3, 2020
@wangyum
Copy link
Member

wangyum commented Jul 3, 2020

ok to test.

@maropu
Copy link
Member

maropu commented Jul 5, 2020

ok to test

@SparkQA
Copy link

SparkQA commented Jul 6, 2020

Test build #124982 has finished for PR 28991 at commit 262d306.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu changed the title [SPARK-26533][SQL][test-hive1.2] Support query auto timeout cancel on thriftserver [SPARK-26533][SQL][test-hive1.2][test-hadoop2.7] Support query auto timeout cancel on thriftserver Jul 6, 2020
@maropu
Copy link
Member

maropu commented Jul 6, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Jul 6, 2020

Test build #125039 has finished for PR 28991 at commit 262d306.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -874,6 +874,22 @@ class HiveThriftBinaryServerSuite extends HiveThriftJdbcTest {
assert(rs.getString(1) === expected.toString)
}
}

test("SPARK-26533: Support query auto timeout cancel on thriftserver") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check if it can throw an exception (when hive1.2 used internally) by using HiveUtils.builtinHiveVersion or the other related variables?

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125039/testReport/org.apache.spark.sql.hive.thriftserver/HiveThriftBinaryServerSuite/SPARK_26533__Support_query_auto_timeout_cancel_on_thriftserver/

Error Message
java.sql.SQLException: Method not supported
Stacktrace
sbt.ForkMain$ForkError: java.sql.SQLException: Method not supported
	at org.apache.hive.jdbc.HiveStatement.setQueryTimeout(HiveStatement.java:739)
	at org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$97(HiveThriftServer2Suites.scala:880)
	at 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks

assert(e.contains("Query timed out after"))

statement.setQueryTimeout(0)
statement.execute("select java_method('java.lang.Thread', 'sleep', 3000L)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change query to: SELECT 'test1', java_method('java.lang.Thread', 'sleep', 3000L); and assert the result to make the test more robust.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks

@SparkQA
Copy link

SparkQA commented Jul 6, 2020

Test build #125063 has finished for PR 28991 at commit 24997ba.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 6, 2020

Test build #125060 has finished for PR 28991 at commit 3317120.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@wangyum wangyum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a config: spark.sql.thriftServer.queryTimeout?

@maropu
Copy link
Member

maropu commented Jul 6, 2020

Could we add a config: spark.sql.thriftServer.queryTimeout?

The thriftserver supports the global config for timeout? This PR itself proposes per-statement config for timeout thoguh.

@wangyum
Copy link
Member

wangyum commented Jul 6, 2020

Yes. Hive 2.3 support the global config for timeout: https://issues.apache.org/jira/browse/HIVE-13760.

@SparkQA
Copy link

SparkQA commented Jul 6, 2020

Test build #125077 has finished for PR 28991 at commit 23b43f9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@leoluan2009
Copy link
Contributor Author

@wangyum @maropu should we support the global config for timeout?

@wangyum
Copy link
Member

wangyum commented Jul 8, 2020

Yes. This will be useful for user to manage queries with SLA. @maropu @juliuszsompolski What do you think?

@maropu
Copy link
Member

maropu commented Jul 8, 2020

Yea, looks okay to add it. But, it might better to make anotehr PR for the support.

@juliuszsompolski
Copy link
Contributor

Looks okay to add it to me as well.

@leoluan2009
Copy link
Contributor Author

ok, I will make another PR to support global config for timeout
@maropu @wangyum @juliuszsompolski thanks for your review!

@maropu
Copy link
Member

maropu commented Aug 3, 2020

Could you resolve the conflict?

@leoluan2009
Copy link
Contributor Author

@maropu can you take a look, I have resolved the conflict, thanks

@SparkQA
Copy link

SparkQA commented Aug 7, 2020

Test build #127204 has finished for PR 28991 at commit 3a07fc5.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Aug 7, 2020

@leoluan2009 Could you fix the build failure?

@SparkQA
Copy link

SparkQA commented Aug 11, 2020

Test build #127300 has finished for PR 28991 at commit 4ca75ba.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 11, 2020

Test build #127303 has finished for PR 28991 at commit da76e1a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@leoluan2009
Copy link
Contributor Author

@maropu all build are ok now, can you take a look again? thanks

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay cc: @dongjoon-hyun @cloud-fan

@@ -86,20 +86,15 @@ private void initOperationLogCapture(String loggingMode) {
}

public ExecuteStatementOperation newExecuteStatementOperation(HiveSession parentSession,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to replace the previous methods here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this update is just to pass queryTimeout into newExecuteStatementOperation.

@@ -87,7 +87,7 @@ private void initOperationLogCapture(String loggingMode) {
}

public ExecuteStatementOperation newExecuteStatementOperation(HiveSession parentSession,
String statement, Map<String, String> confOverlay, boolean runAsync)
String statement, Map<String, String> confOverlay, boolean runAsync, long queryTimeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change these at all since queryTimeout is supported only in hive-2.3?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This update is needed to add the queryTimeout param in SparkSQLOperationManager#newExecuteStatementOperation:
https://github.com/apache/spark/pull/28991/files#diff-9d2cd65aaeae992250b5f40d8c289287R48
Looked around the related code and I think the current one is the simplest. If you have better idea or another suggestion, please let me know in #29933.

@bogdanghit
Copy link
Contributor

Made a pass. I would reduce the unnecessary changes in hive-1.2 and avoid collapsing the two newExecuteStatementOperation methods into one.

@maropu
Copy link
Member

maropu commented Aug 23, 2020

@leoluan2009 Are you still here? Could you address the review comments above?

@maropu
Copy link
Member

maropu commented Aug 23, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Aug 23, 2020

Test build #127805 has finished for PR 28991 at commit da76e1a.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Sep 28, 2020

kindly ping @leoluan2009

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-26533][SQL][test-hive1.2][test-hadoop2.7] Support query auto timeout cancel on thriftserver [SPARK-26533][SQL][test-hadoop2.7] Support query auto timeout cancel on thriftserver Oct 5, 2020
@maropu maropu closed this in d9ee33c Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants