Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11996][Core]Make the executor thread dump work again #9976

Closed
wants to merge 3 commits into from
Closed

[SPARK-11996][Core]Make the executor thread dump work again #9976

wants to merge 3 commits into from

Conversation

zsxwing
Copy link
Member

@zsxwing zsxwing commented Nov 25, 2015

In the previous implementation, the driver needs to know the executor listening address to send the thread dump request. However, in Netty RPC, the executor doesn't listen to any port, so the executor thread dump feature is broken.

This patch makes the driver use the endpointRef stored in BlockManagerMasterEndpoint to send the thread dump request to fix it.

In the previous implementation, the driver needs to know the executor listening address to send the thread dump request. However, in Netty RPC, the executor doesn't listen to any port, and the executor thread dump feature is broken.

This patch makes the driver use the endpointRef stored in BlockManagerMasterEndpoint to send the thread dump request to fix it.
@zsxwing
Copy link
Member Author

zsxwing commented Nov 25, 2015

/cc @vanzin

@SparkQA
Copy link

SparkQA commented Nov 25, 2015

Test build #46703 has finished for PR 9976 at commit 7188561.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

import org.apache.spark.rpc.{ThreadSafeRpcEndpoint, RpcEnv, RpcCallContext, RpcEndpoint}
import org.apache.spark.util.ThreadUtils
import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint}
import org.apache.spark.util.{Utils, ThreadUtils}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: order.

@vanzin
Copy link
Contributor

vanzin commented Nov 25, 2015

LGTM.

@SparkQA
Copy link

SparkQA commented Nov 25, 2015

Test build #46706 has finished for PR 9976 at commit 43d0f3a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 26, 2015

Test build #46712 has finished for PR 9976 at commit d626bfc.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented Nov 26, 2015

retest this please

@SparkQA
Copy link

SparkQA commented Nov 26, 2015

Test build #46734 has finished for PR 9976 at commit d626bfc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented Nov 26, 2015

retest this please

@SparkQA
Copy link

SparkQA commented Nov 26, 2015

Test build #46746 has finished for PR 9976 at commit d626bfc.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 26, 2015

Test build #2118 has finished for PR 9976 at commit d626bfc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented Nov 26, 2015

retest this please

@SparkQA
Copy link

SparkQA commented Nov 26, 2015

Test build #46776 has finished for PR 9976 at commit d626bfc.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented Nov 27, 2015

Test build #46776 has finished for PR 9976 at commit d626bfc.

This patch fails from timeout after a configured wait of 250m.
This patch merges cleanly.
This patch adds no public classes.

Not sure why this happens frequently. I saw some weird logs in the build:

[info] MQTTStreamSuite:
[info] - mqtt input stream (1 second, 879 milliseconds)
[info] Test run started
[info] Test org.apache.spark.streaming.mqtt.JavaMQTTStreamSuite.testMQTTStream started
[info] Test run finished: 0 failed, 0 ignored, 1 total, 0.292s
[info] ScalaTest
[info] Run completed in 18 minutes, 2 seconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 2, Failed 0, Errors 0, Passed 2

Just several seconds for the tests but the total time was 18 minutes. Is this because the global ivy lock, such as, other build in the same machine took a lot of time to resolve the dependencies?

@zsxwing
Copy link
Member Author

zsxwing commented Nov 27, 2015

retest this please

@SparkQA
Copy link

SparkQA commented Nov 27, 2015

Test build #46786 has finished for PR 9976 at commit d626bfc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Nov 27, 2015

I'm going to merge this.

asfgit pushed a commit that referenced this pull request Nov 27, 2015
In the previous implementation, the driver needs to know the executor listening address to send the thread dump request. However, in Netty RPC, the executor doesn't listen to any port, so the executor thread dump feature is broken.

This patch makes the driver use the endpointRef stored in BlockManagerMasterEndpoint to send the thread dump request to fix it.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #9976 from zsxwing/executor-thread-dump.

(cherry picked from commit 0c1e72e)
Signed-off-by: Reynold Xin <rxin@databricks.com>
@asfgit asfgit closed this in 0c1e72e Nov 27, 2015
@zsxwing zsxwing deleted the executor-thread-dump branch November 27, 2015 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants