-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16139][TEST] Add logging functionality for leaked threads in tests #19893
Conversation
Logging the leaked threads in a more grep friendly format would be nice, you could easily create a thread leak report. |
Good point, I've also struggled to collect all actual problems from Jenkins build :)
|
# Each line contains a new regex string which will be evaluated with matches | ||
# Empty lines or starting with # will me skipped | ||
|
||
rpc-client.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is just for testing support, I personally think there's no need to create a config file and read it. Hard-coding filtering rules may be just fine. Neutral on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Sean here - there's not a really obvious use case for having this independent of the class where it's used. Putting it into the code means that the whitelist feature is self-documenting, and you don't have to go through any indirection to find this file.
Plus I think moving this into SparkFunSuite means you can get rid of the file loading logic in 'object SparkFunSuite'.
@@ -683,7 +683,7 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg | |||
val conf = new SparkConf().set("spark.speculation", "true") | |||
sc = new SparkContext("local", "test", conf) | |||
|
|||
val sched = new FakeTaskScheduler(sc, ("execA", "host1"), ("execB", "host2")) | |||
sched = new FakeTaskScheduler(sc, ("execA", "host1"), ("execB", "host2")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was this change about? not shadowing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here originally the newly created instance was stored in a local variable which was never saved in member and freed properly. With this change the afterEach method stops it and frees up the resources.
I have gathered statistics manually about the actual stand. I've grep-ed unit-tests.log in the whole build:
|
Do any of those leaked threads look like they might be real issues to fix? you could paste the results here, minus anything you know isn't a problem. |
I've just started to take a look at it deeper and found some patterns. Namely we can exclude all netty.* threads + ForkJoinPool.* is most of the time but not always created inside scala by the global ExecutionContext. All in all far from have a good picture but I'll exclude these entries. |
On the other side globalEventExecutor.* and dag-scheduler-event-loop was an issue in the tests what I've taken a look at. |
Here is a list but it definitely contains false positives. |
Test build #4006 has finished for PR 19893 at commit
|
I've taken a look at the failed test but seems like unrelated. |
@@ -52,6 +62,23 @@ abstract class SparkFunSuite | |||
getTestResourceFile(file).getCanonicalPath | |||
} | |||
|
|||
private def saveThreadNames(): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest turning this into runningThreadNames(): Set[String], and then you can use this method both in beforeAll() and in printRemainingThreadNames() (line 70). And you can maybe put the whitelist logic here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Config file removed and refactored as you suggested. It's much more simple now, thanks :)
|
||
private def printRemainingThreadNames(): Unit = { | ||
val currentThreadNames = Thread.getAllStackTraces.keySet().map(_.getName).toSet | ||
val whitelistedThreadNames = currentThreadNames. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: '.' goes on next line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved.
# Each line contains a new regex string which will be evaluated with matches | ||
# Empty lines or starting with # will me skipped | ||
|
||
rpc-client.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Sean here - there's not a really obvious use case for having this independent of the class where it's used. Putting it into the code means that the whitelist feature is self-documenting, and you don't have to go through any indirection to find this file.
Plus I think moving this into SparkFunSuite means you can get rid of the file loading logic in 'object SparkFunSuite'.
private def printRemainingThreadNames(): Unit = { | ||
val currentThreadNames = Thread.getAllStackTraces.keySet().map(_.getName).toSet | ||
val whitelistedThreadNames = currentThreadNames. | ||
filterNot(s => SparkFunSuite.threadWhiteList.exists(s.matches(_))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: .filterNot { s =>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style applied.
val whitelistedThreadNames = currentThreadNames. | ||
filterNot(s => SparkFunSuite.threadWhiteList.exists(s.matches(_))) | ||
val remainingThreadNames = whitelistedThreadNames.diff(beforeAllTestThreadNames) | ||
if (!remainingThreadNames.isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remainingThreadNames.nonEmpty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
@@ -72,3 +99,27 @@ abstract class SparkFunSuite | |||
} | |||
|
|||
} | |||
|
|||
object SparkFunSuite | |||
extends Logging { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to previous line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Object removed due to previous review items.
In the meantime analysed a couple of cases and found netty related threads:
I've added them to the whitelist. |
import org.apache.spark.internal.Logging | ||
import org.apache.spark.util.AccumulatorContext | ||
import org.scalatest.{BeforeAndAfterAll, FunSuite, Outcome} | ||
|
||
import scala.collection.JavaConversions._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder why the style checker didn't complain, but scala.*
imports should be in the previous position.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've executed reorganize imports. Shouldn't solve this such problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what that is.
The import order is described in http://spark.apache.org/contributing.html, section "Imports".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the guidance. I've set up the intellij imports organizer as described and fixed with it.
@@ -34,12 +36,24 @@ abstract class SparkFunSuite | |||
with Logging { | |||
// scalastyle:on | |||
|
|||
val threadWhiteList = Set( | |||
"rpc-client.*", "rpc-server.*", "shuffle-client.*", "shuffle-server.*", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add comments explaining why the threads are whitelisted. Without an explanation to the contrary, I don't think any of these should be whitelisted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the new netty related patterns added documentation. Could somebody help me out with rpc and shuffle? All I can see for example TaskSetManagerSuite.test("TaskSet with no preferences") creates a lot of them and I don't see any test issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Temporarily removed rpc and shuffle. I'll put them back when proper doc can be written.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made deepdive what these threads are and put documentation for each. I'll execute a build with them and let's see the new numbers.
ok to test |
val currentThreadNames = runningThreadNames() | ||
val whitelistedThreadNames = currentThreadNames | ||
.filterNot { s => threadWhiteList.exists(s.matches(_)) } | ||
val remainingThreadNames = whitelistedThreadNames.diff(beforeAllTestThreadNames) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think this would be better written as:
val remainingThreadNames = runningThreadNames.diff(beforeAllTestThreadNames).filterNot { s => threadWhiteList.exists(s.matches(_)) }
(although putting the whitelist filtering into runningThreadNames() would still make this more concise).
The reason is that it's not obvious to the reader why you whitelist 'after' threads but not 'before' - clearer to whitelist the diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compressed.
Test build #84577 has finished for PR 19893 at commit
|
Test build #84574 has finished for PR 19893 at commit
|
Test build #84580 has finished for PR 19893 at commit
|
Test build #84606 has finished for PR 19893 at commit
|
Test build #84601 has finished for PR 19893 at commit
|
@squito I mean another jira, because it needs deeper analysis and discussion. |
gentle ping @jiangxb1987 |
Test build #85290 has finished for PR 19893 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM only some nits. also cc @cloud-fan
} | ||
|
||
private def printRemainingThreadNames(): Unit = { | ||
val suiteName = this.getClass.getName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
val shortSuiteName = this.getClass.getName.replaceAll("org.apache.spark", "o.a.s")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
s"thread names: ${remainingThreadNames.mkString(", ")} =====\n") | ||
} | ||
} else { | ||
logWarning(s"\n\n===== THREAD AUDIT POST ACTION CALLED " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove 's' before the string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
||
protected override def beforeAll(): Unit = { | ||
doThreadPreAudit | ||
super.beforeAll |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: super.beforeAll()
, and also super.afterAll()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Test build #85315 has finished for PR 19893 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the places where you're overriding doThreadAuditInSparkFunSuite
, it seems like the code is just not correct, and that you can just fix it instead of overriding that behavior.
with Logging { | ||
// scalastyle:on | ||
|
||
protected val doThreadAuditInSparkFunSuite = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this just doThreadAudit
or enableThreadAudit
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the way this is being used elsewhere, a better name is probably enableAutoThreadAudit
. or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about proper naming before. The last suggested one is definitely better. No exact place where it happens but not suggesting that it's completely turned off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to enableAutoThreadAudit.
/** | ||
* Thread audit for test suites. | ||
* | ||
* Thread audit happens normally in [[SparkFunSuite]] automatically when a new test suite created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't describe the behavior of SparkFunSuite
here. You should instead document the flag in SparkFunSuite
that controls whether this code is triggered.
All the comments in the rest of this class are related to SparkFunSuite
and overriding its default behavior, so they're better placed in SparkFunSuite
and not here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, moving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just remove this paragraph since this class is independent of SparkFunSuite
.
|
||
/** | ||
* During [[SparkContext]] creation BlockManager | ||
* creates event loops. One is wrapped inside |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: line wrapped too early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
protected def doThreadPostAudit(): Unit = printRemainingThreadNames | ||
|
||
private def snapshotRunningThreadNames(): Unit = { | ||
threadNamesSnapshot = runningThreadNames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: call with ()
if you declare the method with ()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
protected def doThreadPreAudit(): Unit = snapshotRunningThreadNames | ||
protected def doThreadPostAudit(): Unit = printRemainingThreadNames | ||
|
||
private def snapshotRunningThreadNames(): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you just inline this in doThreadPreAudit
since it's the only call site and this is a private method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inlined.
threadNamesSnapshot = runningThreadNames | ||
} | ||
|
||
private def printRemainingThreadNames(): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same reasoning as above. Just inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inlined.
@@ -39,6 +41,7 @@ class SessionStateSuite extends SparkFunSuite | |||
protected var activeSession: SparkSession = _ | |||
|
|||
override def beforeAll(): Unit = { | |||
doThreadPreAudit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the problem here that this is not calling super.beforeAll()
? If you do that, you don't need to override doThreadAuditInSparkFunSuite
nor call doThreadPostAudit
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. My intention was to change the least in the tests behaviour. This case doesn't matter.
private var targetAttributes: Seq[Attribute] = _ | ||
private var targetPartitionSchema: StructType = _ | ||
|
||
override def beforeAll(): Unit = { | ||
doThreadPreAudit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here. This should be calling super.beforeAll()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
override protected val doThreadAuditInSparkFunSuite = false | ||
|
||
protected override def beforeAll(): Unit = { | ||
doThreadPreAudit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the same situation, but because this is a trait, it kinda relies on the suites to call beforeAll
and afterAll
correctly... if you don't want to audit all suites, you could write a comment explaining the situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's kind of similar but not the same. Comment added.
override def beforeAll(): Unit = { | ||
// Reuse the singleton session | ||
activeSession = spark | ||
doThreadPreAudit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing. Just call super.beforeAll()
correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Test build #85722 has finished for PR 19893 at commit
|
* | ||
* class MyTestSuite extends SparkFunSuite { | ||
* | ||
* override val doThreadAuditInSparkFunSuite = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enableAutoThreadAudit
now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
* Thread audit happens normally here automatically when a new test suite created. | ||
* The only prerequisite for that is that the test class must extend [[SparkFunSuite]]. | ||
* | ||
* There are some test suites which are doing initialization before [[SparkFunSuite#beforeAll]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better:
"
It is possible to override the default thread audit behavior by setting enableAutoThreadAudit
to false and manually calling the audit methods, if desired. For example:
// Code
"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
/** | ||
* Thread audit for test suites. | ||
* | ||
* Thread audit happens normally in [[SparkFunSuite]] automatically when a new test suite created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just remove this paragraph since this class is independent of SparkFunSuite
.
trait SharedSQLContext extends SQLTestUtils with SharedSparkSession { | ||
|
||
/** | ||
* Auto thread audit is turned off here intentionally and done manually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a little not convinced that this is needed.
I still think that any reported leaks here are caused by bugs in the test suites and not because of this. The code you have here is basically the same thing as SparkFunSuite
.
For example, if a suite extending this does not call super.beforeAll()
but calls super.afterAll()
, won't you get false positives in the output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing but it meant to solve different problem (changes the execution order). Please see the execution order with and without this change described in my previous post:
As a next step analysed SQL test flow. Here are the steps:
1. SharedSparkSession.beforeAll called which initialise SparkSession and SQLContext
2. SparkFunSuite.beforeAll creates a thread snapshot
3. Test code runs
4. SparkFunSuite.afterAll prints out the possible leaks
5. SharedSparkSession.afterAll stops SparkSession
Not sure if I understand right but this will not report false positives. The only problem what I see here as it's not gonna report SparkSession and SQLContext related leaks.
As you mentioned before this code should find SparkContext related threading issues which applies here as well. This is not fulfilled at the moment and my proposal is to fix it this way:
1. SparkFunSuite.beforeAll creates a thread snapshot
2. SharedSparkSession.beforeAll called which initialise SparkSession and SQLContext
3. Test code runs
4. SharedSparkSession.afterAll stops SparkSession
5. SparkFunSuite.afterAll prints out the possible leaks
With this change I don't see any false positives and missed threads.
Please share your ideas related this topic.
Your concern is fully standing but this change is not intended to cover the mentioned issue. The problem you mentioned is addressed in ThreadAudit, namely it prints out the following message such cases:
THREAD AUDIT POST ACTION CALLED WITHOUT PRE ACTION IN SUITE...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand your explanation, and I definitely don't understand what's going on from the comment in the code. What I'm asking is for the comment here to explain not what the code is doing, but why it's doing it.
Basically, if instead of the code you have here, you just called super.beforeAll
and super.afterAll
, without disabling enableAutoThreadAudit
, what will break and why? That's what the comment should explain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, now I see your point. Description changed.
Test build #85802 has finished for PR 19893 at commit
|
Test build #86035 has finished for PR 19893 at commit
|
Checked the failure but seems like unrelated. |
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you had a chance to look at the hive tests? There's a whole bunch of reported thread leaks there. Hive tests behave differently from all of the others in that they share a spark session across suites, not just within a suite.
Examples of reported thread leaks:
- broadcast-exchange-441
- block-manager-ask-thread-pool-22
- ForkJoinPool-289-worker-13
- HMSHandler Deprecated and added a few java api methods for corresponding scala api. #19
And a whole bunch of others.
|
||
/** | ||
* Suites extending [[SharedSQLContext]] are sharing resources (eg. SparkSession) in their tests. | ||
* Such resources are initialized by the suite before thread audit takes thread snapshot and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry but this still does not explain why this is happening. It's just stating that it is.
For example, in SharedSparkSession
, there is this code:
protected override def afterAll(): Unit = {
super.afterAll()
if (_spark != null) {
_spark.sessionState.catalog.reset()
_spark.stop()
_spark = null
}
}
If you move super.afterAll()
to after the session is stopped, won't that solve the problem and avoid this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your suggestion solves one part of the problem. The other one lies here:
protected override def beforeAll(): Unit = {
initializeSession()
// Ensure we have initialized the context before calling parent code
super.beforeAll()
}
Session initialized before thread snapshot. This should also happen in the opposite order. Because I've seen the comment in the code I decided not to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the latter a problem? At worst you'll have less threads after the suite finishes than when it started, which should be fine, no? The problem is having leaked threads, not the other way around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think I see your point. Still, the comment here is confusing. Can't this be done in SharedSparkSession
instead, where that initialization happens, so that it's clear what it's talking about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an easier to understand comment about what's going on here:
/**
* Suites extending [[SharedSQLContext]] are sharing resources (eg. SparkSession) in their tests.
* That trait initializes the spark session in its [[beforeAll()]] implementation before the
* automatic thread snapshot is performed, so the audit code could fail to report threads leaked
* by that shared session.
*
* The behavior is overridden here to take the snapshot before the spark session is initialized.
*/
Sorry for the noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better phrased and compressed explanation, applied. Agree that it would be better to move this functionality into SharedSparkSession
on the other hand it would lead far in terms of number of modifications. SharedSparkSession
has to extend has to extend SparkFunSuite
which I don't see it worth the effort. The other option what I see also doesn't help in terms of understanding. Namely moving manual thread audit into SparkFunSuite
and leaving enableAutoThreadAudit = false
in SharedSQLContext
but splitting functionality rarely can help. Ideas?
Related hive please see my comment on 11 Dec 2017. |
Why not disable the thread audit in the hive module? You added that functionality already, should be pretty trivial to use it. |
Test build #86050 has finished for PR 19893 at commit
|
Thread audit disabled in hive. |
Test build #86093 has finished for PR 19893 at commit
|
Test build #86094 has finished for PR 19893 at commit
|
Merging to master. It would be nice to file a separate bug to eventually look at how to do this on the spark-hive module (or maybe it's just not worth the effort). |
@vanzin @squito @srowen @jiangxb1987 @henryr |
What changes were proposed in this pull request?
Lots of our tests don't properly shutdown everything they create, and end up leaking lots of threads. For example,
TaskSetManagerSuite
doesn't stop the extraTaskScheduler
andDAGScheduler
it creates. There are a couple more instances, eg. inDAGSchedulerSuite
.This PR adds the possibility to print out the not properly stopped thread list after a test suite executed. The format is the following:
With the help of this leaking threads has been identified in TaskSetManagerSuite. My intention is to hunt down and fix such bugs in later PRs.
How was this patch tested?
Manual: TaskSetManagerSuite test executed and found out where are the leaking threads.
Automated: Pass the Jenkins.