-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17816] [Core] Fix ConcurrentModificationException issue in BlockStatusesAccumulator #15371
Conversation
ok to test |
Test build #66425 has finished for PR 15371 at commit
|
Unfortunately, this PR doesn't fix the java.util.ConcurrentModificationException. I can still repro it. I will spend more time on it tomorrow morning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm not sure this is the issue. It's being modified while it's being serialized.
("Block ID" -> id.toString) ~ | ||
("Status" -> blockStatusToJson(status)) | ||
}) | ||
val blockAccumulator = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why this would help. You add a wrapper, but, synchronizing your local access to it doesn't do anything because nothing else is synchronizing on it.
PS can the List[(BlockId,BlockStatus)]
type just be part of the match predicate?
@@ -281,7 +281,7 @@ private[spark] object JsonProtocol { | |||
("Finish Time" -> taskInfo.finishTime) ~ | |||
("Failed" -> taskInfo.failed) ~ | |||
("Killed" -> taskInfo.killed) ~ | |||
("Accumulables" -> JArray(taskInfo.accumulables.map(accumulableInfoToJson).toList)) | |||
("Accumulables" -> JArray(taskInfo.accumulables.toList.map(accumulableInfoToJson))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do -- just puts a copy before the work of mapping? I could see how that would tend to help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't the root cause but it's something nice to have. If you prefer, I can revert this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems OK if it's related cleanup, and potentially helps a closely related manifestation
One comment: once you figured out the proper fix, please add some comments inline so they don't get accidentally removed in the future. |
@seyfe Thanks for reporting this one. Actually, it's different from SPARK-17463. Could you create a new ticket for this issue, please? The cause is we send a mutable TaskInfo to listeners but we may still update TaskInfo's fields (e.g., accumulables) in another thread... Ideally, all events sent to the listeners should be immutable. |
Hi @zsxwing. I have a fix ready and testing it now. I will create a new ticket and send an updated PR today. |
Test build #66473 has finished for PR 15371 at commit
|
@seyfe The issue is |
@zsxwing.
|
@seyfe The comment you are deleting explains why it's safe: the driver doesn't modify |
@zsxwing . |
I also want to point out that below is the core part of fix. Rest of the code changes are side-effects of it.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumb question but how big is this? the solution here is to copy the data structure, which is a good defensive move, as long as it's not big and nothing is actually relying on observing changes to the underlying data. Is that valid?
@@ -281,7 +281,7 @@ private[spark] object JsonProtocol { | |||
("Finish Time" -> taskInfo.finishTime) ~ | |||
("Failed" -> taskInfo.failed) ~ | |||
("Killed" -> taskInfo.killed) ~ | |||
("Accumulables" -> JArray(taskInfo.accumulables.map(accumulableInfoToJson).toList)) | |||
("Accumulables" -> JArray(taskInfo.accumulables.toList.map(accumulableInfoToJson))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems OK if it's related cleanup, and potentially helps a closely related manifestation
Hi @srowen, This PR doesn't introduce any extra data copy operations. It moves the data copy code from I checked the data size using 3 different pipelines. 99% of the time ArrayList has less than 4 items. There was one of case where it maxed at 4000 items but that was less than 1% of the time. I ran my test with 4000 executors, I think that is why this 4000 number came up. I debated other options as well. Moving Json serialization into I don't know the answer for the second part of your question (below), but existing behavior is not changed. Only change is that we can convert ArrayList to a Scala List inside a synchronized block so we won't get
|
I see the new copy (of course or else this wouldn't help) but where is a copy removed? I'm probably overlooking it. A |
Hi @srowen , this is the part that I removed extra copy operation. I changed this line because this conversion is already done by BlockStatusesAccumulator.
|
@seyfe I'm taking my words back. Yea, |
override def value: java.util.List[(BlockId, BlockStatus)] = _seq | ||
// `asScala` accesses the internal values using `java.util.Iterator` so needs to be synchronized | ||
override def value: List[(BlockId, BlockStatus)] = { | ||
_seq.synchronized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_seq.synchronized
is wrong. Collections.synchronizedList
uses its internal mutex
to lock instead of this
.
Why changes them to Scala List? Just change this one to java.util.Collections.unmodifiableList(new ArrayList[(BlockId, BlockStatus)](_seq))
should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @zsxwing . I checked the java doc and it says that getting iterator is not thread safe and suggests below usage. That's why I did _seq.synchronized
https://docs.oracle.com/javase/7/docs/api/java/util/Collections.html
List list = Collections.synchronizedList(new ArrayList());
...
synchronized (list) {
Iterator i = list.iterator(); // Must be in synchronized block
while (i.hasNext())
foo(i.next());
}
Regarding your second questions, JsonProtocal
is using it as Scala collection that is why I converted it to a Scala collection so we won't need to convert again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Sorry. Didn't noticed this line (
mutex = this;
) inCollections.synchronizedList
... - I just took a look at
CollectionAccumulator
. I think we can just makeBlockStatusesAccumulator
extendsCollectionAccumulator
. This would eliminate these duplicated codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding #2, I take a look at CollectionAccumulator
as well and it seems like a good idea. Let me give it a try.
Test build #66534 has finished for PR 15371 at commit
|
Test build #66535 has finished for PR 15371 at commit
|
Test build #66537 has finished for PR 15371 at commit
|
Hi @zsxwing , The test failed with below error. I don't think that it's related with my change. Should we just re-run the test or do you have any suggestion?
|
retest this please |
Test build #66539 has finished for PR 15371 at commit
|
I don't know if it's related but I found a bug in the last iteration. We also need to override copy and copyAndReset methods. Otherwise it throws java.lang.ClassCastException error. |
Test build #66541 has finished for PR 15371 at commit
|
@zsxwing , I built it and the test pipelines works fine. So fix is good. But I don't know how to fix the MiMa tests. Would you mind helping me on this? |
@seyfe I think we can remove |
@zsxwing , I think that is a good idea. I search it and that is the only place we use |
Test build #66627 has finished for PR 15371 at commit
|
retest this please |
Test build #66632 has finished for PR 15371 at commit
|
da2311a
to
5e00dc3
Compare
Test build #66669 has finished for PR 15371 at commit
|
LGTM. Thanks! Merging to master and |
There are some conflicts with 2.0. @seyfe could you submit a PR for branch-2.0, please? Thanks! |
…kStatusesAccumulator Change the BlockStatusesAccumulator to return immutable object when value method is called. Existing tests plus I verified this change by running a pipeline which consistently repro this issue. This is the stack trace for this exception: ` java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) at java.util.ArrayList$Itr.next(ArrayList.java:851) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:183) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45) at scala.collection.TraversableLike$class.to(TraversableLike.scala:590) at scala.collection.AbstractTraversable.to(Traversable.scala:104) at scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:294) at scala.collection.AbstractTraversable.toList(Traversable.scala:104) at org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:314) at org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$5.apply(JsonProtocol.scala:291) at org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$5.apply(JsonProtocol.scala:291) at scala.Option.map(Option.scala:146) at org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:291) at org.apache.spark.util.JsonProtocol$$anonfun$taskInfoToJson$12.apply(JsonProtocol.scala:283) at org.apache.spark.util.JsonProtocol$$anonfun$taskInfoToJson$12.apply(JsonProtocol.scala:283) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35) at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:283) at org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:145) at org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:76) ` Author: Ergin Seyfe <eseyfe@fb.com> Closes apache#15371 from seyfe/race_cond_jsonprotocal.
…kStatusesAccumulator ## What changes were proposed in this pull request? Change the BlockStatusesAccumulator to return immutable object when value method is called. ## How was this patch tested? Existing tests plus I verified this change by running a pipeline which consistently repro this issue. This is the stack trace for this exception: ` java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) at java.util.ArrayList$Itr.next(ArrayList.java:851) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:183) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45) at scala.collection.TraversableLike$class.to(TraversableLike.scala:590) at scala.collection.AbstractTraversable.to(Traversable.scala:104) at scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:294) at scala.collection.AbstractTraversable.toList(Traversable.scala:104) at org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:314) at org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$5.apply(JsonProtocol.scala:291) at org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$5.apply(JsonProtocol.scala:291) at scala.Option.map(Option.scala:146) at org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:291) at org.apache.spark.util.JsonProtocol$$anonfun$taskInfoToJson$12.apply(JsonProtocol.scala:283) at org.apache.spark.util.JsonProtocol$$anonfun$taskInfoToJson$12.apply(JsonProtocol.scala:283) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35) at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:283) at org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:145) at org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:76) ` Author: Ergin Seyfe <eseyfe@fb.com> Closes apache#15371 from seyfe/race_cond_jsonprotocal.
What changes were proposed in this pull request?
Change the BlockStatusesAccumulator to return immutable object when value method is called.
How was this patch tested?
Existing tests plus I verified this change by running a pipeline which consistently repro this issue.
This is the stack trace for this exception:
java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) at java.util.ArrayList$Itr.next(ArrayList.java:851) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:183) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45) at scala.collection.TraversableLike$class.to(TraversableLike.scala:590) at scala.collection.AbstractTraversable.to(Traversable.scala:104) at scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:294) at scala.collection.AbstractTraversable.toList(Traversable.scala:104) at org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:314) at org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$5.apply(JsonProtocol.scala:291) at org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$5.apply(JsonProtocol.scala:291) at scala.Option.map(Option.scala:146) at org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:291) at org.apache.spark.util.JsonProtocol$$anonfun$taskInfoToJson$12.apply(JsonProtocol.scala:283) at org.apache.spark.util.JsonProtocol$$anonfun$taskInfoToJson$12.apply(JsonProtocol.scala:283) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35) at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:283) at org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:145) at org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:76)