Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-34731][CORE] Avoid ConcurrentModificationException when redact…
…ing properties in EventLoggingListener ### What changes were proposed in this pull request? Change DAGScheduler to pass a clone of the Properties object, rather than the original object, to the SparkListenerJobStart event. ### Why are the changes needed? DAGScheduler might modify the Properties object (e.g., in addPySparkConfigsToProperties) after firing off the SparkListenerJobStart event. Since the handler for that event (onJobStart in EventLoggingListener) will iterate over the elements of the Property object, this sometimes results in a ConcurrentModificationException. This can be demonstrated using these steps: ``` $ bin/spark-shell --conf spark.ui.showConsoleProgress=false \ --conf spark.executor.cores=1 --driver-memory 4g --conf \ "spark.ui.showConsoleProgress=false" \ --conf spark.eventLog.enabled=true \ --conf spark.eventLog.dir=/tmp/spark-events ... scala> (0 to 500).foreach { i => | val df = spark.range(0, 20000).toDF("a") | df.filter("a > 12").count | } 21/03/12 18:16:44 ERROR AsyncEventQueue: Listener EventLoggingListener threw an exception java.util.ConcurrentModificationException at java.util.Hashtable$Enumerator.next(Hashtable.java:1387) ``` I've not actually seen a ConcurrentModificationException in onStageSubmitted, only in onJobStart. However, they both iterate over the Properties object, so for safety's sake I pass a clone to SparkListenerStageSubmitted as well. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By repeatedly running the reproduction steps from above. Closes #31826 from bersprockets/elconcurrent. Authored-by: Bruce Robbins <bersprockets@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org> (cherry picked from commit f8a8b34) Signed-off-by: HyukjinKwon <gurwls223@apache.org>
- Loading branch information