[SPARK-13186][Streaming]migrate away from SynchronizedMap #11250

huaxingao · 2016-02-18T08:57:51Z

trait SynchronizedMap in package mutable is deprecated: Synchronization via traits is deprecated as it is inherently unreliable. Change to java.util.concurrent.ConcurrentHashMap instead.

huaxingao · 2016-02-18T09:03:14Z

@srowen @holdenk

Could you please take a look of this PR? I ran the python streaming test cleanly on my local before I submitted the PR.

Thanks a lot!!

srowen · 2016-02-18T10:36:59Z

extras/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisStreamSuite.scala

    })

    ssc.remember(Minutes(60)) // remember all the batches so that they are all saved in checkpoint
    ssc.start()

-    def numBatchesWithData: Int = collectedData.count(_._2._2.nonEmpty)
+    def numBatchesWithData: Int = collectedData.asScala.count(_._2._2.nonEmpty)


I think that we have a problem in lines like this, still. I think this is what @holdenk was alluding to. This returns a wrapper on the collection, and then iterates over it to count non-empty elements. But it may be modified by the put above while that happens, throwing ConcurrentModificationException. We'd have to clone it, or synchronize on the whole object while counting (the latter is probably better).

In that case, it may not add any value to use Java's ConcurrentHashMap. Synchronizing access to mutable.HashMap is the same and doesn't require using a Java type.

@srowen
Thanks for your comment. For the 5 files I changed, I will remove the usage of Java ConcurrentHashMap, and use mutable.HashMap instead. I will wrap every mutable.HashMap operation in a synchronized block.

That could work, we can also just use things like collectedData.values().asScala.count(_._2.nonEmpty)

srowen · 2016-02-20T09:09:26Z

extras/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisStreamSuite.scala

@@ -268,9 +270,9 @@ abstract class KinesisStreamTests(aggregateTestData: Boolean) extends KinesisFun

    // Verify that the recomputed RDDs are KinesisBackedBlockRDDs with the same sequence ranges
    // and return the same data
-    val times = collectedData.keySet
+    val times = collectedData.synchronized { collectedData.keySet }


I think you'd either have to make a copy of the key set being returned, or synchronize the entire foreach block below. The set is (I believe) backed by the collection and can be modified during iteratio.

srowen · 2016-02-21T11:17:31Z

Jenkins test this please

SparkQA · 2016-02-21T12:38:03Z

Test build #51627 has finished for PR 11250 at commit 3de8e40.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-02-22T09:44:48Z

Merged to master

huaxingao · 2016-02-22T18:24:02Z

@srowen @holdenk
Thank you very much for your help!!

[SPARK-13186][Streaming]migrate away from SynchronizedMap

ff5e52a

srowen reviewed Feb 18, 2016
View reviewed changes

huaxingao added 2 commits February 19, 2016 03:30

add synchronized block

37dfc57

change synchroniz block scope

3de8e40

srowen reviewed Feb 20, 2016
View reviewed changes

asfgit closed this in 8f35d3e Feb 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-13186][Streaming]migrate away from SynchronizedMap #11250

[SPARK-13186][Streaming]migrate away from SynchronizedMap #11250

huaxingao commented Feb 18, 2016

huaxingao commented Feb 18, 2016

srowen Feb 18, 2016

huaxingao Feb 19, 2016

holdenk Feb 19, 2016

srowen Feb 20, 2016

srowen commented Feb 21, 2016

SparkQA commented Feb 21, 2016

srowen commented Feb 22, 2016

huaxingao commented Feb 22, 2016

[SPARK-13186][Streaming]migrate away from SynchronizedMap #11250

[SPARK-13186][Streaming]migrate away from SynchronizedMap #11250

Conversation

huaxingao commented Feb 18, 2016

huaxingao commented Feb 18, 2016

srowen Feb 18, 2016

Choose a reason for hiding this comment

huaxingao Feb 19, 2016

Choose a reason for hiding this comment

holdenk Feb 19, 2016

Choose a reason for hiding this comment

srowen Feb 20, 2016

Choose a reason for hiding this comment

srowen commented Feb 21, 2016

SparkQA commented Feb 21, 2016

srowen commented Feb 22, 2016

huaxingao commented Feb 22, 2016