Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConcurrentModificationException when working with measurements #1444

Open
petebankhead opened this issue Jan 4, 2024 · 14 comments · May be fixed by #1466
Open

ConcurrentModificationException when working with measurements #1444

petebankhead opened this issue Jan 4, 2024 · 14 comments · May be fixed by #1466
Labels

Comments

@petebankhead
Copy link
Member

petebankhead commented Jan 4, 2024

Reopening this as it's very similar and still happening in QuPath 0.5.0
A user had an annotation inside which there was another annotation filled with detections (over 5000)
When runing "Delaunay cluster features 2D" we ran into

Error running plugin: java.util.ConcurrentModificationException
java.util.concurrent.ExecutionException: java.util.ConcurrentModificationException
at java.base/java.util.concurrent.FutureTask.report(Unknown Source)
at java.base/java.util.concurrent.FutureTask.get(Unknown Source)
at qupath.lib.plugins.AbstractTaskRunner.awaitCompletion(AbstractTaskRunner.java:147)
at qupath.lib.plugins.AbstractTaskRunner.runTasks(AbstractTaskRunner.java:117)
at qupath.lib.gui.TaskRunnerFX.runTasks(TaskRunnerFX.java:106)
at qupath.lib.plugins.AbstractPlugin.runPlugin(AbstractPlugin.java:147)
at qupath.lib.gui.ParameterDialogWrapper$1.run(ParameterDialogWrapper.java:177)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.util.ConcurrentModificationException
at java.base/java.util.ArrayList.checkForComodification(Unknown Source)
at java.base/java.util.ArrayList.equals(Unknown Source)
at java.base/java.util.WeakHashMap.matchesKey(Unknown Source)
at java.base/java.util.WeakHashMap.get(Unknown Source)
at java.base/java.util.Collections$SynchronizedMap.get(Unknown Source)
at qupath.lib.measurements.NumericMeasurementList$AbstractNumericMeasurementList.getNameMap(NumericMeasurementList.java:142)
at qupath.lib.measurements.NumericMeasurementList$AbstractNumericMeasurementList.close(NumericMeasurementList.java:133)
at qupath.lib.measurements.NumericMeasurementList$FloatList.close(NumericMeasurementList.java:352)
at qupath.opencv.features.DelaunayTriangulation.addClusterMeasurements(DelaunayTriangulation.java:466)
at qupath.opencv.features.DelaunayClusteringPlugin$DelaunayRunnable.run(DelaunayClusteringPlugin.java:215)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

I can ask them to share a QuPath project if it's useful to you.

The issue appeared on a built from source linux QuPath if that's of any use.

All the best and happy new year

Originally posted by @lacan in #1182 (comment)

@petebankhead
Copy link
Member Author

Creating a new issue because from the stack trace it looks like a different issue (related to measurement lists, not object descendants) so any fix should reference this instead.

@Rylern
Copy link
Contributor

Rylern commented Feb 9, 2024

I can ask them to share a QuPath project if it's useful to you.

That would be useful because I was not able to replicate the issue (on MacOS, I will try on Linux later).

However, based on the exception, I think the problem is that the names ArrayList of the AbstractNumericMeasurementList class is accessed from multiple threads without synchronization. Using a CopyOnWriteArrayList instead of an ArrayList may solve this issue.

@petebankhead
Copy link
Member Author

However, based on the exception, I think the problem is that the names ArrayList of the AbstractNumericMeasurementList class is accessed from multiple threads without synchronization. Using a CopyOnWriteArrayList instead of an ArrayList may solve this issue.

I think the list shouldn't be directly accessed elsewhere, and the put method (which calles list.add) is synchronized. Maybe the issue is that the clear() method isn't synchronized?

@Rylern
Copy link
Contributor

Rylern commented Feb 9, 2024

I think both read and write access should be synchronized. Right now, line 142 and line 241 can be executed at the same time.

@petebankhead
Copy link
Member Author

Ah, interesting... yes that looks like a horribly subtle concurrency bug. Synchronizing getNameMap() might fix it?

It does seem that the only time getNameMap() can be called at the same time as modifying the list is during a call to close(). I'd be concerned about any possible inconsistency if we switch to CopyOnWriteArrayList (but perhaps just because I'm not familiar enough with its behavior).

I tried to replicate the issue with

def pathObject = PathObjects.createDetectionObject(ROIs.createEmptyROI())

java.util.stream.IntStream.range(1, 1000)
    .parallel()
    .each(i -> {
        def ml = pathObject.getMeasurementList()
        ml.put("Hello " + i, Math.random())
        ml.close()
    })

...but I failed to get it to throw any exception.

@Rylern
Copy link
Contributor

Rylern commented Feb 9, 2024

Synchronizing getNameMap() might fix it?

I think all access to names should be synchronized. If it's not the case, lines 207 and 241 could be executed at the same time for example. But this may drops the performances of the class. I will read more about concurrency in Java to exactly know what to do in such situations.

Switching to CopyOnWriteArrayList is not necessary if all access are synchronized.

@Rylern
Copy link
Contributor

Rylern commented Feb 13, 2024

So I confirm that all access to any mutable variable should be synchronized. From "Concurrency in Practice":

Whenever more than one thread accesses a given state variable, and one of them might write to it, they all must coordinate their access to it using synchronization.

I can refactor NumericMeasurementList to make it thread-safe.

@petebankhead
Copy link
Member Author

I think that I hadn't really appreciated that using the list as key in the map would result in equals being called and checking all elements.

The purpose was to ensure that identical string lists aren't duplicated for all objects - since there can be millions, which could easily cost hundreds of MB overhead.

Basically, we'd like measurements to be accessible like Map<String, double>, but that doesn't exist in Java, and Map<String, Double> would have much more overhead. Therefore instead we use List<String> and either float[] or double[], where entries in the list correspond to entries in the array. Most of the time, objects will have the same keys/strings - but this isn't enforced (and won't be true when measurements are being added, since the list will be growing). Therefore we want to be able to check when lists are duplicated, and use one instead.

The current design is probably very suboptimal, but it's quite core to QuPath (for performance, memory use and serialization) so any major change would need to be very carefully checked.

@Rylern Rylern linked a pull request Feb 14, 2024 that will close this issue
@Rylern
Copy link
Contributor

Rylern commented Feb 14, 2024

See #1466. I didn't change anything about the design, I just added synchronization.

Today I was able to reproduce the exception above. #1466 solved it.

@petebankhead
Copy link
Member Author

Today I was able to reproduce the exception above

Great! How?

#1466 solved it.

Looks good, is there a way to check if it has any significant performance impact?

I'll check this out as well when I can (I expect it's fine, I've just had synchronization surprise me in the past).

@Rylern
Copy link
Contributor

Rylern commented Feb 14, 2024

Great! How?

  • I opened CMU-1.tiff in a project.
  • I defined an annotation covering almost the entire image.
  • I defined another annotation also almost covering the entire image.
  • I made the 2nd annotation a child of the first.
  • I ran Cell Detection inside the 2nd annotation with default parameters. I got 99465 detections.
  • I saved the image.

The exception occurs each time I run Delaunay cluster features 2D with Add cluster measurements checked. It's a bit weird because I did the same thing yesterday but I didn't have the exception.

Looks good, is there a way to check if it has any significant performance impact?

It has, because running Delaunay cluster features 2D takes around 2.67s without the PR and around 3.71s with the PR (for 99465 detections).

@petebankhead
Copy link
Member Author

petebankhead commented Feb 14, 2024

Wow, thanks, that replicates the issue for me too.

This sounds like a bug / intuitive behavior within the Delaunay triangulation. It's concerning that measurements can be added multiple times to the same objects. It suggests that the results might not be fully deterministic, depending upon the status of the object hierarchy and precisely which annotations are selected.

I'm reluctant to fix the underlying issue in a 0.0.x release, but we should try to replace the command entirely. An implementation with DelaunayTools should be cleaner than the current OpenCV-based one.

As I understand it, this shows that the existing Delaunay command should not be used for nested annotations that contain detections.

Single annotations, or annotations arranged in a 'flat' way (so that the same detection is not a descendent of more than one selected annotation) should be ok.

@Rylern
Copy link
Contributor

Rylern commented Feb 14, 2024

OK, so should I discard the PR? If this is the only issue occuring with NumericMeasurementList, it means that this class doesn't have to be thread-safe, so performance can be saved by keeping the current implementation.

@petebankhead
Copy link
Member Author

Oh no, please keep the PR for now!

I'll check it out in more detail soon - but you've demonstrated that there is a concurrency bug with the measurement list. It just wouldn't have arisen if the Delaunay command wasn't buggy too. Similarly, the performance probably wouldn't have changed noticeably if the Delaunay command wasn't problematic... so this may not be a major issue in other contexts.

One thing to check would be 'Add smoothed measurements' with lots of cells, since this should add a lot of measurements in parallel - but I think only one thread should be accessing each measurement list. Therefore I hope synchronization doesn't cause substantial overhead.

In any case, I think MeasurementList implementations should be thread-safe - so we should address this in either v0.5.1 or v0.6.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants