Improve spotfinding #758

jmp1985 · 2019-04-17T14:27:17Z

This pull request implements an extended dispersion threshold algorithm for spot finding to try and address #752.

The algorithm does the following steps:

Compute the dispersion image
Compute the dispersion mask as normal
Erode the dispersion mask by half the kernel size
Expand the kernel slightly
Compute the mean pixel value excluding the dispersion mask
Compute the strong mask as normal

The rationale for the algorithm is that the dispersion mask is computed by testing the null hypothesis that all the pixels in the local area are drawn from a Poisson distribution. If this hypothesis is false then the pixel is marked as True in the dispersion mask.

If a strong pixel is at the edge of the local area then the pixel at the centre of the box is marked True. This means that the dispersion image tends to have the strong pixels and a band of pixels 1/2 the kernel size around it. Therefore, for true strong spots we can safely erode the dispersion mask by 1/2 the kernel size. This results in a mask which pretty much fits the the spot exactly.

Finally, we want to only select the strong pixels (sometimes the erodes mask still incorporates weak pixels); however, the problem we had before is that we were using the mean calculated including the strong pixels. This was inevitable if we wanted to do the whole thing with a single summed area table. However, we can recompute the mean excluding the dispersion masked pixels with a slightly expanded kernel to ensure we have plenty of pixels to choose from. This gives actually a pretty good estimate of the background from which we can then threshold the strong pixels as done previously.

The end result is that we get much better segmentation of the strong spots. However, the algorithms is a bit slower (currently a lot slower since I have only implemented the debug version - once we are happy that the algorithm is an improvement I will implement a more optimised version). Once optimized I think it will still require more passes of the image data:

One pass to compute the initial summed area table.
One pass to compute the dispersion mask.
Two passes to compute the distance metric for the erosion
One pass to compute the eroded dispersion mask
One pass to compute the second summed area table
One pass to compute the final mask

However, if slower still means better then this may be acceptable.

Here are some examples of some output:

The raw data from an I23 image

The threshold from the same image

The spot finding results with strong pixels marked

I have also modified the largest spot size to be 1000 pixels (10x10x10) since this will tend to find larger spots.

Another nice side effect is that the could now have pretty decent background estimates from the spot finding with little extra cost which might also make the change desirable.

Next steps are to test thoroughly, particular on the data that @rjgildea has been looking at. Then once we are satisfied we need to optimize the code!

jmp1985 · 2019-04-17T14:29:32Z

One thing to note is that I found that the kernel size of 3x3 actually performed well in the majority of cases; however, one CCD dataset I had with really massive spots did better with a kernel of 5x5. I don't think it should require a kernel any larger than that in the majority of cases.

ndevenish · 2019-04-19T14:10:33Z

However, if slower still means better then this may be acceptable.

It's my understanding that we've explicitly rejected improving the spotfinding in the past because speed was a much higher priority. I'm assuming this will sit as an alternate method, but do you have any quantification of what the actual improvements are?

- Do the dispersion threshold - Erode by half the kernel size - Expand the kernel slightly - Compute the local mean exluding dispersion threshold (i.e. non background pixel) - Compute the strong pixel mask

rjgildea · 2019-04-24T09:36:11Z

Anecdotally the spot-finding results look a lot better, e.g. kernel_size=3,3 for image 144 of the i23 germanate dataset:
Before:

After:

Comparison of merging statistics with the old algorithm (b=before) and the new algorithm (a=after):

Possibly still some improvements to be made by increasing the kernel size slightly, but the changes are a lot smaller than before.

rjgildea · 2019-04-24T10:00:46Z

For i03_thaum_eiger dataset there seems a lot less dependence of integration results on kernel size :

rjgildea · 2019-04-24T10:25:20Z

Similarly for I04_Thaum_eiger:

rjgildea · 2019-04-24T10:54:10Z

I04 thaumatin pilatus (i04_bag_training):

rjgildea · 2019-04-24T10:57:31Z

VMXi thaumatin:

jmp1985 · 2019-04-24T12:18:44Z

@rjgildea Thanks for doing these tests. The results look positive.

Fixes case of looking at old pickles which do not have the n_signal column; was looked for in setting the defaults and failed

added a test

jmp1985 · 2019-04-26T13:11:35Z

I've now added a faster implementation and done a crude benchmark using the bag training data.

The old spot finding algorithm is obviously faster but extracts fewer spots, mainly due to filtering out small spots < 3 pixels. Since the spots extracted from the new algorithm are generally larger, this is less of a problem. However, 4 spots were filtered due to having more than 1000 pixels. The new algorithm takes approximately 1.6 times as long to run but as can be seen above, gives much better results

Old
Saved 116082 reflections to strong.mpack
Time Taken: 122.089928

New
Saved 126782 reflections to strong.mpack
Time Taken: 198.506274

jmp1985 · 2019-04-26T14:34:03Z

The average of ten runs:

Old:
Time taken: 124.1504409 seconds

New
Time taken: 175.5127746 seconds

graeme-winter · 2019-05-03T09:58:38Z

Fails for me with Eiger data in xia2:

Grey-Area with :) $ cat DEFAULT/NATIVE/SWEEP1/index/2_dials.find_spots.log 
DIALS (2018) Acta Cryst. D74, 85-97. https://doi.org/10.1107/S2059798317017235
DIALS 2.dev.258-g67c4870f5
The following parameters have been modified:

output {
  reflections = "2_SWEEP1_strong.pickle"
  experiments = "2_SWEEP1_experiments.json"
}
spotfinder {
  write_hot_mask = True
  hot_mask_prefix = "2_hot_mask"
  scan_range = 1 488
  filter {
    min_spot_size = 3
  }
  mp {
    nproc = 8
  }
  threshold {
    dispersion {
      min_local = 0
    }
  }
}
input {
  experiments = /Users/graeme/Projects/chef-dI/improved-spotfinding/with/DEFAULT/NATIVE/SWEEP1/index/1_SWEEP1_experiments.json
}

Configuring spot finder from input parameters
--------------------------------------------------------------------------------
Finding strong spots in imageset 0
--------------------------------------------------------------------------------


Finding spots in image 1 to 488...
Setting chunksize=61
Extracting strong pixels from images
 Using multiprocessing with 8 parallel job(s)



easy_mp crash detected; subprocess trace: ----
Traceback (most recent call last):
  File "/Users/graeme/svn/cctbx/modules/cctbx_project/libtbx/scheduling/job_scheduler.py", line 64, in job_cycle
    value = target( *args, **kwargs )
  File "/Users/graeme/svn/cctbx/modules/dials/util/mp.py", line 191, in __call__
    result.append(self.func(i))
  File "/Users/graeme/svn/cctbx/modules/dials/algorithms/spot_finding/finder.py", line 285, in __call__
    result = self.function(task)
  File "/Users/graeme/svn/cctbx/modules/dials/algorithms/spot_finding/finder.py", line 147, in __call__
    threshold_mask = self.threshold_function.compute_threshold(im, mk)
  File "/Users/graeme/svn/cctbx/modules/dials/extensions/dispersion_extended_spotfinder_threshold_ext.py", line 66, in compute_threshold
    return self._algorithm(image, mask)
  File "/Users/graeme/svn/cctbx/modules/dials/algorithms/spot_finding/threshold.py", line 194, in __call__
    algorithm = threshold.DispersionExtendedThreshold(
'module' object has no attribute 'DispersionExtendedThreshold'
----------------------------------------------


Traceback (most recent call last):
  File "/Users/graeme/svn/cctbx/build/../modules/dials/command_line/find_spots.py", line 224, in <module>
    halraiser(e)
  File "/Users/graeme/svn/cctbx/build/../modules/dials/command_line/find_spots.py", line 222, in <module>
    script.run()
  File "/Users/graeme/svn/cctbx/build/../modules/dials/command_line/find_spots.py", line 153, in run
    reflections = flex.reflection_table.from_observations(experiments, params)
  File "/Users/graeme/svn/cctbx/modules/dials/array_family/flex.py", line 208, in from_observations
    return find_spots(experiments)
  File "/Users/graeme/svn/cctbx/modules/dials/algorithms/spot_finding/finder.py", line 770, in __call__
    table, hot_mask = self._find_spots_in_imageset(imageset)
  File "/Users/graeme/svn/cctbx/modules/dials/algorithms/spot_finding/finder.py", line 867, in _find_spots_in_imageset
    r, h = extract_spots(imageset[j0:j1])
  File "/Users/graeme/svn/cctbx/modules/dials/algorithms/spot_finding/finder.py", line 478, in __call__
    return self._find_spots(imageset)
  File "/Users/graeme/svn/cctbx/modules/dials/algorithms/spot_finding/finder.py", line 582, in _find_spots
    callback=process_output,
  File "/Users/graeme/svn/cctbx/modules/dials/util/mp.py", line 272, in batch_multi_node_parallel_map
    preserve_exception_message=True,
  File "/Users/graeme/svn/cctbx/modules/dials/util/mp.py", line 172, in multi_node_parallel_map
    preserve_exception_message=preserve_exception_message,
  File "/Users/graeme/svn/cctbx/modules/dials/util/mp.py", line 45, in parallel_map
    preserve_exception_message=preserve_exception_message,
  File "/Users/graeme/svn/cctbx/modules/cctbx_project/libtbx/easy_mp.py", line 628, in parallel_map
    result = res()
  File "/Users/graeme/svn/cctbx/modules/cctbx_project/libtbx/scheduling/result.py", line 119, in __call__
    self.traceback( exception = self.exception() )
  File "/Users/graeme/svn/cctbx/modules/cctbx_project/libtbx/scheduling/stacktrace.py", line 117, in __call__
    self.raise_handler( exception = exception )
  File "/Users/graeme/svn/cctbx/modules/cctbx_project/libtbx/scheduling/stacktrace.py", line 134, in raise_with_traceback
    raise_(type(exception), exception, self.traceback)
  File "/Users/graeme/svn/cctbx/modules/cctbx_project/libtbx/scheduling/mainthread.py", line 100, in poll
    value = target( *args, **kwargs )
  File "/Users/graeme/svn/cctbx/modules/dials/util/mp.py", line 88, in __call__
    preserve_exception_message=self.preserve_exception_message,
  File "/Users/graeme/svn/cctbx/modules/cctbx_project/libtbx/easy_mp.py", line 628, in parallel_map
    result = res()
  File "/Users/graeme/svn/cctbx/modules/cctbx_project/libtbx/scheduling/result.py", line 119, in __call__
    self.traceback( exception = self.exception() )
  File "/Users/graeme/svn/cctbx/modules/cctbx_project/libtbx/scheduling/stacktrace.py", line 117, in __call__
    self.raise_handler( exception = exception )
  File "/Users/graeme/svn/cctbx/modules/cctbx_project/libtbx/scheduling/stacktrace.py", line 147, in raise_with_stacktrace
    self.stacktrace( exception = exception )
  File "/Users/graeme/svn/cctbx/modules/cctbx_project/libtbx/scheduling/stacktrace.py", line 88, in __call__
    raise exception
AttributeError: Please report this error to dials-support@lists.sourceforge.net: 'module' object has no attribute 'DispersionExtendedThreshold'

graeme-winter · 2019-05-03T10:00:41Z

@rjgildea asks "did you type Make"

No, I did not

rjgildea · 2019-05-24T10:47:27Z

FYI: for testing both algorithms:

$ dials.find_spots -c -e2 | grep algorithm
    algorithm = dispersion *dispersion_extended helen single```

graeme-winter · 2019-05-24T10:49:53Z

On understanding that we will add a background estimate in the future, agree is a good idea.

branch started in pre-precommitbx era

ndevenish · 2019-06-05T15:17:43Z

algorithms/spot_finding/factory.py

@@ -61,7 +61,7 @@ def generate_phil_scope():
                "to be accepted by the filtering algorithm."
        .type = int(value_min=1)

-      max_spot_size = 100
+      max_spot_size = 1000


Slightly worried that this might have a negative effect on the standard spotfinding?

max_spot_size=100 is overly conservative, especially in the context of smaller eiger pixels (remember this is the number of spot pixels in 3 dimensions).

And I suppose rings picked up will either be noise or eliminated by the other filters.

ndevenish · 2019-06-05T15:23:39Z

Do we really want to move to this as default immediately?

Otherwise, I see no reason why not to merge this now, it mostly just inserts new things without affecting the existing structure.

graeme-winter · 2019-06-05T15:26:39Z

@ndevenish makes spot finding slower but better - that said I am happy to see this merged (size change & all)

re: sizes - could have 100 for 2D spots, 1000 for 3D?

jmp1985 requested review from ndevenish, graeme-winter, rjgildea, dagewa and phyy-nx April 17, 2019 14:27

jmp1985 added 4 commits April 23, 2019 14:36

Added chebyshev distance and test

68dfd40

Added a new spot finder

b589bfb

- Do the dispersion threshold - Erode by half the kernel size - Expand the kernel slightly - Compute the local mean exluding dispersion threshold (i.e. non background pixel) - Compute the strong pixel mask

Added a test for the new spot finder

246bac1

Modified largest spot size to be 1000 10*10*10 contiguous pixels

1e2e2fb

jmp1985 force-pushed the improve_spotfinding branch from a3b7c4f to 1e2e2fb Compare April 23, 2019 13:36

rjgildea referenced this pull request Apr 25, 2019

Very generous default values rather than None

742d6f1

Fixes case of looking at old pickles which do not have the n_signal column; was looked for in setting the defaults and failed

jmp1985 added 3 commits April 26, 2019 13:35

Ignore masked value in mean_filter function

5fa5f45

Added a faster implementation of the extended spot finding code and

ef5c1be

added a test

Fixed function breaking test

67c4870

Anthchirp added 2 commits May 24, 2019 21:46

address lgtm issues

0055990

run black

ef2abb7

branch started in pre-precommitbx era

ndevenish reviewed Jun 5, 2019

View reviewed changes

graeme-winter and others added 3 commits June 7, 2019 11:48

Flake8 cleanup

ad5e145

Typo: x->y

877a61a

Add placeholder file for release notes

a7966d9

Anthchirp merged commit 137eff7 into master Jun 7, 2019

Anthchirp deleted the improve_spotfinding branch June 7, 2019 11:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve spotfinding #758

Improve spotfinding #758

jmp1985 commented Apr 17, 2019

jmp1985 commented Apr 17, 2019 •

edited

ndevenish commented Apr 19, 2019

rjgildea commented Apr 24, 2019

rjgildea commented Apr 24, 2019

rjgildea commented Apr 24, 2019

rjgildea commented Apr 24, 2019

rjgildea commented Apr 24, 2019

jmp1985 commented Apr 24, 2019

jmp1985 commented Apr 26, 2019

jmp1985 commented Apr 26, 2019

graeme-winter commented May 3, 2019

graeme-winter commented May 3, 2019

rjgildea commented May 24, 2019 •

edited

graeme-winter commented May 24, 2019

ndevenish Jun 5, 2019

rjgildea Jun 5, 2019

ndevenish Jun 5, 2019

ndevenish commented Jun 5, 2019 •

edited

graeme-winter commented Jun 5, 2019

Improve spotfinding #758

Improve spotfinding #758

Conversation

jmp1985 commented Apr 17, 2019

jmp1985 commented Apr 17, 2019 • edited

ndevenish commented Apr 19, 2019

rjgildea commented Apr 24, 2019

rjgildea commented Apr 24, 2019

rjgildea commented Apr 24, 2019

rjgildea commented Apr 24, 2019

rjgildea commented Apr 24, 2019

jmp1985 commented Apr 24, 2019

jmp1985 commented Apr 26, 2019

jmp1985 commented Apr 26, 2019

graeme-winter commented May 3, 2019

graeme-winter commented May 3, 2019

rjgildea commented May 24, 2019 • edited

graeme-winter commented May 24, 2019

ndevenish Jun 5, 2019

Choose a reason for hiding this comment

rjgildea Jun 5, 2019

Choose a reason for hiding this comment

ndevenish Jun 5, 2019

Choose a reason for hiding this comment

ndevenish commented Jun 5, 2019 • edited

graeme-winter commented Jun 5, 2019

jmp1985 commented Apr 17, 2019 •

edited

rjgildea commented May 24, 2019 •

edited

ndevenish commented Jun 5, 2019 •

edited