-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve spotfinding #758
Improve spotfinding #758
Conversation
One thing to note is that I found that the kernel size of 3x3 actually performed well in the majority of cases; however, one CCD dataset I had with really massive spots did better with a kernel of 5x5. I don't think it should require a kernel any larger than that in the majority of cases. |
It's my understanding that we've explicitly rejected improving the spotfinding in the past because speed was a much higher priority. I'm assuming this will sit as an alternate method, but do you have any quantification of what the actual improvements are? |
- Do the dispersion threshold - Erode by half the kernel size - Expand the kernel slightly - Compute the local mean exluding dispersion threshold (i.e. non background pixel) - Compute the strong pixel mask
a3b7c4f
to
1e2e2fb
Compare
@rjgildea Thanks for doing these tests. The results look positive. |
Fixes case of looking at old pickles which do not have the n_signal column; was looked for in setting the defaults and failed
I've now added a faster implementation and done a crude benchmark using the bag training data. The old spot finding algorithm is obviously faster but extracts fewer spots, mainly due to filtering out small spots < 3 pixels. Since the spots extracted from the new algorithm are generally larger, this is less of a problem. However, 4 spots were filtered due to having more than 1000 pixels. The new algorithm takes approximately 1.6 times as long to run but as can be seen above, gives much better results Old New |
The average of ten runs: Old: New |
Fails for me with Eiger data in xia2:
|
@rjgildea asks "did you type Make" No, I did not |
FYI: for testing both algorithms:
|
On understanding that we will add a background estimate in the future, agree is a good idea. |
@@ -61,7 +61,7 @@ def generate_phil_scope(): | |||
"to be accepted by the filtering algorithm." | |||
.type = int(value_min=1) | |||
|
|||
max_spot_size = 100 | |||
max_spot_size = 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly worried that this might have a negative effect on the standard spotfinding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_spot_size=100 is overly conservative, especially in the context of smaller eiger pixels (remember this is the number of spot pixels in 3 dimensions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I suppose rings picked up will either be noise or eliminated by the other filters.
Do we really want to move to this as default immediately? Otherwise, I see no reason why not to merge this now, it mostly just inserts new things without affecting the existing structure. |
@ndevenish makes spot finding slower but better - that said I am happy to see this merged (size change & all) re: sizes - could have 100 for 2D spots, 1000 for 3D? |
This pull request implements an extended dispersion threshold algorithm for spot finding to try and address #752.
The algorithm does the following steps:
The rationale for the algorithm is that the dispersion mask is computed by testing the null hypothesis that all the pixels in the local area are drawn from a Poisson distribution. If this hypothesis is false then the pixel is marked as True in the dispersion mask.
If a strong pixel is at the edge of the local area then the pixel at the centre of the box is marked True. This means that the dispersion image tends to have the strong pixels and a band of pixels 1/2 the kernel size around it. Therefore, for true strong spots we can safely erode the dispersion mask by 1/2 the kernel size. This results in a mask which pretty much fits the the spot exactly.
Finally, we want to only select the strong pixels (sometimes the erodes mask still incorporates weak pixels); however, the problem we had before is that we were using the mean calculated including the strong pixels. This was inevitable if we wanted to do the whole thing with a single summed area table. However, we can recompute the mean excluding the dispersion masked pixels with a slightly expanded kernel to ensure we have plenty of pixels to choose from. This gives actually a pretty good estimate of the background from which we can then threshold the strong pixels as done previously.
The end result is that we get much better segmentation of the strong spots. However, the algorithms is a bit slower (currently a lot slower since I have only implemented the debug version - once we are happy that the algorithm is an improvement I will implement a more optimised version). Once optimized I think it will still require more passes of the image data:
However, if slower still means better then this may be acceptable.
Here are some examples of some output:
The raw data from an I23 image
The threshold from the same image
The spot finding results with strong pixels marked
I have also modified the largest spot size to be 1000 pixels (10x10x10) since this will tend to find larger spots.
Another nice side effect is that the could now have pretty decent background estimates from the spot finding with little extra cost which might also make the change desirable.
Next steps are to test thoroughly, particular on the data that @rjgildea has been looking at. Then once we are satisfied we need to optimize the code!