Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dials.integrate background.algorithm=simple for integrating detectors #706

Closed
dagewa opened this issue Feb 13, 2019 · 16 comments
Closed

dials.integrate background.algorithm=simple for integrating detectors #706

dagewa opened this issue Feb 13, 2019 · 16 comments

Comments

@dagewa
Copy link
Member

dagewa commented Feb 13, 2019

The default background.algorithm=glm is the best choice for photon-counting detectors, however my experience suggests that background.algorithm=simple is more appropriate for integrating detectors. The reason seems to be simply the inverse of the logic that dictates why the GLM approach is better for counts: pixel values on integrating detectors are better described by a normal distribution than a Poisson distribution. Assuming Poisson will tend to overestimate the background level. As a result, integrating beyond the resolution limit leads to intensities with a negative bias. This does not just affect CCDs, but also other types of integrating detector, such as certain types of detector used in electron diffraction and most likely detectors used at the XFEL, like the CS-PAD.

At the moment, the background algorithm is a user choice, but very few people know that the default may not be the most appropriate choice. Are there ways in which we could make this choice automatically? It seems like the detector_helper_sensors could be put to work here (though it is not a 1-1 mapping, some chips allow operation in either counting or integration modes).

@phyy-nx
Copy link
Member

phyy-nx commented Feb 13, 2019 via email

@dagewa
Copy link
Member Author

dagewa commented Feb 14, 2019

Good point - an inappropriate default can work both ways. Auto should solve most issues for detectors we know about. For detector types that could be either counting or integrating there is some chance the appropriate response model would be recorded in metadata (he says, while fighting natural pessimism)

@dagewa
Copy link
Member Author

dagewa commented Feb 28, 2019

@jmp1985 wondering if you have an opinion here?

@jmp1985
Copy link
Contributor

jmp1985 commented Feb 28, 2019

@dagewa I agree that having an auto which knows about the detector type would be useful. In general, if the pixels can assume negative values then the glm method may be the best solution; it is hard to argue that this could be correctly modelled by a Poisson distribution (in particular thinking of those data you had which had average background < 0!)

@dagewa
Copy link
Member Author

dagewa commented Mar 5, 2019

It sounds like to do this right we need something additional to the sensor type property. At minimum an integrating vs counting choice that is set by the format class, then used by background.algorithm=auto within dials.integrate.

Even better would be a measure of detector response statistics. Back in Waterman & Evans (2010) we summarised this by a 'cascade factor' γ and 'pixel factor' ψ, and showed how the expressions for integrated intensity errors could be extended to take these into account. For a (true) photon counting detector γ=1 and ψ=0, so that's easy. Unfortunately for integrating detectors the actual values of these parameters are not well-known and can be difficult to measure.

It might be reasonable to add these parameters to the detector model anyway, and use γ≠1 simply as a flag to determine the behaviour of background.algorithm=auto without yet extending the integrated intensity error estimates. At the moment, the choice of background algorithm can be critical and can be the difference between successful integration or not for data from some types of detector.

I'd welcome other opinions on this.

@phyy-nx
Copy link
Member

phyy-nx commented Mar 6, 2019 via email

@dagewa
Copy link
Member Author

dagewa commented Mar 7, 2019

For a Rayonix CCD module we measured the cascade factor γ=1.41, which is not too far off a value obtained by simulation using manufacturer specs, which gave γ=1.54. The pixel factor used in simulation was ψ=4.38. We didn't compare this with measurement, but the true value is probably close. This factor relies mainly on read noise, which is published by manufacturers. ψ has a lower limit of 1/12 set by digitisation error.

I don't know what the defaults should be for other detectors though. It would be quite a bit of work to figure them all out. CCDs are probably fairly similar, but we can't be sure without some experiments (which no one is likely to do).

Mainly the effect would be to mitigate the underestimation of integrated intensity errors rather than affect the intensities themselves, although profile fitting results would be affected slightly as the errors are used during the fit. It is already well known that the errors on integrated intensities are wrong, which is usually 'dealt with' by inflating them during scaling (see SdFac, SdAdd and SdB in Aimless). By providing less bad initial estimates the main effect would probably be that the inflation factor becomes smaller. So, I think the effect might be fairly minor, but at least the model would be more realistic (unless the factors are given crazy values). Perhaps it would even be possible to optimise the parameters for particular detector types according to how much the scaling program has to inflate errors afterwards, avoiding a lot of difficult experiments.

I suspect the value of investing much time in this is limited though, at least for CCDs (CS PAD etc. may be a different matter). At the moment I would be more interested in using this merely as a flag to indicate when a detector was integrating, not counting, and the simple background algorithm should be switched on.

@phyy-nx
Copy link
Member

phyy-nx commented Mar 7, 2019 via email

@graeme-winter
Copy link
Contributor

graeme-winter commented Mar 7, 2019 via email

@dagewa dagewa added this to the DIALS 2.0 milestone May 2, 2019
@dagewa
Copy link
Member Author

dagewa commented May 2, 2019

I agree we should keep things simple at first, starting with an integrating vs counting enum, allowing future expansion. This should be set by the format class, depending on the detector type and mode of operation. This alone will allow background.algorithm=auto to be implemented.

The current default behaviour is inappropriate for integrating detectors such as CCDs and the CS-PAD, so I see this as an important enhancement, and I would suggest a useful feature to have in place for DIALS 2.0.

@dagewa
Copy link
Member Author

dagewa commented May 2, 2019

@jmp1985 I am happy to start the work here as I raised the issue, but it'd be good to have a conversation first about how to go about it.

@jmp1985
Copy link
Contributor

jmp1985 commented May 2, 2019

Hi @dagewa. I will pick this up when I am back in Diamond next week.

@jmp1985
Copy link
Contributor

jmp1985 commented May 7, 2019

Hi @dagewa. I think it would be very useful in the long term to have the more thorough description you suggest.

Another simple way of selecting with auto would be to simply query the trusted range of the detector. Counting detectors should not have negative pixels in the trusted range so we can use glm, otherwise use simple. Or are there cases where the pixel values are non-negative but we are still unable to make the assumption of the pixel values being Poisson distributed?

@dagewa
Copy link
Member Author

dagewa commented May 7, 2019

@jmp1985 Interesting point. For CCDs and certain types of detector used in electron microscopes it is true that negative values are 'allowed' measurements, assuming that any applied bias or pedestal level has been subtracted from the data first. In those cases it is certainly true that the trusted_range should start from a negative value, if it is set correctly.

It would be interesting to query the registry and look at what trusted_range is currently set to for each Format. I wonder if we are setting this appropriately in each case.

This wouldn't work in every case though. The mixed mode detector @phyy-nx mentions would an example where the assumption breaks down.

@dagewa
Copy link
Member Author

dagewa commented May 7, 2019

After a quick look I think it is clear we don't handle trusted_range properly, or at least we can't trust this to identify integrating detectors. There are image plates and CCDs where trusted_range is set to start from 0 (see FormatRAXISII.py, FormatDIP2030b.py, FormatSMVRigakuSaturn.py) and some weird logic used for SMV ADSC, where raw_data has the image pedestal value subtracted from it, but underload = pedestal - 1

@dagewa dagewa added this to To do in DIALS 2.0 via automation Jul 2, 2019
@dagewa dagewa modified the milestones: DIALS 2.0, DIALS 2.1 Jul 5, 2019
@dagewa dagewa removed this from To do in DIALS 2.0 Jul 5, 2019
@dagewa
Copy link
Member Author

dagewa commented Dec 5, 2019

It seems the proposals here got lost by becoming too complicated.

How about I start by creating background.algorithm=auto as the default option and simply set this to glm if gain==1 and simple otherwise? I assert that background.algorithm=glm is not what you want when gain != 1

dagewa added a commit that referenced this issue Dec 6, 2019
Choose between glm and simple methods depending on whether the
detector appears to be counting or not (#706)
dagewa added a commit that referenced this issue Dec 16, 2019
@dagewa dagewa closed this as completed in e8eafc9 Dec 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants