Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating detector background model variance #1692

Merged
merged 11 commits into from
May 24, 2021

Conversation

dagewa
Copy link
Member

@dagewa dagewa commented May 7, 2021

The variance calculation for the "simple" background model, with a Constant3dModel or a Constant2dModel appeared incorrect. This was being calculated by var[k] = mean[k] / (count - 1);, which means that if the background mean value is negative (which can happen with integrating detectors), then the variance could be set negative, which is clearly wrong.

This PR sets the variance to the actual sample variance of the background pixels using a standard one-pass variance algorithm. Two tests fail:

Results (5.93s):
       6 passed
       2 failed
         - test/algorithms/background/test_modeller.py:159 TestPoisson.test_constant2d_modeller
         - test/algorithms/background/test_modeller.py:181 TestPoisson.test_constant3d_modeller

I might need help from @jmp1985 in figuring out what these tests are supposed to show and why they fail.

@dagewa dagewa requested a review from jmp1985 May 7, 2021 16:19
@dagewa
Copy link
Member Author

dagewa commented May 7, 2021

Perhaps what is meant by the variance of the background model here is actually the square of the standard error of the mean? In that case the value requires division by an additional factor of N. I would appreciate clarification regarding what variances() of a background model is supposed to represent.

@dagewa
Copy link
Member Author

dagewa commented May 7, 2021

NB an additional division by count does indeed make those tests pass 🤔

@dagewa
Copy link
Member Author

dagewa commented May 7, 2021

Thinking about this some more, I think the further division by N makes sense. It implies that the variances() of the constant background model are not in fact sample variances of the values of the background pixels, but the squared standard error of the mean of those background pixels. This is equivalently the expected variance of the mean, so is correctly a measure of the uncertainty in the modelled background value. I've pushed a commit that makes that change. The tests all now pass. I'd like to look more closely into the effect on data processing before merging and maybe add an additional test that ensures the calculation does equal the SEM^2.

@dagewa dagewa requested a review from phyy-nx May 7, 2021 21:20
@dagewa
Copy link
Member Author

dagewa commented May 7, 2021

@phyy-nx this change affects anyone working with integrating detectors, when the "simple" background algorithm is used.

@phyy-nx
Copy link
Member

phyy-nx commented May 7, 2021

Thanks, I've been following the thread loosely today. dials.stills_process overrides the background algorithm to simple, but enforces the linear2d method. Is that affected? I remember checking it pretty carefully and thinking it lined up with Leslie 1999 pretty well at the time.

@dagewa
Copy link
Member Author

dagewa commented May 7, 2021

Right, I haven't touched linear2d. I'll look through the other methods though to see that all are consistent.

@dagewa
Copy link
Member Author

dagewa commented May 10, 2021

Surprisingly the new calculation makes absolutely no difference to the integration results. This implies that the variance calculation made here is not actually used. Investigating further...

@jmp1985
Copy link
Contributor

jmp1985 commented May 11, 2021

Thinking about this some more, I think the further division by N makes sense. It implies that the variances() of the constant background model are not in fact sample variances of the values of the background pixels, but the squared standard error of the mean of those background pixels. This is equivalently the expected variance of the mean, so is correctly a measure of the uncertainty in the modelled background value. I've pushed a commit that makes that change. The tests all now pass. I'd like to look more closely into the effect on data processing before merging and maybe add an additional test that ensures the calculation does equal the SEM^2.

Hi David

So the variance in the case is as you suspected, the expected variance on the mean so it looks like what you have implemented is correct.

Thanks, I've been following the thread loosely today. dials.stills_process overrides the background algorithm to simple, but enforces the linear2d method. Is that affected? I remember checking it pretty carefully and thinking it lined up with Leslie 1999 pretty well at the time.

That's right, the original implementation was basically done as described in the Leslie 1999 paper.

Surprisingly the new calculation makes absolutely no difference to the integration results. This implies that the variance calculation made here is not actually used. Investigating further...

So the variance calculation should have an effect on the variance of the integrated intensities, is this not the case?

@jmp1985
Copy link
Contributor

jmp1985 commented May 11, 2021

Actually taking a closer look at the integration code, the variance on the integrated intensities is calculated without reference to this variance (so I guess I just implemented this here to give a value for the variance in the absence of integration results).

See for example https://github.com/dials/dials/blob/main/algorithms/integration/sum/summation.h line 84.

In that implementation, the variance of the integrated intensity is gives by |I_signal| + |I_background| * (1 + N_sigma / N_background) so this should never be negative at least. I think I used the absolute values precisely because of the problem with negative backgrounds in ED data. I was never 100% happy with this but it was the best solution I could think of at the time.

Use Welford variance calculation rather than the naive algorithm, which
suffers from loss of precision. Use Bessel's correction for the sample
variance.

This leads to a small positive increase in values of
refl["background.dispersion"]
@dagewa
Copy link
Member Author

dagewa commented May 13, 2021

Thanks @jmp1985 for the information. I'm developing a fuller understanding of what's going on in this code now. The calculation I changed previously was the estimation of variances of the background model parameters, not variances of the background model values, however as these models are for constant background these concepts coincided within a scale factor of the number of pixels.

I added some explanatory comments and then moved on to dealing with another variance calculation. This time, it is the variance of background values that is being calculated, but only as an intermediate step to setting the refl["background.dispersion"] value. I changed this to also use the Welford one pass algorithm that does not suffer serious loss of precision like the naive algorithm. I also included the standard bias correction for sample size. I believe this is necessary if one is ever going to compare background.dispersion between different spots that have a different number of background pixels. As such the background.dispersion values are now slightly larger than they were previously.

This change still has no effect on integrated intensity error estimation, because as you point out that is done separately in summation.h and fitting.h. Changing those might be more complicated because at that point we don't know what type of background model was used, so instead of estimating variance based on the model fit we just assume each background pixel is a Poisson source.

@dagewa
Copy link
Member Author

dagewa commented May 19, 2021

After a bit more testing, it appears that moderately negative background values are already tolerated by dials.integrate. There are no more failures in integration when zero pedestal is applied to data from the Ceta-D that has negative bias than when a pedestal of, say, -100 is applied. However, more reflections are rejected during scaling when there is no pedestal.

I would like to continue to make changes to the intensity error estimates for integrating detectors, but in a separate PR as that would no longer be down to simply changing the background model variance. I believe the changes here are worth merging as they constitute improvements to the calculation of background model parameter variance, and in the calculation of refl["background.dispersion"] values, even if that has no effect on integration error estimates.

@ndevenish ndevenish merged commit 05ac7be into main May 24, 2021
@ndevenish ndevenish deleted the integrating-detector-variance branch May 24, 2021 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants