Integrating detector background model variance #1692

dagewa · 2021-05-07T16:19:48Z

The variance calculation for the "simple" background model, with a Constant3dModel or a Constant2dModel appeared incorrect. This was being calculated by var[k] = mean[k] / (count - 1);, which means that if the background mean value is negative (which can happen with integrating detectors), then the variance could be set negative, which is clearly wrong.

This PR sets the variance to the actual sample variance of the background pixels using a standard one-pass variance algorithm. Two tests fail:

Results (5.93s):
       6 passed
       2 failed
         - test/algorithms/background/test_modeller.py:159 TestPoisson.test_constant2d_modeller
         - test/algorithms/background/test_modeller.py:181 TestPoisson.test_constant3d_modeller

I might need help from @jmp1985 in figuring out what these tests are supposed to show and why they fail.

This reverts commit 0f5e559. Logging at this point is messy. Need a different approach (#1691).

dagewa · 2021-05-07T16:34:04Z

Perhaps what is meant by the variance of the background model here is actually the square of the standard error of the mean? In that case the value requires division by an additional factor of N. I would appreciate clarification regarding what variances() of a background model is supposed to represent.

dagewa · 2021-05-07T16:39:43Z

NB an additional division by count does indeed make those tests pass 🤔

dagewa · 2021-05-07T21:19:14Z

Thinking about this some more, I think the further division by N makes sense. It implies that the variances() of the constant background model are not in fact sample variances of the values of the background pixels, but the squared standard error of the mean of those background pixels. This is equivalently the expected variance of the mean, so is correctly a measure of the uncertainty in the modelled background value. I've pushed a commit that makes that change. The tests all now pass. I'd like to look more closely into the effect on data processing before merging and maybe add an additional test that ensures the calculation does equal the SEM^2.

dagewa · 2021-05-07T21:21:57Z

@phyy-nx this change affects anyone working with integrating detectors, when the "simple" background algorithm is used.

phyy-nx · 2021-05-07T21:48:58Z

Thanks, I've been following the thread loosely today. dials.stills_process overrides the background algorithm to simple, but enforces the linear2d method. Is that affected? I remember checking it pretty carefully and thinking it lined up with Leslie 1999 pretty well at the time.

dagewa · 2021-05-07T22:37:34Z

Right, I haven't touched linear2d. I'll look through the other methods though to see that all are consistent.

dagewa · 2021-05-10T09:28:50Z

Surprisingly the new calculation makes absolutely no difference to the integration results. This implies that the variance calculation made here is not actually used. Investigating further...

jmp1985 · 2021-05-11T09:49:35Z

Thinking about this some more, I think the further division by N makes sense. It implies that the variances() of the constant background model are not in fact sample variances of the values of the background pixels, but the squared standard error of the mean of those background pixels. This is equivalently the expected variance of the mean, so is correctly a measure of the uncertainty in the modelled background value. I've pushed a commit that makes that change. The tests all now pass. I'd like to look more closely into the effect on data processing before merging and maybe add an additional test that ensures the calculation does equal the SEM^2.

Hi David

So the variance in the case is as you suspected, the expected variance on the mean so it looks like what you have implemented is correct.

Thanks, I've been following the thread loosely today. dials.stills_process overrides the background algorithm to simple, but enforces the linear2d method. Is that affected? I remember checking it pretty carefully and thinking it lined up with Leslie 1999 pretty well at the time.

That's right, the original implementation was basically done as described in the Leslie 1999 paper.

Surprisingly the new calculation makes absolutely no difference to the integration results. This implies that the variance calculation made here is not actually used. Investigating further...

So the variance calculation should have an effect on the variance of the integrated intensities, is this not the case?

jmp1985 · 2021-05-11T09:59:53Z

Actually taking a closer look at the integration code, the variance on the integrated intensities is calculated without reference to this variance (so I guess I just implemented this here to give a value for the variance in the absence of integration results).

See for example https://github.com/dials/dials/blob/main/algorithms/integration/sum/summation.h line 84.

In that implementation, the variance of the integrated intensity is gives by |I_signal| + |I_background| * (1 + N_sigma / N_background) so this should never be negative at least. I think I used the absolute values precisely because of the problem with negative backgrounds in ED data. I was never 100% happy with this but it was the best solution I could think of at the time.

Use Welford variance calculation rather than the naive algorithm, which suffers from loss of precision. Use Bessel's correction for the sample variance. This leads to a small positive increase in values of refl["background.dispersion"]

dagewa · 2021-05-13T10:02:36Z

Thanks @jmp1985 for the information. I'm developing a fuller understanding of what's going on in this code now. The calculation I changed previously was the estimation of variances of the background model parameters, not variances of the background model values, however as these models are for constant background these concepts coincided within a scale factor of the number of pixels.

I added some explanatory comments and then moved on to dealing with another variance calculation. This time, it is the variance of background values that is being calculated, but only as an intermediate step to setting the refl["background.dispersion"] value. I changed this to also use the Welford one pass algorithm that does not suffer serious loss of precision like the naive algorithm. I also included the standard bias correction for sample size. I believe this is necessary if one is ever going to compare background.dispersion between different spots that have a different number of background pixels. As such the background.dispersion values are now slightly larger than they were previously.

This change still has no effect on integrated intensity error estimation, because as you point out that is done separately in summation.h and fitting.h. Changing those might be more complicated because at that point we don't know what type of background model was used, so instead of estimating variance based on the model fit we just assume each background pixel is a Poisson source.

dagewa · 2021-05-19T11:27:55Z

After a bit more testing, it appears that moderately negative background values are already tolerated by dials.integrate. There are no more failures in integration when zero pedestal is applied to data from the Ceta-D that has negative bias than when a pedestal of, say, -100 is applied. However, more reflections are rejected during scaling when there is no pedestal.

I would like to continue to make changes to the intensity error estimates for integrating detectors, but in a separate PR as that would no longer be down to simply changing the background model variance. I believe the changes here are worth merging as they constitute improvements to the calculation of background model parameter variance, and in the calculation of refl["background.dispersion"] values, even if that has no effect on integration error estimates.

dagewa added 5 commits May 7, 2021 10:59

Log a message to say which background algorithm was chosen

0f5e559

Correct variance calculation for "simple" constant 2D background model

b84a5e5

Correct variance calculation for "simple" constant 3D background model

8b0841b

Revert "Log a message to say which background algorithm was chosen"

1f19bbd

This reverts commit 0f5e559. Logging at this point is messy. Need a different approach (#1691).

News

b521b28

dagewa requested a review from jmp1985 May 7, 2021 16:19

Rename newsfragments/xxx.bugfix to newsfragments/1692.bugfix

2b2625a

Store SEM^2 rather than sample variance of the background pixels.

4225048

dagewa requested a review from phyy-nx May 7, 2021 21:20

dagewa added 3 commits May 13, 2021 09:44

Fix typo

afb2cdf

Add explanatory comments to the Model class

eaec2fe

Correct index of dispersion calculation.

1a8be09

Use Welford variance calculation rather than the naive algorithm, which suffers from loss of precision. Use Bessel's correction for the sample variance. This leads to a small positive increase in values of refl["background.dispersion"]

Update 1692.bugfix

c1ee006

ndevenish merged commit 05ac7be into main May 24, 2021

ndevenish deleted the integrating-detector-variance branch May 24, 2021 15:05

huwjenkins mentioned this pull request Jun 2, 2021

Profile fitting when background is negative #1732

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrating detector background model variance #1692

Integrating detector background model variance #1692

dagewa commented May 7, 2021

dagewa commented May 7, 2021

dagewa commented May 7, 2021

dagewa commented May 7, 2021

dagewa commented May 7, 2021

phyy-nx commented May 7, 2021

dagewa commented May 7, 2021

dagewa commented May 10, 2021

jmp1985 commented May 11, 2021

jmp1985 commented May 11, 2021

dagewa commented May 13, 2021

dagewa commented May 19, 2021

Integrating detector background model variance #1692

Integrating detector background model variance #1692

Conversation

dagewa commented May 7, 2021

dagewa commented May 7, 2021

dagewa commented May 7, 2021

dagewa commented May 7, 2021

dagewa commented May 7, 2021

phyy-nx commented May 7, 2021

dagewa commented May 7, 2021

dagewa commented May 10, 2021

jmp1985 commented May 11, 2021

jmp1985 commented May 11, 2021

dagewa commented May 13, 2021

dagewa commented May 19, 2021