Improve handling of profile/sum intensities for scaling. #1300

jbeilstenedmands · 2020-06-12T08:24:14Z

If using intensity_choice=combine in scaling (the default), also use reflections where only one of profile/sum intensity estimate is available. i.e. combine estimates if both available, else just use the one estimate that succeeded.
Current behaviour is that only reflections with both profile and summation estimates are used.
This bit of the scaling code could do with refactoring a bit, however leaving for now to give a clean changeset for if we want to add this to current releases.

If only one estimate present, use that if intensity_choice=combine

jbeilstenedmands · 2020-06-12T12:58:49Z

For reference, testing on the X4-wide dataset with the change from #1297

masked_refl_profile_fitting + master scaling, d_min=1.1

                             ----------Merging statistics----------                    
        
        Resolution: 42.36 - 1.10
        Observations: 75598
        Unique reflections: 14700
        Redundancy: 5.1
        Completeness: 96.66%
        Mean intensity: 320.1
        Mean I/sigma(I): 5.0
        R-merge: 0.102
        R-meas:  0.113
        R-pim:   0.047
        
        
        Statistics by resolution bin:
         d_max  d_min   #obs  #uniq   mult.  %comp       <I>  <I/sI>    r_mrg   r_meas    r_pim   cc1/2   cc_ano
         42.39   2.99   4696    864    5.44 100.00    2905.7    40.3    0.040    0.044    0.018   0.999*   0.650*
          2.99   2.37   4295    792    5.42  99.50     992.9    16.7    0.085    0.094    0.039   0.993*   0.344*
          2.37   2.07   4652    782    5.95 100.00     669.5    11.4    0.120    0.131    0.052   0.984*  -0.061
          2.07   1.88   4401    773    5.69  99.87     348.2     6.3    0.160    0.176    0.071   0.963*  -0.044
          1.88   1.75   4228    771    5.48 100.00     217.0     4.2    0.197    0.216    0.087   0.961*  -0.212
          1.75   1.64   4438    756    5.87 100.00     163.4     3.1    0.212    0.232    0.093   0.950*  -0.127
          1.64   1.56   4392    756    5.81 100.00     109.3     2.1    0.249    0.273    0.110   0.929*  -0.114
          1.56   1.49   3989    768    5.19  99.48      73.3     1.4    0.303    0.335    0.141   0.900*  -0.233
          1.49   1.44   4131    752    5.49 100.00      59.7     1.1    0.328    0.362    0.152   0.887*  -0.215
          1.44   1.39   4309    750    5.75  99.87      47.4     0.9    0.393    0.432    0.176   0.832*  -0.053
          1.39   1.34   4322    752    5.75 100.00      30.7     0.6    0.481    0.527    0.213   0.844*  -0.040
          1.34   1.30   3854    747    5.16  99.73      22.9     0.4    0.575    0.638    0.271   0.789*  -0.132
          1.30   1.27   3863    739    5.23  99.60      21.3     0.4    0.654    0.726    0.310   0.670*  -0.052
          1.27   1.24   4114    741    5.55 100.00      17.4     0.3    0.751    0.827    0.342   0.738*  -0.051
          1.24   1.21   4180    741    5.64  99.60      13.9     0.2    0.937    1.030    0.421   0.680*   0.013
          1.21   1.18   3814    749    5.09 100.00      12.4     0.2    1.050    1.172    0.507   0.684*  -0.047
          1.18   1.16   2845    713    3.99  95.07       9.7     0.1    1.352    1.554    0.743   0.534*  -0.052
          1.16   1.14   2074    640    3.24  87.19       8.2     0.1    1.374    1.617    0.825   0.546*   0.016
          1.14   1.12   1663    579    2.87  79.10       5.5     0.1    2.031    2.415    1.275   0.471*   0.133
          1.12   1.10   1338    535    2.50  70.58       4.3     0.0    2.190    2.671    1.492   0.290*  -0.008
         42.36   1.10  75598  14700    5.14  96.66     320.1     5.0    0.102    0.113    0.047   0.998*   0.363*

masked_refl_profile_fitting + this PR, d_min=1.1

                     ----------Merging statistics----------                    
        
        Resolution: 42.36 - 1.10
        Observations: 91798
        Unique reflections: 14838
        Redundancy: 6.2
        Completeness: 97.57%
        Mean intensity: 297.7
        Mean I/sigma(I): 4.3
        R-merge: 0.150
        R-meas:  0.163
        R-pim:   0.063
        
        
        Statistics by resolution bin:
         d_max  d_min   #obs  #uniq   mult.  %comp       <I>  <I/sI>    r_mrg   r_meas    r_pim   cc1/2   cc_ano
         42.39   2.99   4990    868    5.75 100.00    2848.2    37.2    0.049    0.054    0.022   0.998*   0.521*
          2.99   2.37   4975    800    6.22 100.00     908.3    14.1    0.128    0.140    0.055   0.983*   0.265*
          2.37   2.07   5134    782    6.57 100.00     631.5     9.6    0.163    0.177    0.067   0.971*  -0.141
          2.07   1.88   5166    776    6.66 100.00     305.1     5.1    0.250    0.270    0.102   0.940*  -0.087
          1.88   1.75   5124    772    6.64 100.00     174.8     3.2    0.346    0.374    0.140   0.897*  -0.316
          1.75   1.64   5172    756    6.84 100.00     134.7     2.4    0.342    0.370    0.138   0.900*  -0.082
          1.64   1.56   5166    756    6.83 100.00      87.1     1.6    0.402    0.435    0.163   0.800*  -0.145
          1.56   1.49   5253    773    6.80 100.00      48.7     1.0    0.532    0.576    0.218   0.636*  -0.205
          1.49   1.44   5130    752    6.82 100.00      45.0     0.8    0.510    0.550    0.205   0.770*  -0.085
          1.44   1.39   5133    751    6.83 100.00      38.1     0.7    0.548    0.593    0.224   0.656*  -0.053
          1.39   1.34   5125    752    6.82 100.00      25.2     0.5    0.627    0.678    0.253   0.655*  -0.129
          1.34   1.30   5147    750    6.86 100.00      16.4     0.3    0.773    0.836    0.314   0.711*  -0.154
          1.30   1.27   5048    744    6.78 100.00      15.7     0.3    0.839    0.910    0.346   0.539*  -0.121
          1.27   1.24   5039    741    6.80 100.00      14.1     0.2    0.964    1.044    0.396   0.429*  -0.111
          1.24   1.21   5069    744    6.81 100.00      11.4     0.2    1.079    1.167    0.439   0.660*   0.008
          1.21   1.18   4740    749    6.33 100.00      10.0     0.2    1.244    1.356    0.530   0.677*  -0.061
          1.18   1.16   3623    726    4.99  96.80       8.2     0.1    1.518    1.695    0.735   0.592*  -0.010
          1.16   1.14   2770    660    4.20  89.92       7.1     0.1    1.690    1.924    0.890   0.484*  -0.002
          1.14   1.12   2180    606    3.60  82.79       4.6     0.0    2.587    3.004    1.474   0.354*  -0.032
          1.12   1.10   1814    580    3.13  76.52       3.3     0.0    2.818    3.339    1.739   0.259*   0.062
         42.36   1.10  91798  14838    6.19  97.57     297.7     4.3    0.150    0.163    0.063   0.996*   0.250*

So an increase from 75598 to 91798 overall reflections, however merging stats look worse, although I don't like to read too much into the quality of the x4 wide dataset. More suitable datasets to follow.

jbeilstenedmands · 2020-06-12T15:22:50Z

Okay, so ran this on the I23 FutA dataset from #1291 with #1297 applied also.

For AUTOMATIC/DEFAULT/SAD                    Overall    Low     High
High resolution limit                           1.83    4.95    1.83
Low resolution limit                           75.98   76.04    1.86
Completeness                                   61.7    95.7    43.5
Multiplicity                                    5.6     5.7     3.4
I/sigma                                        12.0    50.7     0.6
Rmerge(I)                                     0.120   0.054   0.309
Rmerge(I+/-)                                  0.116   0.051   0.277
Rmeas(I)                                      0.133   0.060   0.366
Rmeas(I+/-)                                   0.141   0.062   0.392
Rpim(I)                                       0.055   0.025   0.193
Rpim(I+/-)                                    0.080   0.035   0.277
CC half                                       0.997   0.999   0.866
Wilson B factor                              20.910
Anomalous completeness                         61.3    95.8    42.6
Anomalous multiplicity                          2.9     3.0     1.7
Anomalous correlation                        -0.296  -0.209  -0.438
Anomalous slope                               0.308
dF/F                                          0.097
dI/s(dI)                                      0.546
Total observations                           167168   13601    3564
Total unique                                  29741    2383    1051
Assuming spacegroup: P 1 2 1
Unit cell (with estimated std devs):
57.2912(17)  75.9776(18) 64.377(4)
90.0        101.306(5)   90.0

Running dials.merge on the scaled files with a comparable d_min=1.93 to compare to #1291:

                     ----------Merging statistics----------                    
        
        Resolution: 75.98 - 1.93
        Observations: 148992
        Unique reflections: 26272
        Redundancy: 5.7
        Completeness: 64.37%
        Mean intensity: 6443.2
        Mean I/sigma(I): 13.2
        R-merge: 0.117
        R-meas:  0.130
        R-pim:   0.054
        
        
        Statistics by resolution bin:
         d_max  d_min   #obs  #uniq   mult.  %comp       <I>  <I/sI>    r_mrg   r_meas    r_pim   cc1/2   cc_ano
         76.04   5.24  11205   2002    5.60  95.29   16766.7    51.4    0.051    0.057    0.024   0.999*  -0.108
          5.24   4.16  10239   1894    5.41  91.94   19282.9    38.9    0.074    0.082    0.034   0.990*  -0.193
          4.16   3.63   9664   1856    5.21  90.10   20140.8    28.1    0.101    0.113    0.049   0.999*  -0.310
          3.63   3.30   9137   1699    5.38  82.88   11166.9    18.7    0.124    0.138    0.057   0.987*  -0.365
          3.30   3.06   7940   1465    5.42  71.74    7072.7    12.6    0.130    0.143    0.059   0.988*  -0.466
          3.06   2.88   8179   1397    5.85  68.15    4624.6     9.3    0.153    0.167    0.066   0.985*  -0.507
          2.88   2.74   7621   1391    5.48  68.42    3500.7     6.9    0.158    0.173    0.070   0.987*  -0.436
          2.74   2.62   8001   1320    6.06  65.12    2996.8     6.6    0.192    0.209    0.082   0.974*  -0.259
          2.62   2.52   7767   1302    5.97  63.48    2076.9     4.5    0.175    0.191    0.075   0.981*  -0.596
          2.52   2.43   7056   1227    5.75  60.47    1829.0     3.8    0.193    0.212    0.086   0.972*  -0.525
          2.43   2.36   7443   1197    6.22  59.58    1697.7     3.6    0.198    0.215    0.084   0.970*  -0.550
          2.36   2.29   6957   1133    6.14  55.76    1460.3     3.1    0.220    0.240    0.093   0.972*  -0.442
          2.29   2.23   6072   1154    5.26  56.40    2045.4     3.7    0.374    0.418    0.184   0.806*  -0.141
          2.23   2.17   7073   1106    6.40  54.06    1204.4     2.5    0.196    0.213    0.082   0.978*  -0.393
          2.17   2.12   6528   1066    6.12  53.35    1006.9     2.1    0.296    0.322    0.125   0.951*  -0.365
          2.12   2.08   5702   1070    5.33  52.79    1031.0     2.0    0.412    0.456    0.191   0.807*  -0.078
          2.08   2.04   5330   1031    5.17  50.86    1156.6     2.1    0.480    0.539    0.240   0.686*  -0.219
          2.04   2.00   6793   1025    6.63  50.12     578.8     1.3    0.288    0.312    0.118   0.961*  -0.356
          2.00   1.96   5643    954    5.92  47.27     402.0     0.9    0.394    0.433    0.173   0.712*  -0.330
          1.96   1.93   4642    983    4.72  47.88     520.2     1.0    0.501    0.562    0.249   0.661*  -0.239
         75.98   1.93 148992  26272    5.67  64.37    6443.2    13.2    0.117    0.130    0.054   0.997*  -0.296

i.e. for d_min=1.93 we now have 148992 reflections compared to 64696 for xia2-3dii and 26876 for xia2-dials master. However lack of anomalous signal still seems an issue.

jbeilstenedmands · 2020-06-12T15:46:05Z

Comparing beta-lactamase dataset.

316116 (PR) vs 313539 (master) total reflections. Stats look slightly worse for this PR, although I guess by including data that perhaps is not quite as good/reliable (in this case where profile fitting failed but summation was successful), there is less overall agreement/precision within the data.

Anthchirp · 2020-06-15T08:28:43Z

util/filter_reflections.py

+    elif all(i in intensity_choice for i in ["sum | profile"]):
+        reducer = SumORPrfIntensityReducer


Would this be clearer?

Suggested change

elif all(i in intensity_choice for i in ["sum | profile"]):

reducer = SumORPrfIntensityReducer

elif {"sum | profile"} <= set(intensity_choice):

reducer = SumORPrfIntensityReducer

also does intensity_choice need to be a list or can it be a genuine set given that as far as I can tell we don't care about the order of elements?

True, this can be a set rather than a list

jbeilstenedmands added 4 commits June 12, 2020 09:19

Update handling of prf/sum reflections for scaling.

b77ccd1

If only one estimate present, use that if intensity_choice=combine

Update tests for handling of prf/sum in scaling

929e9db

Fix tests and code for prf/sum handling for scaling

121f6f5

Fix logging issue

e06f64f

Add back removed function, fix function name call.

dade496

jbeilstenedmands marked this pull request as ready for review June 15, 2020 08:19

Anthchirp reviewed Jun 15, 2020

View reviewed changes

jbeilstenedmands added 2 commits June 18, 2020 10:21

Add tests to verify new behaviour

67ea4df

Add newsfragment. Fixes #1293

64361f6

jbeilstenedmands merged commit 9d2b5d4 into master Jun 18, 2020

rjgildea mentioned this pull request Jul 20, 2020

dials.integrate: should not copy input to output #1342

Closed

Anthchirp deleted the scaling_intensity_handling branch October 2, 2020 15:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve handling of profile/sum intensities for scaling. #1300

Improve handling of profile/sum intensities for scaling. #1300

jbeilstenedmands commented Jun 12, 2020 •

edited

jbeilstenedmands commented Jun 12, 2020

jbeilstenedmands commented Jun 12, 2020 •

edited

jbeilstenedmands commented Jun 12, 2020

Anthchirp Jun 15, 2020

jbeilstenedmands Jun 15, 2020

		elif all(i in intensity_choice for i in ["sum \| profile"]):
		reducer = SumORPrfIntensityReducer

Improve handling of profile/sum intensities for scaling. #1300

Improve handling of profile/sum intensities for scaling. #1300

Conversation

jbeilstenedmands commented Jun 12, 2020 • edited

jbeilstenedmands commented Jun 12, 2020

jbeilstenedmands commented Jun 12, 2020 • edited

jbeilstenedmands commented Jun 12, 2020

Anthchirp Jun 15, 2020

Choose a reason for hiding this comment

jbeilstenedmands Jun 15, 2020

Choose a reason for hiding this comment

jbeilstenedmands commented Jun 12, 2020 •

edited

jbeilstenedmands commented Jun 12, 2020 •

edited