Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of profile/sum intensities for scaling. #1300

Merged
merged 7 commits into from
Jun 18, 2020

Conversation

jbeilstenedmands
Copy link
Contributor

@jbeilstenedmands jbeilstenedmands commented Jun 12, 2020

If using intensity_choice=combine in scaling (the default), also use reflections where only one of profile/sum intensity estimate is available. i.e. combine estimates if both available, else just use the one estimate that succeeded.
Current behaviour is that only reflections with both profile and summation estimates are used.
This bit of the scaling code could do with refactoring a bit, however leaving for now to give a clean changeset for if we want to add this to current releases.

@jbeilstenedmands
Copy link
Contributor Author

For reference, testing on the X4-wide dataset with the change from #1297

masked_refl_profile_fitting + master scaling, d_min=1.1

                             ----------Merging statistics----------                    
        
        Resolution: 42.36 - 1.10
        Observations: 75598
        Unique reflections: 14700
        Redundancy: 5.1
        Completeness: 96.66%
        Mean intensity: 320.1
        Mean I/sigma(I): 5.0
        R-merge: 0.102
        R-meas:  0.113
        R-pim:   0.047
        
        
        Statistics by resolution bin:
         d_max  d_min   #obs  #uniq   mult.  %comp       <I>  <I/sI>    r_mrg   r_meas    r_pim   cc1/2   cc_ano
         42.39   2.99   4696    864    5.44 100.00    2905.7    40.3    0.040    0.044    0.018   0.999*   0.650*
          2.99   2.37   4295    792    5.42  99.50     992.9    16.7    0.085    0.094    0.039   0.993*   0.344*
          2.37   2.07   4652    782    5.95 100.00     669.5    11.4    0.120    0.131    0.052   0.984*  -0.061
          2.07   1.88   4401    773    5.69  99.87     348.2     6.3    0.160    0.176    0.071   0.963*  -0.044
          1.88   1.75   4228    771    5.48 100.00     217.0     4.2    0.197    0.216    0.087   0.961*  -0.212
          1.75   1.64   4438    756    5.87 100.00     163.4     3.1    0.212    0.232    0.093   0.950*  -0.127
          1.64   1.56   4392    756    5.81 100.00     109.3     2.1    0.249    0.273    0.110   0.929*  -0.114
          1.56   1.49   3989    768    5.19  99.48      73.3     1.4    0.303    0.335    0.141   0.900*  -0.233
          1.49   1.44   4131    752    5.49 100.00      59.7     1.1    0.328    0.362    0.152   0.887*  -0.215
          1.44   1.39   4309    750    5.75  99.87      47.4     0.9    0.393    0.432    0.176   0.832*  -0.053
          1.39   1.34   4322    752    5.75 100.00      30.7     0.6    0.481    0.527    0.213   0.844*  -0.040
          1.34   1.30   3854    747    5.16  99.73      22.9     0.4    0.575    0.638    0.271   0.789*  -0.132
          1.30   1.27   3863    739    5.23  99.60      21.3     0.4    0.654    0.726    0.310   0.670*  -0.052
          1.27   1.24   4114    741    5.55 100.00      17.4     0.3    0.751    0.827    0.342   0.738*  -0.051
          1.24   1.21   4180    741    5.64  99.60      13.9     0.2    0.937    1.030    0.421   0.680*   0.013
          1.21   1.18   3814    749    5.09 100.00      12.4     0.2    1.050    1.172    0.507   0.684*  -0.047
          1.18   1.16   2845    713    3.99  95.07       9.7     0.1    1.352    1.554    0.743   0.534*  -0.052
          1.16   1.14   2074    640    3.24  87.19       8.2     0.1    1.374    1.617    0.825   0.546*   0.016
          1.14   1.12   1663    579    2.87  79.10       5.5     0.1    2.031    2.415    1.275   0.471*   0.133
          1.12   1.10   1338    535    2.50  70.58       4.3     0.0    2.190    2.671    1.492   0.290*  -0.008
         42.36   1.10  75598  14700    5.14  96.66     320.1     5.0    0.102    0.113    0.047   0.998*   0.363*

masked_refl_profile_fitting + this PR, d_min=1.1

                     ----------Merging statistics----------                    
        
        Resolution: 42.36 - 1.10
        Observations: 91798
        Unique reflections: 14838
        Redundancy: 6.2
        Completeness: 97.57%
        Mean intensity: 297.7
        Mean I/sigma(I): 4.3
        R-merge: 0.150
        R-meas:  0.163
        R-pim:   0.063
        
        
        Statistics by resolution bin:
         d_max  d_min   #obs  #uniq   mult.  %comp       <I>  <I/sI>    r_mrg   r_meas    r_pim   cc1/2   cc_ano
         42.39   2.99   4990    868    5.75 100.00    2848.2    37.2    0.049    0.054    0.022   0.998*   0.521*
          2.99   2.37   4975    800    6.22 100.00     908.3    14.1    0.128    0.140    0.055   0.983*   0.265*
          2.37   2.07   5134    782    6.57 100.00     631.5     9.6    0.163    0.177    0.067   0.971*  -0.141
          2.07   1.88   5166    776    6.66 100.00     305.1     5.1    0.250    0.270    0.102   0.940*  -0.087
          1.88   1.75   5124    772    6.64 100.00     174.8     3.2    0.346    0.374    0.140   0.897*  -0.316
          1.75   1.64   5172    756    6.84 100.00     134.7     2.4    0.342    0.370    0.138   0.900*  -0.082
          1.64   1.56   5166    756    6.83 100.00      87.1     1.6    0.402    0.435    0.163   0.800*  -0.145
          1.56   1.49   5253    773    6.80 100.00      48.7     1.0    0.532    0.576    0.218   0.636*  -0.205
          1.49   1.44   5130    752    6.82 100.00      45.0     0.8    0.510    0.550    0.205   0.770*  -0.085
          1.44   1.39   5133    751    6.83 100.00      38.1     0.7    0.548    0.593    0.224   0.656*  -0.053
          1.39   1.34   5125    752    6.82 100.00      25.2     0.5    0.627    0.678    0.253   0.655*  -0.129
          1.34   1.30   5147    750    6.86 100.00      16.4     0.3    0.773    0.836    0.314   0.711*  -0.154
          1.30   1.27   5048    744    6.78 100.00      15.7     0.3    0.839    0.910    0.346   0.539*  -0.121
          1.27   1.24   5039    741    6.80 100.00      14.1     0.2    0.964    1.044    0.396   0.429*  -0.111
          1.24   1.21   5069    744    6.81 100.00      11.4     0.2    1.079    1.167    0.439   0.660*   0.008
          1.21   1.18   4740    749    6.33 100.00      10.0     0.2    1.244    1.356    0.530   0.677*  -0.061
          1.18   1.16   3623    726    4.99  96.80       8.2     0.1    1.518    1.695    0.735   0.592*  -0.010
          1.16   1.14   2770    660    4.20  89.92       7.1     0.1    1.690    1.924    0.890   0.484*  -0.002
          1.14   1.12   2180    606    3.60  82.79       4.6     0.0    2.587    3.004    1.474   0.354*  -0.032
          1.12   1.10   1814    580    3.13  76.52       3.3     0.0    2.818    3.339    1.739   0.259*   0.062
         42.36   1.10  91798  14838    6.19  97.57     297.7     4.3    0.150    0.163    0.063   0.996*   0.250*

So an increase from 75598 to 91798 overall reflections, however merging stats look worse, although I don't like to read too much into the quality of the x4 wide dataset. More suitable datasets to follow.

@jbeilstenedmands
Copy link
Contributor Author

jbeilstenedmands commented Jun 12, 2020

Okay, so ran this on the I23 FutA dataset from #1291 with #1297 applied also.

For AUTOMATIC/DEFAULT/SAD                    Overall    Low     High
High resolution limit                           1.83    4.95    1.83
Low resolution limit                           75.98   76.04    1.86
Completeness                                   61.7    95.7    43.5
Multiplicity                                    5.6     5.7     3.4
I/sigma                                        12.0    50.7     0.6
Rmerge(I)                                     0.120   0.054   0.309
Rmerge(I+/-)                                  0.116   0.051   0.277
Rmeas(I)                                      0.133   0.060   0.366
Rmeas(I+/-)                                   0.141   0.062   0.392
Rpim(I)                                       0.055   0.025   0.193
Rpim(I+/-)                                    0.080   0.035   0.277
CC half                                       0.997   0.999   0.866
Wilson B factor                              20.910
Anomalous completeness                         61.3    95.8    42.6
Anomalous multiplicity                          2.9     3.0     1.7
Anomalous correlation                        -0.296  -0.209  -0.438
Anomalous slope                               0.308
dF/F                                          0.097
dI/s(dI)                                      0.546
Total observations                           167168   13601    3564
Total unique                                  29741    2383    1051
Assuming spacegroup: P 1 2 1
Unit cell (with estimated std devs):
57.2912(17)  75.9776(18) 64.377(4)
90.0        101.306(5)   90.0 

Running dials.merge on the scaled files with a comparable d_min=1.93 to compare to #1291:

                     ----------Merging statistics----------                    
        
        Resolution: 75.98 - 1.93
        Observations: 148992
        Unique reflections: 26272
        Redundancy: 5.7
        Completeness: 64.37%
        Mean intensity: 6443.2
        Mean I/sigma(I): 13.2
        R-merge: 0.117
        R-meas:  0.130
        R-pim:   0.054
        
        
        Statistics by resolution bin:
         d_max  d_min   #obs  #uniq   mult.  %comp       <I>  <I/sI>    r_mrg   r_meas    r_pim   cc1/2   cc_ano
         76.04   5.24  11205   2002    5.60  95.29   16766.7    51.4    0.051    0.057    0.024   0.999*  -0.108
          5.24   4.16  10239   1894    5.41  91.94   19282.9    38.9    0.074    0.082    0.034   0.990*  -0.193
          4.16   3.63   9664   1856    5.21  90.10   20140.8    28.1    0.101    0.113    0.049   0.999*  -0.310
          3.63   3.30   9137   1699    5.38  82.88   11166.9    18.7    0.124    0.138    0.057   0.987*  -0.365
          3.30   3.06   7940   1465    5.42  71.74    7072.7    12.6    0.130    0.143    0.059   0.988*  -0.466
          3.06   2.88   8179   1397    5.85  68.15    4624.6     9.3    0.153    0.167    0.066   0.985*  -0.507
          2.88   2.74   7621   1391    5.48  68.42    3500.7     6.9    0.158    0.173    0.070   0.987*  -0.436
          2.74   2.62   8001   1320    6.06  65.12    2996.8     6.6    0.192    0.209    0.082   0.974*  -0.259
          2.62   2.52   7767   1302    5.97  63.48    2076.9     4.5    0.175    0.191    0.075   0.981*  -0.596
          2.52   2.43   7056   1227    5.75  60.47    1829.0     3.8    0.193    0.212    0.086   0.972*  -0.525
          2.43   2.36   7443   1197    6.22  59.58    1697.7     3.6    0.198    0.215    0.084   0.970*  -0.550
          2.36   2.29   6957   1133    6.14  55.76    1460.3     3.1    0.220    0.240    0.093   0.972*  -0.442
          2.29   2.23   6072   1154    5.26  56.40    2045.4     3.7    0.374    0.418    0.184   0.806*  -0.141
          2.23   2.17   7073   1106    6.40  54.06    1204.4     2.5    0.196    0.213    0.082   0.978*  -0.393
          2.17   2.12   6528   1066    6.12  53.35    1006.9     2.1    0.296    0.322    0.125   0.951*  -0.365
          2.12   2.08   5702   1070    5.33  52.79    1031.0     2.0    0.412    0.456    0.191   0.807*  -0.078
          2.08   2.04   5330   1031    5.17  50.86    1156.6     2.1    0.480    0.539    0.240   0.686*  -0.219
          2.04   2.00   6793   1025    6.63  50.12     578.8     1.3    0.288    0.312    0.118   0.961*  -0.356
          2.00   1.96   5643    954    5.92  47.27     402.0     0.9    0.394    0.433    0.173   0.712*  -0.330
          1.96   1.93   4642    983    4.72  47.88     520.2     1.0    0.501    0.562    0.249   0.661*  -0.239
         75.98   1.93 148992  26272    5.67  64.37    6443.2    13.2    0.117    0.130    0.054   0.997*  -0.296

i.e. for d_min=1.93 we now have 148992 reflections compared to 64696 for xia2-3dii and 26876 for xia2-dials master. However lack of anomalous signal still seems an issue.

@jbeilstenedmands
Copy link
Contributor Author

Comparing beta-lactamase dataset.
cc_one_half
mean_redundancy
r_pim

316116 (PR) vs 313539 (master) total reflections. Stats look slightly worse for this PR, although I guess by including data that perhaps is not quite as good/reliable (in this case where profile fitting failed but summation was successful), there is less overall agreement/precision within the data.

@jbeilstenedmands jbeilstenedmands marked this pull request as ready for review June 15, 2020 08:19
Comment on lines +115 to +116
elif all(i in intensity_choice for i in ["sum | profile"]):
reducer = SumORPrfIntensityReducer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be clearer?

Suggested change
elif all(i in intensity_choice for i in ["sum | profile"]):
reducer = SumORPrfIntensityReducer
elif {"sum | profile"} <= set(intensity_choice):
reducer = SumORPrfIntensityReducer

also does intensity_choice need to be a list or can it be a genuine set given that as far as I can tell we don't care about the order of elements?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, this can be a set rather than a list

@jbeilstenedmands jbeilstenedmands merged commit 9d2b5d4 into master Jun 18, 2020
@Anthchirp Anthchirp deleted the scaling_intensity_handling branch October 2, 2020 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants