Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output scaling overall merging statistics in the xia2 style. #1312

Merged
merged 8 commits into from
Jul 16, 2020

Conversation

jbeilstenedmands
Copy link
Contributor

i.e. The end of scaling output looks something like this. Bonus is that a few additional anomalous quality indicators are now output. Only thing not output that is in xia2 output is the Wilson B factor, as this should be calculated after merging and truncating.

            ----------Merging statistics by resolution bin----------           

 d_max  d_min   #obs  #uniq   mult.  %comp       <I>  <I/sI>    r_mrg   r_meas    r_pim   cc1/2   cc_ano
 69.31   3.29  22774   3392    6.71  98.63     532.9    53.1    0.038    0.041    0.016   0.999*   0.353*
  3.29   2.61  22323   3319    6.73  97.53     193.6    36.0    0.050    0.054    0.021   0.998*   0.393*
  2.61   2.28  22557   3259    6.92  96.79      99.3    26.6    0.065    0.071    0.027   0.997*   0.324*
  2.28   2.07  22203   3244    6.84  95.98      67.6    20.3    0.079    0.085    0.032   0.996*   0.281*
  2.07   1.93  21523   3169    6.79  95.25      44.0    15.1    0.099    0.108    0.041   0.992*   0.190*
  1.93   1.81  21673   3188    6.80  94.40      25.1    10.4    0.136    0.147    0.056   0.991*   0.186*
  1.81   1.72  21276   3154    6.75  93.98      15.6     7.3    0.180    0.195    0.074   0.986*   0.162*
  1.72   1.65  20912   3106    6.73  93.13      10.4     5.3    0.231    0.251    0.096   0.978*   0.132*
  1.65   1.58  21666   3123    6.94  92.64       8.4     4.4    0.275    0.297    0.112   0.973*   0.053*
  1.58   1.53  20697   3083    6.71  92.03       6.3     3.5    0.326    0.354    0.135   0.955*   0.106*
  1.53   1.48  20174   3067    6.58  91.61       5.0     2.8    0.382    0.415    0.161   0.939*   0.031
  1.48   1.44  20819   3031    6.87  90.80       3.7     2.2    0.473    0.512    0.194   0.915*   0.034
  1.44   1.40  18344   2987    6.14  90.46       3.2     1.8    0.523    0.571    0.227   0.878*   0.001
  1.40   1.37  13881   2480    5.60  72.83       2.7     1.5    0.584    0.645    0.267   0.812*  -0.001
  1.37   1.33   9578   1749    5.48  52.71       2.4     1.3    0.638    0.704    0.292   0.748*  -0.019
  1.33   1.31   6726   1287    5.23  38.73       2.3     1.2    0.654    0.726    0.306   0.735*   0.007
  1.31   1.28   4850   1001    4.85  29.69       2.0     1.0    0.746    0.833    0.362   0.602*  -0.020
  1.28   1.26   2923    700    4.18  21.03       1.8     0.8    0.854    0.974    0.453   0.403*   0.032
  1.26   1.23   1214    407    2.98  12.37       1.5     0.6    0.897    1.070    0.569   0.282*  -0.156
  1.23   1.21    305    191    1.60   5.64       1.6     0.5    0.741    0.981    0.636   0.490*  -0.408
 69.19   1.21 316418  48937    6.47  72.90      69.3    12.8    0.065    0.070    0.027   0.999*   0.329*


               ----------Summary of merging statistics----------               

                                             Overall    Low     High
High resolution limit                           1.21    3.29    1.21
Low resolution limit                           69.19   69.31    1.23
Completeness                                   72.9    98.6     5.6
Multiplicity                                    6.5     6.7     1.6
I/sigma                                        12.8    53.1     0.5
Rmerge(I)                                     0.065   0.038   0.741
Rmerge(I+/-)                                  0.055   0.031   0.695
Rmeas(I)                                      0.070   0.041   0.981
Rmeas(I+/-)                                   0.065   0.037   0.983
Rpim(I)                                       0.027   0.016   0.636
Rpim(I+/-)                                    0.035   0.019   0.695
CC half                                       0.999   0.999   0.490
Anomalous completeness                         71.6    99.1     1.4
Anomalous multiplicity                          3.3     3.5     1.3
Anomalous correlation                         0.329   0.353  -0.408
Anomalous slope                               0.807
dF/F                                          0.070
dI/s(dI)                                      0.959
Total observations                           316418   22774     305
Total unique                                  48937    3392     191

Writing html report to dials.scale.html
Saving the scaled experiments to scaled.expt
Saving the scaled reflections to scaled.refl
See dials.github.io/dials_scale_user_guide.html for more info on scaling options

@jbeilstenedmands jbeilstenedmands changed the title Output scaling overall merging statistics in the xia2-style. Output scaling overall merging statistics in the xia2 style. Jun 25, 2020
@graeme-winter
Copy link
Contributor

Well, I obviously approve of the suggestion! 🙂

@jbeilstenedmands
Copy link
Contributor Author

One potential issue is that before applying a resolution cutoff, the high resolution bin will often just be noise, which may be confusing or unsightly. Something we could do to get around this is to use the resolutionizer code to suggest a resolution limit, and then report the high resolution bin using that limit? Thoughts @graeme-winter ?

@graeme-winter
Copy link
Contributor

Valid concern, would suggest using the resolutionizer code to determine an outer bin then show

  • overall
  • overall to chosen limit
  • inner
  • outer to chosen limit

unless %USER% has set limit, in which case use that. By a happy coincidence I think this is exactly what xia2.small_molecule does 🤔

@jbeilstenedmands
Copy link
Contributor Author

Example output when resolution limit within measured range:

            ----------Merging statistics by resolution bin----------           

 d_max  d_min   #obs  #uniq   mult.  %comp       <I>  <I/sI>    r_mrg   r_meas    r_pim   cc1/2   cc_ano
 72.29   4.85  35119   2123   16.54  90.96      99.1    35.1    0.125    0.129    0.027   0.998*   0.013
  4.85   3.85  35008   2204   15.88  94.03      83.1    30.5    0.147    0.151    0.033   0.996*   0.027
  3.85   3.36  34497   2206   15.64  95.50      41.3    23.8    0.192    0.197    0.044   0.995*  -0.010
  3.36   3.05  34699   2266   15.31  96.26      20.3    16.0    0.314    0.323    0.071   0.989*   0.029
  3.05   2.83  33927   2202   15.41  95.45      13.9    12.3    0.405    0.417    0.094   0.975*   0.002
  2.83   2.67  34209   2268   15.08  97.38      10.7     9.3    0.526    0.541    0.123   0.963*   0.013
  2.67   2.53  35104   2291   15.32  97.61       8.2     7.3    0.627    0.645    0.145   0.942*  -0.018
  2.53   2.42  32881   2239   14.69  96.22       6.4     5.5    0.762    0.787    0.186   0.384*  -0.008
  2.42   2.33  27903   2214   12.60  94.94       5.1     3.9    0.939    0.974    0.252   0.870*   0.007
  2.33   2.25  20854   2161    9.65  92.47       4.2     2.7    0.966    1.014    0.299   0.599*  -0.021
  2.25   2.18  16347   2062    7.93  88.54       3.6     2.0    1.308    1.394    0.464   0.333*   0.014
  2.18   2.12  12857   2020    6.36  85.99       3.0     1.4    1.097    1.189    0.445   0.006   0.004
  2.12   2.06  10391   1968    5.28  84.07       2.8     1.0    1.156    1.298    0.563   0.007  -0.008
  2.06   2.01   8006   1865    4.29  81.09       2.6     0.7    1.330    1.512    0.695  -0.001   0.016
  2.01   1.97   6184   1847    3.35  78.76       2.7     0.6    1.721    2.070    1.108   0.004   0.040
  1.97   1.92   4482   1673    2.68  71.56       0.4     0.5    1.188    1.414    0.747   0.030   0.005
  1.92   1.89   3037   1454    2.09  62.70       3.3     0.4    1.534    1.968    1.209  -0.004   0.250
  1.89   1.85   1692   1059    1.60  45.61       9.1     0.3    1.663    2.162    1.364   0.047  -1.000
  1.85   1.82    655    528    1.24  22.65       2.8     0.2    1.860    2.583    1.786   0.019   0.000
  1.82   1.79    194    173    1.12   7.49       3.4     0.1   -2.792   -3.948   -2.792   0.072   0.000
 72.23   1.79 388046  36823   10.54  78.99      18.6     9.1    0.296    0.311    0.086   0.583*   0.003


Resolution limit suggested from cc1/2 fit (limit cc1/2=0.3): 2.18

               ----------Summary of merging statistics----------               

                                            Suggested   Low    High  Overall
High resolution limit                           2.18    5.92    2.18    1.79
Low resolution limit                           72.23   72.27    2.22   72.23
Completeness                                   94.5    90.1    87.3    79.0
Multiplicity                                   14.1    16.5     7.5    10.5
I/sigma                                        13.4    41.1     1.9     9.1
Rmerge(I)                                     0.260   0.107   1.452   0.296
Rmerge(I+/-)                                  0.260   0.106   1.450   0.293
Rmeas(I)                                      0.268   0.109   1.555   0.311
Rmeas(I+/-)                                   0.274   0.111   1.620   0.316
Rpim(I)                                       0.063   0.023   0.537   0.086
Rpim(I+/-)                                    0.084   0.032   0.699   0.108
CC half                                       0.899   0.998   0.311   0.583
Anomalous completeness                         62.8    74.4    48.9    44.4
Anomalous multiplicity                          8.4     9.0     4.8     6.7
Anomalous correlation                         0.000   0.008  -0.003   0.003
Anomalous slope                               0.791                   0.688
dF/F                                          0.100                   0.235
dI/s(dI)                                      0.683                   0.492
Total observations                           340548   19123    8309  388046
Total unique                                  24236    1161    1103   36823

Writing html report to dials.scale.html
Saving the scaled experiments to scaled.expt
Saving the scaled reflections to scaled.refl
See dials.github.io/dials_scale_user_guide.html for more info on scaling options

@jbeilstenedmands jbeilstenedmands marked this pull request as ready for review June 26, 2020 13:00
@graeme-winter
Copy link
Contributor

Just run this

                                             Overall    Low     High
High resolution limit                           1.08    2.94    1.08
Low resolution limit                          102.23  102.69    1.10
Completeness                                   88.6   100.0    16.7
Multiplicity                                   20.2    25.0     1.1
I/sigma                                        21.9    63.6     1.7
Rmerge(I)                                     0.088   0.052   0.240
Rmerge(I+/-)                                  0.086   0.052   0.259
Rmeas(I)                                      0.090   0.053   0.333
Rmeas(I+/-)                                   0.090   0.053   0.363
Rpim(I)                                       0.018   0.011   0.229
Rpim(I+/-)                                    0.025   0.014   0.254
CC half                                       0.999   0.999   0.880
Anomalous completeness                         84.7   100.0     1.5
Anomalous multiplicity                         10.8    14.2     1.0
Anomalous correlation                        -0.037  -0.067   0.000
Anomalous slope                               0.956
dF/F                                          0.050
dI/s(dI)                                      0.799
Total observations                          1828128  138425     958
Total unique                                  90451    5533     842

So worked on 1st cut 🙂

@graeme-winter
Copy link
Contributor

Change set looks sensible.

I wonder if we should pull the equivalent code out of xia2 & just use this?

Will go look at the change sets in more detail now.

Copy link
Contributor

@graeme-winter graeme-winter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made a few comments, most of them pretty minor but I think it would be good to look at them before merging this. The actual output change I am fine with, but wondering if some housekeeping while you're looking at this stuff would be a good idea?

algorithms/merging/merge.py Outdated Show resolved Hide resolved
algorithms/merging/merge.py Outdated Show resolved Hide resolved
algorithms/scaling/observers.py Show resolved Hide resolved
max_current_res = merging_stats.bins[-1].d_min
cut_merging_statistics_result = None
cut_anom_merging_statistics_result = None
if r_cc - max_current_res > 0.005:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.005 because?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional: I dislike magic numbers, and also the significance of these depends very much on the value under comparison

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We report resolution stats to 2dp, so this is meant to be "if the same to within two decimal places". xia2's magic number here is 0.004 🤷‍♂️ 🙃

algorithms/scaling/observers.py Outdated Show resolved Hide resolved
for f, k in zip(row_format, row_data)
)
except TypeError:
formatted = "(error)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 not sure I like this one...

report/analysis.py Outdated Show resolved Hide resolved
report/analysis.py Outdated Show resolved Hide resolved
report/test_analysis.py Outdated Show resolved Hide resolved
else:
cc_f = fit(s_s[i:], cc_s[i:], 6)

logger.debug("rch: fits")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rch: ?

@@ -19,7 +19,7 @@
from dials.util.export_mtz import MADMergedMTZWriter, MergedMTZWriter
from dials.report.analysis import (
make_merging_statistics_summary,
make_xia2_style_statistics_summary,
table_1_summary,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jbeilstenedmands
Copy link
Contributor Author

Fixed most issues now and marked as resolved. Anything left are things carried over from existing code and I think are fine to stay as is.


if cc_half_method == "sigma_tau":
cc_s = flex.double(
[b.cc_one_half_sigma_tau for b in merging_statistics.bins]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[b.cc_one_half_sigma_tau for b in merging_statistics.bins]
b.cc_one_half_sigma_tau for b in merging_statistics.bins

flex constructors work with generators, no need to explicitly construct a list and then throw it away. Applies to this line and a couple more times down the file.

@@ -182,28 +181,184 @@ def _batch_bins_and_data(batches, values, function_to_apply):
return batch_bins, data


def make_merging_statistics_summary(dataset_statistics):
"""Format merging statistics information into an output string."""
formats = collections.OrderedDict(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPython 3.6+ (and Python 3.7+) dictionaries are ordered by default. No more need to use OrderedDict.

@codecov
Copy link

codecov bot commented Jul 16, 2020

Codecov Report

Merging #1312 into master will decrease coverage by 0.05%.
The diff coverage is 88.23%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1312      +/-   ##
==========================================
- Coverage   64.26%   64.20%   -0.06%     
==========================================
  Files         616      616              
  Lines       69688    69782      +94     
  Branches     9505     9529      +24     
==========================================
+ Hits        44786    44805      +19     
- Misses      23160    23212      +52     
- Partials     1742     1765      +23     
Impacted Files Coverage Δ
algorithms/scaling/algorithm.py 84.05% <ø> (-0.06%) ⬇️
algorithms/scaling/observers.py 92.98% <77.77%> (-1.48%) ⬇️
report/analysis.py 95.20% <88.73%> (-4.80%) ⬇️
algorithms/merging/merge.py 83.06% <100.00%> (+0.13%) ⬆️
command_line/scale.py 90.81% <100.00%> (ø)
report/test_analysis.py 100.00% <100.00%> (ø)
command_line/report.py 74.36% <0.00%> (-5.84%) ⬇️
report/plots.py 89.25% <0.00%> (-1.50%) ⬇️
algorithms/scaling/test_scale.py 98.77% <0.00%> (-1.23%) ⬇️
algorithms/integration/report.py 86.59% <0.00%> (-0.69%) ⬇️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d064f02...994d80c. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants