Implement unique experiment identifiers #902

jbeilstenedmands · 2019-08-29T10:39:49Z

Overview
This pull request implements the generation and use of unique experiment identifiers throughout the dials processing workflow. The motivation behind this is to allow matching of experiments and reflection tables without relying on the ordering of these objects (either on loading data or during processing. This feature has already proved useful in dials.scale and dials.cosym).

These identifiers (a uuid string by default) are stored in experiment.identifier, and the reflection_table.experiment_identifiers() map is used to record the numerical_id -> identifier mapping.
When a new experiment is created, a unique identifier is generated - this only happens in dials.import and dials.index. When a reflection table is created, the experiment_identifiers map is populated - this happens in spot-finding, indexing and integration. Other programs in the standard dials workflow add to the existing table, so the mapping is preserved.

So far I have only implemented and tested the uuid identifier generation for the sweeps case. For the stills case, I have added in function calls in the relevant locations in stills_indexer (which would currently generate a uuid); I invite @asmit3 and collaborators to add to this PR to modify this as you wish as I am unfamiliar with the ins and outs of the stills_process workflow.

People can probably ignore the changes to the scaling files if reviewing this. I shall continue to review the dials programs for the 'potential issue' (see below). However, the tests pass locally for me and wanted to get this PR out there for feedback.

Detailed description on behaviour changes
The behavioural changes are primarily to help programs which handle the input of multiple datasets in separate files - which will likely have duplicate numerical ids in the reflection tables.

To implement safe handling of data on loading, regardless of order, there is a new function to perform the role of flatten_experiments and flatten_refections, which also reorders the reflection tables to match the experiments order using the experiment identifiers;
refls, expts = reflections_and_experiments_from_files(refl_file_objs, expt_file_objs)
(note that the id columns in the tables are not renumbered at this point. This was something I introduced recently in flatten_reflections for flatten_reflections should wrangle experiment identifiers gracefully #656 but agree that this functionality should occur elsewhere, see below).
So far I have only replaced the flatten functions in dials.scale, but this can be rolled out to other programs if people are happy about this.
To ensure that multiple reflection tables can be joined together, one must ensure that the column id values do not clash. To do this, one can use the renumber_table_id_columns(reflection_tables) function in dials.util.multi_dataset_handling. There is also a split_reflection_tables_on_ids(reflection_tables) function for splitting multi-dataset tables on ids.
Therefore for programs that handle reflection tables with multiple datasets, the following calls are the suggested way to load the data, to provide a list of experiments and reflections with different numerical ids:

# load data and make sure the order of refls and expts matches
refls, expts =  reflections_and_experiments_from_files(refl_file_objs, expt_file_objs)

# renumber id columns if necessary to avoid duplicate numerical ids across tables
refls = renumber_table_id_columns(refls)

# split multi-dataset tables if necessary to a list of single dataset tables
refls = split_reflection_tables_on_ids(refls)

Potential issue
One problematic aspect is that performing a selection on a multi-dataset reflection table based on the "id" column may no longer be safe, as the identifier map is not updated (which could be an issue if trying to recombine the tables). Therefore code like this, to select data from the 0th experiment;
new_refl = refl.select(refl["id"] == 0)
should be replaced with one of

new_refl = refl.select_on_id_values([0])
new_refl = refl.select_on_experiment_identifiers([experiments[0].identifier])

or should be modified to

new_refl = refl.select(refl["id"] == 0)
new_refl.clean_experiment_identifiers_map()

I'm continuing to check the repository for places where this could be an issue.

Update flattening of data and multi-dataset handling

Allow -1 id entries in the table

Tidy up scaling files, update selecting function name.

graeme-winter · 2019-08-29T14:16:13Z

Looking at this now. Will try side-by-side comparison to get a sense of the real differences...

graeme-winter · 2019-08-29T14:41:33Z

Experiment 3:
Experiment identifier: 3c3a61b3-ee6a-45d1-a79d-2fbb6615ec71

certainly looks like a UUID 🙂

graeme-winter · 2019-08-29T14:48:32Z

OK, hypothesis test:

dials.index 10_SWEEP3_strong.expt 2_SWEEP1_strong.refl 2_SWEEP1_strong.expt 10_SWEEP3_strong.refl 6_SWEEP2_strong.expt 14_SWEEP4_strong.refl 14_SWEEP4_strong.expt 6_SWEEP2_strong.refl

  experiments = 10_SWEEP3_strong.expt
  experiments = 2_SWEEP1_strong.expt
  experiments = 6_SWEEP2_strong.expt
  experiments = 14_SWEEP4_strong.expt
  reflections = 2_SWEEP1_strong.refl
  reflections = 10_SWEEP3_strong.refl
  reflections = 14_SWEEP4_strong.refl
  reflections = 6_SWEEP2_strong.refl
}

Found max_cell: 31.1 Angstrom
Setting d_min: 0.61
FFT gridding: (256,256,256)
Number of centroids used: 22366

Gives:

--------------------------------------------------
| Imageset | # indexed | # unindexed | % indexed |
--------------------------------------------------
| 0        | 722       | 6241        | 10.4%     |
| 1        | 473       | 4681        | 9.2%      |
| 2        | 1827      | 3268        | 35.9%     |
| 3        | 2975      | 2857        | 51.0%     |
--------------------------------------------------

(actually:

dials.index 10_SWEEP3_strong.expt 2_SWEEP1_strong.refl 2_SWEEP1_strong.expt 10_SWEEP3_strong.refl 6_SWEEP2_strong.expt 14_SWEEP4_strong.refl 14_SWEEP4_strong.expt 6_SWEEP2_strong.refl  'auto_reduction.action=fix' 'indexing.method=fft3d' 'indexing.nproc=8' 'filter_ice=false' 'reflections_per_degree=100' 'close_to_spindle_cutoff=0.020000' 'outlier.algorithm=auto' 'min_cell=3' 'output.experiments=17_indexed.expt' 'output.reflections=17_indexed.refl'

whilst

dials.index *SWEEP*  'auto_reduction.action=fix' 'indexing.method=fft3d' 'indexing.nproc=8' 'filter_ice=false' 'reflections_per_degree=100' 'close_to_spindle_cutoff=0.020000' 'outlier.algorithm=auto' 'min_cell=3' 'output.experiments=17_indexed.expt' 'output.reflections=17_indexed.refl'

i.e. in order

input {
  experiments = 10_SWEEP3_strong.expt
  experiments = 14_SWEEP4_strong.expt
  experiments = 2_SWEEP1_strong.expt
  experiments = 6_SWEEP2_strong.expt
  reflections = 10_SWEEP3_strong.refl
  reflections = 14_SWEEP4_strong.refl
  reflections = 2_SWEEP1_strong.refl
  reflections = 6_SWEEP2_strong.refl
}

gives

--------------------------------------------------
| Imageset | # indexed | # unindexed | % indexed |
--------------------------------------------------
| 0        | 4662      | 492         | 90.5%     |
| 1        | 4564      | 531         | 89.6%     |
| 2        | 6763      | 228         | 96.7%     |
| 3        | 5331      | 501         | 91.4%     |
--------------------------------------------------

i.e. more to do... the scans are not assigned id's when imported by xia2 - explains something... will need to make some fixes in there probably once this is merged.

graeme-winter · 2019-08-29T14:51:33Z

Also that xia2 job failed on the second sweep with

Traceback (most recent call last):
  File "/Users/graeme/svn/cctbx/build/../modules/dials/command_line/integrate.py", line 694, in <module>
    script.run()
  File "/Users/graeme/svn/cctbx/build/../modules/dials/command_line/integrate.py", line 377, in run
    reflections.extend(rubbish)
RuntimeError: Please report this error to dials-support@lists.sourceforge.net: dials Error: /Users/graeme/svn/cctbx/modules/dials/array_family/boost_python/flex_reflection_table.cc(597): Experiment identifiers do not match

=> will try some manual runs

jbeilstenedmands · 2019-08-29T14:54:39Z

Sorry, should have made it clearer that I've only added the order resolving to dials.scale at the moment, but appreciate that is not very useful for testing, so will add to all programs now. The second thing with integrate is probably a real issue, will look into.

jbeilstenedmands · 2019-08-29T15:59:31Z

Indexing should now be fixed, unsure about the integration issue at the moment

graeme-winter · 2019-08-30T12:15:37Z

OK, @jbeilstenedmands we are now cooking on gas 🙂

graeme-winter · 2019-08-30T12:53:59Z

@jbeilstenedmands thanks for the updates - now behaves in the way I would expect... properly testing now

graeme-winter · 2019-08-30T13:08:20Z

Worked through some data - joint indexing works as it used to behaviour wise, everything else makes sense so e.g.

Grey-Area master :) $ diff integrated.expt ../branch/integrated.expt 
6c6
<       "identifier": "", 
---
>       "identifier": "bbcf665d-1a19-441b-87c0-56537ed3d50f", 
17c17
<       "identifier": "", 
---
>       "identifier": "5aee9ff3-d2af-48d4-8c00-d2661e484824",

looks right

jbeilstenedmands · 2019-08-30T13:10:35Z

A thought, do we ever use an imported.expt or strong.refl with a post-indexing datafile, as this will not work as the identifiers are different?

graeme-winter · 2019-08-30T13:13:14Z

You can re-index a strong reflection file with an indexed experiment...

Grey-Area one-reindex :( $ dials.index split_0.expt ../one/strong.refl 
DIALS (2018) Acta Cryst. D74, 85-97. https://doi.org/10.1107/S2059798317017235
DIALS 2.dev.760-g461c39c55
The following parameters have been modified:

input {
  experiments = split_0.expt
  reflections = ../one/strong.refl
}

Found max_cell: 31.3 Angstrom

Indexed crystal models:
model 1 (5596 reflections):
Crystal:
    Unit cell: (4.823, 16.883, 23.891, 89.997, 89.979, 90.005)
    Space group: P 1
    U matrix:  {{-0.5516, -0.6277,  0.5493},
                { 0.8337, -0.4342,  0.3412},
                { 0.0244,  0.6462,  0.7628}}
    B matrix:  {{ 0.2073,  0.0000,  0.0000},
                { 0.0000,  0.0592,  0.0000},
                {-0.0001, -0.0000,  0.0419}}
    A = UB:    {{-0.1144, -0.0372,  0.0230},
                { 0.1728, -0.0257,  0.0143},
                { 0.0050,  0.0383,  0.0319}}
--------------------------------------------------
| Imageset | # indexed | # unindexed | % indexed |
--------------------------------------------------
| 0        | 5596      | 522         | 91.5%     |
--------------------------------------------------

################################################################################
Starting refinement (macro-cycle 1)
################################################################################


Summary statistics for 5593 observations matched to predictions:
------------------------------------------------------------------------
|                   | Min    | Q1       | Med        | Q3      | Max   |
------------------------------------------------------------------------
| Xc - Xo (mm)      | -3.231 | -0.04597 | 0.00611    | 0.04907 | 2.47  |
| Yc - Yo (mm)      | -1.976 | -0.08399 | -0.0007289 | 0.09323 | 1.69  |
| Phic - Phio (deg) | -4.137 | -0.07088 | 0.01164    | 0.07935 | 5.96  |
| X weights         | 215.4  | 373.4    | 394.4      | 402.8   | 405.6 |
| Y weights         | 197.5  | 347.5    | 380        | 399.5   | 405.6 |
| Phi weights       | 228.1  | 294.1    | 298.1      | 300     | 300   |
------------------------------------------------------------------------

Detecting centroid outliers using the Tukey algorithm
763 reflections have been flagged as outliers

Summary statistics for 4830 observations matched to predictions:
-------------------------------------------------------------------------
|                   | Min     | Q1       | Med       | Q3      | Max    |
-------------------------------------------------------------------------
| Xc - Xo (mm)      | -0.2929 | -0.04152 | 0.005233  | 0.04231 | 0.2357 |
| Yc - Yo (mm)      | -0.3618 | -0.07847 | -0.004461 | 0.07292 | 0.444  |
| Phic - Phio (deg) | -0.3203 | -0.04857 | 0.01978   | 0.08028 | 0.313  |
| X weights         | 235.9   | 377.5    | 395.7     | 403     | 405.6  |
| Y weights         | 197.5   | 350.7    | 381       | 399.3   | 405.6  |
| Phi weights       | 228.1   | 294.3    | 298       | 300     | 300    |
-------------------------------------------------------------------------

There are 16 parameters to refine against 4830 reflections in 3 dimensions

Refinement steps:
------------------------------------------------
| Step | Nref | RMSD_X   | RMSD_Y   | RMSD_Phi |
|      |      | (mm)     | (mm)     | (deg)    |
------------------------------------------------
| 0    | 4830 | 0.068531 | 0.11688  | 0.095684 |
| 1    | 4830 | 0.058289 | 0.098136 | 0.081775 |
| 2    | 4830 | 0.05708  | 0.089541 | 0.080328 |
| 3    | 4830 | 0.057451 | 0.087404 | 0.079957 |
| 4    | 4830 | 0.056425 | 0.085956 | 0.078045 |
| 5    | 4830 | 0.053391 | 0.082936 | 0.073133 |
| 6    | 4830 | 0.047965 | 0.078008 | 0.064004 |
| 7    | 4830 | 0.04336  | 0.074752 | 0.055323 |
| 8    | 4830 | 0.042213 | 0.074628 | 0.052413 |
| 9    | 4830 | 0.042145 | 0.074792 | 0.052033 |
| 10   | 4830 | 0.042143 | 0.074808 | 0.052008 |
| 11   | 4830 | 0.042143 | 0.074809 | 0.052007 |
------------------------------------------------
RMSD no longer decreasing

RMSDs by experiment:
---------------------------------------------
| Exp | Nref | RMSD_X  | RMSD_Y  | RMSD_Z   |
| id  |      | (px)    | (px)    | (images) |
---------------------------------------------
| 0   | 4830 | 0.24502 | 0.43494 | 0.26004  |
---------------------------------------------

Refined crystal models:
model 1 (5596 reflections):
Crystal:
    Unit cell: (4.7901(3), 16.7614(11), 23.7295(16), 90.0019(12), 90.0015(12), 90.0098(12))
    Space group: P 1
    U matrix:  {{-0.5519, -0.6262,  0.5507},
                { 0.8335, -0.4363,  0.3391},
                { 0.0280,  0.6461,  0.7627}}
    B matrix:  {{ 0.2088,  0.0000,  0.0000},
                { 0.0000,  0.0597,  0.0000},
                { 0.0000,  0.0000,  0.0421}}
    A = UB:    {{-0.1152, -0.0374,  0.0232},
                { 0.1740, -0.0260,  0.0143},
                { 0.0059,  0.0386,  0.0321}}
--------------------------------------------------
| Imageset | # indexed | # unindexed | % indexed |
--------------------------------------------------
| 0        | 5596      | 522         | 91.5%     |
--------------------------------------------------

Indexed crystal models:
model 1 (5596 reflections):
Crystal:
    Unit cell: (4.7901(3), 16.7614(11), 23.7295(16), 90.0019(12), 90.0015(12), 90.0098(12))
    Space group: P 1
    U matrix:  {{-0.5519, -0.6262,  0.5507},
                { 0.8335, -0.4363,  0.3391},
                { 0.0280,  0.6461,  0.7627}}
    B matrix:  {{ 0.2088,  0.0000,  0.0000},
                { 0.0000,  0.0597,  0.0000},
                { 0.0000,  0.0000,  0.0421}}
    A = UB:    {{-0.1152, -0.0374,  0.0232},
                { 0.1740, -0.0260,  0.0143},
                { 0.0059,  0.0386,  0.0321}}
--------------------------------------------------
| Imageset | # indexed | # unindexed | % indexed |
--------------------------------------------------
| 0        | 5630      | 488         | 92.0%     |
--------------------------------------------------

################################################################################
Starting refinement (macro-cycle 2)
################################################################################


Summary statistics for 5593 observations matched to predictions:
-----------------------------------------------------------------------
|                   | Min    | Q1       | Med       | Q3      | Max   |
-----------------------------------------------------------------------
| Xc - Xo (mm)      | -2.307 | -0.02735 | 0.0002192 | 0.02922 | 2.461 |
| Yc - Yo (mm)      | -1.957 | -0.04946 | -0.00517  | 0.04708 | 1.716 |
| Phic - Phio (deg) | -8.605 | -0.0368  | 0.0006295 | 0.03402 | 5.47  |
| X weights         | 215.4  | 373.7    | 394.4     | 402.8   | 405.6 |
| Y weights         | 197.5  | 347.9    | 380.2     | 399.5   | 405.6 |
| Phi weights       | 228.1  | 294.1    | 298       | 300     | 300   |
-----------------------------------------------------------------------

Detecting centroid outliers using the Tukey algorithm
700 reflections have been flagged as outliers

Summary statistics for 4893 observations matched to predictions:
--------------------------------------------------------------------------
|                   | Min     | Q1       | Med        | Q3      | Max    |
--------------------------------------------------------------------------
| Xc - Xo (mm)      | -0.1282 | -0.02543 | -0.0005181 | 0.02489 | 0.1734 |
| Yc - Yo (mm)      | -0.1996 | -0.04489 | -0.005109  | 0.04197 | 0.2641 |
| Phic - Phio (deg) | -0.1749 | -0.0322  | 0.002899   | 0.03317 | 0.1833 |
| X weights         | 235.9   | 379.6    | 396.3      | 403.1   | 405.6  |
| Y weights         | 197.5   | 353.5    | 382.8      | 399.9   | 405.6  |
| Phi weights       | 228.1   | 294.1    | 297.9      | 300     | 300    |
--------------------------------------------------------------------------

There are 16 parameters to refine against 4893 reflections in 3 dimensions

Refinement steps:
------------------------------------------------
| Step | Nref | RMSD_X   | RMSD_Y   | RMSD_Phi |
|      |      | (mm)     | (mm)     | (deg)    |
------------------------------------------------
| 0    | 4893 | 0.03987  | 0.068776 | 0.047111 |
| 1    | 4893 | 0.039835 | 0.068146 | 0.047293 |
| 2    | 4893 | 0.039815 | 0.068009 | 0.047435 |
| 3    | 4893 | 0.039812 | 0.067966 | 0.04747  |
| 4    | 4893 | 0.039792 | 0.067959 | 0.047412 |
| 5    | 4893 | 0.039735 | 0.067977 | 0.047261 |
| 6    | 4893 | 0.039665 | 0.06803  | 0.047044 |
| 7    | 4893 | 0.03963  | 0.068099 | 0.046889 |
| 8    | 4893 | 0.039622 | 0.068127 | 0.046841 |
| 9    | 4893 | 0.039622 | 0.068131 | 0.046835 |
| 10   | 4893 | 0.039622 | 0.068131 | 0.046835 |
------------------------------------------------
RMSD no longer decreasing

RMSDs by experiment:
---------------------------------------------
| Exp | Nref | RMSD_X  | RMSD_Y  | RMSD_Z   |
| id  |      | (px)    | (px)    | (images) |
---------------------------------------------
| 0   | 4893 | 0.23036 | 0.39611 | 0.23418  |
---------------------------------------------

Refined crystal models:
model 1 (5596 reflections):
Crystal:
    Unit cell: (4.7881(2), 16.7536(7), 23.7208(11), 90.0033(10), 90.0073(10), 90.0134(10))
    Space group: P 1
    U matrix:  {{-0.5519, -0.6262,  0.5507},
                { 0.8334, -0.4364,  0.3390},
                { 0.0280,  0.6461,  0.7628}}
    B matrix:  {{ 0.2089,  0.0000,  0.0000},
                { 0.0000,  0.0597,  0.0000},
                { 0.0000,  0.0000,  0.0422}}
    A = UB:    {{-0.1153, -0.0374,  0.0232},
                { 0.1741, -0.0260,  0.0143},
                { 0.0059,  0.0386,  0.0322}}
--------------------------------------------------
| Imageset | # indexed | # unindexed | % indexed |
--------------------------------------------------
| 0        | 5596      | 522         | 91.5%     |
--------------------------------------------------

Indexed crystal models:
model 1 (5596 reflections):
Crystal:
    Unit cell: (4.7881(2), 16.7536(7), 23.7208(11), 90.0033(10), 90.0073(10), 90.0134(10))
    Space group: P 1
    U matrix:  {{-0.5519, -0.6262,  0.5507},
                { 0.8334, -0.4364,  0.3390},
                { 0.0280,  0.6461,  0.7628}}
    B matrix:  {{ 0.2089,  0.0000,  0.0000},
                { 0.0000,  0.0597,  0.0000},
                { 0.0000,  0.0000,  0.0422}}
    A = UB:    {{-0.1153, -0.0374,  0.0232},
                { 0.1741, -0.0260,  0.0143},
                { 0.0059,  0.0386,  0.0322}}
--------------------------------------------------
| Imageset | # indexed | # unindexed | % indexed |
--------------------------------------------------
| 0        | 5597      | 521         | 91.5%     |
--------------------------------------------------

################################################################################
Starting refinement (macro-cycle 3)
################################################################################


Summary statistics for 5593 observations matched to predictions:
------------------------------------------------------------------------
|                   | Min    | Q1       | Med        | Q3      | Max   |
------------------------------------------------------------------------
| Xc - Xo (mm)      | -2.27  | -0.02788 | -0.0007741 | 0.02872 | 2.514 |
| Yc - Yo (mm)      | -1.951 | -0.04738 | -0.00369   | 0.04655 | 1.722 |
| Phic - Phio (deg) | -8.996 | -0.0349  | 0.002666   | 0.03429 | 5.491 |
| X weights         | 215.4  | 373.7    | 394.4      | 402.8   | 405.6 |
| Y weights         | 197.5  | 347.9    | 380.2      | 399.5   | 405.6 |
| Phi weights       | 228.1  | 294.1    | 298        | 300     | 300   |
------------------------------------------------------------------------

Detecting centroid outliers using the Tukey algorithm
718 reflections have been flagged as outliers

Summary statistics for 4875 observations matched to predictions:
-------------------------------------------------------------------------
|                   | Min     | Q1       | Med       | Q3      | Max    |
-------------------------------------------------------------------------
| Xc - Xo (mm)      | -0.1284 | -0.02595 | -0.001249 | 0.02407 | 0.1794 |
| Yc - Yo (mm)      | -0.2164 | -0.04304 | -0.00369  | 0.04079 | 0.2296 |
| Phic - Phio (deg) | -0.1805 | -0.03021 | 0.004821  | 0.03308 | 0.1749 |
| X weights         | 235.9   | 380.1    | 396.5     | 403.2   | 405.6  |
| Y weights         | 197.5   | 353.9    | 383.2     | 400     | 405.6  |
| Phi weights       | 228.1   | 294.2    | 297.9     | 300     | 300    |
-------------------------------------------------------------------------

There are 16 parameters to refine against 4875 reflections in 3 dimensions

Refinement steps:
------------------------------------------------
| Step | Nref | RMSD_X   | RMSD_Y   | RMSD_Phi |
|      |      | (mm)     | (mm)     | (deg)    |
------------------------------------------------
| 0    | 4875 | 0.039855 | 0.067744 | 0.046671 |
| 1    | 4875 | 0.039901 | 0.067593 | 0.046755 |
| 2    | 4875 | 0.039938 | 0.067526 | 0.046814 |
| 3    | 4875 | 0.03995  | 0.067507 | 0.04683  |
| 4    | 4875 | 0.039948 | 0.067509 | 0.046817 |
| 5    | 4875 | 0.039941 | 0.067523 | 0.04678  |
| 6    | 4875 | 0.039931 | 0.067549 | 0.046726 |
| 7    | 4875 | 0.039925 | 0.067572 | 0.046685 |
| 8    | 4875 | 0.039923 | 0.06758  | 0.046672 |
| 9    | 4875 | 0.039923 | 0.067581 | 0.04667  |
------------------------------------------------
RMSD no longer decreasing

RMSDs by experiment:
---------------------------------------------
| Exp | Nref | RMSD_X  | RMSD_Y  | RMSD_Z   |
| id  |      | (px)    | (px)    | (images) |
---------------------------------------------
| 0   | 4875 | 0.23211 | 0.39291 | 0.23335  |
---------------------------------------------

Refined crystal models:
model 1 (5596 reflections):
Crystal:
    Unit cell: (4.7874(2), 16.7512(7), 23.7180(10), 90.0027(10), 90.0073(10), 90.0146(10))
    Space group: P 1
    U matrix:  {{-0.5519, -0.6262,  0.5507},
                { 0.8334, -0.4364,  0.3390},
                { 0.0281,  0.6460,  0.7628}}
    B matrix:  {{ 0.2089,  0.0000,  0.0000},
                { 0.0001,  0.0597,  0.0000},
                { 0.0000,  0.0000,  0.0422}}
    A = UB:    {{-0.1153, -0.0374,  0.0232},
                { 0.1741, -0.0261,  0.0143},
                { 0.0059,  0.0386,  0.0322}}
--------------------------------------------------
| Imageset | # indexed | # unindexed | % indexed |
--------------------------------------------------
| 0        | 5596      | 522         | 91.5%     |
--------------------------------------------------
Target d_min_final reached: finished with refinement
Saving refined experiments to indexed.expt
Saving refined reflections to indexed.refl

which apparently still works

jbeilstenedmands · 2019-08-30T13:19:05Z

Okay, perhaps index is a bit special for this kind of use as it assigns new identifiers anyway, do the identifiers in the indexed.{expt,refl} files match in this case?

graeme-winter · 2019-08-30T13:34:36Z

Time passes....

Grey-Area one-reindex :) $ dials.show indexed.expt |grep -i ident
Experiment identifier: 63aaf39c-4245-4b3f-817d-9c3f06a167ce
  identifier: PILATUS 2M, S/N 24-0107 Diamond

Grey-Area one-reindex :) $ dials.python
Python 2.7.15 (default, Oct 10 2018, 09:01:12) 
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from dials.array_family import flex
>>> r = flex.reflection_table.from_file("indexed.refl")
>>> help(r)

>>> r.experiment_identifiers()
<dials_array_family_flex_ext.experiment_id_map object at 0x1101837d0>
>>> for x in _:
...   print x
... 
(0, '63aaf39c-4245-4b3f-817d-9c3f06a167ce')

computer says yes

graeme-winter

With the recent changes this now behaves like I would expect (without challenging by e.g. switching order of input files around)

Unexpectedly it also works when I switch around the order of input files:

input {
  experiments = ../one/imported.expt
  experiments = ../two/imported.expt
  reflections = ../two/strong.refl
  reflections = ../one/strong.refl
}

and gives numerically identical output:

Grey-Area shuffle :) $ diff dials.index.log  ../dials.index.log 
4a5,7
> output {
>   split_experiments = True
> }
6,9c9,12
<   experiments = ../one/imported.expt
<   experiments = ../two/imported.expt
<   reflections = ../two/strong.refl
<   reflections = ../one/strong.refl
---
>   experiments = one/imported.expt
>   experiments = two/imported.expt
>   reflections = one/strong.refl
>   reflections = two/strong.refl
720a724
> Splitting experiments before output

So to me this delivers on the original promise of using unique identifiers.

I would like to also see at the least feedback from @ndevenish and @dagewa for technical and API user feedback. As an end user this does what I want and makes sense => 👍

graeme-winter · 2019-08-30T13:54:26Z

... that said

dials.index (two scans input) [split_experiments=true]

then

dials.split_experiments indexed.*

Only gives you one set of indexed reflections in split_0.* and no unindexed reflections, and no reflections at all and a boom in split_1.* - when looking at these with the reciprocal lattice viewer. Looking at indexed.* behaves exactly as you would expect.

So I think split_experiments needs some more attention...

graeme-winter · 2019-08-30T13:58:55Z

On this last one... I get the same behaviour with master so perhaps a red herring. Or a little red 🐛

graeme-winter · 2019-08-30T14:38:35Z

Putative fix for the last one in #906

Anthchirp · 2019-08-30T21:21:38Z

command_line/reindex.py

-    reflections = flatten_reflections(params.input.reflections)
-    experiments = flatten_experiments(params.input.experiments)
+    reflections, experiments = reflections_and_experiments_from_files(
+        params.input.reflections, params.input.experiments


Wondering about this pattern.
Is there any benefit in exposing params.input.reflections/params.input.experimentsto the application at all, or could/should this be integrated intoparams.input/the parse_args` call?

This is a good suggestion but more wide-ranging - possibly a follow up pull request?

dagewa · 2019-09-05T08:34:30Z

If new_refl = refl.select(refl["id"] == 0) is now unsafe, would it be reasonable to decorate the select method for reflection tables so that this now calls clean_experiment_identifiers_map() automatically afterwards?

dagewa · 2019-09-05T08:46:56Z

dials.combine_experiments overwrites reflection id according to input order

dials/command_line/combine_experiments.py

Line 529 in 7a239f2

sub_ref["id"] = flex.int(len(sub_ref), global_id)

but does nothing with experiment_identifiers. Is this a problem?

graeme-winter · 2019-09-05T08:49:04Z

@dagewa probably yes... I think split and combine both give interesting and perhaps unexpected behaviour

See also #907 - I think there is more we could be doing here to keep things consistent (and outlawing experiment id -1)

dagewa · 2019-09-05T08:59:56Z

@graeme-winter I see. I think we need to figure out this interesting/unexpected behaviour and exercise it in some new tests.

I wonder if some of these problems go away if reflection tables always kept their experiment_identifiers maps updated automatically. So, for example, as above calling clean_experiment_identifiers_map on select, but also updating this map when a reflection table id column is touched.

I'd like it if we could work with reflection tables the way we always have done with the additional bookkeeping all being done behind the scenes. I'm not worried about the replacement of flatten_reflections etc., but I am concerned about standard flex array operations leading to inconsistencies or subtle bugs.

codecov-io · 2019-09-05T09:01:12Z

Codecov Report

Merging #902 into master will decrease coverage by 0.05%.
The diff coverage is 79.9%.

@@            Coverage Diff             @@
##           master     #902      +/-   ##
==========================================
- Coverage   67.33%   67.28%   -0.06%     
==========================================
  Files         608      605       -3     
  Lines       69294    69166     -128     
  Branches     8465     8454      -11     
==========================================
- Hits        46659    46537     -122     
+ Misses      20798    20793       -5     
+ Partials     1837     1836       -1

Impacted Files	Coverage Δ
test/util/test_nexus.py	`94.25% <ø> (ø)`	⬆️
command_line/model_background.py	`0% <ø> (ø)`	⬆️
util/masking/__init__.py	`85.27% <ø> (ø)`	⬆️
libtbx_refresh.py	`0% <ø> (ø)`	⬆️
command_line/integrate.py	`61.91% <ø> (ø)`	⬆️
command_line/find_spots.py	`76.47% <ø> (ø)`	⬆️
command_line/find_hot_pixels.py	`80.7% <ø> (ø)`	⬆️
command_line/dials_import.py	`64.03% <ø> (ø)`	⬆️
command_line/import_stream.py	`0% <ø> (ø)`	⬆️
test/command_line/test_generate_mask.py	`100% <ø> (ø)`	⬆️
... and 61 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d91a010...461c39c. Read the comment docs.

jbeilstenedmands · 2019-09-05T09:26:03Z

@dagewa yes I think we should decorate the reflection table select method to discreetly update the identifiers, good idea

This is a noninvasive section from previously-approved PR #902

jbeilstenedmands added 19 commits August 28, 2019 14:09

Set experiment identifier on import

7788e6d

Set identifiers map when creating strong.refl

17c2b97

Assign identifiers in indexing

99c323c

Update identifiers handling in integration

a4b3599

Update multi dataset handling for unique identifiers

79dc942

Add function to resolve file ordering with identifiers

eda3809

Update flattening of data and multi-dataset handling

Update image exclusion with new identifiers

cfc5873

Update are_identifiers_consistent definition

1f0a6a0

Allow -1 id entries in the table

Update dials.scale and dials.compute_delta_cchalf

be95444

Fix test_export.py

9505d6a

Update tests for handling loading of reflections and experiments

001cea4

Set identifiers in import. Add test that identifier is set.

260963e

Update indexing tests and fixes to dials.index

b5a0dba

Add some integration tests with identifiers

35e3ab3

Add tests that identifiers are set in spotfinding

3fdde03

Test for preservation of experiment identifiers throughout processing

1a7f228

Update exclude_images test

d43a7d7

Update dataset exclusion handling in scaling

03f331a

Replace string identifier with numerical id in scaling, filtering

f5f3fcf

Tidy up scaling files, update selecting function name.

jbeilstenedmands requested review from ndevenish, graeme-winter, asmit3 and dagewa August 29, 2019 10:39

jbeilstenedmands added 2 commits August 29, 2019 11:41

Add missing init file from test/util

7c9c3ac

Fix typo and flake8 warnings

7464385

Flake8 placation

461c39c

graeme-winter approved these changes Aug 30, 2019

View reviewed changes

Anthchirp reviewed Aug 30, 2019

View reviewed changes

Anthchirp mentioned this pull request Oct 21, 2019

unpythonic data structures: experiment_identifiers, reflection_tables #862

Closed

graeme-winter mentioned this pull request Dec 3, 2019

DIALS 2.1 #993

Closed

Anthchirp added this to the DIALS 2.1 milestone Dec 4, 2019

jbeilstenedmands mentioned this pull request Dec 16, 2019

Handle selecting a reflection table with experiment identifiers #1077

Merged

jbeilstenedmands added a commit that referenced this pull request Dec 17, 2019

Add reflection table sorting functions.

26e13a3

This is a noninvasive section from previously-approved PR #902

jbeilstenedmands mentioned this pull request Jan 9, 2020

Use experiment identifiers from import #1086

Merged

jbeilstenedmands closed this Jan 9, 2020

Anthchirp deleted the expt_identifiers branch January 23, 2020 16:53

Anthchirp mentioned this pull request Aug 26, 2021

stills_indexer: Add a deterministic option for experiment identifiers #1864

Closed

huwjenkins mentioned this pull request Feb 9, 2022

dials.scale exclude_datasets and experiment identifiers #2006

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement unique experiment identifiers #902

Implement unique experiment identifiers #902

jbeilstenedmands commented Aug 29, 2019

graeme-winter commented Aug 29, 2019

graeme-winter commented Aug 29, 2019 •

edited

graeme-winter commented Aug 29, 2019

graeme-winter commented Aug 29, 2019

jbeilstenedmands commented Aug 29, 2019

jbeilstenedmands commented Aug 29, 2019

graeme-winter commented Aug 30, 2019

graeme-winter commented Aug 30, 2019

graeme-winter commented Aug 30, 2019

jbeilstenedmands commented Aug 30, 2019

graeme-winter commented Aug 30, 2019

jbeilstenedmands commented Aug 30, 2019

graeme-winter commented Aug 30, 2019

graeme-winter left a comment

graeme-winter commented Aug 30, 2019

graeme-winter commented Aug 30, 2019

graeme-winter commented Aug 30, 2019

Anthchirp Aug 30, 2019

graeme-winter Aug 31, 2019

dagewa commented Sep 5, 2019

dagewa commented Sep 5, 2019

graeme-winter commented Sep 5, 2019

dagewa commented Sep 5, 2019 •

edited

codecov-io commented Sep 5, 2019

jbeilstenedmands commented Sep 5, 2019

Implement unique experiment identifiers #902

Implement unique experiment identifiers #902

Conversation

jbeilstenedmands commented Aug 29, 2019

graeme-winter commented Aug 29, 2019

graeme-winter commented Aug 29, 2019 • edited

graeme-winter commented Aug 29, 2019

graeme-winter commented Aug 29, 2019

jbeilstenedmands commented Aug 29, 2019

jbeilstenedmands commented Aug 29, 2019

graeme-winter commented Aug 30, 2019

graeme-winter commented Aug 30, 2019

graeme-winter commented Aug 30, 2019

jbeilstenedmands commented Aug 30, 2019

graeme-winter commented Aug 30, 2019

jbeilstenedmands commented Aug 30, 2019

graeme-winter commented Aug 30, 2019

graeme-winter left a comment

Choose a reason for hiding this comment

graeme-winter commented Aug 30, 2019

graeme-winter commented Aug 30, 2019

graeme-winter commented Aug 30, 2019

Anthchirp Aug 30, 2019

Choose a reason for hiding this comment

graeme-winter Aug 31, 2019

Choose a reason for hiding this comment

dagewa commented Sep 5, 2019

dagewa commented Sep 5, 2019

graeme-winter commented Sep 5, 2019

dagewa commented Sep 5, 2019 • edited

codecov-io commented Sep 5, 2019

Codecov Report

jbeilstenedmands commented Sep 5, 2019

graeme-winter commented Aug 29, 2019 •

edited

dagewa commented Sep 5, 2019 •

edited