Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement unique experiment identifiers #902

Closed
wants to merge 24 commits into from

Conversation

jbeilstenedmands
Copy link
Contributor

Overview
This pull request implements the generation and use of unique experiment identifiers throughout the dials processing workflow. The motivation behind this is to allow matching of experiments and reflection tables without relying on the ordering of these objects (either on loading data or during processing. This feature has already proved useful in dials.scale and dials.cosym).

These identifiers (a uuid string by default) are stored in experiment.identifier, and the reflection_table.experiment_identifiers() map is used to record the numerical_id -> identifier mapping.
When a new experiment is created, a unique identifier is generated - this only happens in dials.import and dials.index. When a reflection table is created, the experiment_identifiers map is populated - this happens in spot-finding, indexing and integration. Other programs in the standard dials workflow add to the existing table, so the mapping is preserved.

So far I have only implemented and tested the uuid identifier generation for the sweeps case. For the stills case, I have added in function calls in the relevant locations in stills_indexer (which would currently generate a uuid); I invite @asmit3 and collaborators to add to this PR to modify this as you wish as I am unfamiliar with the ins and outs of the stills_process workflow.

People can probably ignore the changes to the scaling files if reviewing this. I shall continue to review the dials programs for the 'potential issue' (see below). However, the tests pass locally for me and wanted to get this PR out there for feedback.

Detailed description on behaviour changes
The behavioural changes are primarily to help programs which handle the input of multiple datasets in separate files - which will likely have duplicate numerical ids in the reflection tables.

  1. To implement safe handling of data on loading, regardless of order, there is a new function to perform the role of flatten_experiments and flatten_refections, which also reorders the reflection tables to match the experiments order using the experiment identifiers;
    refls, expts = reflections_and_experiments_from_files(refl_file_objs, expt_file_objs)
    (note that the id columns in the tables are not renumbered at this point. This was something I introduced recently in flatten_reflections for flatten_reflections should wrangle experiment identifiers gracefully #656 but agree that this functionality should occur elsewhere, see below).
    So far I have only replaced the flatten functions in dials.scale, but this can be rolled out to other programs if people are happy about this.

  2. To ensure that multiple reflection tables can be joined together, one must ensure that the column id values do not clash. To do this, one can use the renumber_table_id_columns(reflection_tables) function in dials.util.multi_dataset_handling. There is also a split_reflection_tables_on_ids(reflection_tables) function for splitting multi-dataset tables on ids.

  3. Therefore for programs that handle reflection tables with multiple datasets, the following calls are the suggested way to load the data, to provide a list of experiments and reflections with different numerical ids:

# load data and make sure the order of refls and expts matches
refls, expts =  reflections_and_experiments_from_files(refl_file_objs, expt_file_objs)

# renumber id columns if necessary to avoid duplicate numerical ids across tables
refls = renumber_table_id_columns(refls)

# split multi-dataset tables if necessary to a list of single dataset tables
refls = split_reflection_tables_on_ids(refls)

Potential issue
One problematic aspect is that performing a selection on a multi-dataset reflection table based on the "id" column may no longer be safe, as the identifier map is not updated (which could be an issue if trying to recombine the tables). Therefore code like this, to select data from the 0th experiment;
new_refl = refl.select(refl["id"] == 0)
should be replaced with one of

new_refl = refl.select_on_id_values([0])
new_refl = refl.select_on_experiment_identifiers([experiments[0].identifier])

or should be modified to

new_refl = refl.select(refl["id"] == 0)
new_refl.clean_experiment_identifiers_map()

I'm continuing to check the repository for places where this could be an issue.

@graeme-winter
Copy link
Contributor

Looking at this now. Will try side-by-side comparison to get a sense of the real differences...

@graeme-winter
Copy link
Contributor

graeme-winter commented Aug 29, 2019

Experiment 3:
Experiment identifier: 3c3a61b3-ee6a-45d1-a79d-2fbb6615ec71

certainly looks like a UUID 🙂

@graeme-winter
Copy link
Contributor

OK, hypothesis test:

dials.index 10_SWEEP3_strong.expt 2_SWEEP1_strong.refl 2_SWEEP1_strong.expt 10_SWEEP3_strong.refl 6_SWEEP2_strong.expt 14_SWEEP4_strong.refl 14_SWEEP4_strong.expt 6_SWEEP2_strong.refl

  experiments = 10_SWEEP3_strong.expt
  experiments = 2_SWEEP1_strong.expt
  experiments = 6_SWEEP2_strong.expt
  experiments = 14_SWEEP4_strong.expt
  reflections = 2_SWEEP1_strong.refl
  reflections = 10_SWEEP3_strong.refl
  reflections = 14_SWEEP4_strong.refl
  reflections = 6_SWEEP2_strong.refl
}

Found max_cell: 31.1 Angstrom
Setting d_min: 0.61
FFT gridding: (256,256,256)
Number of centroids used: 22366

Gives:

--------------------------------------------------
| Imageset | # indexed | # unindexed | % indexed |
--------------------------------------------------
| 0        | 722       | 6241        | 10.4%     |
| 1        | 473       | 4681        | 9.2%      |
| 2        | 1827      | 3268        | 35.9%     |
| 3        | 2975      | 2857        | 51.0%     |
--------------------------------------------------

(actually:

dials.index 10_SWEEP3_strong.expt 2_SWEEP1_strong.refl 2_SWEEP1_strong.expt 10_SWEEP3_strong.refl 6_SWEEP2_strong.expt 14_SWEEP4_strong.refl 14_SWEEP4_strong.expt 6_SWEEP2_strong.refl  'auto_reduction.action=fix' 'indexing.method=fft3d' 'indexing.nproc=8' 'filter_ice=false' 'reflections_per_degree=100' 'close_to_spindle_cutoff=0.020000' 'outlier.algorithm=auto' 'min_cell=3' 'output.experiments=17_indexed.expt' 'output.reflections=17_indexed.refl'

whilst

dials.index *SWEEP*  'auto_reduction.action=fix' 'indexing.method=fft3d' 'indexing.nproc=8' 'filter_ice=false' 'reflections_per_degree=100' 'close_to_spindle_cutoff=0.020000' 'outlier.algorithm=auto' 'min_cell=3' 'output.experiments=17_indexed.expt' 'output.reflections=17_indexed.refl'

i.e. in order

input {
  experiments = 10_SWEEP3_strong.expt
  experiments = 14_SWEEP4_strong.expt
  experiments = 2_SWEEP1_strong.expt
  experiments = 6_SWEEP2_strong.expt
  reflections = 10_SWEEP3_strong.refl
  reflections = 14_SWEEP4_strong.refl
  reflections = 2_SWEEP1_strong.refl
  reflections = 6_SWEEP2_strong.refl
}

gives

--------------------------------------------------
| Imageset | # indexed | # unindexed | % indexed |
--------------------------------------------------
| 0        | 4662      | 492         | 90.5%     |
| 1        | 4564      | 531         | 89.6%     |
| 2        | 6763      | 228         | 96.7%     |
| 3        | 5331      | 501         | 91.4%     |
--------------------------------------------------

i.e. more to do... the scans are not assigned id's when imported by xia2 - explains something... will need to make some fixes in there probably once this is merged.

@graeme-winter
Copy link
Contributor

Also that xia2 job failed on the second sweep with

Traceback (most recent call last):
  File "/Users/graeme/svn/cctbx/build/../modules/dials/command_line/integrate.py", line 694, in <module>
    script.run()
  File "/Users/graeme/svn/cctbx/build/../modules/dials/command_line/integrate.py", line 377, in run
    reflections.extend(rubbish)
RuntimeError: Please report this error to dials-support@lists.sourceforge.net: dials Error: /Users/graeme/svn/cctbx/modules/dials/array_family/boost_python/flex_reflection_table.cc(597): Experiment identifiers do not match

=> will try some manual runs

@jbeilstenedmands
Copy link
Contributor Author

Sorry, should have made it clearer that I've only added the order resolving to dials.scale at the moment, but appreciate that is not very useful for testing, so will add to all programs now. The second thing with integrate is probably a real issue, will look into.

@jbeilstenedmands
Copy link
Contributor Author

Indexing should now be fixed, unsure about the integration issue at the moment

@graeme-winter
Copy link
Contributor

OK, @jbeilstenedmands we are now cooking on gas 🙂

@graeme-winter
Copy link
Contributor

@jbeilstenedmands thanks for the updates - now behaves in the way I would expect... properly testing now

@graeme-winter
Copy link
Contributor

Worked through some data - joint indexing works as it used to behaviour wise, everything else makes sense so e.g.

Grey-Area master :) $ diff integrated.expt ../branch/integrated.expt 
6c6
<       "identifier": "", 
---
>       "identifier": "bbcf665d-1a19-441b-87c0-56537ed3d50f", 
17c17
<       "identifier": "", 
---
>       "identifier": "5aee9ff3-d2af-48d4-8c00-d2661e484824", 

looks right

@jbeilstenedmands
Copy link
Contributor Author

A thought, do we ever use an imported.expt or strong.refl with a post-indexing datafile, as this will not work as the identifiers are different?

@graeme-winter
Copy link
Contributor

You can re-index a strong reflection file with an indexed experiment...

Grey-Area one-reindex :( $ dials.index split_0.expt ../one/strong.refl 
DIALS (2018) Acta Cryst. D74, 85-97. https://doi.org/10.1107/S2059798317017235
DIALS 2.dev.760-g461c39c55
The following parameters have been modified:

input {
  experiments = split_0.expt
  reflections = ../one/strong.refl
}

Found max_cell: 31.3 Angstrom

Indexed crystal models:
model 1 (5596 reflections):
Crystal:
    Unit cell: (4.823, 16.883, 23.891, 89.997, 89.979, 90.005)
    Space group: P 1
    U matrix:  {{-0.5516, -0.6277,  0.5493},
                { 0.8337, -0.4342,  0.3412},
                { 0.0244,  0.6462,  0.7628}}
    B matrix:  {{ 0.2073,  0.0000,  0.0000},
                { 0.0000,  0.0592,  0.0000},
                {-0.0001, -0.0000,  0.0419}}
    A = UB:    {{-0.1144, -0.0372,  0.0230},
                { 0.1728, -0.0257,  0.0143},
                { 0.0050,  0.0383,  0.0319}}
--------------------------------------------------
| Imageset | # indexed | # unindexed | % indexed |
--------------------------------------------------
| 0        | 5596      | 522         | 91.5%     |
--------------------------------------------------

################################################################################
Starting refinement (macro-cycle 1)
################################################################################


Summary statistics for 5593 observations matched to predictions:
------------------------------------------------------------------------
|                   | Min    | Q1       | Med        | Q3      | Max   |
------------------------------------------------------------------------
| Xc - Xo (mm)      | -3.231 | -0.04597 | 0.00611    | 0.04907 | 2.47  |
| Yc - Yo (mm)      | -1.976 | -0.08399 | -0.0007289 | 0.09323 | 1.69  |
| Phic - Phio (deg) | -4.137 | -0.07088 | 0.01164    | 0.07935 | 5.96  |
| X weights         | 215.4  | 373.4    | 394.4      | 402.8   | 405.6 |
| Y weights         | 197.5  | 347.5    | 380        | 399.5   | 405.6 |
| Phi weights       | 228.1  | 294.1    | 298.1      | 300     | 300   |
------------------------------------------------------------------------

Detecting centroid outliers using the Tukey algorithm
763 reflections have been flagged as outliers

Summary statistics for 4830 observations matched to predictions:
-------------------------------------------------------------------------
|                   | Min     | Q1       | Med       | Q3      | Max    |
-------------------------------------------------------------------------
| Xc - Xo (mm)      | -0.2929 | -0.04152 | 0.005233  | 0.04231 | 0.2357 |
| Yc - Yo (mm)      | -0.3618 | -0.07847 | -0.004461 | 0.07292 | 0.444  |
| Phic - Phio (deg) | -0.3203 | -0.04857 | 0.01978   | 0.08028 | 0.313  |
| X weights         | 235.9   | 377.5    | 395.7     | 403     | 405.6  |
| Y weights         | 197.5   | 350.7    | 381       | 399.3   | 405.6  |
| Phi weights       | 228.1   | 294.3    | 298       | 300     | 300    |
-------------------------------------------------------------------------

There are 16 parameters to refine against 4830 reflections in 3 dimensions

Refinement steps:
------------------------------------------------
| Step | Nref | RMSD_X   | RMSD_Y   | RMSD_Phi |
|      |      | (mm)     | (mm)     | (deg)    |
------------------------------------------------
| 0    | 4830 | 0.068531 | 0.11688  | 0.095684 |
| 1    | 4830 | 0.058289 | 0.098136 | 0.081775 |
| 2    | 4830 | 0.05708  | 0.089541 | 0.080328 |
| 3    | 4830 | 0.057451 | 0.087404 | 0.079957 |
| 4    | 4830 | 0.056425 | 0.085956 | 0.078045 |
| 5    | 4830 | 0.053391 | 0.082936 | 0.073133 |
| 6    | 4830 | 0.047965 | 0.078008 | 0.064004 |
| 7    | 4830 | 0.04336  | 0.074752 | 0.055323 |
| 8    | 4830 | 0.042213 | 0.074628 | 0.052413 |
| 9    | 4830 | 0.042145 | 0.074792 | 0.052033 |
| 10   | 4830 | 0.042143 | 0.074808 | 0.052008 |
| 11   | 4830 | 0.042143 | 0.074809 | 0.052007 |
------------------------------------------------
RMSD no longer decreasing

RMSDs by experiment:
---------------------------------------------
| Exp | Nref | RMSD_X  | RMSD_Y  | RMSD_Z   |
| id  |      | (px)    | (px)    | (images) |
---------------------------------------------
| 0   | 4830 | 0.24502 | 0.43494 | 0.26004  |
---------------------------------------------

Refined crystal models:
model 1 (5596 reflections):
Crystal:
    Unit cell: (4.7901(3), 16.7614(11), 23.7295(16), 90.0019(12), 90.0015(12), 90.0098(12))
    Space group: P 1
    U matrix:  {{-0.5519, -0.6262,  0.5507},
                { 0.8335, -0.4363,  0.3391},
                { 0.0280,  0.6461,  0.7627}}
    B matrix:  {{ 0.2088,  0.0000,  0.0000},
                { 0.0000,  0.0597,  0.0000},
                { 0.0000,  0.0000,  0.0421}}
    A = UB:    {{-0.1152, -0.0374,  0.0232},
                { 0.1740, -0.0260,  0.0143},
                { 0.0059,  0.0386,  0.0321}}
--------------------------------------------------
| Imageset | # indexed | # unindexed | % indexed |
--------------------------------------------------
| 0        | 5596      | 522         | 91.5%     |
--------------------------------------------------

Indexed crystal models:
model 1 (5596 reflections):
Crystal:
    Unit cell: (4.7901(3), 16.7614(11), 23.7295(16), 90.0019(12), 90.0015(12), 90.0098(12))
    Space group: P 1
    U matrix:  {{-0.5519, -0.6262,  0.5507},
                { 0.8335, -0.4363,  0.3391},
                { 0.0280,  0.6461,  0.7627}}
    B matrix:  {{ 0.2088,  0.0000,  0.0000},
                { 0.0000,  0.0597,  0.0000},
                { 0.0000,  0.0000,  0.0421}}
    A = UB:    {{-0.1152, -0.0374,  0.0232},
                { 0.1740, -0.0260,  0.0143},
                { 0.0059,  0.0386,  0.0321}}
--------------------------------------------------
| Imageset | # indexed | # unindexed | % indexed |
--------------------------------------------------
| 0        | 5630      | 488         | 92.0%     |
--------------------------------------------------

################################################################################
Starting refinement (macro-cycle 2)
################################################################################


Summary statistics for 5593 observations matched to predictions:
-----------------------------------------------------------------------
|                   | Min    | Q1       | Med       | Q3      | Max   |
-----------------------------------------------------------------------
| Xc - Xo (mm)      | -2.307 | -0.02735 | 0.0002192 | 0.02922 | 2.461 |
| Yc - Yo (mm)      | -1.957 | -0.04946 | -0.00517  | 0.04708 | 1.716 |
| Phic - Phio (deg) | -8.605 | -0.0368  | 0.0006295 | 0.03402 | 5.47  |
| X weights         | 215.4  | 373.7    | 394.4     | 402.8   | 405.6 |
| Y weights         | 197.5  | 347.9    | 380.2     | 399.5   | 405.6 |
| Phi weights       | 228.1  | 294.1    | 298       | 300     | 300   |
-----------------------------------------------------------------------

Detecting centroid outliers using the Tukey algorithm
700 reflections have been flagged as outliers

Summary statistics for 4893 observations matched to predictions:
--------------------------------------------------------------------------
|                   | Min     | Q1       | Med        | Q3      | Max    |
--------------------------------------------------------------------------
| Xc - Xo (mm)      | -0.1282 | -0.02543 | -0.0005181 | 0.02489 | 0.1734 |
| Yc - Yo (mm)      | -0.1996 | -0.04489 | -0.005109  | 0.04197 | 0.2641 |
| Phic - Phio (deg) | -0.1749 | -0.0322  | 0.002899   | 0.03317 | 0.1833 |
| X weights         | 235.9   | 379.6    | 396.3      | 403.1   | 405.6  |
| Y weights         | 197.5   | 353.5    | 382.8      | 399.9   | 405.6  |
| Phi weights       | 228.1   | 294.1    | 297.9      | 300     | 300    |
--------------------------------------------------------------------------

There are 16 parameters to refine against 4893 reflections in 3 dimensions

Refinement steps:
------------------------------------------------
| Step | Nref | RMSD_X   | RMSD_Y   | RMSD_Phi |
|      |      | (mm)     | (mm)     | (deg)    |
------------------------------------------------
| 0    | 4893 | 0.03987  | 0.068776 | 0.047111 |
| 1    | 4893 | 0.039835 | 0.068146 | 0.047293 |
| 2    | 4893 | 0.039815 | 0.068009 | 0.047435 |
| 3    | 4893 | 0.039812 | 0.067966 | 0.04747  |
| 4    | 4893 | 0.039792 | 0.067959 | 0.047412 |
| 5    | 4893 | 0.039735 | 0.067977 | 0.047261 |
| 6    | 4893 | 0.039665 | 0.06803  | 0.047044 |
| 7    | 4893 | 0.03963  | 0.068099 | 0.046889 |
| 8    | 4893 | 0.039622 | 0.068127 | 0.046841 |
| 9    | 4893 | 0.039622 | 0.068131 | 0.046835 |
| 10   | 4893 | 0.039622 | 0.068131 | 0.046835 |
------------------------------------------------
RMSD no longer decreasing

RMSDs by experiment:
---------------------------------------------
| Exp | Nref | RMSD_X  | RMSD_Y  | RMSD_Z   |
| id  |      | (px)    | (px)    | (images) |
---------------------------------------------
| 0   | 4893 | 0.23036 | 0.39611 | 0.23418  |
---------------------------------------------

Refined crystal models:
model 1 (5596 reflections):
Crystal:
    Unit cell: (4.7881(2), 16.7536(7), 23.7208(11), 90.0033(10), 90.0073(10), 90.0134(10))
    Space group: P 1
    U matrix:  {{-0.5519, -0.6262,  0.5507},
                { 0.8334, -0.4364,  0.3390},
                { 0.0280,  0.6461,  0.7628}}
    B matrix:  {{ 0.2089,  0.0000,  0.0000},
                { 0.0000,  0.0597,  0.0000},
                { 0.0000,  0.0000,  0.0422}}
    A = UB:    {{-0.1153, -0.0374,  0.0232},
                { 0.1741, -0.0260,  0.0143},
                { 0.0059,  0.0386,  0.0322}}
--------------------------------------------------
| Imageset | # indexed | # unindexed | % indexed |
--------------------------------------------------
| 0        | 5596      | 522         | 91.5%     |
--------------------------------------------------

Indexed crystal models:
model 1 (5596 reflections):
Crystal:
    Unit cell: (4.7881(2), 16.7536(7), 23.7208(11), 90.0033(10), 90.0073(10), 90.0134(10))
    Space group: P 1
    U matrix:  {{-0.5519, -0.6262,  0.5507},
                { 0.8334, -0.4364,  0.3390},
                { 0.0280,  0.6461,  0.7628}}
    B matrix:  {{ 0.2089,  0.0000,  0.0000},
                { 0.0000,  0.0597,  0.0000},
                { 0.0000,  0.0000,  0.0422}}
    A = UB:    {{-0.1153, -0.0374,  0.0232},
                { 0.1741, -0.0260,  0.0143},
                { 0.0059,  0.0386,  0.0322}}
--------------------------------------------------
| Imageset | # indexed | # unindexed | % indexed |
--------------------------------------------------
| 0        | 5597      | 521         | 91.5%     |
--------------------------------------------------

################################################################################
Starting refinement (macro-cycle 3)
################################################################################


Summary statistics for 5593 observations matched to predictions:
------------------------------------------------------------------------
|                   | Min    | Q1       | Med        | Q3      | Max   |
------------------------------------------------------------------------
| Xc - Xo (mm)      | -2.27  | -0.02788 | -0.0007741 | 0.02872 | 2.514 |
| Yc - Yo (mm)      | -1.951 | -0.04738 | -0.00369   | 0.04655 | 1.722 |
| Phic - Phio (deg) | -8.996 | -0.0349  | 0.002666   | 0.03429 | 5.491 |
| X weights         | 215.4  | 373.7    | 394.4      | 402.8   | 405.6 |
| Y weights         | 197.5  | 347.9    | 380.2      | 399.5   | 405.6 |
| Phi weights       | 228.1  | 294.1    | 298        | 300     | 300   |
------------------------------------------------------------------------

Detecting centroid outliers using the Tukey algorithm
718 reflections have been flagged as outliers

Summary statistics for 4875 observations matched to predictions:
-------------------------------------------------------------------------
|                   | Min     | Q1       | Med       | Q3      | Max    |
-------------------------------------------------------------------------
| Xc - Xo (mm)      | -0.1284 | -0.02595 | -0.001249 | 0.02407 | 0.1794 |
| Yc - Yo (mm)      | -0.2164 | -0.04304 | -0.00369  | 0.04079 | 0.2296 |
| Phic - Phio (deg) | -0.1805 | -0.03021 | 0.004821  | 0.03308 | 0.1749 |
| X weights         | 235.9   | 380.1    | 396.5     | 403.2   | 405.6  |
| Y weights         | 197.5   | 353.9    | 383.2     | 400     | 405.6  |
| Phi weights       | 228.1   | 294.2    | 297.9     | 300     | 300    |
-------------------------------------------------------------------------

There are 16 parameters to refine against 4875 reflections in 3 dimensions

Refinement steps:
------------------------------------------------
| Step | Nref | RMSD_X   | RMSD_Y   | RMSD_Phi |
|      |      | (mm)     | (mm)     | (deg)    |
------------------------------------------------
| 0    | 4875 | 0.039855 | 0.067744 | 0.046671 |
| 1    | 4875 | 0.039901 | 0.067593 | 0.046755 |
| 2    | 4875 | 0.039938 | 0.067526 | 0.046814 |
| 3    | 4875 | 0.03995  | 0.067507 | 0.04683  |
| 4    | 4875 | 0.039948 | 0.067509 | 0.046817 |
| 5    | 4875 | 0.039941 | 0.067523 | 0.04678  |
| 6    | 4875 | 0.039931 | 0.067549 | 0.046726 |
| 7    | 4875 | 0.039925 | 0.067572 | 0.046685 |
| 8    | 4875 | 0.039923 | 0.06758  | 0.046672 |
| 9    | 4875 | 0.039923 | 0.067581 | 0.04667  |
------------------------------------------------
RMSD no longer decreasing

RMSDs by experiment:
---------------------------------------------
| Exp | Nref | RMSD_X  | RMSD_Y  | RMSD_Z   |
| id  |      | (px)    | (px)    | (images) |
---------------------------------------------
| 0   | 4875 | 0.23211 | 0.39291 | 0.23335  |
---------------------------------------------

Refined crystal models:
model 1 (5596 reflections):
Crystal:
    Unit cell: (4.7874(2), 16.7512(7), 23.7180(10), 90.0027(10), 90.0073(10), 90.0146(10))
    Space group: P 1
    U matrix:  {{-0.5519, -0.6262,  0.5507},
                { 0.8334, -0.4364,  0.3390},
                { 0.0281,  0.6460,  0.7628}}
    B matrix:  {{ 0.2089,  0.0000,  0.0000},
                { 0.0001,  0.0597,  0.0000},
                { 0.0000,  0.0000,  0.0422}}
    A = UB:    {{-0.1153, -0.0374,  0.0232},
                { 0.1741, -0.0261,  0.0143},
                { 0.0059,  0.0386,  0.0322}}
--------------------------------------------------
| Imageset | # indexed | # unindexed | % indexed |
--------------------------------------------------
| 0        | 5596      | 522         | 91.5%     |
--------------------------------------------------
Target d_min_final reached: finished with refinement
Saving refined experiments to indexed.expt
Saving refined reflections to indexed.refl

which apparently still works

@jbeilstenedmands
Copy link
Contributor Author

Okay, perhaps index is a bit special for this kind of use as it assigns new identifiers anyway, do the identifiers in the indexed.{expt,refl} files match in this case?

@graeme-winter
Copy link
Contributor

Time passes....

Grey-Area one-reindex :) $ dials.show indexed.expt |grep -i ident
Experiment identifier: 63aaf39c-4245-4b3f-817d-9c3f06a167ce
  identifier: PILATUS 2M, S/N 24-0107 Diamond

Grey-Area one-reindex :) $ dials.python
Python 2.7.15 (default, Oct 10 2018, 09:01:12) 
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from dials.array_family import flex
>>> r = flex.reflection_table.from_file("indexed.refl")
>>> help(r)

>>> r.experiment_identifiers()
<dials_array_family_flex_ext.experiment_id_map object at 0x1101837d0>
>>> for x in _:
...   print x
... 
(0, '63aaf39c-4245-4b3f-817d-9c3f06a167ce')

computer says yes

Copy link
Contributor

@graeme-winter graeme-winter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the recent changes this now behaves like I would expect (without challenging by e.g. switching order of input files around)

Unexpectedly it also works when I switch around the order of input files:

input {
  experiments = ../one/imported.expt
  experiments = ../two/imported.expt
  reflections = ../two/strong.refl
  reflections = ../one/strong.refl
}

and gives numerically identical output:

Grey-Area shuffle :) $ diff dials.index.log  ../dials.index.log 
4a5,7
> output {
>   split_experiments = True
> }
6,9c9,12
<   experiments = ../one/imported.expt
<   experiments = ../two/imported.expt
<   reflections = ../two/strong.refl
<   reflections = ../one/strong.refl
---
>   experiments = one/imported.expt
>   experiments = two/imported.expt
>   reflections = one/strong.refl
>   reflections = two/strong.refl
720a724
> Splitting experiments before output

So to me this delivers on the original promise of using unique identifiers.

I would like to also see at the least feedback from @ndevenish and @dagewa for technical and API user feedback. As an end user this does what I want and makes sense => 👍

@graeme-winter
Copy link
Contributor

... that said

dials.index (two scans input) [split_experiments=true]

then

dials.split_experiments indexed.*

Only gives you one set of indexed reflections in split_0.* and no unindexed reflections, and no reflections at all and a boom in split_1.* - when looking at these with the reciprocal lattice viewer. Looking at indexed.* behaves exactly as you would expect.

So I think split_experiments needs some more attention...

@graeme-winter
Copy link
Contributor

On this last one... I get the same behaviour with master so perhaps a red herring. Or a little red 🐛

@graeme-winter
Copy link
Contributor

Putative fix for the last one in #906

reflections = flatten_reflections(params.input.reflections)
experiments = flatten_experiments(params.input.experiments)
reflections, experiments = reflections_and_experiments_from_files(
params.input.reflections, params.input.experiments
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering about this pattern.
Is there any benefit in exposing params.input.reflections/params.input.experimentsto the application at all, or could/should this be integrated intoparams.input/the parse_args` call?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good suggestion but more wide-ranging - possibly a follow up pull request?

@dagewa
Copy link
Member

dagewa commented Sep 5, 2019

If new_refl = refl.select(refl["id"] == 0) is now unsafe, would it be reasonable to decorate the select method for reflection tables so that this now calls clean_experiment_identifiers_map() automatically afterwards?

@dagewa
Copy link
Member

dagewa commented Sep 5, 2019

dials.combine_experiments overwrites reflection id according to input order

sub_ref["id"] = flex.int(len(sub_ref), global_id)

but does nothing with experiment_identifiers. Is this a problem?

@graeme-winter
Copy link
Contributor

@dagewa probably yes... I think split and combine both give interesting and perhaps unexpected behaviour

See also #907 - I think there is more we could be doing here to keep things consistent (and outlawing experiment id -1)

@dagewa
Copy link
Member

dagewa commented Sep 5, 2019

@graeme-winter I see. I think we need to figure out this interesting/unexpected behaviour and exercise it in some new tests.

I wonder if some of these problems go away if reflection tables always kept their experiment_identifiers maps updated automatically. So, for example, as above calling clean_experiment_identifiers_map on select, but also updating this map when a reflection table id column is touched.

I'd like it if we could work with reflection tables the way we always have done with the additional bookkeeping all being done behind the scenes. I'm not worried about the replacement of flatten_reflections etc., but I am concerned about standard flex array operations leading to inconsistencies or subtle bugs.

@codecov-io
Copy link

Codecov Report

Merging #902 into master will decrease coverage by 0.05%.
The diff coverage is 79.9%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #902      +/-   ##
==========================================
- Coverage   67.33%   67.28%   -0.06%     
==========================================
  Files         608      605       -3     
  Lines       69294    69166     -128     
  Branches     8465     8454      -11     
==========================================
- Hits        46659    46537     -122     
+ Misses      20798    20793       -5     
+ Partials     1837     1836       -1
Impacted Files Coverage Δ
test/util/test_nexus.py 94.25% <ø> (ø) ⬆️
command_line/model_background.py 0% <ø> (ø) ⬆️
util/masking/__init__.py 85.27% <ø> (ø) ⬆️
libtbx_refresh.py 0% <ø> (ø) ⬆️
command_line/integrate.py 61.91% <ø> (ø) ⬆️
command_line/find_spots.py 76.47% <ø> (ø) ⬆️
command_line/find_hot_pixels.py 80.7% <ø> (ø) ⬆️
command_line/dials_import.py 64.03% <ø> (ø) ⬆️
command_line/import_stream.py 0% <ø> (ø) ⬆️
test/command_line/test_generate_mask.py 100% <ø> (ø) ⬆️
... and 61 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d91a010...461c39c. Read the comment docs.

@jbeilstenedmands
Copy link
Contributor Author

@dagewa yes I think we should decorate the reflection table select method to discreetly update the identifiers, good idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants