## Part 1: Indexing and Geometry Refinement

The first step in data processing is to determine the diffraction geometry and crystal orientation by analyzing the locations of strong Bragg peaks. These steps are performed using [DIALS](https://dials.github.io), a suite of programs for integrating and scaling Bragg data. For our purposes, we will only use indexing and geometry refinement tools. Normally these steps would be followed by integration and scaling, but we will skip them to save time. For a more complete *DIALS* tutorial, see [Processing in Detail](https://dials.github.io/documentation/tutorials/processing_in_detail_betalactamase.html).

Typically *DIALS* programs are run from the command line. Here we run *DIALS* from within the notebook using an exclamation point to execute each command in a shell. To run the command, click in the cell and press shift-Enter.

### Metadata import

The image file headers contain metadata specifying the experimental geometry such as X-ray wavelength, detector distance, oscillation increment per image. The command `dials.import` reads the image headers and outputs a json-formatted text file `imported.expt` containing the relevant metadata. Run the following command to import the insulin dataset:

In [1]:
!dials.import images/insulin_2_1

DIALS (2018) Acta Cryst. D74, 85-97. https://doi.org/10.1107/S2059798317017235
DIALS 3.11
The following parameters have been modified:

input {
  experiments = <image files>
}

--------------------------------------------------------------------------------
  format: <class 'dxtbx.format.FormatCBFMiniPilatusCHESS_6MSN127.FormatCBFMiniPilatusCHESS_6MSN127'>
  num images: 500
  sequences:
    still:    0
    sweep:    1
  num stills: 0
--------------------------------------------------------------------------------
Writing experiments to imported.expt
[0m

The imported geometry can be examined jupyter lab by double-clicking *imported.expt* in the file browser.

`dials.show` can also be used to print the parameters as follows

In [2]:
!dials.show imported.expt

DIALS (2018) Acta Cryst. D74, 85-97. https://doi.org/10.1107/S2059798317017235
The following parameters have been modified:

input {
  experiments = imported.expt
}

Experiment 0:
Experiment identifier: 2ec933bd-b255-a677-6cda-57c5eb4d2ddd
Image template: /nfs/chess/user/spm82/mdx/meth-enzymol-tutorial/images/insulin_2_1/insulin_2_1_####.cbf.bz2
Detector:
Panel:
  name: Panel
  type: SENSOR_PAD
  identifier: 
  pixel_size:{0.172,0.172}
  image_size: {2463,2527}
  trusted_range: {-1,1.0098e+06}
  thickness: 1
  material: Si
  mu: 3.92851
  gain: 1
  pedestal: 0
  fast_axis: {1,0,0}
  slow_axis: {0,-1,0}
  origin: {-217.491,213.713,-260.05}
  distance: 260.05
  pixel to millimeter strategy: ParallaxCorrectedPxMmStrategy
    mu: 3.92851
    t0: 1


Max resolution (at corners): 1.155825
Max resolution (inscribed):  1.485702

Beam:
    wavelength: 0.9768
    sample to source direction : {0,0,1}
    divergence: 0
    sigma divergence: 0
    polarization normal: {0,1,0}
    polarization fract

### Beamstop mask

The beamstop mask must also be generated and added to the `imported.expt`. A beamstop mask is not essential for Bragg data processing, but it is important to create one for later import into *mdx2*. The mask can be drawn using the graphical image viewer that comes with *DIALS*:

```bash
dials.image_viewer imported.expt
```

Alternatively, the following will generate an appropriate circular mask for the insulin dataset using `dials.generate_mask` and `dials.apply_mask`.

In [3]:
!dials.generate_mask imported.expt untrusted.circle=1264,1242,50
!dials.apply_mask imported.expt input.mask=pixels.mask output.experiments=imported.expt

The following parameters have been modified:

untrusted {
  circle = 1264 1242 50
}
input {
  experiments = imported.expt
}

DIALS (2018) Acta Cryst. D74, 85-97. https://doi.org/10.1107/S2059798317017235
Writing mask to pixels.mask
[0mThe following parameters have been modified:

input {
  mask = "pixels.mask"
  experiments = imported.expt
}
output {
  experiments = "imported.expt"
}

Writing experiments to imported.expt
[0m

### Spotfinding

Next, `dials.find_spots` will read all of the images and locate Bragg peaks. This requires reading in all of the diffraction images, and it can take a while depending on computational resources.

In [4]:
!dials.find_spots imported.expt

DIALS (2018) Acta Cryst. D74, 85-97. https://doi.org/10.1107/S2059798317017235
DIALS 3.11
The following parameters have been modified:

input {
  experiments = imported.expt
}

Setting spotfinder.filter.min_spot_size=3
Configuring spot finder from input parameters
--------------------------------------------------------------------------------
Finding strong spots in imageset 0
--------------------------------------------------------------------------------

Finding spots in image 1 to 500...
Setting nproc=1
Setting chunksize=20
Extracting strong pixels from images
 Using multiprocessing with 1 parallel job(s)

Found 1600 strong pixels on image 1
Found 1569 strong pixels on image 2
Found 1304 strong pixels on image 3
Found 1484 strong pixels on image 4
Found 1568 strong pixels on image 5
Found 1653 strong pixels on image 6
Found 1766 strong pixels on image 7
Found 1719 strong pixels on image 8
Found 1966 strong pixels on image 9
Found 1949 strong pixels on image 10
Found 1729 strong pi

### Indexing

The indexing step assigns Miller indices to each diffraction spot. When processing an unknown crystal, it is usually necessary to re-index the data after space group determination. Here, the space group is known (the insulin crystal is I 2<sub>1</sub>3, space group number 199), and we can avoid having to re-index later by passing this information to `dials.index`.

In [5]:
!dials.index imported.expt strong.refl space_group=199

DIALS (2018) Acta Cryst. D74, 85-97. https://doi.org/10.1107/S2059798317017235
DIALS 3.11
The following parameters have been modified:

indexing {
  known_symmetry {
    space_group = "I 21 3"
  }
}
input {
  experiments = imported.expt
  reflections = strong.refl
}

Found max_cell: 74.2 Angstrom
Setting d_min: 1.45
FFT gridding: (256,256,256)
Number of centroids used: 50261
Candidate solutions:
+-------------------------------------+----------+----------------+------------+-------------+-------------------+-----------+-----------------+-----------------+
| unit_cell                           |   volume |   volume score |   #indexed |   % indexed |   % indexed score |   rmsd_xy |   rmsd_xy score |   overall score |
|-------------------------------------+----------+----------------+------------+-------------+-------------------+-----------+-----------------+-----------------|
| 68.61 68.61 68.61 109.5 109.5 109.5 |   248622 |           0.02 |      48839 |          97 |              0.04

For diffuse scattering, it's important to verify that all (or nearly all) peaks are indexed. There might be split Bragg reflections, multiple lattices, or salt crystals contaminating the signal. Or, the crystal might have slipped during data collection.

Check that indexing was successful by examining the *dials.index* output.

Find the table with "% indexed".

```
+------------+-------------+---------------+-------------+
|   Imageset |   # indexed |   # unindexed | % indexed   |
|------------+-------------+---------------+-------------|
|          0 |       50694 |           417 | 99.2%       |
+------------+-------------+---------------+-------------+
```

The fraction of indexed peaks should be close to 100%. A small percentage may suggest incorrect experimental geometry, multiple lattices, twinning, salt or ice diffraction, or other issues. The indexing results should be inspected by running `dials.image_viewer indexed.expt`. In our case, 99.2 percent of strong peaks were indexed (there were 417 unindexed peaks and 50694 indexed peaks).

Find the table called "RMSDs by experiment".

```
RMSDs by experiment:
+-------+--------+----------+----------+------------+
|   Exp |   Nref |   RMSD_X |   RMSD_Y |     RMSD_Z |
|    id |        |     (px) |     (px) |   (images) |
|-------+--------+----------+----------+------------|
|     0 |   4999 |  0.21024 |   0.2213 |    0.20059 |
+-------+--------+----------+----------+------------+
```

This table includes the root mean square displacement (RMSD) between the predicted and observed spot centroids. If the RMSDs are less than ~1, this indicates that the indexing solution was successful and a single set of geometric parameters can describe the entire dataset accurately. However, if the crystal slips slightly during data collection or the lattice constants change, the spot predictions can be improved by fitting a scan-varying geometric model.

### Geometry refinement

Next we use `dials.refine` to fit a scan-varying model of the crystal geometry. This can be important if the lattice constants change due to radiation damage, or if the crystal is not held rigidly in the loop and slips during data collection.

In [6]:
!dials.refine indexed.expt indexed.refl

DIALS (2018) Acta Cryst. D74, 85-97. https://doi.org/10.1107/S2059798317017235
DIALS 3.11
The following parameters have been modified:

input {
  experiments = indexed.expt
  reflections = indexed.refl
}

Configuring refiner
Setting outlier.nproc=1

Summary statistics for 50635 observations matched to predictions:
+-------------------+---------+----------+-----------+---------+--------+
|                   |     Min |       Q1 |       Med |      Q3 |    Max |
|-------------------+---------+----------+-----------+---------+--------|
| Xc - Xo (mm)      | -0.3512 | -0.02151 |  0.002323 | 0.02393 | 0.4996 |
| Yc - Yo (mm)      | -0.3488 | -0.02623 |  0.001705 | 0.02626 | 0.3831 |
| Phic - Phio (deg) |  -0.958 | -0.01668 | -0.001109 |  0.0151 |  1.247 |
| X weights         |   251.7 |    386.5 |     401.2 |   404.9 |  405.6 |
| Y weights         |   239.2 |    372.6 |     395.7 |   403.8 |  405.6 |
| Phi weights       |   871.2 |     1199 |      1200 |    1200 |   1200 |
+-----------------

To see if the spot predictions improved, find the table "RMSDs by experiment" in the text output:

```
RMSDs by experiment:
+-------+--------+----------+----------+------------+
|   Exp |   Nref |   RMSD_X |   RMSD_Y |     RMSD_Z |
|    id |        |     (px) |     (px) |   (images) |
|-------+--------+----------+----------+------------|
|     0 |  45511 |  0.17561 |  0.18017 |    0.19491 |
+-------+--------+----------+----------+------------+
```

Compared with the corresponding table from `dials.index`, the RMSDs have improved as we expected. However the improvement is modest. In our case, the indexing solution already had small RMSDs and refinement was not strictly necessary.

### Diagnostic statistics and plots

The `dials.report` function generates an html file with interactive plots. The report should be examined closely to catch any issues that would potentially complicate analysis of the diffuse scattering. 

In [7]:
!dials.report refined.expt refined.refl

DIALS (2018) Acta Cryst. D74, 85-97. https://doi.org/10.1107/S2059798317017235
The following parameters have been modified:

input {
  experiments = refined.expt
  reflections = refined.refl
}

Analysing strong spots
 Selected 51111 strong reflections.........................................0.01s
Analysing reflection centroids
 Selected 45511 refined reflections........................................0.01s
 Analysing centroid differences with I/Sigma > 0
 Analysing centroid differences in x/y with I/Sigma > 0
 Analysing centroid differences in z with I/Sigma > 0
 Analysing centroid differences vs phi with I/Sigma > 0
Analysing reflection intensities
 Selecting only integrated reflections...Analysing reference profiles
 Skipping: following required fields not present:
  intensity.prf.value
  intensity.prf.variance
  profile.correlation
Analysing scan-varying crystal model
Writing html report to: dials.report.html
[0m

Open the file `dials.report.html` in an internet browser (the file can also be opened by double-clicking in Jupyter Lab, however certain elements may not render correctly; if you are unable to expand the menus, click "Trust HTML" in the upper left corner of the window).

Expand the tab called "Analysis of scan varying model". These plots should be checked to see whether the unit cell axes change significantly during data collection. Abrupt changes in the orientation parameters may indicate that the crystal slipped during data collection.

Expand the tab called "Analysis of reflection centroids". Ideally, the error in X, Y positions should appear random across the detector face. In our dataset, the errors cluster in rectangular blocks corresponding to the detector modules. If sub-pixel accuracy is desired, the module alignments can be calibrated. Next, examine the plots of spot centroid errors. The spread of centroids around the mean should show a compact peak. If there are multiple peaks observed, then the crystal may be twinned or multiple lattice may be present (e.g. due to cracking).

### Background image metadata

An additional dataset was acquired for background subtraction in _mdx2_. We'll use `dials.import` and `dials.apply_mask` to create a metadata file `background.expt` that will be used in Part 4 of this tutorial.

In [8]:
!dials.import images/insulin_2_bkg output.experiments=background.expt
!dials.apply_mask background.expt input.mask=pixels.mask output.experiments=background.expt

DIALS (2018) Acta Cryst. D74, 85-97. https://doi.org/10.1107/S2059798317017235
DIALS 3.11
The following parameters have been modified:

output {
  experiments = "background.expt"
}
input {
  experiments = <image files>
}

--------------------------------------------------------------------------------
  format: <class 'dxtbx.format.FormatCBFMiniPilatusCHESS_6MSN127.FormatCBFMiniPilatusCHESS_6MSN127'>
  num images: 50
  sequences:
    still:    0
    sweep:    1
  num stills: 0
--------------------------------------------------------------------------------
Writing experiments to background.expt
[0mThe following parameters have been modified:

input {
  mask = "pixels.mask"
  experiments = background.expt
}
output {
  experiments = "background.expt"
}

Writing experiments to background.expt
[0m