Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dials.export: Add option to specify composition of asymmetric unit so a complete SHELX instruction file can be written. #2623

Merged
merged 8 commits into from Apr 16, 2024
1 change: 1 addition & 0 deletions newsfragments/2623.feature
@@ -0,0 +1 @@
``dials.export``: Add option to specify composition of asymmetric unit for SHELX ``.ins`` file output.
13 changes: 13 additions & 0 deletions src/dials/command_line/export.py
Expand Up @@ -45,6 +45,11 @@
XDS format exports a models.expt file as XDS.INP and XPARM.XDS files. If a
reflection file is given it will be exported as a SPOT.XDS file.

SHELX format exports intensity data in HKLF 4 format for use in the SHELX suite
of programs. As this file format does not contain unit cell parameters or
symmetry information a minimal instruction file is also written. Optionally the
expected contents of the asymmetric unit can be added to this instruciton file.

PETS format exports intensity data and diffraction data in the CIF format
used by PETS. This is primarily intended to produce files suitable for
dynamic diffraction refinement using Jana2020, which requires this format.
Expand Down Expand Up @@ -72,6 +77,11 @@
dials.export indexed.refl format=xds
dials.export models.expt format=xds
dials.export models.expt indexed.refl format=xds

# Export to shelx
dials.export scaled.expt scaled.refl format=shelx
dials.export scaled.expt scaled.refl format=shelx shelx.hklout=dials.hkl
dials.export scaled.expt scaled.refl format=shelx composition=C3H7NO2S
"""

phil_scope = parse(
Expand Down Expand Up @@ -249,6 +259,9 @@
ins = dials.ins
.type = path
.help = "The output ins file"
composition = CH
.type = str
.help = "The chemical composition of the asymmetric unit"
scale = True
.type = bool
.help = "Scale reflections to maximise output precision in SHELX 8.2f format"
Expand Down
38 changes: 37 additions & 1 deletion src/dials/util/export_shelx.py
@@ -1,9 +1,11 @@
from __future__ import annotations

import logging
import re
from math import isclose

from cctbx import crystal, miller
from cctbx.eltbx import chemical_elements
from iotbx.shelx.write_ins import LATT_SYMM

from dials.algorithms.scaling.scaling_library import determine_best_unit_cell
Expand Down Expand Up @@ -96,12 +98,13 @@ def export_shelx(scaled_data, experiment_list, params):
_write_ins(
experiment_list,
best_unit_cell=params.mtz.best_unit_cell,
composition=params.shelx.composition,
ins_file=params.shelx.ins,
)
logger.info(f"Written {params.shelx.ins}")


def _write_ins(experiment_list, best_unit_cell, ins_file):
def _write_ins(experiment_list, best_unit_cell, composition, ins_file):
sg = experiment_list[0].crystal.get_space_group()
unit_cells = []
wavelengths = []
Expand Down Expand Up @@ -167,3 +170,36 @@ def _write_ins(experiment_list, best_unit_cell, ins_file):
)
)
LATT_SYMM(f, sg)
logger.info(
f"Using composition {composition} to write SFAC & UNIT instructions"
)
sorted_composition = _parse_compound(composition)
f.write("SFAC %s\n" % " ".join(sorted_composition))
f.write(
"UNIT %s\n"
% " ".join(
[str(sorted_composition[e] * sg.order_z()) for e in sorted_composition]
)
)
f.write("HKLF 4\n") # Intensities
f.write("END\n")


def _parse_compound(composition):
elements = chemical_elements.proper_caps_list()[:94]
m = re.findall(r"([A-Z][a-z]?)(\d*)\s*", composition)
dagewa marked this conversation as resolved.
Show resolved Hide resolved
if all(e[1] == "" for e in m):
# Assume user has just supplied list of elements so UNIT instruction cannot be calculated
result = {element: 0 for element, count in m}
else:
result = {element: int(count or 1) for element, count in m}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thought occurs to me. Does SHELX input support non-stoichiometric formulae? Should int be float?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems it is not an error for shelxl, but it would be unconventional. I haven't found any examples of non-integer values after UNIT online. Wouldn't this just be represented by the occupancy column for the atom?

As for shelxt, it ignores what's in UNIT entirely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UNIT is element count in unit cell not asu. I was multiplying the formula entered by Z. Quite common to have fractions of the formula in the asu but fractions of an atom in the unit cell ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep this as int. Convenient to have the composition multiplied by Z 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I now understand @benjaminhwilliams's comment. Take an example like CCDC 1880252. In that case I think with the current code you'd want to write 'C6 H5 B0.25 F P0.25' if you scaled the data in I-4. So maybe we should have float here and convert to int when writing the UNIT card? Or allow users to set Z (as in _cell_formula_units_Z) to override space_group().order_z()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, gotcha. Allow float input and convert to int makes sense to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking some more about this probably the best approach is to not allow non-integer values but add a parameter shelx.zprime so for the example above the user would enter 'C24 H20 B F4 P' and zprime=0.25 and get Z = 2.

In that case should shelx.composition be renamed shelx.formula? or should we have both - where shelx.composition is just element types and shelx.formula is always interpreted as a formula even when there no numbers.

# Check for elements without scattering factors in SHELXL
unmatched = list(set(result).difference(elements))
if unmatched:
raise Sorry(f"Element(s) {' '.join(unmatched)} in {composition} not recognised")

if "C" in result:
# move C to start of list
elements.insert(0, elements.pop(5))
sorted_result = {e: result[e] for e in elements if e in result}
return sorted_result
33 changes: 33 additions & 0 deletions tests/command_line/test_export.py
Expand Up @@ -584,6 +584,39 @@ def test_shelx_ins_best_unit_cell(dials_data, tmp_path):
assert result == pytest.approx(cell_esds[instruction], abs=0.001)


def test_shelx_ins_composition(dials_data, tmp_path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth adding a test which actually executes shelxt if available to verify?

# Call dials.export
result = subprocess.run(
[
shutil.which("dials.export"),
"intensity=scale",
"format=shelx",
"composition=C3H7NO2S",
dials_data("l_cysteine_4_sweeps_scaled", pathlib=True)
/ "scaled_20_25.expt",
dials_data("l_cysteine_4_sweeps_scaled", pathlib=True)
/ "scaled_20_25.refl",
],
cwd=tmp_path,
capture_output=True,
)
assert not result.returncode and not result.stderr
assert (tmp_path / "dials.ins").is_file()

sfac_unit = {
"SFAC": "C H N O S",
"UNIT": "12 28 4 8 4",
}

with (tmp_path / "dials.ins").open() as fh:
for line in fh:
tokens = line.split()
instruction = tokens[0]
if instruction in sfac_unit:
result = " ".join(tokens[1:6])
assert result == sfac_unit[instruction]


def test_export_sum_or_profile_only(dials_data, tmp_path):
expt = dials_data("insulin_processed", pathlib=True) / "integrated.expt"
refl = dials_data("insulin_processed", pathlib=True) / "integrated.refl"
Expand Down