# Overview of the CrystFEL-based processing

This notebook comprises a workflow using CrystFEL-based tools combined with original scripts. Below is a step-by-step guide:

1. **Run Indexamajig** (`gandalf_iterator`)  
   - Perform peakfinding, indexing, and integration for each HDF5 file in the specified folder, varying beam center coordinates on a grid withing a given radius.

2. **Evaluate IQM** (`automate_evaluation`)  
   - Parse stream files for indexing quality metrics (IQMs), apply weights, and identify the best results.

3. **Merge** (`merge`)  
   - Merge the best result stream file to refine cell parameters and symmetry.

4. **Convert**  
   - **to `.hkl`** for ShelX (`convert_hkl_crystfel_to_shelx`).  
   - **to `.mtz`** for downstream crystallographic tools (`convert_hkl_to_mtz`).

Please ensure that preprocessing (peak finding, center finding and center refinement) has been done and that you have the required packages and environment set up (CrystFEL, Python packages, etc.) before proceeding.


# Run Indexamajig with options for peakfinding, indexing and integration

In [None]:
from gandalf_radial_iterator import gandalf_iterator

geomfile_path = "/Users/xiaodong/Desktop/simulations/LTA/LTAsim.geom"       # .geom file
cellfile_path = "/Users/xiaodong/Desktop/simulations/LTA/LTA.cell"          # .cell file

input_path =   "/Users/xiaodong/Desktop/simulations/LTA/simulation-24"      # .h5 folder will also be output folder

output_file_base = "LTA"    # output files will be named output_file_base_xcoord_ycoord.h5

num_threads = 8             # number of CPU threads to use
x, y = 512.5, 512.5         # initial beam center from where iterations will start

"""Define the grid and maximum radius for iterations.
As example max_radius = 1, step = 0.2 will give 81 iterations.
Iterations will start at the center and move radially outwards.
"""
max_radius = 1              # maximum radius in pixels
step = 0.2                  # grid granularity in pixels

extra_flags=[
# PEAKFINDING
"--no-revalidate",
"--no-half-pixel-shift",
"--peaks=cxi", 
"--min-peaks=15",
# INDEXING
"--indexing=xgandalf",
"--tolerance=10,10,10,5",
"--no-refine",
"--xgandalf-sampling-pitch=5",
"--xgandalf-grad-desc-iterations=1",
"--xgandalf-tolerance=0.02",
# INTEGRATION
"--integration=rings",
"--int-radius=4,5,9",
"--fix-profile-radius=70000000",
# OUTPUT
"--no-non-hits-in-stream",
]

"""Examples of extra flags(see crystfel documentation https://www.desy.de/~twhite/crystfel/manual-indexamajig.html):"""

""" Basic options
"--highres=n",
"--no-image-data",
"""

""" Peakfinding
"--peaks=cxi",
"--peak-radius=inner,middle,outer",
"--min-peaks=n",
"--median-filter=n",
"--filter-noise",
"--no-revalidate",
"--no-half-pixel-shift",

"--peaks=peakfinder9",
"--min-snr=1",
"--min-snr-peak-pix=6",
"--min-snr-biggest-pix=1",
"--min-sig=9",
"--min-peak-over-neighbour=5",
"--local-bg-radius=5",

"--peaks=peakfinder8",
"--threshold=45",
"--min-snr=3",
"--min-pix-count=3",
"--max-pix-count=500",
"--local-bg-radius=9",
"--min-res=30",
"--max-res=500",
"""

""" Indexing
"--indexing=xgandalf",

"--tolerance=tol"
"--no-check-cell",
"--no-check-peaks",
"--multi",
"--no-retry",
"--no-refine",

"--xgandalf-sampling-pitch=n"
"--xgandalf-grad-desc-iterations=n"
"--xgandalf-tolerance=n"
"--xgandalf-no-deviation-from-provided-cell"
"--xgandalf-max-lattice-vector-length=n"
"--xgandalf-min-lattice-vector-length=n"
"--xgandalf-max-peaks=n"
"--xgandalf-fast-execution"
"""

""" Integration
"--fix-profile-radius=n",
"--fix-divergence=n",
"--integration=rings",
"--int-radius=4,5,10",
"--push-res=n",
"--overpredict",
"--cell-parameters-only",
"""

""" Output
"--no-non-hits-in-stream",
"--no-peaks-in-stream",
"--no-refls-in-stream",
"--serial-offset
"""

gandalf_iterator(x, y, geomfile_path, cellfile_path, input_path, output_file_base, num_threads, max_radius=max_radius, step=step, extra_flags=extra_flags)


# Evaluate the IQM with chosen weights for all frames across all index results

In [2]:
from automate_evaluation import automate_evaluation

# Enter folder with stream file results from indexamajig. 
# Note that ALL stream files in the folder will be processed.
stream_file_folder = "/Users/xiaodong/Desktop/simulations/LTA/simulation-24/xgandalf_iterations_max_radius_1_step_0.2" 

weights_list = [
    (1, 1, 1, 1, 1, 1)
]

"""
Each weight corresponds to one of the six metrics used in calculating the combined IQM value.
The combined IQM is computed by first normalizing each metric across all stream files, then 
multiplying each normalized metric by its assigned weight, and finally summing the results.
The order (or keys) of the weights must match the following metrics:

- 'weighted_rmsd'
- 'fraction_outliers'
- 'length_deviation'
- 'angle_deviation'
- 'peak_ratio'
- 'percentage_indexed'

Multiple weight combinations can be specified if needed.
"""

automate_evaluation(stream_file_folder, weights_list, indexing_tolerance=2)

Evaluating multiple stream files with weights: (1, 1, 1, 1, 1, 1)


Processing chunks in LTA_-513.3_-512.1.stream: 100%|██████████| 100/100 [00:04<00:00, 20.63chunk/s]
Processing chunks in LTA_-513.3_-513.1.stream: 100%|██████████| 100/100 [00:05<00:00, 19.62chunk/s]
Processing chunks in LTA_-513.1_-511.9.stream: 100%|██████████| 100/100 [00:05<00:00, 17.96chunk/s]
Processing chunks in LTA_-512.9_-512.1.stream: 100%|██████████| 100/100 [00:08<00:00, 11.28chunk/s]
Processing chunks in LTA_-512.9_-513.1.stream: 100%|██████████| 100/100 [00:10<00:00,  9.93chunk/s]
Processing chunks in LTA_-512.7_-512.9.stream: 100%|██████████| 100/100 [00:11<00:00,  8.45chunk/s]
Processing chunks in LTA_-512.3_-512.9.stream: 100%|██████████| 100/100 [00:12<00:00,  8.28chunk/s]
Processing chunks in LTA_-512.7_-512.5.stream: 100%|██████████| 100/100 [00:12<00:00,  7.79chunk/s]
Processing chunks in LTA_-513.3_-512.3.stream: 100%|██████████| 100/100 [00:07<00:00, 12.84chunk/s]
Processing chunks in LTA_-512.1_-511.7.stream: 100%|██████████| 100/100 [00:04<00:00, 21.95chunk/s]


Combined metrics CSV written to /Users/xiaodong/Desktop/simulations/LTA/simulation-24/xgandalf_iterations_max_radius_1_step_0.2/metric_values_IQM_1_1_1_1_1_1.csv
Best results stream file written to /Users/xiaodong/Desktop/simulations/LTA/simulation-24/xgandalf_iterations_max_radius_1_step_0.2/merged_IQM_1_1_1_1_1_1.stream


# Merge the best results stream file

In [3]:
from merge import merge

stream_file = "/Users/xiaodong/Desktop/simulations/LTA/simulation-24/xgandalf_iterations_max_radius_1_step_0.2/merged_IQM_1_1_1_1_1_1.stream"
pointgroup = "m-3m"
num_threads = 24
iterations = 5

output_dir = merge(
    stream_file,
    pointgroup=pointgroup,
    num_threads=num_threads,
    iterations=iterations,
)

if output_dir is not None:
    print("Merging done. Results are in:", output_dir)

Running partialator for stream file: /Users/xiaodong/Desktop/simulations/LTA/simulation-24/xgandalf_iterations_max_radius_1_step_0.2/merged_IQM_1_1_1_1_1_1.stream


Partialator Progress:   0%|          | 0/7 [00:00<?, ?Residual/s]

Partialator Progress: 100%|██████████| 7/7 [00:01<00:00,  6.78Residual/s]

Partialator completed for stream file: /Users/xiaodong/Desktop/simulations/LTA/simulation-24/xgandalf_iterations_max_radius_1_step_0.2/merged_IQM_1_1_1_1_1_1.stream
Merging done. Results are in: /Users/xiaodong/Desktop/simulations/LTA/simulation-24/xgandalf_iterations_max_radius_1_step_0.2/merged_IQM_1_1_1_1_1_1_merge_5_iter





# Convert to shelx compatible .hkl

In [2]:
from convert_hkl_crystfel_to_shelx import convert_hkl_crystfel_to_shelx 
output_dir = "/Users/xiaodong/Desktop/simulations/LTA/simulation-24/xgandalf_iterations_max_radius_1_step_0.2/merged_IQM_1_1_1_1_1_1_merge_5_iter"
convert_hkl_crystfel_to_shelx(output_dir)

[INFO] Converting crystfel.hkl to shelx.hkl in directory: /Users/xiaodong/Desktop/simulations/LTA/simulation-24/xgandalf_iterations_max_radius_1_step_0.2/merged_IQM_1_1_1_1_1_1_merge_5_iter
[INFO] Conversion to shelx.hkl completed successfully in: /Users/xiaodong/Desktop/simulations/LTA/simulation-24/xgandalf_iterations_max_radius_1_step_0.2/merged_IQM_1_1_1_1_1_1_merge_5_iter/shelx


# Convert to mtz

In [3]:
from convert_hkl_to_mtz import convert_hkl_to_mtz
cellfile_path = "/Users/xiaodong/Desktop/simulations/LTA/LTA.cell"  # If defined above comment out this line
convert_hkl_to_mtz(output_dir, cellfile_path=cellfile_path)

[INFO] Converting crystfel.hkl to output.mtz in directory: /Users/xiaodong/Desktop/simulations/LTA/simulation-24/xgandalf_iterations_max_radius_1_step_0.2/merged_IQM_1_1_1_1_1_1_merge_5_iter
[INFO] Conversion to output.mtz completed successfully in: /Users/xiaodong/Desktop/simulations/LTA/simulation-24/xgandalf_iterations_max_radius_1_step_0.2/merged_IQM_1_1_1_1_1_1_merge_5_iter
