Skip to content

Quick start with experimental data

Kartik Ayyer edited this page Feb 25, 2020 · 10 revisions

Here is a tutorial for analyzing experimental data. We recommend doing at least one simulated reconstruction before doing this tutorial. You can start here to do a quick reconstruction.

We will first get the data from the CXIDB, generate the configuration file, convert the data and then run the reconstruction.

Setting things up

As in the simulation case, we recommend creating a reconstruction directory to keep things separated. You can do this easily with the following script from the root directory:

./dragonfly_init -t spi

This should create a folder called spi_0001 and compile the various executables. In the next few sections, we will create the configuration file, replacing the default config.ini.

Data source

We will work with data collected at the AMO end-station of the Linac Coherent Light Source (LCLS). This data was collected as part of an Single Particle Imaging (SPI) Initiative experiment in July/August 2015. The data has been published as Reddy, et al. Scientific Data 4, 170079 (2017).

As a first step, you will need to download the single hits from CXIDB. Follow the hdf5 link to download the 13 HDF5 files into a folder. Within each of these files, the dataset photonConverter/pnccdBack/photonCount contains photon converted data of the 4x4 down-sampled pnCCD detector.

Experimental geometry

The experimental parameters section of the configuration file is given below:

[parameters]
detd = 586
lambda = 7.75
detsize = 260 257
pixsize = 0.3
stoprad = 40
ewald_rad = 650.
polarization = x

Most of the parameters are self-explanatory. For units and other details, see the configuration file page. The only additional parameter not seen in the simulation examples is ewald_rad. This is the radius of curvature of the Ewald sphere in voxels. The size of the 3D grid is determined by the distance (in voxels) of the highest resolution detector pixel. The value of 650 was chosen to get a reasonable oversampling and this generates a 3D volume of 125x125x125 voxels.

Detector file

Here is the make_detector section of the configuration file

[make_detector]
in_mask_file = aux/mask_pnccd_back_260_257.byt
out_detector_file = data/det_pnccd_back.dat

In order to generate the detector, we will use a custom mask file representing the different pixel types. After adding this section to the configuration file, run ./utils/make_detector.py in the reconstruction folder.

Data conversion

In order to convert the HDF5 data to emc files, run the following command for each file:

./utils/convert/h5toemc.py -d photonConverter/pnccdBack/photonCount <HDF5_file>

This will create a file in the data folder with the same filename (except extension) as the HDF5 file. To see other options, run ./utils/convert/h5toemc.py -h. Other data and geometry conversion utilities are available in the utils/convert/ folder, but they are not necessary for this data set. You can look at the data using the frame viewer (./utils/frameviewer.py). Click 'Random' a few times to see how things look.

The emc file header contains the total number of pixels in the detector, which this conversion script gets from the configuration file. In order to avoid warnings during the reconstruction, the [parameters] section of the configuration file must be made before converting the data.

EMC parameters

Here is the emc section of the configuration file:

[emc]
in_photons_list = amo86615.txt
in_detector_file = make_detector:::out_detector_file
output_folder = data/
log_file = EMC.log
num_div = 10
need_scaling = 1
beta = 0.001
beta_schedule = 1.41421356 10

First, the in_photon_list file needs to be created. This is just a text file with all the emc file locations i.e.

data/amo86615_182_PR772_single.emc
data/amo86615_183_PR772_single.emc
...

The other new parameters are beta and beta_schedule. The β parameter is described in detail in the Dragonfly paper. In a nutshell, the orientation probablility distribution is raised to the power of β, and the beta_schedule parameter is the so-called Deterministic Annealing schedule where β is multiplied by a factor of sqrt(2) every 10 iterations. These measures aid in smooth convergence when the signal is high, which it very much is in this data set.

Running and monitoring

On a single computer, one can just run

./emc 100

However, since there is a lot of data, this may be quite slow. We would recommend running this at your friendly neighborhood cluster with MPI. On 4 32-thread nodes at CFEL, the time per iteration goes from around 800 s at the beginning to around 240 s for the 100th iteration.

You can monitor the reconstruction using the autoplot GUI described here.

Check your work

Here is our output after 100 iterations: