Skip to content

PyTomGUI Tutorial

McHaillet edited this page Jan 11, 2023 · 158 revisions

Table of Contents

  1. Introduction

  2. Data

  3. Initialization

  4. Stage: Data transfer

    4.1 Starting a new project

    4.2 Job submission settings

    4.3 Collecting data

    4.4 Motion correction

  5. Stage: Tomographic reconstruction

    5.1 Selecting tilt-series

    5.2 Creating a markerfile

    5.3 Aligning tilt-series

    5.4 CTF correction

    5.5 Reconstructing tomograms

  6. Stage: Particle picking

    6.1 Template matching

    6.2 Removing mismatches and false positives

  7. Stage: Subtomogram analysis

    7.1 Reconstructing subtomograms

    7.2 Aligning subtomograms

    7.3 Classifying subtomograms

    7.4 Re-aligning after classification

    7.5 Improving resolution by limiting tilt angles

    7.6 Tight mask for determining resolution

  8. Conclusion

1. Introduction

(Back to top)

The aim of this tutorial is to introduce you to the computational workflow in cryo-electron tomography (Cryo-ET). Cryo-ET is an imaging technique used to obtain high-resolution three-dimensional images of biological objects such as macromolecules and cells in their near native environment. Samples are tilted as they are imaged, resulting is a set of 2D images (tilt series), that can be combined to form a three-dimensional (3D) reconstruction (figure). In Cryo-ET samples are immobilized in non-crystalline ice (imaged at temperatures below -150°C) allowing them to be imaged without dehydration or chemical fixation, processes which could disrupt or distort biological structures.

figure: Schematic of the electron tomography setup. The sample is tilted between -60° and +60°. The recorded Tilt-series images are combined into a tomographic reconstruction. From: Eikos, Own work, CC BY-SA 4.0

In the first step of the tutorial you will reconstruct several three-dimensional tomograms from measured tilt series. Using quantitative measures such as Fourier Shell Correlation (FSC) the resolution of a raw tomogram barely exceeds 50 Å – nevertheless, much higher resolution information will be present in your tomograms, but difficult to distinguish from the background – if you squint you tend to see quite some detail beyond this level. Larger macromolecular complexes can be identified visually in tomograms.

The limit on the resolution of the tomographic reconstruction is due to the maximal dose the sample tolerates. In order to extend the resolution, identical parts of the tomogram can be averaged. This is called subtomogram averaging. Resolutions beyond 4 Å have been achieved this way. In the second part of the tutorial you will select regions of interest within a reconstructed tomogram, and you will align the sub-tomograms to obtain a higher resolution structure of the object of interest. In many cases, the sub-tomograms will be structurally heterogeneous because they correspond to different states of the molecule under scrutiny. Finally, you will 'classify' the subtomograms into structurally more homogeneous bins, allowing you to analyze major structural differences in the data.

2. Data

(Back to top)

In this tutorial we will process four tilt series that are described in Khoshouei et al. (Journal of Structural Biology, 2016). The sample in the tilt series are purified mammalian 80S ribosomes. The data were recorded using SerialEM on a K2 detector in counting mode, using varying defocus values. The dataset EMPIAR-10064 is available from the Electron Microscopy Database (EMDB). You can download the data by following the link. Specifically, we will use the four mixed tilt series that have mixedCTEM as a prefix. The tilt series have already been motion corrected and we will therefore skip this step in the processing pipeline. For completeness, we do give a description of how to apply it for uncorrected datasets.

3. Initialization

(Back to top)

We assume you have installed PyTomGUI (and its dependencies) at this point. For installation notes, see the installation page.

To run PyTomGUI we need to make sure the conda environment is loaded and, if you need to run motion correction or ctf correction, that imod and motioncor2 are on your path. Activating pytom is done by running conda activate pytom_env. On our cluster we work with modules, and there you will need to run the following lines to activate PyTomGUI (where latest should be replaced by the pytom version number):

module load imod/4.10.25 motioncor2/1.2.1 miniconda/pytom/[latest]
condaactivate

4. Stage: Data transfer

(Back to top)

The four tilt series are downloaded as mrc-stacks, which is not a default datatype of the PyTomGUI. In order for PyTomGUI to be able to incorporate the tilt series, each tilt image has to be saved as a mrc file. Furthermore, a meta file has to be generated. In the tutorial folder within PyTom you find a script named mrcs2mrc.py that helps you with this. Before running create an output directory to unpack the tilts (e.g. mkdir [YOUR_DIR_NAME]), all stacks can be unpacked to the same directory. Please run the script on the downloaded files with mixed defocus as follows:

pytom /path/to/pytom/installation/tutorials/tutorial_1/mrcs2mrc.py -f /path/to/[SOME_STACK.mrc] -t /path/to/[YOUR_DIR_NAME]

Execute this command individually for each of the following downloaded mrc stacks:

mixedCTEM_tomo1.mrc 
mixedCTEM_tomo2.mrc 
mixedCTEM_tomo3.mrc 
mixedCTEM_tomo4.mrc

Take into account that they need to be preceded by the path to their directory.

After executing the script you will find in the target directory a meta file (in mdoc format) and all the separate tilt images saved as mrcs. We need this directory in the next step for transferring the data to our PyTomGUI project.

4.1 Starting a new project

(Back to top)

Now we are ready to start PyTomGUI. Open the GUI by executing in terminal:

pytomGUI

Before we see the main view, we first need to create a new project (or open an existing one).

Go to the Project dropdown menu and click New

A popup screen will show and will ask for a project name as well as were to save it. Importantly, you need to select the folder first.

Click Browse to select a place to store the project

Give the project a name (e.g. practice) by extending the path. In the popup screen you will end up with a line saying: /path/to/projectname

Press Create

The GUI will now opens dialogs for your queue system options. Select the correct value for your job submission system.

The GUI will now pop up (figure). From this screen we will continue working. There are two panels: the left panel—or main panel—where the major processing stages are selected, the right panel where the individual actions of each stage are performed. The right panel is divided by tabs to access the actions required to complete the stage. Some actions still have different execution options.

Any time during processing you can press the Save icon in the upper bar. Once that is done, you can continue at a later moment with your project. Reopening a project can be done by selecting Open from the Project dropdown menu.

figure: Main screen PyTomGUI.

4.2 Job submission settings

(Back to top)

This step is highly dependent on your cluster environment. So make sure to set this up correctly.

Its important to ensure that queue settings have been properly for submitting to jobs to HPC systems. This mainly concerns module loading for jobs as the current shell environment is not transfered in job submission. Additionally the queue parameters need to be correct (such as number of cores per nodes, the name of the queue, the queuing system). Opens the PyTomGUI settings by clicking the settings icon on the top bar. In our case we set it up as follows:

From Job Submission Parameters select All

From Select Modules click only the pytom module

Add the Custom Command condaactivate

For MotionCorrection jobs additionally add the motioncor2 module. For CTFDeterminaiont, SingleCTFCorrection, and BatchCTFCorrection add the imod module.

4.3 Collecting data

(Back to top)

The data is already collected but needs to be imported into your own project folder to start processing. Therefore you first need to go to the Data Transfer stage in the left panel.

Click on Data Transfer

Go to the Data Collection tab

Activate Data Collection

Now we can transfer the individual tilt images (as created above) to our project. The Nanograph Folder and Mdoc Folder are the same in our case. First, select the data for the Nanograph Folder.

Set filetype from tif to mrc in the dropdown menu

Press Browse

Open the folder of the individual tilt images

Then select the data for the Mdoc Folder and execute.

Set filetype to mdoc in the dropdown menu

Open the folder of the individual tilt images

Press Run

With Run the data is imported to your own project folder. This takes a few seconds. In the bottom of the screen there is a bar showing the percentage of the data that is imported. Once that is finished you can continue.

4.4 Motion correction

(Back to top)

As the data for this tutorial is already motion corrected we will not have to use the Motion Correction tab.

The input TIF files are composed of a series of frames for each tilt angle. In order to compensate for beam induced motion of the sample, a series of frames are recorded in rapid succession. The individual frames can be aligned during an initial preprocessing step called motion correction. The PyTom-GUI uses Motioncor2 for motion correction. For motion correction one can supply additional files, like a gain reference file, which motioncor2 will use for correcting the different probabilities of pixels to record a signal. You can select if you want to align patches within the image, and whether you want to crop the images in Fourier space, i.e., to downsample the images (for example from super-resolution to normal-resolution). In the example, those options are not used.

NOTE: We will not use patches, nor binning or gain correction. If a file called gain_ref.dm4 is present in your data folder, this can be used for gain correction.

5. Stage: Tomographic reconstruction

(Back to top)

We have successful transferred the data into our project folder. The next step is tomographic reconstruction of the data. Therefore, we need to activate the Tomographic Reconstruction stage in the left panel.

Click on the Enable Stage dropdown menu on top

Select Tomographic Reconstruction

In the left panel the second button is now activated.

5.1 Selecting tilt-series

(Back to top)

During this step the tilt images will be made square as K2 images have small differences in x and y dimensions. Furthermore, pixels with extreme values will be corrected to the mean value. Besides these corrections, the data for each tilt series will be copied to separate folders. The tomograms will be numbered from _000 onwards.

Go to the Select Tomograms tab

Press Refresh Tab

Now a table with the four tilt series will become visible. Please check all four boxes under create. You can do this by clicking all checkboxes individually, or by checking the box behind Apply to all. After checking the boxes the putative name of the tomogram is filled in automatically.

Check the create boxes for all tomograms (or click the top one to activate all)

In the fourth column, the type of Input File can be selected. Select Raw Nanographs as we have not performed any motion correction within PyTom, and therefore use the 'raw' downloaded images. The Bin Factor IMOD can be left untouched as we are not importing from IMOD.

Select Raw Nanographs from the input files column

Press Run

It takes a few minutes before all the folders are made. You can follow the progress of this step in the progress bar.

5.2 Creating a markerfile

(Back to top)

(In case you experience issues in this step, we provide some pregenerated marker files in the tutorial directory of pytom [pytom_folder]/tutorials/tutorial_1/markerfiles/tomogram_00*/markerfile.txt. These need to be placed in [pytom_project]/03_Tomographic_Reconstruction/tomogram_00*/sorted/. The reference markers for series 000, 001, 002, 003 are 4, 0, 4, 7.)

The rough alignment of individual tilt images is known from experiment, but not sufficiently accurate for a meaningful reconstruction. Thus, we need to correct for differences in rotation, translation, and magnification between individual tilt images. For this we have added gold particles (fiducials) to our sample. They form points of reference in each tilt image. The aim of this step is to trace their relative position in each of the tilt images. You have to track at least five, evenly dispersed gold particles that are present in each tilt series. The goal here is to annotate these markers across each tilt series. Watch our video tutorial Create Markerfile to get a brief overview.

Go to the Create Markerfile tab

Click Create Markerfile

Two windows will pop up: one for display of tilt-series images, and one with settings.

Select a tomogram name in the Settings pop-up window

Recommended values for display (Binning Factor Reading) and fiducial detection (Binning Factor Finding Fiducials) are 4 and 12, respectively.

Press Load Tilt Images

While the GUI is loading something in the background it will freeze. No worries, just wait for it to finish. After doing the background calculation, it will be fully functional again.

After a few seconds, the raw tilt images should show up in the window. You can navigate through the display of the tilt images with some hotkeys (see video link above).

The next steps consist of indexing fiducials and storing them in a markerfile. We start by automatic fiducial detection. In the Settings window are some parameters present that influence automatic detection. Angle tilt Axis (degrees), Reference Frame, Pixel Size (Å), Fiducial Size (Å) are filled in automatically based on specifications in the meta file of the series. However, the mdoc files of this tilt series did not contain the correct pixel size, so here we can update it to the proper value of 2.62A (repeat for each tilt-series loaded).

Set pixel size to 2.62 (A)

For fiducial detection we will use the LoG algorithm. The starting values are appropriate for this tilt-series. The LoG will filter the image based on fiducial size so this value needs to be set to the right size of 100A. Leave the checkbox marked for searching all frames.

Click Find Fiducials (or press F)

The tilt images will now show red markers around gold beads. As the gold particles are also visible by eye, we can assess how well PyTom detected them. Play around with the Accuracy level and Threshold cc_map parameters to see how it influences detection. After changing the values re-execute Find Fiducials to see the effects on detection. Decreasing the threshold and increasing sensitivity will detect more fiducials but also increase the amount of false positives. Settle on values that show good agreement with visual inspection of the markers. Once you have satisfactory detection continue with the following steps.

Two important terms:

  • Reference frame: the 0 degrees projection image in the tilt-series.
  • Reference marker: the marker located closest to the tilt axis.

Before we start with indexing, we can already remove some annotations that are problematic. For good annotation we need gold beads (1) to remain in the image frame through the full tilt-series, and (2) to be isolated in each frame. The latter can cause problems for neighboring fiducials at high tilt angles. Additionally, (3) fiducials that are not properly recognized with automatic detection can also be removed. Therefore we want to deselect markers that have either of these properties. In order to do this, it is important to understand that the indexing step annotates markers based on the reference frame. Each identified fiducial in the reference frame will be indexed and tracked through the tilt images. If you can already see that one of the detected beads in this frame cannot be properly be tracked through the series (either due to (1), (2), or (3)), you should deselect it from the reference frame. Keep in mind that in later steps we need a reference marker that is located as closely to the tilt axis as possible. Try to not deselect a potential reference marker.

Deselection can be done as follows:

  1. Center the enlarged crop-out on the fiducial by left-clicking in the main image pane.

  2. Deselect the annotation by right-clicking the red circle in the enlarged crop-out.

Once this is done, the frame shifts can be detected and the markers can be indexed.

Click Detect Frame Shifts (or press D)

The following step will take a few seconds:

Click Index Fiducials (or press I)

You should now see that the markers are colored and numbered. Again cycle through the tilt images to see how well each marker is tracked. With the LoG they should be robustly tracked. If not, you can solve it through manual adjustment. However, if the indexing is very bad, you can consider re-executing Find Fiducials which will remove all the current indexing and redo automatic fiducial detection.

Click Manually adjust Markers

A list of the currently indexed markers will appear, showing how many annotations have been made for each marker through the tilt series. Pressing Select Marker after clicking on a marker in the list will allow you to manually adjust the annotations of that specific marker. By pressing Next Missing Fiducial (hotkey: >) or Prev Missing Fiducial (hotkey: <) the selection will automatically take you to frames where the annotations is missing. In these frames you can update an annotation by deselecting incorrect ones (right-click) and selecting correct markers (left-click) in the crop-out. Besides these missing frames, you also need to ensure the marker is tracked correctly (i.e. does not jump). Once you are satisfied with the selection, you can redo Detect Frame Shifts and Index Fiducials and the marker list will be updated. It is possible that the marker again shows a missing annotation, in that case repeat the manual adjustment and re-index, until the marker is fully annotated. Repeat the above steps for each marker until they are all fully annotated. Once this is done you can save the markerfile.

Press Create Markerfile

Select all the marker and move them to the right column by pressing '>>'

The next step will pinpoint the markers to higher resolution in the unbinned tomogram. It might take a few seconds to execute.

Press Recenter Markers

You will see R-scores written behind each marker once it finishes execution. You can now write the markers to a txt file.

Press Save Markers

Note down which of the markers is located most closely to the tilt axis as you will need for the next actions. You can also open Check Align Errors to see the R-scores of the markers and how much they deviate per tilt. The marker that lies best on the tilt-axis often has the lowest R-scores.

Now start creating markerfiles for each of the remaining tomograms by repeating the above steps, before you continue with alignment.

5.3 Aligning tilt-series

(Back to top)

In alignment we apply a transformation to each tilt image based on the marker annotations in our markerfile. This corrects for translational and rotational shifts, and magnification differences. The aligned images can be used in the next step for CTF correction.

Go to the Alignment tab

Click Batch Alignment

Press Refresh Tab

A list of alignment-ready tomograms will appear. You can select the ones that you would like to align, in this case you will use all four tilt series. Select the reference marker for each tilt series—as written down in the previous step—from the dropdown menus in the Ref. Marker column. First Angle, Last Angle, and Ref. Image are filled automatically from the meta file. For the remaining parameters:

Parameter Value
Exp. Rot Angle 5

Fill in the parameters

Press Run

It might take a few minutes for alignment to finish, you will get a notification on successful completion.

If you want to submit this step to a node of your cluster, you can activate the queue box. This option will be available for many steps from now on. The details for job submission can be inspected via the Settings icon at the top of the GUI. Clicking Settings will open a pop-up window, in the Queuing Parameters tab you can inspect/change the settings for specific actions via the Job Submission Parameters dropdown menu.

Via the plot icon on the topside of the GUI, or from Plot from the Tools dropdown menu, you can view alignment scores. Go to Alignment Errors and refresh, your alignment scores should pop up. In case one of the scores is higher than 3, the alignment is too far off, and its advisable to redo alignment for this tilt-series. You could reduce/increase the indexed gold markers to get a better distribution over the tomogram, or change one of the reference markers. Alternatively you can replace the markers by the pregenerated files in the tutorial folder of the PyTom distribution

IMPORTANT NOTE ON JOBS AND LOGFILES

From here on out jobs will store information of their submission scripts and their logfiles in the PyTomGUI project directory. The GUI provides some tools to view all this information. You can click the image of the stack in the icon bar on top or select Queue from the Tools dropdown menu, here you can switch between Local Jobs and Queued Jobs (i.e. jobs submitted to the queuing system). Presh the refresh button to load the latest information. From there you can view the .sh file used to execute the job (Open Job), the log file (Open Log), see the running status (Running), and kill a running job (Terminate). Select the checkbox for the action and press Run in the left bottom.

5.4 CTF correction

(Back to top)

All TEM images are subject to convolution with a Contrast-Transfer Function (CTF). Approximate deconvolution is ultimately required to get a faithful image of the specimen, analogous to the typical workflow in single particle analysis. Peculiar about cryo-ET is that the images have a strong defocus gradient due to the tilting of the specimen. This defocus gradient has to be considered when determining the CTF parameters (defocus, astigmatism, phase shift) and also in the CTF correction. The PyTomGUI employs CTFplotter from IMOD to determine the CTF parameters (CTF Determination) and PyTom’s own procedure for CTF Correction (phase flipping). For both, determination of CTF parameters and the actual CTF correction the tilt angle is required. Hence, CTF correction is performed after the tilt series alignment in the PyTomGUI workflow. Watch our video tutorial CTF Determination to get a brief overview.

To start the CTF correction click the tab CTF Correction in the Tomographic Reconstruction stage. We start with the determination of the CTF parameters by CTF plotter in the CTF determination panel (Figure 6). The Folder Sorted & Aligned Tilt Images can be found in .../03_Tomographic_Reconstruction/tomogram_xxx/sorted/. Enter 3000 for the Expected Defocus value, which is important for good starting parameters for the search.

figure: This panel sets the parameters for invoking CTFplotter.

Pressing Execute command opens CTF plotter (Figure 7). Visit Guide to Ctfplotter for a detailed tutorial. If agreement between data and simulation is reasonable save the parameter table in the interface (best Fit each view separately). This procedure has to be repeated for each tilt series.

In the Fitting Params widget, set:

  • X1 start and end to 0.07 and 0.20, respectively
  • Check the find astigmatism box

In the Angle Range & Tile Selection widget, set:

  • Middle tilt angle is 0.0
  • Check fit each view separately
  • Then autofit all views

Wait for the IMOD CTF determination to finish and the press the Save to File. Repeat for each tilt-series.

figure: CTF plotter invoked by Execute Command button in CTF determination. The purple curve is the experimental radially averaged power spectrum and the green curve is the simulated one for the determined defocus value.

After determination of the CTF parameters the actual CTF correction can be performed (beta), optionally in batch for a set of tomograms. In Fourier space phases are flipped in specific areas, whereby the image is divided into small stripes to account for the defocus gradient.

The tilt-series can be CTF corrected in Batch mode. For this go to the Batch Correction tab, activate the checkboxes for the tomograms and click Run. Potentially, you can submit this job to a queuing system, but only if you have set up the parameters correctly. These jobs can also be run on gpus, in which case you need to specify a GPU index for each job.

5.5 Reconstructing tomograms

(Back to top)

After alignment and CTF correction, you can compute a 3D reconstruction of your sample. PyTom supports two types of 3D reconstruction: weighted backprojection (WBP) and Iterative Nonuniform Fourier Reconstruction (INFR). WBP comes with two different weighting schemes: r*-weighting, which linearly enhances weights with increasing frequency, and a scheme which accounts for precise angular sampling (going back to Harausz and van Heel). The former typically results in over-amplification of high-frequency noise, which is reduced with the alternative weighting scheme. The iterative INFR can achieve more accurate weighting and reconstruction at the expense of computational cost. Watch our video tutorial Tomogram Reconstruction to get a brief overview.

Go to the Reconstruction tab

Choose Batch Reconstruction

You will apply WBP to all tomograms. Leave the default settings for First Angle and Last Angle, they should -60 <--> 54 for 000 and 001, while -60 <--> 56 for 002 and 003. The alignment should correspond to the reference marker you selected for each tomogram. The rest can be filled in as follows (sorted indicates images without ctf-correction):

Parameter Value
ctf sorted
Weighting -1
Bin Factor 8

If you want to run on gpu's, fill in the appropriate gpu id in the last column (0 for example). Activate queue to submit your reconstruction to nodes of your cluster (in Settings you can specify your job submission system). On gpu nodes, gpu indexing always starts counting from 0.

Fill in the parameters

Press Run

You can open a tomogram with the PyTomGUI volume viewer by pressing the box icon, or selecting Plot3d from the Tools dropdown menu. Navigate to [project]/04_Particle_Picking/Tomograms and select on of the reconstructions.

6. Stage: Particle picking

(Back to top)

The next step of the processing workflow is to determine the coordinates of potentially interesting features, typically particles of a specific type. There are two options supported to obtain these coordinates: manual picking and template matching. In this tutorial we will start with the latter—an automated approach—but do some manual correction afterwards. Watch our video tutorial about Manual Picking to get an overview of the particle picking GUI. First enable the next stage.

Click on the Enable Stage dropdown menu on top

Select Particle Picking

The workflow of this stage will consist of template matching, candidate extraction, and manual validation of the extracted candidates.

6.1 Template matching

(Back to top)

Templating matching cross-correlates tomograms with a particle template, confined by a mask. High correlation scores indicate positions likely to contain the particle. The template has a much smaller dimension than the tomogram, but an identical pixelsize. The mask needs to have the same dimension as the template. EM and mrc files are accepted as input for the template and mask.

Go to the Template Matching tab

Go to Single and check the Template Matching box

For this tutorial we provide a template and mask that can be found in /path/to/pytom/tutorials/tutorial_1/templates (emd_5592_21A.mrc and mask_20_8_1.mrc). Normally you would generate a template via the script template_generation.py that is run from the command line with options for ctf correction and low-pass filtering. The mask can be created from the GUI via Template Matching -> Single clicking the Create button for mask.

As a side note, it is always a good idea to visually inspect your template and mask. This can for example be done in the GUI via 3d viewer that can be opened from the toolbar. Your mask should always ecompass your protein fully and should have a smooth falloff to prevent correlation artifacts in Fourier space.

Open the Batch Template Match panel

Press Refresh Tab

A dialog window will open, which first asks for tomograms, then template(s), and finally mask(s) to be used in the batch processing. The left dialog panel provides file navigation, while the right panel shows files ready for selection. Importantly, files need to be double clicked (!) in the left section to move them to the right section, before clicking OK.

Select the tomograms

Select the template

Select the mask

First, you will execute template matching only for the first tomogram (000) by clicking both the Run and Mirrored box. The template and mask should be automatically filled by the selected files. Wedge Angle 1 and Wedge Angle 2 can be left at the default parameter. For the remaining parameters see the table:

Parameter Value
angle list angles_12.85_7112.em
Start Z 115
End Z 350

We reduce the Z space for template matching to reduce computation time. The top and bottom part of a reconstruction generally contain some empty overhang.

Template matching can be submitted to your queuing system or run on GPU's. You can fill in the same gpu index for all jobs in which case PyTom will run them one by one, or you can manually fill in gpu indices for each job in which case it can be run in parallel.

Fill in the parameters

Press Run

Note that template matching will take significant amounts of time on CPU, ~1 hour on 16 cores. On GPU times should be ~5 minutes per tomogram. The Run and Mirrored box executes template matching twice: once with the provided template, and once with a mirrored version of the template. Detectors can sometimes mirror projections, resulting in a mirrored version of the protein. Executing this step with two versions of the template we can determine the correct orientation. The correctly-handed version will provide significantly higher correlation scores.

Once template matching has finished (which you can check via the queue pop-up window), we will extract the candidates. Maxima of the resulting correlation volume specify likely particle locations. To determine coordinates of maxima of the correlation volume and the corresponding orientations that yielded the highest correlation scores the corresponding output files need to be analyzed using the Batch Extract sub-tab within the Template Matching action.

Open the Batch Extract panel in Template Matching

Press Refresh Tab

In dialog window you can select candidate particle list with high correlation scores found by Template Matching. You can select these in the cross correlation folder for each tomogram .../04_Particle_Picking/Template_Matching/cross_correlation/tomogram_000_WBP/. However, PyTomGUI will do so automatic filling. Do this now for the regular and mirrored xml files.

Select the files

You can now specify the parameters in the table that appears in the GUI. File Name Particle List and Output Dir Subtomograms are automatically filled out, but you can adjust the name of the particle list. For the remaining parameters:

Parameter Value
Size (pixels) 10
#Candidates 1000
Min. Score 0.001

This will select the 1000 highest ranking candidate particles with a correlation score above 0.001. This step is fast so not many nodes are required.

Fill in the parameters

Press Run

At the top of the GUI you can select the Plot icon to view correlation scores from the extracted particle lists. To assess whether the template should be mirrored we compare the correlation of regular and mirrored candidates. Open the Plot pop-up window

Go to Template Matching Results tab

Select the regular particleList and the particleList Mirrored

Press Plot

figure: Cross-correlation scores of mirrored and non-mirrored template matching can be plotted against each other to find an optimal cutoff for correlation score.

The plot shows the correlation scores of the two particle list. The particles found by the original template should have higher correlation scores than the mirrored template scores. This tells us the particles are in the same orientation as our template. Additionally, we might also see a corssing point of the two curves if we extract enough particles. This is the point where detected particles mix with the background noise. It can provide a good cutoff point for particle extraction up to a minimal score.

Based on the original template performing better thant the mirrored, we can now execute Template Matching on the other tomograms without mirroring the template. The parameters are the same as above, but—only activate the Run box.

Repeat Template Matching for the remaining tomograms

Extract candidates from Template Matching

6.2 Removing mismatches and false positives

(Back to top)

Every candidate list contains some falsely annotated particles. Having a significant fraction of outliers in your selection might hinder further processing steps. So, it is necessary to clean up your template matched candidates. The GUI can be used in several ways to deselect particles. In this tutorial we will use two methods for cleaning up the particle lists. The first procedure is done at this stage (through the Manual Picking tab), the second will be done by classification during subtomogram analysis.

Go to the Manual Picking tab

This tab can also be used for full manual particle picking, but here you will use it to load the extracted particles from template matching and deselect false particles.

Click Browse

Open a tomogram

The tomogram will open in a new window, and additionally a long window—the particle window—opens (figure). By scrolling in the first you can move through a tomogram along the z-direction. In this window are some options available for adjusting the view. The Gaussian Filter can be set to smooth the image, while the Step Size makes scrolling faster by skipping some of the z-slices. Pertaining to the particles: Size Selection sets the radius of the selection around each particle, Number Particles shows the amount of particles currently selected. Minimal Score and Maximum Score apply to the correlation score of particles loaded from your template matching results. Set these things before loading the particle list because otherwise it will be slower.

Set Size Selection to 20

Activate Gaussian filter with value 0.8

Now you will load one of the lists of extracted candidates from template matching corresponding to the currently opened tomogram. There is a button for opening particle lists on the topside of the tomogram viewing window. In the dialog window you can opt for either opening Coordinate Files (txt format) or Particle List (xml format) in the dropdown menu. Particle List will show the xml files created by candidate extraction.

Open Particle List

You will see that blue circles have been drawn around particles in the tomogram, and the additional window will now show crop-outs of these selections in the tomogram. Increasing the size has expanded the circles to include surrounding sections of the particles. This facilitates distinguishing selected proteins in the particle window. Scrolling through the particle windows shows increasingly worse quality particles. The selections at the bottom will likely not be particles, but noise (or gold beads). Left-clicking one of the particles will show the error score in the top bar of the particle window. Additionally, the tomogram view will move to the selected place and change the selected circle to red (takes a few seconds).

By changing the Minimal Score many bad particles can be deselected. You can insert here the correlation score previously determined in your plot.

Set Minimal Score

Now you will manually inspect the remaining particles. By right-clicking particles in the particle window, you can deselect them. Take care, once a particle is deselected, the action cannot be reversed in the window. The deselection here relies on qualitative assessment: some selections will be too small or contain gold beads. Mainly, focus on deselecting those, others will be taken care of during Subtomogram Analysis.

Deselect Particles

Save your updated selection by pressing the Save icon at the top of the viewer.

Save the new Particle List

Now that the process has been explained, you can repeat the deselection step for the remaining tomograms.

Once you have finished for all tomograms, the individual lists can be combined into a larger particle list.

Go to the Create Particle List tab

Open Batch

Press Refresh Tab

A dialog window appears where you can select the lists that you want to combine. Double-click a file to move it from left to right, where it's staged for combination. Particle lists can be found in [project]/04_ParticlePicking/PickedParticles and have names of the following format particleList_TM_tomogram_###_WBP_emd_5592_21A.xml. Their name is formatted by tilt-series id and template name. You need to stage one for each tomogram.

After pressing OK a table will show where you can adjust some final parameters. Make sure that the angles for tomogram 000 and 001 are set to 30 for wedge 1 and 36 for wedge 2, while 002 and 003 should have 30 and 34, respectively. Then press Run (no need to select queue).

figure: Viewer for selecting particles. To enhance contrast a Gaussian Filter can be applied. The Size Selection specifies the diameter of your particles (in pixels) and hence the displayed circles. Step Size specifies the number of z-slices to be moved upon pressing the arrow buttons.

7. Stage: Subtomogram analysis

(Back to top)

Now we are ready for subtomogram analysis. Start by enabling the next stage. At this point we assume you are familiar with the basic steps and layout of PyTom and will therefore skip detailed commands. Steps in this part of the processing generally require significant time to complete, even on a cluster. We note for each step an approximation of computational time on our own cluster. Clearly, computation times can vary between computers/clusters, but it gives an indication for planning your data processing.

7.1 Reconstructing subtomograms

(Back to top)

First you need to reconstruct the subtomograms corresponding to the coordinates specified in the respective particle list. After enabling the stage, you select Reconstruct Subtomograms, and therein Batch Reconstruction. Upon refreshing the tab you can select the cleaned-up, combined particle list created in the previous section, this will load a particle list for each tomogram in a table. In the alignment column you can specify the reference marker you wrote down earlier for each tomogram. We will also now switch to using the ctf corrected images (can be specified in the ctf column). Set the following parameters for each tilt series:

Parameter Value
first angle -60
last angle 54 or 56 (i.e. the max angle)
ctf sorted_ctf (the ctf corrected images)
Bin Factor Recon 8
Weighting -1
Size Subtomos 50
Bin Subtomos 4

Bin Factor Recon is the binning factor used for tomogram reconstruction in the tomographic reconstruction stage and is needed for calculating the coordinates, in this case 8. Size subtomos specifies how many pixels the reconstructed box should contain along each axis, here 50, with a corresponding Bin Subtomos of 4. Offset and Polishing will be left empty.

Subtomograms can be reconstructed for multiple tomograms simultaneously by increasing the number of nodes when submitting to a cluster system. Subtomograms can also be reconstructed on multiple GPU's (or a single GPU), for this you can fill in comma separated GPU ids (e.g. 0,1,2,3). PyTomGUI will arrange that each tomogram is run sequentially and that the reconstruction load is divided over the GPUs. Do not assign a different GPU to each tomogram, as PyTomGUI will still run them sequentially.

Activate Run boxes

Select number of nodes or GPUs

Press Run

Computation time per tomogram with 900 particles on a single node of 10 CPUs with 20 threads is approximately 5 minutes. Or on 4 GPUs (GeForce GTX 1080 Ti) < 1 minute.

IMPORTANT NOTE ON BINNING

Binning drastically increases the speed of coming steps. Alternatively you can also specify a binning factor in each subtomogram analysis step (and hence keep the subtomgorams unbinned), but the subtomos still need to be downsampled everytime they are loaded from memory which can still be costly. In this case we just want to get an initial alignment to filter out some false positives, so binning the subtomos is fine. Consider the maximal resolution you need for the classification, and bin the subtomograms appropriately. In this case we have 4 x 2.62 $Å$ = 10.56 $Å$ pixels, which gives a nyquist frequency of 20.96 $Å^{-1}$.

We generally recommend to bin the subtomograms during reconstruction as it makes it easier to examine different parameters relatively quickly.

7.2 Aligning subtomograms

(Back to top)

Alignment of subtomograms to a common coordinate system, which then allows meaningful averaging, relies on iterative algorithms. In PyTom, quasi-expectation maximation approaches iteratively optimize the correlation of particles with an average. In particular, the 3D rotational search is vast as three Euler angles need to be sampled. PyTom supports two flavors of iterative subtomogram alignment: Fast Rotational Matching (FRM) and real space alignment (with GLocal).

Real space alignment tends to be more accurate because it can be focused on specific features of the molecule of interest with a mask, but the orientation sampling takes much longer in real space. As a consequence, only a limited range of orientations can be sampled in a single iteration, decreasing the coverage compared to the FRM alignment. Thus, real space alignment is a good option if the approximate orientations of the particles are already known. This is the case if the particles have been detected with template matching or if FRM alignment is executed beforehand.

To use real space alignment, choose the tab Align Subtomograms with the specification GLocal (Gold-standard Local alignment) (Figure 20). Required input are a Particle List (with good approximations of orientations, i.e. from template matching or initial FRM alignment), and a Mask. The particle list is the same as the combined list we just used to make the subtomogram reconstructions. An initial external reference model can be provided or, when left out, the average of the subtomograms will be calculated as a first step and used as a reference.

The mask needs to be created in correspondence to the reconstructed subtomogram box sizes. Generate a mask with the following parameters:

Mask
Dimension (pixels) 50
Radius (pixels) 19
Smooth 1.2

The pixel size here should always be the pixel size in Angstrom that we have in our subtomograms. The original size is 2.62A and we binned the subtomograms 4 times, i.e. we get 10.48A. For parameters use the following values (remaining can be left at default):

Parameter Value
Pixel size 10.48 A
Number of iterations 4
Particle Diameter 270
Number of angular shells 3
Angular Increment 3
Binning Factor 1

(In case you did not bin the subtomograms, you could potentially do that here as well.)

Activate queue or use GPUs if possible. Multiple GPUs are assigned with comma separation (e.g. 0,1,2,3).

Press Generate Command

The execution command will be displayed. You can still change parameters in the command at this point (in case of queue: the number of nodes used (-N) or the maximum execution time (--time)).

Execute command

Computation time on 4 cluster nodes, each with 10 CPUs and 20 threads, for a total of 80 threads: ~2 hours. On 4 GPUs (GeForce GTX 1080 Ti), execution took ~40 minutes.

Results of the GLocal subtomogram alignment will be stored in /path/to/project/05_Subtomogram_Analysis/Alignment/GLocal/. For each iteration, the average, the average filtered to the resolution according to FSC and the respective particle list (with updated orientations) will be stored. Average-FinalFiltered_[resolution].em contains the final averaged structure with the resolution of the structure in the file name (the final resolution can also be viewed at the bottom of the job log file via Tools > Queue). When applied to the particles found by template matching the average rapidly converges, and should be in the range of ~20 A. The resulting structures can be viewed through the manual picking 3D viewer or using external software (such as chimera).

figure: Subtomogram average resulting from GLocal alignment of particles selected by template matching. This average has a resolution of ~20 A.

7.3 Classifying subtomograms

(Back to top)

Part of PyTom are two distinct classification approaches: Constrained Principal Component analysis-based classification (CPCA) and auto-focused classification. CPCA is designed to classify previously aligned subtomograms according to main features in a user-defined mask. Auto-focused classification aims to align subtomograms simultaneously with classification and to automatically focus the classification on the most variable parts of the data. In this tutorial we will use auto-focused classification.

While CPCA classification requires accurately pre-aligned particles and typically prior knowledge on which feature to classify for, auto-focused classification attempts to alleviate these requirements. The procedure continuously monitors the variance between the class averages, where the classification is focused on (hence the name: auto-focused). Class assignments are based on a voting procedure: pairwise comparisons are made for each investigated particle (i.e., the particle is compared to class 1 and 2, class 1 and 3, class 2 and 3, and so forth) and the particle is assigned to the class that has most ‘wins’ in these comparisons. You access this classification option through Auto Focus in the Classify Subtomograms tab (Figure 24).

Autofocused classification is performed on the specified Particle List. The Focussed Mask allows to restrict the classification area; for example, membranes can be flexible and you may want to exclude this area from the classification to rather focus on other structural features. The Alignment Mask is required for the subtomogram alignment at each classification iteration. The Noise Percentage is an estimation of particles that likely do not correspond to any meaningful class and should rather be assigned to a ‘junk’ basket (here 10%). The STD Threshold Diff Map determined the auto-focused mask: areas above this threshold are used for the classification (here 0.4 times the standard deviation). The Particle Density threshold focuses on particularly negative densities – typically classification targets densities that are present / absent in the classes at this resolution.

Choose the Particle List from the last GLocal iteration (in [project]/05_Subtomogram_Analysis/Alignment/GLocal find the latest .xml file, they start with the iteration number, e.g. 3-ParticleList.xml). For the Classification and Alignment Mask you can both pick the same mask, namely the one created for GLocal alignment (50 pixels wide, 19 pixels radius, and smoothing 1.2). You can specify an output folder, PyTomGUI will also attempt to create one based on the particle list name. Finally, leave all parameters on default, except set the Max Bandwith to 13px, 1/4 of the box size which is half of Nyquist. Binning is 1 as we already binned the subtomograms. Submitting the job to queue is advisable.

Press Generate command

Press Execute command

Computational time 2 hours on 4 nodes with 20 cores each (total of 80 cores).

The convergence of AutoFocus classification can be checked via the log file (Tools > Queue). Each iteration will write a matrix with the class changes. The first iterations should show a matrix where the diagonals elements (the number of particles that stay in the same class) will be similar to the off-diagonals (the class-changes). For later iterations the number on the diagonals should increase while the off-diagonals should decrease, meaning that the class assignment is becoming more stable.

figure: All positive classes of autofocus classification (0,1,2,3). chimera [project]/05_Subtomogram_Analysis/Classification/AutoFocus/[job]/iter[i]_class?.em, in the chimera command line issue tile, set the density threshold in the chimera Volume Viewer for each density to negative. PyTom works with negative densities, in constrast to software such as RELION.

7.4 Re-aligning after classification

(Back to top)

To start re-aligning we first need to select the classes from the previous step that we want to continue with. Secondly, we will reconstruct the subtomograms again but with lower binning, as we have already hit nyquist with the first GLocal alignment.

In the Particle Picking stage go to the Alter Particle List tab and check the box for Extract Particles from XML. For the Particle List browse to the classification folder ([project]/05_Subtomgram_Analysis/Classification/AutoFocus/[job]/classified_pl_iter[lastest].xml) and select the latest iteration. Click the By Class checkbox and fill in 0,1,2,3 to extract all classes. Autofocus has attempted to assign all false positives to the noise class -1, so we will not extract those. After pressing Adjust!, save you particle list with a new name. Then deselect Extract Particles from XML and activate the Change Parameters checkbox. Select as input the file you just saved with the extracted classes, activate multiply shifts and multiply them by 2, and Adjust! to a new file. This will correct the shifts for the 2x binned subtomograms that we are about to make.

Now we can redo the subtomogram reconstruction step, but with a subtomogram binning factor of 2 and box size of 100. Please use the particle list created above. All other parameters can remain the same as before, importantly bin factor tomo should remain at 8 and the ctf column should point to the sorted_ctf images!

Using the classified and 2x binned subtomograms, we can rerun GLocal alignment. For this generate a mask with (100 px, 37 r, 2.5 s), change the pixel size to (2 * 2.62A =) 5.24A, while leaving other parameters as specified above. After this alignment your resolution should be around 15 Å (as seen in the files in the GLocal folder or the log files of the GLocal job).

figure: Subtomogram average resulting from unbinned subtomograms, after AutoFocus classification. The resolution rapidly converges to ~15 A.

7.5 Improving resolution by limiting tilt angles

(Back to top)

In order to improve the resolution of the reconstruction further we need to unbin the subtomograms and address the issue of sample deformation/deterioration during image collection. We do this by only using the first 21 recorded images.

In the tilt alignment step in the tomogram reconstruction stage (aligning tilt-series), you can specify the tilt angles for an alignment. Please limit your angles from -20 to 20 degrees. Now alter your particle list (of the final GLocal iteration from the previous step) in the Alter Particle List tab, to have two wedge angles of 70 degrees, and multiply the shifts again by 2 to account for unbinning. Save your particle list under a different name. Then, reconstruct the subtomograms unbinned (i.e. subtomo size is 200 and subtomo bin is 1) with the first and last angle set to -20 and 20, respectively. Finally, rerun GLocal alignment, now with a new mask (200px, 75r, 5s) and pixel size set to 2.62A. The result of the alignment should now be between 13 and 14 Å.

figure: Subtomogram average resulting from subtomograms with limited range of tilt angles (-20 deg to 20 deg). The resolution converges to ~13 A.

7.6 Tight mask for determining resolution

(Back to top)

The current resolution of the model is an underestimation of the real resolution, because the mask is relatively large compared to the object. We are thus averaging a significant amount of noise. By creating a mask that is tighter to the object, this problem can be avoided. This does, however, create the danger to create a too tight mask, which leads to an overestimation of the resolution of the model.

Go to the Validation tab. Select average-Final-Even.mrc and average-Final-Odd.mrc from the last GLocal job as volume 1 and volume 2, create a mask with the default parameters around average-FinalFiltered_[res].mrc from the GLocal output, and select an output folder. Make sure the pixel size is set to 2.62A. The job can potentially be run on a GPU. For visualization you can check the plot results checkbox, PyTomGUI will also write out the raw values to your output folder as a .dat file for visualization of your choice.

The randomize phases will test against a random phase object in addition to the half map test. PyTom should report a resolution of ~11 A at this point.

figure: Fourier Shell Correlation of the halfmaps from the final subtomogram alignment. The CTF rings can be seen in the correlation. Correlation drops below the gold standard of 0.143 around 10.5 A.

8. Conclusion

(Back to top)

You have made your first steps in becoming a cryo-electron tomography processing expert! PyTom comes with many features that you can read about on the wiki. You may especially be interested in integrating part of the workflow, such as template matching, into your processing pipeline. Check the pages about interfacing with other software for more information.

Clone this wiki locally