Skip to content
B.A.Himes edited this page Apr 21, 2022 · 79 revisions

The PRIMARY PAPER describing emClarity may be found at Nature Methods

The source code for the TUTORIAL included in your "docs" folder may be found at Thomas Frosio's github

A PROTOCOL including trouble shooting from the Zhang lab using vs 1.5.0.2 is submitted for review.

The strategies there will apply to newer versions, so aside for the purpose of replication, I recommend the most current version.
A Mailing List hosted on google groups is a good place to search for common questions/answers and post new ones if needed.

ANNOUNCEMENTS


2022-April-20

vs 1.6.1

Fixes:

  • a bug that was breaking higher-order symmetry
Adds:
  • Supports Nvidia Ampere architecture natively.
  • Now saves unmasked halfmaps for EMDB deposition (found in your FSC folder, labelled in obvi way)
  • Now saves XML FSC plot for upload to emdb
Breaks:
  • ⚠️ Requires upgrading to Matlab 2021a MRC ⚠️

2021-June-11

vs 1.5.3.11

  • fixes bug in
emClarity avg

that now requires

Raw_angleSearch
to be defined so the program knows to use helical or "regular" angular convention.

2021-June-3

vs 1.5.3.10

  • fixes bug with bead removal inconsistency and for the case where no model exists
  • fixes helical alignment and enables symmetry around the helical axis
+1: The helical axis must now be on Y (not Z)
  • fixes incomplete commit from Jan 24 that set the alignment shape mask == fsc shape mask with hardcoded params. The equality is still true, however specifying the following will now modify your alignment mask.
point_down: default values shown
shape_mask_threshold=2.5
shape_mask_lowpass=14
You can test a series of these masks by also specifying, then running
shape_mask_test=1
emClarity fsc paramX.m X RawAlignment
This will save the shape mask in your FSC directory, including the lowpass and threshold in the file name
warning: Generally, a higher threshold will result in a tighter mask and increasing the lowpass may help to "re-capture" regions that are sometimes lost for flexible samples. This is fast to calculate, and should be easy to script.

2021-May-14

  • Minor, supersedes 1.5.3.08
LTS 1.5.3.09
  • Minor, print line errors in BH_eraseBeads. No need to update from 1.5.3.06/7. Use this if on older version.
LTS 1.5.3.08

2021-May-13

  • Minor changes to keep the files input to autoAlign unmodified. (tilt and modified stacks all end up in fixedStacks)
LTS 1.5.3.07

2021-May-12

⚠️ Deployed files were not linking quite right. Working Beta: LTS 1.5.3.06

Not Working Beta: LTS 1.5.3.05

Fixes several bugs and changes a couple of default options. We are currently testing from start to finish on several data sets, and should have a stable release by 1.5.3.10

Bug fixes

  • changed the file path parsing in emClarity.m to be safe for directory names with dots
  • fix asymmetric unit randomization in templateSearch v2 that helps to reduce wedge bias by randmozing detected angles to one withing the symmetry group. Previously only worked with old symmetry operator and cyclic sym. Now works with new symmetry operator, CX, I,I2, O.
⚠️ This symmetry is taken from your parameter file. The symmetry option on the CLI is now ignored, and will be removed in future versions. For now, leaving in place to not break scripts.
  • fixes file check on bead eraser model when not present
  • Many fixes to the suite of programs that support autoAlign, see commit log ( 39210f8 ) if you are curious
🤿 autoAlign is now working very nicely on a wide array of data with or without beads. If you have beads, please add:
autoAli_refine_on_beads=true
  • For rectangular images, significant data loss is possible depending on the orientation of your tilt-axis. Fixes the bug that did not always correctly determine if a 90 degree pre-rotation would prevent data loss (by introducing a new empirical rather than analytical check.)

Changed defaults

  • Since the default templateSearch now sets a threshold automatically, you can override this to get the original behavior
Override_threshold_and_return_N_peaks=N
  • The default option for bead erasure is before ctf correction - I still think it makes sense to do this after ctf correction, but haven't sorted out all the logic. If you want to test on your own data:
erase_beads_after_ctf=true

2021-May-07

Working version: LTS 1.5.3.03

  • After more extensive testing, it appears matlab2020b and 2021a both run substantially slower. It is not clear why, so I am returning to matlab2019a, which should hopefully also resolve compatibility issues many centos7 users were having.
  • In 1.5.2 I recommended switching to the "newer" version of template matching. This is now the default
    1. The old version may be accessed by running with "v1" e.g.
      $ emClarity v1 templateSearch tiltN tomoN ref.mrc symmetry <gpuIDX>
    2. Previously the additional "CTF" CLI argument was needed for the newer template search - this is no longer the case, simply run:
      $ emClarity templateSearch tiltN tomoN ref.mrc symmetry <gpuIDX>
    3. This means you must have your CTF corrected tomograms. Since you have no metadata (subtomo.mat) then
      $ emClarity ctf 3d paramN.m templateSearch
    4. The parameter Tmp_threshold=300 previously returned 300 of your highest scoring peaks (after excluding surrounding voxels based on your particleRadius, or if specified your Peak_mRadius. If you wish to retain this behavior, you now need to add the Override_threshold_and_return_N_peaks =300. Otherwise, the new behavior is as follows:
      1. Tmp_threshold=300 will be taken as an estimate of the number of particles.
      2. From this value a score threshold will be calculated that should result in fewer false positives (~10% of estimated) and allows for up to 2x estimated as this may vary tomo to tomo. If there are significant departures from Gaussian noise (eg carbon edge), this may fail, hence the option to override.
  • the newer symmetry parameter is now enforced in all programs, and must be in your parameter file.
    • symmetry=C1 [C2..CX,O,I]
    • other symmetrys may be added on request

2021-January-25

Working beta version: LTS 1.5.2.0

  • See previous for minimum IMOD install, but you should prob use >= 4.11.0
  • recommend CTF corrected template matching (emClarity v2 templateSearch paramT.m tomo N ref sym gpuIDX CTF)
    • Note the CTF at the end, if omitted the contrast will be inverted and your search will fail
    • Note the sym in the above call is now (C1, C3, I, etc.) as symmetry=C3 in the parameter file
    • use_new_grid_search is enabled by default, so the search is constrained by symmetry automatically (i.e. for C3 180,10 will search 120,10)
    • emClarity ctf 3d paramT.m templateSearch, is used to create the tomograms for this process
    • The same command, but with phakePhasePlate=1 set will now produce the same size tomo as specified for templateSearch, but with a _filtered.rec suffix. This way you can more easily edit your template matching results.
    • Resolution will be limited to the lower of (Nyquist, lowResCut) where lowResCut defaults to 28 if not specified in the param file.
  • switched from Matlab 19a to 20b. See the MCR on the install page for instructions if needed
  • if you haven't updated in a while, please check out the last few pages on the announcements below.
Good luck!

2020-June-10

LTS 1.5.1.0

Several functions now are enabled that use code I've been working on directly in CUDA. To prevent headaches for users, select CUDA libraries are now distributed with emClarity. You will notice when you unpack the zip, you will now have a set up like:

  emClarity_1.5.1.0/
    docs/
    lib/
      libcufft.so.10
      libcublasLt.so.10
    bin/
      deps/
        IMOD and cisTEM dependencies
  NOTE: This increases the binary size from ~ 70 --> 250 mb, but it means that you don't have to update or install the cuda-toolkit to any specific version. I hope this is a reasonable tradeoff : )

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

emClarity now requires IMOD >= 4.10.43

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Auto masking (shape mask) can now be modified via your parameter file.

Turning the mask on/off uses a boolean (1/0)

  * flgFscShapeMask=1

The lowpass filter used to initially select regions to dilate can be changed. Lowering the resolution will tend to help when density outside your particle is being retained in the mask.

   * shape_mask_lowpass=18

The number of standard deviations above the mean that "high intensity" voxels are selected from the lowpass filtered map also affects the regions initially included.

  * shape_mask_threshold=2.4

Finally, to test the effect of the mask, this boolean can be set to true.

  * shape_mask_test=0

Running: >$ emClarity fsc paramN.m N RawAlignment with this paramter will result in creation of the mask, saved in your project/FSC folder




The 3d sampling function (elsewhere called a 3d CTF) is now calculated on the fly using new code written in cuda. This is now enabled by default, but may still be disabled by setting:

  * use_v2_SF3D=0

emClarity autoalign functionality has been extended to include additional options and refinement on the gold beads after iterative patch tracking

  Default values and a brief description:
  Lowpass filter used in alignment. (Note a median filter is applied in the preporcessing step as well. This is only  used for tilt series alignment.)
    * autoAli_max_resolution=18
  Maximum pixel size used for alignment in ang/pix. If you have a pixel size of 2.0, then your max binning = 5 (with default below)
    * autoAli_min_sampling_rate=10
  Minimum pixel size used for alignment in ang/pix. If you have a pixel size of 2.0, then your minimum binning = 2 (with default below)
    * autoAli_max_sampling_rate=3
  Making this larger will result in more patches, and more local areas in later iterations, but may also decrease accuracy.
    * autoAli_patch_size_factor=4
  When more than ~7 gold beads are present, turning this on can substantially  improve the aut alignment
    * autoAli_refine_on_beads=0
  The number of initial patch tracking iterations where in-plane rotations are not fit. This can help for series that are very jump or poorly tracked in the scope.
    * autoAli_n_iters_no_rotation=3
  Influences the number of patches
    * autoAli_patch_overlap=0.5
  Each binning will be run for this many iterations.
   * autoAli_iterations_per_bin=3
  Max shift for the alignment, equal to autoAli_max_shift_in_pixels / ( nIteration^autoAli_max_shift_factor) +1
    * autoAli_max_shift_in_pixels=10
    * autoAli_max_shift_factor=1

Symmetry is now extended beyond CX to include I/I2, O, and DX. Only CX and O have been confirmed. The old default will be used unless you add the new parameter to your param

  * symmetry=C3
  * symmetry=O
  * symmetry=I2
      etc etc

Improvements to the interpolation to reduce memory usage an increase speed.

New parameter to reduce large shifts in Z after running tomoCPR:

  By default it is still off. Use this if you see X shifts (5th column in your mapBack/project_...tltxf) that
  look like -5 -4 -3 -2 -1 0 1 2 3 4 5
  * eucentric_fit=0
  * eucentric_minTilt=15 # Lower tilts are less useful in the fitting of the eucentric_fit. Does not affect application.

Bead model is now updated after tomoCPR so later iterations still erase properly.

Subtomogram alignment now defaults to the v2 program. The main difference is the use of the new faster inerpolation. This can lead to significant speedups, especially for symmetric particles.

  NOTE: There is some problem resulting in memory problems that is evident by "core.***" files being dumped out. This doesn't seem to affect the alignment, and only happens in the compiled version, but not when run from interactive matlab, so I haven't been able to track it down.

emClarity avg now uses the new interpolator classification

emClarity check now prints out substantially more user/installation/computer related information by calling BH_checkInstall.sh prior to BH_checkInstall.m

All parallel pool objects (and their mutex locks) are now written to the temporary EMC_CACHE_DIR which helps to avoid collisions when running many jobs on the cluster at once. This is especiially useful for the following two functions which can now be run in finer grain.

  ALIGNRAW
  rather than 
    * emClarity alignRaw paramN.m N
  you may now run
    * for i in $(seq 1 M) ; do emClarity alignRaw paramN.n [N,i,M] ; done
    * emClarity alignRaw paramN.m N
    WHERE:
      N = Cycle number
      M = the number of total jobs to split into. (should not be > number of tomograms)
      i = index from 1 --> M
    NOTE:
      * This still needs to be scripted. You can run each as a job on your cluster or send to the background on local machines. 
      * The loop above is an illustration, but would actually be in serial not parallel, and would be slower.
      * You will also need to have the number of GPUS adjusted. I've been keeping two params.     
        1) paramN.m ( used for average, ctf 3d etc.)
        2) paramN_split.m ( nGPUs nCpuCores adjusted for the split run.)
    NOTE:
      * You must run the "regular" command when all the split commands are done, which will read the alignments from your project/alignResume/cyc....txt files and update the database.
    TOMOCPR
  rather than
    * emClarity tomoCPR paramN.n N
  you may now run
    * for i in $(seq 1 M) ; do emClarity tomoCPR paramN.n [N,i,M] ; done
    * emClarity tomoCPR paramN.m [N,0,0]
  NOTE: 
    Similar logic as alignRaw. The syntax is a little different in the "cleanup run" 

2020-April-10

LTS 1.5.0.4 is up this morning, including the autoAlign function which will align your tilt-series and also find any gold beads.

run as

$ emClarity autoAlign param0.m tilt1.st tilt1.rawtlt estimatedTiltAxis

  • The file extensions on the tilt-series and tilt angles are not fixed
  • Estimated tilt-axis is in degrees as in IMOD
  • This will create the "fixedStacks" directory for you, including all files needed for emClarity to run, and a list of the basenames for each tilt-series to use in early preprocessing.
It seems to work pretty well. There are three optional parameters (for now, could expose more) for you to experiment with.

autoAli_max_resolution

  • default is currently 18 Angstrom.
    • I would suggest trying lower resolution first if you are having trouble.
autoAli_max_sampling_rate
  • default is currently 3 angpix.
    • Using a finer sampling rate can be very slow, and seems to add little benefit.
autoAli_patch_size_factor
  • default is currently 4.
    • This could be the most important. In the first four iterations, the patch size is = NX / iIter. So for iIter=1 there is only 1 patch, e.g. coarse alignment. After the fourth iteration the patch size is = NX / (iIter * autoAli_patch_size_factor) so a larger value for this parameter will result in smaller patches and therefore more points in the local alignment. This increases runtime and also can be wildly inaccurate on some data sets if set too large.
This function currently just uses the cpu as it is automating underlying IMOD functions. They seem to use up to 6 threads per process, which could probably be controlled more specifically using environmental variables. It would be great if someone could let me know about this.

I've been running 8-10 tilt-series in parallel.

TODO - control threading. TODO - enable use of fastScratchDisk, or even better tmpfs or shm as there is a decent amount of disk i/o

2020-April-8

LTS 1.5.0.x is available for testing. I would expect somewhat frequent updates over the next couple of weeks as you all find bugs.

Many fixes from versions 1.2 - 1.4.X. Also exposed a few "hidden" features. Details to come. New tutorial which is in progress, but nearing completion in the docs/

Also please note the run script (emClarity_1_5_0_X_v19a) should be updated as it fixes a problem with collision of parallel workers that matlab creates.

2019-July-11

Uploaded emClarity 1.4.3

Some bigger changes:

  • tomoCPR -
    • a 3d CTF corrected tomogram for the full tilt-series is calculated to estimate the background noise instead of just using the individual reconstructions. This should increase the accuracy, particularly for cellular samples.
    • now a default random subset of at most 500 subtomograms/ tilt-series will be used. You may change this behavior by setting the parameter "tomoCPR_randomSubset"
  • ctf estimate/update
    • 2d interpolation is now set to default to linear due to memory problems with K3 data. When Fourier interpolation is used, the image is padded to be square - this is a problem with K3 data. You may set to Fourier interpolation with the parameter "useFourierInterp"
  • alignRaw -
    • translation search restriction according to Peak_mRadius is changed to work properly for oblate particles and improved to reduce cumulative shifts from exceeding the desired range.
    • The XYZ shifts printed to the log/terminal are now in (previously pixels) and in the particles reference frame, which should make it easier to monitor the alignment.
    • Ability to reset just the shifts to some previous point in time (see below.)
  • geometry -
    • Option to reset the XYZ alignments to previous cycle. This should be done prior to averaging. For example, if you have just run "alignRaw param4.m 4" and want to revert to cycle 1 positions while keeping the angles. Run
>$ emClarity geometry param4.m 4 RawAlignment RevertXYZ [1,0,0] STD
  • ctf 3d -
    • When filtering a full reconstruction for analysis you may now at a negative sign to the first value to get white density. The default is [100,2]
    • You may now think of the first value as a way to fine tune the strength of the filter. E.g. Strongest --> weakest
phakePhasePlate=[200,2]
phakePhasePlate=[50,2]
phakePhasePlate=[200,1]
phakePhasePlate=[50,1]
2019-Apr-18

Uploaded emClarity 1.3.0

Minor change in template search to better accommodate searching for filaments. Also add a description of related parameters, for template searching and alignment

Brief notes on previous releases can be found in the Change Log

emClarity

(enhanced Macro-molecular CLassification and Alignment for highResolution In situ TomographY ) is a collection of gpu accelerated software developed to enable determination of biological structures at resolutions better than 1nm from heterogeneous specimen imaged by cryo-Electron Tomography.

Please have a look at the roadmap page to send you on your way.

Bugs and features should be submitted through the "issues" tab above, and general discussion is encouraged through the google user group.

Overview

Information is grouped into three primary categories:

1) Tutorial videos

    _The quickest way to get up and running_

2) User guides

    _More thorough discussion of features and the theory behind them_

3) Developer guides

    _Nuts and bolts behind the scenes_