A deep learning based open-source software with a friendly user interface to pick 3D particles rapidly and accurately from cryo-electron tomograms. With the advantages of weak labels, lightweight architecture and GPU-accelerated pooling operations, the cost of annotations and the time of computational inference are significantly reduced while the accuracy is greatly improved by applying a Gaussian-type mask and using a customized architecture design.
Note: DeepETPicker is a Pytorch implementation.
- Linux (Ubuntu 18.04.5 LTS; CentOS)
- NVIDIA GPU
The following instructions assume that pip
and anaconda
or miniconda
are available. In case you have a old deepetpicker environment installed, first remove the old one with:
conda env remove --name deepetpicker
The first step is to crate a new conda virtual environment:
conda create -n deepetpicker -c conda-forge python=3.8.3 -y
Activate the environment:
conda activate deepetpicker
To download the codes, please do:
git clone https://github.com/cbmi-group/DeepETPicker
cd DeepETPicker
Next, install a custom pytorch and relative packages needed by DeepETPicker:
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch -y
pip install -r requirement.txt
-
if using GUI
To use GUI packages with Linux, you will need to install the following extended dependencies for Qt.
-
For
CentOS
, to install packages, please do:sudo yum install -y mesa-libGL libXext libSM libXrender fontconfig xcb-util-wm xcb-util-image xcb-util-keysyms xcb-util-renderutil libxkbcommon-x11
-
For
Ubuntu
, to install packages, please do:sudo apt-get install -y libgl1-mesa-glx libglib2.0-dev libsm6 libxrender1 libfontconfig1 libxcb-icccm4 libxcb-image0 libxcb-keysyms1 libxcb-render-util0 libxcb-shape0 libxcb-xinerama0 libxcb-xkb1 libxkbcommon-x11-dev libdbus-1-3
To run the DeepETpicker, please do:
conda activate deepetpicker python PATH_TO_DEEPETPICKER/main.py
Note:
PATH_TO_DEEPETPICKER
is the corresponding directory where the code located. -
-
Non GUI version
In addition to the
GUI version
of DeepETPicker, we also provide anon-GUI version
of DeepETPicker for people who understand python and deep-learning. It consists of four processes, includingpreprocessing
,train config generation
,training
andtesting
. A sample tutorial can be found in.bin/bash_command.md
.
The following steps are required in order to run DeepETPicker:
-
Install Docker
Note: docker engine version shuold be >= 19.03. The size of Docker mirror of Deepetpicker is 7.21 GB, please ensure that there is enough memory space.
-
Install NVIDIA Container Toolkit for GPU support.
-
Download Docker image of DeepETPicker.
docker pull docker.io/lgl603/deepetpicker:latest
-
Run the Docker image of DeepETPicker.
docker run --gpus=all -itd \ --restart=always \ --shm-size=100G \ -e DISPLAY=unix$DISPLAY \ --name deepetpicker \ -p 50022:22 \ --mount type=bind,source='/host_path/to/data',target='/container_path/to/data' \ lgl603/deepetpicker:latest
-
The option
--shm-size
is used to set the required size of shared momory of the docker containers. -
The option
--mount
is used to mount a file or directory on the host machine into the Docker container, wheresource='/host_path/to/data'
denotes denotes the data directory really existed in host machine.target='/container_path/to/data'
is the data directory where the directory'/host_path/to/data'
is mounted in the container.Note:
'/host_path/to/data'
should be writable by running bash commandchmod -R 777 '/host_path/to/data'
.'/host_path/to/data'
should be replaced by the data directory real existed in host machine. For convenience,'/container_path/to/data'
can set the same as'/host_path/to/data'
-
The DeepETPicker can be used directly in this machine, and it also can be used by a machine in the same LAN.
- Directly open DeepETPicker in this machine:
ssh -X test@'ip_address' DeepETPicker # where the 'ip_address' of DeepETPicker container can be obtained as follows: docker inspect --format='{{.NetworkSettings.IPAddress}}' deepetpicker
- Connect to this server remotely and open DeepETPicker software via a client machine:
ssh -X -p 50022 test@ip DeepETPicker
Here
ip
is the IP address of the server machine,password ispassword
.
Installation time
: the size of Docker mirror of Deepetpicker is 7.21 GB, and the installation time depends on your network speed. When the network speed is fast enough, it can be configured within a few minutes.
Detailed tutorials for two sample datasets of SHREC2021 and EMPIAR-10045 are provided. Main steps of DeepETPicker includeds preprocessing, traning of DeepETPicker, inference of DeepETPicker, and particle visualization.
-
Data preparation
Before launching the graphical user interface, we recommend creating a single folder to save inputs and outputs of DeepETpicker. Inside this base folder you should make a subfolder to store raw data. This raw_data folder should contain:
-
tomograms(with extension .mrc or .rec)
-
coordinates file with the same name as tomograms except for extension. (with extension *.csv, *.coords or *.txt. Generally, *.coords is recoommand.).
Here, we provides two sample datasets of EMPIAR-10045 and SHREC_2021 for particle picking to enable you to learn the processing flow of DeepETPicker better and faster. The sample dataset can be download in one of two ways:
- Baidu Netdisk Link: https://pan.baidu.com/s/1aijM4IgGSRMwBvBk5XbBmw; verification code: cbmi
- Zeenodo Link: https://zenodo.org/records/12512436
-
-
Data structure
The data should be organized as follows:
├── /base/path │ ├── raw_data │ │ ├── tomo1.coords │ │ └── tomo1.mrc │ │ └── tomo2.coords │ │ └── tomo2.mrc │ │ └── tomo3.mrc │ │ └── tomo4.mrc │ │ └── ...
For above data,
tomo1.mrc
andtomo2.mrc
can be used as train/val dataset, since they all have coordinate files (matual annotation). If a tomogram has no matual annotation (such astomo3.mrc
), it only can be used as test dataset.
-
Input & Output
Launch the graphical user interface of DeepETPicker. On the
Preprocessing
page, please set some key parameters as follows:input
- Dataset name: e.g. SHREC_2021_preprocess
- Base path: path to base folder
- Coords path: path to raw_data folder
- Coords format: .csv, .coords or .txt
- Tomogram path: path to raw_data folder
- Tomogram format: .mrc or .rec
- Number of classes: multiple classes of macromolecules also can be localized separately
Output
- Label diameter(in voxels): the diameter of generated weak label, which is usually smaller than the average diameter of the particles. Empirically, you can set it as large as possible but should be smaller than the real diameter.
- Ocp diameter(in voxels): the real diameter of the particles. Empirically, in order to obtain good selection results, we recommend that the particle size is adjusted to the range of 20~30 by binning operation. For particles of multi-classes, their diameters should be separated with a comma.
- Configs: if you click 'Save configs', it would be the path to the file which contains all the parameters filled in this page
Note: Before `Training of DeepETPicker`, please do `Preprocessing` first to ensure that the basic parameters required for training are provided.
In practice, default parameters can give you a good enough result.
Training parameter description:
- Dataset name: e.g. SHREC_2021_train
- Dataset list: get the list of train/val tomograms. The first column denotes particle number, the second column denotes tomogram name, the third column denotes tomogram ids. If you have n tomograms, the ids will be {0, 1, 2, ..., n-1}.
- Train dataset ids: tomograms used for training. You can click
Dataset list
to obain the dataset ids firstly. One or multiple tomograms can be used as training tomograms. But make sure that thetraning dataset ids
are selected from {0, 1, 2, ..., n-1}, where n is the total number of tomograms obtained fromDataset list
. Here, we provides two ways to set dataset ids:- 0, 2, ...: different tomogram ids are separated with a comma.
- 0-m: where the ids of {0, 1, 2, ..., m-1} will be selected. Note: this way only can be used for tomograms with continuous ids.
- Val dataset ids: tomograms used for validation. You can click
Dataset list
to obain the dataset ids firstly. Note: only one tomogram can be selected as val dataset. - Number of classes: particle classes you want to pick
- Batch size: a number of samples processed before the model is updated. It is determined by your GPU memory, reducing this parameter might be helpful if you encounter out of memory error.
- Patch size: the sizes of subtomogram. It needs to be a multiple of 8. It is recommended that this value is not less than 64, and the default value is 72.
- Padding size: a hyperparameter of overlap-tile strategy. Usually, it can be from 6 to 12, and the default value is 12.
- Learning rate: the step size at each iteration while moving toward a minimum of a loss function.
- Max epoch: total training epochs. The default value 60 is usually sufficient.
- GPU id: the GPUs used for training, e.g. 0,1,2,3 denotes using GPUs of 0-4. You can run the following command to get the information of available GPUs: nvidia-smi.
- Save Configs: save the configs listed above. The saved configurations contains all the parameters filled in this page, which can be directly loaded via
Load configs
next time instead of filling them again.
In practice, default parameters can give you a good enough result.
Inference parameter description:
- Train Configs: path to the configuration file which has been saved in the
Training
step - Networks weights: path to the model which has be generated in the
Training
step - Patch size & Pad_size: tomogram is scanned with a specific stride S and a patch size of N in this stage, where S = N - 2*Pad_size.
- GPU id: the GPUs used for inference, e.g. 0,1,2,3 denotes using GPUs of 0-4. You can run the following command to get the information of available GPUs: nvidia-smi.
Coord format conversion
The predicted coordinates with extension *.coords
has four columns: class_id, x, y, z
. To facilitate users to perform the subsequent subtomogram averaging, format conversion of coordinate file is provided.
- Coords path: the path of coordinates data predicted by well-trained DeepETPicker.
- Output format: three formats can be converted, including
*.box
for EMAN2,*.star
for RELION,*.coords
for RELION.
Note: After converting the coordinates to *.coords
, one can get the coordinates with relion4 format using the script in https://github.com/cbmi-group/DeepETPicker/blob/main/utils/coords_to_relion4.py
directly.
-
Showing Tomogram
You can click the
Load tomogram
button on this page to load the tomogram. -
Showing Labels
After loading the coordinates file by clicking
Load labels
, you can clickShow result
to visualize the label. The label's diameter and width also can be tuned on the GUI. -
Parameter Adjustment
In order to increase the visualization of particles, Gaussian filtering and histogram equalization are provided:
- Filter: when choosing Gaussian, a Gaussian filter can be applied to the displayed tomogram, kernel_size and sigma(standard deviation) can be tuned to adjust the visual effects
- Contrast: when choosing hist-equ, histogram equalization can be performed
-
Mantual picking
After loading the tomogram and pressing ‘enable’, you can pick particles manually by double-click the left mouse button on the slices. If you want to delete an error labeling, just right-click the mouse. You can specify a different category id per class. Always remember to save the resuls when you finish.
-
Position Slider
You can scan through the volume in x, y and z directions by changing their values. For z-axis scanning, shortcut keys of Up/Down arrow can be used.
If you encounter any problems during installation or use of DeepETPicker, please contact us by email guole.liu@ia.ac.cn. We will help you as soon as possible.
If you use this code for your research, please cite our paper DeepETPicker: Fast and accurate 3D particle picking for cryo-electron tomography using weakly supervised deep learning.
@article{DeepETPicker,
title={DeepETPicker: Fast and accurate 3D particle picking for cryo-electron tomography using weakly supervised deep learning},
author={Guole Liu, Tongxin Niu, Mengxuan Qiu, Yun Zhu, Fei Sun, and Ge Yang},
journal={Nature Communications},
year={2024}
}