In this work, we developed Vislocas, which identifies potential protein mis-localization events from IHC images, to mark different cancer subtypes. Vislocas combines CNN and vision transformer to capture IHC image features at both global and local levels. Vislocas can be trained from scratch to create an end-to-end system that can identify protein subcellular localizations directly from IHC images.
- Ubuntu 9.4.0-1ubuntu1~20.04.1
- RTX 2080 Ti(11GB) * 6
Requirements | Release |
---|---|
CUDA | 11.3 |
Python | 3.8.15 |
pytorch | 1.11.0 |
torchvision | 0.12.0 |
torchaudio | 0.11.0 |
cudatoolkit | 11.3 |
pandas | 1.2.4 |
fvcore | 0.1.5 |
opencv-python | 4.6.0.66 |
timm | 0.6.12 |
scipy | 1.9.3 |
einops | 0.6.0 |
matplotlib | 3.5.1 |
scikit-learn | 1.1.2 |
tensorboard | 2.11.0 |
adabelief-pytorch | 0.2.0 |
This folder stores the code files for data loading.
- ihc.py
This file includes ihc data loading code.
- build.py
This file includes building dataset code.
- loader.py
This file includes constructing loader code.
This folder stores the code files for data preparing.
- IF
This folder stores the code files used to analyse and process the IF labels.
- IHC
This folder stores the code files used to analyse, process and download the IHC data.
- pathology
This folder stores the code files used to analyse, process and download the pathology data.
Download and save the data annotation information to this folder.
All benchmark data has been deposited at Zenodo (https://doi.org/10.5281/zenodo.10632698)
The following files should be placed directly in the data
directory.
File | Descriptrion |
---|---|
data1.csv | The whole benchmark dataset |
data_train.csv | The full training set |
data_test.csv | The independent set |
data_train_spliti_foldj.csv | The training set for the jth-fold cross-validation of the i-th division |
data_val_spliti_foldj.csv | The validation set for the jth-fold cross-validation of the i-th division |
data_IHC_analysis.csv | Data statistics for the benchmark data set |
annotations.csv | Location annotation information for IF images |
tissueUrl.csv | All normal IHC images and their URLs in the HPA database. |
pathologyUrl.csv | All pathology IHC images and their URLs in the HPA database. |
normalWithAnnotation.csv | IHC data information with matching IF labels |
normalLabeled.csv | IHC data with labels |
The following files should be placed directly in the data/cancer
directory.
File | Descriptrion |
---|---|
normalGlioma.csv | The normal data for glioma |
patholotyGlioma.csv | The patholoty data for glioma |
normalMelanoma.csv | The normal data for melanoma |
patholotyMelanoma.csv | The patholoty data for melanoma |
normalSkinCancer.csv | The normal data for skin cancer |
patholotySkinCancer.csv | The patholoty data for skin cancer |
screenedNormalData.csv | Normal data filtered to the same image quality as the benchmark dataset |
screenedPathologyData.csv | Pathology tissue data filtered to the same image quality as the benchmark dataset |
This folder stores model-related code files, including Visloacas model code, loss function code, and model training-related code.
- cvr_utils
This folder stores convolutional tokenizer and transformer encoder module code.
- cct.py
This file includes Visloacas model code.
- classifier_model.py
This file includes load model code.
- train_classifier.py
This file includes model training-related code.
- loss.py
This file includes loss function code.
- criterion.py
This file includes criterion-related code.
This folder stores code files for model training, prediction, biomarker prediction, etc.
- train.py
This file includes model training code.
- test.py
This file includes model testing code.
- multi-instance.py
This file includes the code that aggregates the image-level results into protein-level results.
- cancerTest.py
This file includes the codes for screening biomarkers of cancer subtypes.
- cal_metrics.py
This file includes the code that calculates the performance metrics.
This folder stores the optimiser, scheduler, checkpoint and other utilities code files.
- checkpoint.py
This file includes checkpoint code.
- config_defaults.py
This file includes the parameter configuration code.
- distributed.py
This file includes the code for distributed training.
- eval_metrics.py
This file includes the utilities code for calculating the performance metrics.
- optimizer.py
This file includes the optimizer code.
- scheduler.py
This file includes the scheduler code.
This folder is used to store the output log messages.
This folder is used to store the output models and prediction results.
This folder is used to store the downloaded images.
All benchmark data has been deposited at Zenodo (https://doi.org/10.5281/zenodo.10632698)
The download address of the images can be obtained from file tissueUrl.csv
and pathologyUrl.csv
.
The program is written in Python 3.8.15 and to run the code we provide, so you need to install the environment.yml
through inputting the following command in command line mode:
conda env create -f environment.yml
- Train the Vislocas model, and save trained model to
./results/IHC
.
sh train.sh
- Predict subcellular location results for each image, and save the results to
./results/IHC
.
sh test.sh
- Aggregates the image-level results into protein-level results and calculate the performance metrics of the model.
sh cal_metrics.sh
In the result files obtained,
statistic_crossValidation_Vislocas_mlce_lr-000005_bn_drop-01_attn-drop-01_drop-path-01_batch12_seed6293_wd-005_aug_no-normalized_metrics.csv
andstatistic_proteinLevel_crossValidation_Vislocas_mlce_lr-000005_bn_drop-01_attn-drop-01_drop-path-01_batch12_seed6293_wd-005_aug_no-normalized_metrics.csv
are the image level results and protein level results, respectively. - Predict mis-localization-related cancer subtype biomarker, and save the results to
./results/cancer
.
sh cancerTest.sh