A framework to train weakly supervised deep learning models on whole-slide images.
Install dependecies of the project with conda:
$ mamba env create --file conda.env.yaml
The configuration file instructs the software about how it is suppose to use the resources, such as metadata file, file locations, artifact directory, target variable. See the example configuration available in configs/example_config.yaml
The metadata file is used by the software to keep track of the slides and its classification targets. Of note, you can set several targets for the slide by adding new target columns. Slides can be skipped in a given target column by setting it to NaN
. See the example metadata file available in inputs/example_metadata.tsv
.
The weights of the feature extractor model can be downloaded here: RetCCL,
download the weights and add the RetCCL_resnet50_weights.pth
file to the pretrained_models
directory.
The weights of the tile filtering models can be downloaded here.
More information regarding these networks can be found here.
The common.target_label
setting, on the config file selects which column of the metadata file will be used as target variable.
The config file have many other paramters that can be changed, check the configs/example_config.yaml file.
The pipeline is split into the following steps:
- create_tiles: create a tile map of all slides and filter tiles of interest according to the pretrained filter NN.
- extract_features: run the feature extraction network on all tiles of interest
- create_splits: create the train/test splits for all folds
- train_model: train the model on all folds
- create_att_heatmaps: generate attention heatmaps from the trained NN results
Run each step in order with the run_steps.sh
script.
wsi-mil code is released under the GPLv3 License.
Models implemented and used here are a re-implementation and/or inspired on existing works: