Title

Finding Hidden Features Responsible for Machine Learning Failures

Bin Folder
- heatmap.py - for construction of heatmaps
- roc.py - outputs an roc graph with area under the curve score
- roc-multi.py - outputs multiple roc graphs in one image with area under the curve scores
- test.py, train.py - respective scripts for testing and training the model
- test-ensemble.py - tests the 3 best epochs of our model and returns each epoch's passed probabilities for each image
Config Folder
- The .csv files are generated from scripts from our data preprocessing repository: https://github.com/askovdal/fhfrmlf
- config.json - includes the configuration of our model
data
- dataset.py - used to load data for train.py and test.py in Bin Folder
- imgaug.py - script for augmenting images
- utils.py - includes utility functions such as border padding and transformation
Model folder
- Backbone folder
  - init.py - initialization file
  - densenet.py - includes the different densenet architectures that can be called in the config.json file
  - inception.py - includes the inception architecture --
  - vgg.py - includes the different vgg architectures --
- attention_map.py - implements the attention map specified in the config.json file
- classifier.py - runs the classifier combination specified in the config.json file, giving the resulting model
- global_pool.py - implements the different global pooling operations, our study uses PCAM
- utils.py - includes utility functions to get normalization and optimiser types, as well as transforming tensor to numpy.
Utils folder
- heatmaper.py - implements the method for creating the heatmaps from the global pooling operation, our PCAM example is specified in this paper: https://arxiv.org/abs/2005.14480
- misc.py - includes a learning rate scheduling function
PCAM.png - An image describing the composition of the model used for our research
requirements.txt - A statement of what packages are needed and the minimal working version of them.

How to reproduce our results

Technical prerequisites

To reproduce our results, the following is needed:

4 CUDA GPUs
16 CPU cores
+64 GB RAM

These requirements are due to the large data size and model complexity.

Software requirements

The following software is needed to run the scripts:

Python 3
pip packages listed in requirements.txt

Training the model

Run the following script using python 3. The resulting model will be saved to logdir-50-50

python3 bin/train.py config/config.json logdir-50-50 --num_workers 16 --device_ids "0,1,2,3" --verbose True

Testing the model

Run the following scripts. The model will be tested on the three different experimental setups described in Section 5.1.1

python3 logdir-50-50/classification/bin/test.py --model_path "logdir-50-50/" --in_csv_path "config/pneumothorax-mixed-p-tube-split.csv" --out_csv_path "logdir-50-50/tube-split/test.csv" --device_ids "0,1,2,3" --num_workers 16
python3 logdir-50-50/classification/bin/test.py --model_path "logdir-50-50/" --in_csv_path "config/pneumothorax-mixed-p-w-tube.csv" --out_csv_path "logdir-50-50/w-tube/test.csv" --device_ids "0,1,2,3" --num_workers 16
python3 logdir-50-50/classification/bin/test.py --model_path "logdir-50-50/" --in_csv_path "config/pneumothorax-mixed-p-wo-tube.csv" --out_csv_path "logdir-50-50/wo-tube/test.csv" --device_ids "0,1,2,3" --num_workers 16

To generate the ROC curves, run the following script.

python3 logdir-50-50/classification/bin/roc-multi.py plot --plot_path0 "logdir-50-50/w-tube/" --plot_path1 "logdir-50-50/tube-split/" --plot_path2 "logdir-50-50/wo-tube/"

To generate heatmaps, create a txt file called images.txt containing the images you want to create heatmaps on, and run the following script. The heatmaps will be saved to the heatmaps folder.

python3 logdir-50-50/classification/bin/heatmap.py logdir-50-50/best.ckpt logdir-50-50/cfg.json images.txt heatmaps/ --device_ids '0'

Testing the ensemble

When training the model, the 3 best epochs were saved as best1.ckpt, best2.ckpt and best3.ckpt. Run the following script to test all and save their predictions.

python3 logdir-50-50/classification/bin/test-ensemble.py --model_path "logdir-50-50/" --in_csv_path "config/pneumothorax-mixed-p-w-tube.csv" --out_csv_path "logdir-50-50/ensemble-w-tube/test.csv" --device_ids "0,1,2,4" --num_workers 16

The resulting CSV-file can be used to create a correlation matrix with the script correlation.py in https://github.com/askovdal/fhfrmlf.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Title

Contents

How to reproduce our results

Technical prerequisites

Software requirements

Training the model

Testing the model

Testing the ensemble

About

Releases

Packages

Contributors 6

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
bin		bin
config		config
data		data
model		model
utils		utils
.gitignore		.gitignore
PCAM.png		PCAM.png
README.md		README.md
requirements.txt		requirements.txt

askovdal/fhfrmlf-model

Folders and files

Latest commit

History

Repository files navigation

Title

Contents

How to reproduce our results

Technical prerequisites

Software requirements

Training the model

Testing the model

Testing the ensemble

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages