This code was built upon the https://github.com/binli123/dsmil-wsi repository. So the organisation structure is largely
of the datasets folder is inherited. For faster computation, the csv features were converted into hdf5
and pt
files like in https://github.com/mahmoodlab/CLAM.
-
DHMC_MetaData_Release_1.0.csv - downloaded from https://bmirds.github.io/LungCancer/; gives predominant LUAD pattern
-
tcga_classes_extended_info.csv - see https://github.com/GeorgeBatch/TCGA-lung-histology-download/
-
tcga_dsmil_test_ids.csv - see https://github.com/GeorgeBatch/TCGA-lung-histology-download/
-
tcia_cptac_md5sum_hashes.txt - see https://github.com/GeorgeBatch/TCIA-CPTAC-lung-histology-download
-
tcia_cptac_luad_lusc_cohort.csv - see https://github.com/GeorgeBatch/TCIA-CPTAC-lung-histology-download
-
tcia_cptac_string_2_ouh_labels.csv - took unique values from tcia_cptac_luad_lusc_cohort.csv and manually mapped to labels inspired by OUH (Oxford University Hospitals) reports
Columns include the label
(LUAD vs LUSC) and paths to features:
features_csv_file_path
h5_file_path
pt_file_path
mapping = {
"LUAD": 0,
"LUSC": 1,
}
DHMC has only LUAD slides, so all entries in the label
field are 0:
TCGA has both LUAD and LUSC so entries in the label
field include 0 and 1:
Run the labels creation code notebook. The code will create the files in labels/experiment-label-files/.
Note, the combined dataset for training/validation is not the same as in the paper since the in-house DART dataset is not publicly available. The test set, however, is the same as in the paper and is fully available in the 8-label task and 5-label task.
George Batchkala is supported by Fergus Gleeson and the EPSRC Center for Doctoral Training in Health Data Science (EP/S02428X/1). The work was done as part of DART Lung Health Program (UKRI grant 40255).
The computational aspects of this research were supported by the Wellcome Trust Core Award Grant Number 203141/Z/16/Z and the NIHR Oxford BRC. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.