pip install -r requirements.txt
Dataset preparation: download the corresponding datasets and set the data folder in configuration YAML files.
- CIFAR-10 can be downloaded via
torchvision.datasets.CIFAR10()
- Indoor is accessible on Indoor Scene Recognition
- Kvasir Capsule is accessible on Simula
- ISIC 2019 is accessible on ISIC Challenge
The core data valuation and selection codes are provided in ./data_value/data_value.py
.
Modify the configurations in config/default.yaml
(including dataset path and training parameters), and run the following command:
python adaptive_selection.py
The random seed is kept the same as in the paper and the train-validation-test split should be the same. The splits are recorded in split_indices
folder as well for loading them manually. Load the dataset via our Dataset
s, split (if there are no official splits) train-test by train.npy
(the training and validation indices), and then split train-validation on the train split by train_no_val.npy
(the training indices excluding the validation indices).
As for GradMatch, we modify the official implementation.
- For the proposed method, run an observation experiment first by setting
observe: True
andfullset: True
inconfig/default.yaml
. Run adaptive data selection and obtain the observed GradNorm scores for each epoch. - Run
coreset_selection.py
and obtain the selected coreset indices. The indices will be stored in./data/coreset
together with those of compared methods. Previous run results are provided in the./data/coreset
indices folder. - Run
adaptive_selection.py
withconfig/coreset.yaml
for verification.
As for the other baselines, we select the coresets by modifying and running DeepCore.