An Empirical Study of Adaptive Data Selection on Real-world Images

Getting Started

pip install -r requirements.txt

Dataset preparation: download the corresponding datasets and set the data folder in configuration YAML files.

CIFAR-10 can be downloaded via torchvision.datasets.CIFAR10()
Indoor is accessible on Indoor Scene Recognition
Kvasir Capsule is accessible on Simula
ISIC 2019 is accessible on ISIC Challenge

Adaptive Data Selection

The core data valuation and selection codes are provided in ./data_value/data_value.py.

Modify the configurations in config/default.yaml (including dataset path and training parameters), and run the following command:

python adaptive_selection.py

The random seed is kept the same as in the paper and the train-validation-test split should be the same. The splits are recorded in split_indices folder as well for loading them manually. Load the dataset via our Datasets, split (if there are no official splits) train-test by train.npy (the training and validation indices), and then split train-validation on the train split by train_no_val.npy (the training indices excluding the validation indices).

As for GradMatch, we modify the official implementation.

Coreset Selection and Verification

For the proposed method, run an observation experiment first by setting observe: True and fullset: True in config/default.yaml. Run adaptive data selection and obtain the observed GradNorm scores for each epoch.
Run coreset_selection.py and obtain the selected coreset indices. The indices will be stored in ./data/coreset together with those of compared methods. Previous run results are provided in the ./data/coreset indices folder.
Run adaptive_selection.py with config/coreset.yaml for verification.

As for the other baselines, we select the coresets by modifying and running DeepCore.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config

config

data/coreset

data/coreset

data_value

data_value

split_indices

split_indices

utils

utils

README.md

README.md

adaptive_selection.py

adaptive_selection.py

coreset_selection.py

coreset_selection.py

requirements.txt

requirements.txt

Repository files navigation

An Empirical Study of Adaptive Data Selection on Real-world Images

Getting Started

Adaptive Data Selection

Coreset Selection and Verification

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
config		config
data/coreset		data/coreset
data_value		data_value
split_indices		split_indices
utils		utils
README.md		README.md
adaptive_selection.py		adaptive_selection.py
coreset_selection.py		coreset_selection.py
requirements.txt		requirements.txt

ZhenyuTANG2023/data_selection

Folders and files

Latest commit

History

Repository files navigation

An Empirical Study of Adaptive Data Selection on Real-world Images

Getting Started

Adaptive Data Selection

Coreset Selection and Verification

About

Resources

Stars

Watchers

Forks

Languages