Skip to content

ForBlindRev/AIBias

Repository files navigation

This is the code to our paper "Fair and accurate age prediction using distribution aware data curation and augmentation".

Basic Overview


Data

In our whole procedure, we used 6 datasets in total. For pre-training, we used IMDB-WIKI dataset, which are separated into two subdatasets: WIKI and IMDB. For analysis and curating our Balanced Dataset, UTK-Face, MOPRH-2, Megaage-Asian and APPA-REAL datasets are utilized. For generalization test, FG-NET dataset is taken as a dataset from a total different distribution. These datasets are downloaded or purchased via the following links:

After downloading these datasets, they are required to be moved to the ./data folder extracted to their corresponding folders.


Data pre-processing

After downloading and unzipping data in the ./data folder, go into pre-processing folder and run the following code to construct Balanced Data.

python data_preprocess.py -dir <PATH_TO_DATA> -train_save_path <PATH_TO_TRAIN_DATA> -test_save_path <PATH_TO_TEST_DATA>

Results

After balancing, the dataset has the following distribution:

Training and Testing

When data is ready, run the train.py file to train the model and use the test.py file to test the model.

python train.py -datafolder <PATH_TO_DATA_FOLDER> -opt <OPT_METHOD> -train_path <PATH_TO_TRAIN_DATA> -test_path <PATH_TO_TEST_DATA> -model_name <MODEL_NAME> -dataset <DATASET_NAME> -num_epoches <num_epochs> -lr <LEARNING_RATE> -pretrained_model <PATH_TO_PRETRAINED_MODEL>
python test.py -test_path <PATH_TO_TEST_DATA> -result_folder <PATH_TO_SAVE_RESULTS> -trained_model <PATH_TO_TRAINED_MODEL> 

Data Augmentation and OOD_retrival

After training, runing the file data_augmentation.py to do the augmentation and OOD selecting to get augmentated data.

python data_augmentation -train_path <PATH_TO_TRAINING_DATA> -model_path <PATH_TO_TRAINED_MODEL> -in_path <PATH_TO_IN_DISTRIBUTION_DATA> -out_path <PATH_TO_OUT_OF_DISTRIBUTION_DATA> -batch_size <BATCH_SIZE> -quantile <QUANTILE_TO_SPLIT_DATA> -save_path <PATH_TO_SAVE_BALANCED_AUG_DATA> -aug_save_path <PATH_TO_SAVE_AUG_DATA>

Results

Augmentation OOD-Scores

Augmentated Data Training and Testing

Similarly, run the train.py and test.py to train and test the model on augmentated data.

python train.py -datafolder <PATH_TO_DATA_FOLDER> -opt <OPT_METHOD> -train_path <PATH_TO_TRAIN_DATA> -test_path <PATH_TO_TEST_DATA> -model_name <MODEL_NAME> -dataset <DATASET_NAME> -num_epoches <num_epochs> -lr <LEARNING_RATE> -trained_model <PATH_TO_PRETRAINED_MODEL>

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages