What Should be Balanced in a "Balanced'' Dataset?

Paper details

Haiyu Wu, Kevin W. Bowyer, What Should Be Balanced in a "Balanced“ Face Recognition Dataset?, BMVC, 2023, arXiv:2304.09818.

Abstract

The issue of demographic disparities in face recognition accuracy has attracted increasing attention in recent years. Various face image datasets have been proposed as ’fair’ or ’balanced’ to assess the accuracy of face recognition algorithms across demographics. These datasets typically balance the number of identities and images across demographics. It is important to note that the number of identities and images in an evaluation dataset are not driving factors for 1-to-1 face matching accuracy. Moreover, balancing the number of identities and images does not ensure balance in other factors known to impact accuracy, such as head pose, brightness, and image quality. We demonstrate these issues using several recently proposed datasets. To improve the ability to perform less biased evaluations, we propose a bias-aware toolkit that facilitates creation of cross-demographic evaluation datasets balanced on factors mentioned in this paper.

Citation

If you use any part of our code or dataset, please cite our paper and VGGFace2 paper.

@article{wu2023real,
  title={A Real Balanced Dataset For Understanding Bias? Factors That Impact Accuracy, Not Numbers of Identities and Images},
  author={Wu, Haiyu and Bowyer, Kevin W},
  journal={arXiv preprint arXiv:2304.09818},
  year={2023}
}

@inproceedings{cao2018vggface2,
  title={Vggface2: A dataset for recognising faces across pose and age},
  author={Cao, Qiong and Shen, Li and Xie, Weidi and Parkhi, Omkar M and Zisserman, Andrew},
  booktitle={2018 13th IEEE international conference on automatic face \& gesture recognition (FG 2018)},
  pages={67--74},
  year={2018},
  organization={IEEE}
}

Dataset Collection

!!!!!UPDATE!!!!!

Now you can get the dataset at BA-test and extract the images by:

python3 get_images.py -zx .zx/file/path

Other Way

You need to have the VGGFace2 (original version) dataset first, then use file_path_extractor.py to collect the image paths:

python3 file_path_extractor.py -s folder/path/of/vggface2 -d ./ -sfn vggface2 -end_with jpg

Then collecting BA-test images by running image_collection.py to collect the images and store the image paths to the "BA-test.txt" file.

python3 image_collection.py -im_path vggface2.txt -label ./BA-test/labels.csv

The last step is using run_face_alignment.py in img2pose repo to crop and align images.

Benchmark Collection

After you having the BA-test dataset, you can simply run image_collection.py to collect the benchmark images.

python3 image_collection.py -im_path BA-test.txt -label ./benchmark/benchmark_labels.csv -dest ./benchmark/images -n BA-test_benchmark

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BA-test

BA-test

benchmark

benchmark

README.md

README.md

get_images.py

get_images.py

image_collection.py

image_collection.py

Repository files navigation

What Should be Balanced in a "Balanced'' Dataset?

Paper details

Abstract

Citation

Dataset Collection

!!!!!UPDATE!!!!!

Other Way

Benchmark Collection

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
BA-test		BA-test
benchmark		benchmark
README.md		README.md
get_images.py		get_images.py
image_collection.py		image_collection.py

HaiyuWu/BA-test-dataset

Folders and files

Latest commit

History

Repository files navigation

What Should be Balanced in a "Balanced'' Dataset?

Paper details

Abstract

Citation

Dataset Collection

!!!!!UPDATE!!!!!

Other Way

Benchmark Collection

About

Resources

Stars

Watchers

Forks

Languages