You don't need no dataset: Leveraging synthetic datasets for training deep classifiers.

This package contains modules to generate fair datasets from a biased StyleGAN2 model and use these for training classifiers. As an example, this package supports face-swap detection.

Zero-Shot Racially Balanced Dataset Generation Using StyleGAN2

Paper (arXiv)

Setup

Our work heavily depends on the StyleGAN2 github repository. It can be cloned and extracted as follows.

$ git clone https://github.com/NVlabs/stylegan3 && mv styleGAN3/* ./

Fair Dataset Generation

To generate data via the proposed latent space exploration approach you will have to run the script independently for each racial groups you are interested in. The codebase currently supoorts 6 racial groups - Blacks, Indians, Asians, Hispanic, White, Middle Eastern - based on the Deepface ethnicity classifier.

The mode parameter can be generate - for generating sampling in the W space or generate_z for generating samples in the Z space. The start and end iter help in controlling parallel runs of the script. They refer to the seed values. The script supports the following racial groups - ["asian", "white", "middle_eastern", 'black', 'latino', 'indian']

You need for first generate synthetic identities using the proposed evolutionary algorithm. We saw a quality vs diversity trade-off between mutating in the Z and W+ latent spaces respectively - this should be kept in mind when deciding which one to prefer. In general, based on the results reported in the paper, we recommend the Z space which achieves similar results with higher quality images.

$ python3 main_latent_exploration.py --target_race <racial group> --mode <mutate_z, mutate> --start_iter 0 --end_iter 1000

Afterwhich, multiple samples per identity can be generated using the following command.

$ python3 main_latent_exploration.py --target_race <racial group> --mode <generate_z, generate> --start_iter 0 --end_iter 1000

You can also generate samples using random rejection sampling.

$ python3 main_latent_exploration.py --target_race <racial group> --mode rejection_sampling

Upcoming

Support for generating combinations of demographics.

If you use this package please consider citiing the corresponding paper.

@article{jain2023zero,
  title={Zero-shot racially balanced dataset generation using an existing biased StyleGAN2},
  author={Jain, Anubhav and Memon, Nasir and Togelius, Julian},
  booktitle = {International Joint Conference on Biometrics (IJCB 2023)},
  year={2023}
}

A Dataless FaceSwap Detection Approach Using Synthetic Images

Paper

This package contains the codebase for the paper titled "A Dataless FaceSwap Detection Approach Using Synthetic Images" that was accepted at IJCB 2022. The paper proposes a privacy preserving approach to detect faceswaps using of faces that don't exist. We make use of synthetic images generated using StyleGAN3 and show that it has various other benefits such as reduction in bias, learning more generalizable features and also generalizability to unseen faceswap models and datasets.

Setup

We currently support two models to generate faceswaps - SimSwap and Sberswap. These need to cloned and extracted in this repository for it to work. You can do so using the commands below. However, we write our own face cropper and thus the insightace_func needs to be removed from the directory.

$ git clone https://github.com/neuralchen/SimSwap
$ rm SimSwap/insightface_func && mv SimSwap/* ./
$ git clone https://github.com/ai-forever/sber-swap
$ rm sber-swap/insightface_func && mv sber-swap/* ./

We also use the StyleGAN3 github repository for generating synthetic images. It can similarly be cloned and extracted as follows:

$ git clone https://github.com/NVlabs/stylegan3 && mv styleGAN3/* ./

Training using Syntheic Images

To train the faceswap detection model - XceptionNet - using synthetic images, run the following command on your terminal.

$ python3 main.py --mode train --swap_model {simswap or sberswap} --batch_size 12 --n_steps 2000 --save_model_suffix synthetic

Training using Real data

$ python3 main.py --mode train_real_gpu --swap_model {simswap or sberswap} --batch_size 12 --n_steps 2000 --save_model_suffix real

Testing

You can test the models using your own trained models or use the pretrained models provided by us. For either of them you test the models on the FFHQ dataset, Celeba-HQ, ADFES or any of the subsets of the FaceForensics++ dataset. You need to download and extract the respective dataset in this working repository.

$ python3 main.py --mode test --test_dataset {ffhq, celeba-hq, adfes, ff-neuraltextures, ff-face2face, ff-faceswap, ff-faceshifter or ff-google} --full_test_model_path /path/to/the/test/model

If you use this package please consider citiing the corresponding paper.

   @INPROCEEDINGS{Jain_IJCB_2022,
            author = {Jain, Anubhav and Memon, Nasir and Togelius, Julian},
            title = {A Dataless FaceSwap Detection Approach Using Synthetic Images},
         booktitle = {International Joint Conference on Biometrics (IJCB 2022)},
            year = {2022},
            note = {Accepted for Publication in IJCB2022},
   }

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
insightface_func		insightface_func
.DS_Store		.DS_Store
README.md		README.md
balanced_set.py		balanced_set.py
cap_test.py		cap_test.py
dataset_balanced.py		dataset_balanced.py
main.py		main.py
main_latent_exploration.py		main_latent_exploration.py
sberswap.py		sberswap.py
simswap.py		simswap.py
stylegan3.py		stylegan3.py
test_uniqueness_identities.py		test_uniqueness_identities.py

anubhav1997/youneednodataset

Folders and files

Latest commit

History

Repository files navigation

You don't need no dataset: Leveraging synthetic datasets for training deep classifiers.

Zero-Shot Racially Balanced Dataset Generation Using StyleGAN2

Setup

Fair Dataset Generation

Upcoming

A Dataless FaceSwap Detection Approach Using Synthetic Images

Setup

Training using Syntheic Images

Training using Real data

Testing

About

Resources

Stars

Watchers

Forks

Languages