Researchers have proposed to use data of human preference feedback to fine-tune text-to-image generative models. However, the scalability of human feedback collection has been limited by its reliance on manual annotation. Therefore, we develop and test a method to automatically annotate user preferences from their spontaneous facial expression reaction to the generated images. We collect a dataset of Facial Expression Reaction to Generated Images (FERGI) and show that the activations of multiple facial action units (AUs) are highly correlated with user evaluations of the generated images. Specifically, AU4 (brow lowerer) is reflective of negative evaluations of the generated image whereas AU12 (lip corner puller) is reflective of positive evaluations. These can be useful in two ways. Firstly, we can automatically annotate user preferences between image pairs with substantial difference in these AU responses with an accuracy significantly outperforming state-of-the-art scoring models. Secondly, directly integrating the AU responses with the scoring models improves their consistency with human preferences. Finally, this method of automatic annotation with facial expression analysis can be potentially generalized to other generation tasks.
The primary dependencies include NumPy, pandas, SciPy, Matplotlib, seaborn, OpenCV, MediaPipe, PyTorch, Torchvision, Trasnformers, CLIP, BLIP, ImageReward, and HPS v2.
The datasets for training the AU models DISFA and DISFA+ are supposed to be stored at "../FER_datasets/DISFA" and "../FER_datasets/DISFAPlus" respectively (paths specificed in config.py).
FERGI dataset is available for research purposes. Please request it by filling out this form. The dataset is supposed to be stored in the "data" folder. Although the raw dataset is not provided in the github repository, the processed facial features of the videos in the dataset has already been provided in the "data" folder.
Multiple pretrained models are used in our model training and analysis. They need to be downloaded from the following links and stored in the "pretrained_models" folder.
The face recognition model used as the pretrained model to fine-tune for training the AU recognition model can be downloaded here. The download link is provided in the github repository of InsightFace. It is supposed to be renamed as "glint360k_cosface_r50_fp16_0.1.pth" and stored in the "pretrained_models" folder after being downloaded.
The face detection model can be downloaded here. The download link is provided in the official document of MediaPipe.
The facial landmark detection model can be downloaded here. The download link is provided in the github repository of pytorch_face_landmark.
Run preprocess_DISFA.py and preprocess_DISFAPlus.py for preprocessing the AU datasets.
Run DISFAwithPlus_train_model.py for the training AU recognition model. The trained AU models will be saved in the folder "AU_models". The AU model used for following analysis can also be downloaded here. Note that this model is trained on DISFA and DISFA+ and thus should be used for research purposes only. If you use this model in your paper, you should also cite the papers of DISFA and DISFA+ (see the terms of use for DISFA and DISFA+) in addition to our paper.
Run clips_facial_process.py for processing the facial features of the videos in the FERGI dataset. The results from our model has already been provided in the "data" folder.
Run preprocess_image_data.py, preprocess_baseline_data.py, and preprocess_reaction_data.py for preprocessing the data of generated images, the data of baseline videos, and the data of reaction videos in the FERGI dataset respectively. The results are saved in the "preparation" folder.
Run filter_participants_based_on_AU4.py for excluding participants with unreliable, unstable AU4 estimation in following classification (Section 6.2 in the paper). The result is saved in the "preparation" folder.
Run image_preference_binary_classification_based_on_ranking.py for binary classification of image preferences (Section 6.2 in the paper). The results are saved in the "results" folder.
Run result_analysis.ipynb for analyzing and visualizing the results (Sections 6.1 and 6.2 in the paper). The visualizations are saved in the "figures" folder.
@article{feng2023fergi,
title={FERGI: Automatic Annotation of User Preferences for Text-to-Image Generation from Spontaneous Facial Expression Reaction},
author={Feng, Shuangquan and Ma, Junhua and de Sa, Virginia R},
journal={arXiv preprint arXiv:2312.03187},
year={2023}
}