Fair Generation without Unfair Distortions: Debiasing Text-to-Image Generation with Entanglement-Free Attention (EFA)
This is the official PyTorch implementation for the ICCV 2025 paper "Fair Generation without Unfair Distortions: Debiasing Text-to-Image Generation with Entanglement-Free Attention".
Follow these steps to set up the environment:
# Create conda environment
conda create -n efa python=3.9
conda activate efa
# Install required packages
pip install -r requirements.txt
# Install modified diffusers library
pip install -e diffusersDownload the model from Hugging Face:
mkdir -p pretrained_models
cd pretrained_models
git lfs install
git clone https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5For mask generation, we use Grounded-SAM-2. Please refer to the Grounded-SAM-2 repository for installation and setup instructions. This is required for generating segmentation masks for the training data.
Generate training datasets for different attribute types:
# Generate datasets for gender, race, and gender×race combinations
bash scripts/create_dataset.sh [GPU_ID]This script will create three datasets: datasets/gender, datasets/race, and datasets/gxr.
After running this script, use Grounded-SAM-2 to generate corresponding mask images in datasets/{target_bias}_mask folders. We use the text prompt "person." for the mask generation.
The final folder structure will be:
datasets/
├── gender/
├── gender_mask/
├── race/
├── race_mask/
├── gxr/
└── gxr_mask/
# Gender
bash scripts/train_gender.sh [GPU_ID]
# Race
bash scripts/train_race.sh [GPU_ID]
# Gender × Race
bash scripts/train_gxr.sh [GPU_ID]During training, checkpoints will be saved in exps/{target_bias}.
Tip: When optimizing hyperparameters, start with high reg_w and seg_w values where EFA results closely match the original SD results.
Gradually decrease them while validating with val_professions in winobias_cfg to find the optimal hyperparameters that balance debiasing and preserving non-target features.
# Gender
bash scripts/test_gender.sh [GPU_ID]
# Race
bash scripts/test_race.sh [GPU_ID]
# Gender × Race
bash scripts/test_gxr.sh [GPU_ID]In the scripts, template_key indicates the text prompt template: '0' for the basic template and '1' for the complex template.
Sample results comparing the original SD with EFA-enhanced generation will be available in the exps/{target_bias} folders after running inference:
winobias_epc{test_epc}/images: Images generated with EFAwinobias_epc{test_epc}/original: Images generated from the original SDwinobias_epc{test_epc}/original_face_mask: Detected face regions
EFA is implemented based on the diffusers library with modifications to the following key components:
diffusers/src/diffusers/models/unet_2d_condition_exp.py- Modified UNet with EFA supportdiffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_exp.py- Enhanced pipelineadaptive_attention.py- Core EFA attention mechanismadaptive_unet.py- Adaptive UNet wrapper
If you find this work useful, please cite:
@InProceedings{Park_2025_ICCV,
author = {Park, Jeonghoon and Lee, Juyoung and Chung, Chaeyeon and Lee, Jaeseong and Choo, Jaegul and Gu, Jindong},
title = {Fair Generation without Unfair Distortions: Debiasing Text-to-Image Generation with Entanglement-Free Attention},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2025},
pages = {17567-17576}
}Our code is based on InterpretDiffusion repository.