Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?


Failed to load latest commit information.
Latest commit message
Commit time

Mitigating Gender Bias In Captioning System

This is the pytorch implemention for WWW(The Web Conference)2021 paper “Mitigating Gender Bias In Captioning system” [Link][cite]. Recent studies have shown that captioning datasets, such as the COCO dataset, may contain severe social bias which could potentially lead to unintentional discrimination in learning models. In this work, we specifically focus on the gender bias problem.


  • pytorch==0.41
  • torchvision==0.2.1
  • python==3.6
  • scikit-image==0.16.2
  • numpy==1.18.1
  • h5py==2.10.0

COCO-GB: Dataset Creation and Analysis

The COCO-GB dataset are created for quantifying gender bias in models. We construct COCO-GB v1 based on a widely used split and create a gender-balanced secret test dataset. COCO-GB v2 is created by reorganizing the train/test split so that the gender-object joint distribution in training set is very different from testing set.

  • Gender-object joint distribution of COCO training dataset

  • Gender-object joint distribution of original COCO test dataset

  • Gender-object joint distribution of COCO-GB v1 secret test dataset

  • Gender-object joint distribution of COCO-GB v2 test dataset

Benchmarking Captioning Models on COCO-GB v1

To reveal the gender bias in existing models, we utilize the gender prediction performance to quantify bias learned by models. Models are trained on Karpathy split, obtain caption quality from original test split, and evaluate gender prediciton performance on the COCO-GB v1 secret test dataset.

Evaluation of Gender Prediction Accuracy

python Benchmarking_existing_models/ 

We saved the caption results of baseline models in folder Benchmarking_existing_models/json_results

Evaluation of Caption Quality

Please download the COCO official Evaluation Tool from

Image Captioning Model with Guided Attention

We propose a novel Guided Attention Image Captioning model (GAIC) to mitigate gender bias by self-supervising on model’s visual attention. GAIC has two complementary streams to encourage the model to explore correct gender features. The training pipeline can seamlessly add extra supervision to accelerate the self-exploration process. Besides, GAIC is model-agnostic and can be easily applied to various captioning models.

Data pipeline

See create_input_files() in

This reads the data downloaded and saves the following files –

An HDF5 file containing images for each split in an I, 3, 256, 256 tensor, where I is the number of images in the split. Pixel values are still in the range [0, 255], and are stored as unsigned 8-bit Ints. A JSON file for each split with a list of N_c * I encoded captions, where N_c is the number of captions sampled per image. These captions are in the same order as the images in the HDF5 file. Therefore, the ith caption will correspond to the i // N_cth image. A JSON file for each split with a list of N_c * I caption lengths. The ith value is the length of the ith caption, which corresponds to the i // N_cth image. A JSON file which contains the word_map, the word-to-index dictionary.

Training Baseline Model


We choose Att model as the baseline model. We train baseline for 5 epochs on COCO dataset.

Training GAIC Model


We Construct GAIC model based on the baseline model. Fine-tune the GAIC model on the fine-tune set for 1 epoch. For training GAICes model, please set the supervised_training = True.

Evaluation of Caption Quality and Gender Accuracy


This code will evaluate the model caption quality on the original test dataset, and evaluate model gender prediction accuracy on COCO-GB v1 or COCO-GB v2.

Show Qualitative Results


This code will show the attention maps for inputting images.