Online Images Amplify Gender Bias

This github contains all of the replication materials for Guilbeault et al. (2024) "Online Images Amplify Gender Bias" https://www.nature.com/articles/s41586-024-07068-x

This git contains:
-An r script for replicating all main and supplementary analyses from the raw metadata ("master script.R").
-The raw image data collected and coded from Google and Wikipedia (see below for details)
-The raw data for all experiments (see below for details)
-Python code for training a skipgram model from scratch ("word2vec_training.ipynb").
-The word2vec model we retained on a recent sample of online news ("word2vec-retrained.model").
-A python script which demonstrates how to extract a gender dimension from a word embedding model and position social categories along this dimension (applied to our retrained word2vec model) ("gender_parity_retrained_word2vec.ipynb").

Notes:

1.0. The datasets used in this study are too large to upload as a csv to this git repository, so they can be accessed and downloaded at the following google drive links (you will need to update the filepaths in the replication r script so that it downloads data from your local computer):

Rothe, Rasmus, Radu Timofte, and Luc Van Gool. “Deep Expectation of Real and Apparent Age from a Single Image Without Facial Landmarks.” International Journal of Computer Vision 126, no. 2 (April 1, 2018): 144–57. https://doi.org/10.1007/s11263-016-0940-3.

1.2. The data associated with the Wikipedia-based Image Text Dataset is from the following paper:

Srinivasan, Krishna, Karthik Raman, Jiecao Chen, Michael Bendersky, and Marc Najork. “WIT: Wikipedia-Based Image Text Dataset for Multimodal Multilingual Machine Learning.” In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2443–49. New York, NY, USA: Association for Computing Machinery, 2021. https://doi.org/10.1145/3404835.3463257.

The WIT data did not originally contain the age and gender classifications. Our dataset includes the gender and social category classifications for images from this data, which we collected using human crowdsourcing from Mturk (see full paper for methodological details).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

gender_parity_retrained_word2vec.ipynb

gender_parity_retrained_word2vec.ipynb

master script.R

master script.R

word2vec-retrained.model

word2vec-retrained.model

word2vec_training.ipynb

word2vec_training.ipynb

Repository files navigation

Online Images Amplify Gender Bias

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
README.md		README.md
gender_parity_retrained_word2vec.ipynb		gender_parity_retrained_word2vec.ipynb
master script.R		master script.R
word2vec-retrained.model		word2vec-retrained.model
word2vec_training.ipynb		word2vec_training.ipynb

drguilbe/ImgVSText

Folders and files

Latest commit

History

Repository files navigation

Online Images Amplify Gender Bias

About

Resources

Stars

Watchers

Forks

Languages