# Exploring and Mitigating Gender Bias in Word Embeddings

## 🛠️Setup

In [None]:
!pip install gensim

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
import os
import re
import sys
import csv
import string
import unicodedata

import numpy             as np
import pandas            as pd

import matplotlib.colors as colors
import matplotlib.pyplot as plt
import seaborn           as sns

from matplotlib.pyplot import figure

import gensim
from gensim.models import KeyedVectors

from numpy.linalg import norm

In [None]:
plt.style.use('seaborn-pastel')

In [None]:
# Load the Drive helper and mount
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Debiasing Static Word Embeddings

### Word Embeddings

***What are Word Embeddings?***

**Word embeddings** are a method for representing words in a continuous vector space, so that semantically similar words are mapped to neighbouring vector points. This makes it possible to perform mathematical operations such as vector addition and subtraction on the words. Word embeddings are often employed in natural language processing tasks, such as language translation and text classification, since they offer a method for converting discrete words into a format that can be fed into machine learning models. There are several ways for generating word embeddings, including neural network-based approaches, such as word2vec and GloVe, and count-based methods, such as Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA).

In [None]:
# load pretrained model
model = KeyedVectors.load_word2vec_format('/content/drive/MyDrive/MT/Pretrained Models/archive.zip (Unzipped Files)/GoogleNews-vectors-negative300.bin', binary=True)

In [None]:
# vocabulary size
w2v_vocabulary = model.wv.vocab
len(w2v_vocabulary)

  w2v_vocabulary = model.wv.vocab


3000000

***Problem statement:*** The vocabulary of Word2Vec contains gender-neutral English terms such as "doctor," but during training, the model picked up on social prejudices.

### Identifying Gender Bias

#### Identify Gender Subspace

In [None]:
model.most_similar(positive=['mother', 'male'],
                   negative=['female'])

[('stepfather', 0.7652485370635986),
 ('father', 0.7571325898170471),
 ('grandmother', 0.7490994930267334),
 ('aunt', 0.7424759864807129),
 ('daughter', 0.7276815176010132),
 ('son', 0.7222350239753723),
 ('stepmother', 0.693554162979126),
 ('siblings', 0.6783632040023804),
 ('maternal_grandmother', 0.6747703552246094),
 ('niece', 0.6711771488189697)]

In [None]:
model.most_similar(positive=['scientist', 'female'],
                   negative=['male'])

[('researcher', 0.6796072125434875),
 ('physicist', 0.6226291060447693),
 ('microbiologist', 0.5891815423965454),
 ('biochemist', 0.5856112837791443),
 ('geneticist', 0.579893171787262),
 ('biologist', 0.5766334533691406),
 ('professor', 0.5546311140060425),
 ('molecular_biologist', 0.5460183620452881),
 ('geochemist', 0.5431622266769409),
 ('ecologist', 0.5383110046386719)]

In [None]:
model.most_similar(positive=['scientist', 'she'],
                   negative=['he'])

[('researcher', 0.6531404256820679),
 ('geneticist', 0.52978515625),
 ('biologist', 0.5268193483352661),
 ('physicist', 0.5161564350128174),
 ('Researcher', 0.5157882571220398),
 ('doctoral_student', 0.5141373872756958),
 ('biochemist', 0.5123931169509888),
 ('professor', 0.507071316242218),
 ('microbiologist', 0.5021166801452637),
 ('marine_biologist', 0.49474790692329407)]

In [None]:
model.most_similar(positive=['doctor', 'she'],
                   negative=['he'])

[('nurse', 0.6588720679283142),
 ('gynecologist', 0.6471721529960632),
 ('nurse_practitioner', 0.6255377531051636),
 ('midwife', 0.600278377532959),
 ('pediatrician', 0.5921323299407959),
 ('dermatologist', 0.5582225322723389),
 ('ob_gyn', 0.5563921928405762),
 ('pharmacist', 0.5559877753257751),
 ('doctors', 0.5544068217277527),
 ('nurse_midwife', 0.554105281829834)]

In [None]:
model.most_similar(positive=['doctor', 'she'],
                   negative=['he'])

[('nurse', 0.6588720679283142),
 ('gynecologist', 0.6471721529960632),
 ('nurse_practitioner', 0.6255377531051636),
 ('midwife', 0.600278377532959),
 ('pediatrician', 0.5921323299407959),
 ('dermatologist', 0.5582225322723389),
 ('ob_gyn', 0.5563921928405762),
 ('pharmacist', 0.5559877753257751),
 ('doctors', 0.5544068217277527),
 ('nurse_midwife', 0.554105281829834)]

### Solution:

#### Identify gender subspace

First, we define gender-specific terms (essentially, English words that are not gender-neutral by definition), and then we calculate the difference between sets of gender-specific words. The bias subspace is computed by calculating the Singular Value Decomposition of these differences.

In [None]:
gender_specifc_words = ['boy', 'man', 'girl', 'woman', 'male', 'female', 'she', 'he']

In [None]:
b1 = model.get_vector('she') - model.get_vector('he')

In [None]:
bias_direction = b1

#### Hard de-biasing

##### Neutralize

After getting the bias direction b, we will subtract the embedding's projection into the bias axis b to eliminate the bias components from all gender-neutral terms.

In [None]:
embedding = model.vectors

In [None]:
model_debiased_embedding = embedding - ((embedding * bias_direction) / norm(bias_direction) ** 2) @ bias_direction.reshape(-1,1)

In [None]:
model_debiased_embedding.shape

(3000000, 300)

##### Equalize

In the last phase, gender-specific terms, such as "boy" and "girl," should not differ by gender in an unequal manner, i.e., "boy" should not be more masculine than "girl" is feminine.

In [None]:
equalize_pairs = [("estrogen", "testosterone")]

In [None]:
for (word1, word2) in equalize_pairs:
  embedding_word1 = model_debiased_embedding[model.vocab.get(word1).index, :]
  embedding_word2 = model_debiased_embedding[model.vocab.get(word2).index, :]
  mean_embeddings = (embedding_word1 + embedding_word2) / 2.

  niu = mean_embeddings - ((mean_embeddings * bias_direction) / norm(bias_direction)) @ bias_direction.reshape(-1,1)
  insert_unknown_greek_letter = np.sqrt(abs(1 - norm(niu) ** 2))
  if np.dot((embedding_word1 - embedding_word2), bias_direction) > 0:
    insert_unknown_greek_letter = -insert_unknown_greek_letter

  embedding_word1 = insert_unknown_greek_letter * bias_direction + niu
  embedding_word2 = (-1.) * insert_unknown_greek_letter * bias_direction + niu