# Workshop // Exploring Gender Bias in Word Embedding

## https://learn.responsibly.ai/word-embedding

Powerd by [`responsibly`](https://docs.responsibly.ai/) - Toolkit for auditing and mitigating bias and fairness of machine learning systems 🔎🤖🧰

# Part Eleven: Your Turn! SOLUTIONS
<big>⌨️</big>

Note: The first two tasks require a basic background in Python programming. For the last task, you need some experience with Machine Learning and Natural Langauge Processing (NLP) as well.

In [None]:
from responsibly.we import load_w2v_small

w2v_small = load_w2v_small()

## Task 1 - Racial bias

Let's explor racial bias usint Tolga's approche. Will use the [`responsibly.we.BiasWordEmbedding`](http://docs.responsibly.ai/word-embedding-bias.html#ethically.we.bias.BiasWordEmbedding) class. `GenderBiasWE` is a sub-class of `BiasWordEmbedding`.

In [None]:
from responsibly.we import BiasWordEmbedding

w2v_small_racial_bias = BiasWordEmbedding(w2v_small, only_lower=True)

💎💎💎 Identify the racial direction using the `sum` method

In [None]:
white_common_names = ['Emily', 'Anne', 'Jill', 'Allison', 'Laurie', 'Sarah', 'Meredith', 'Carrie',
                      'Kristen', 'Todd', 'Neil', 'Geoffrey', 'Brett', 'Brendan', 'Greg', 'Matthew',
                      'Jay', 'Brad']

black_common_names = ['Aisha', 'Keisha', 'Tamika', 'Lakisha', 'Tanisha', 'Latoya', 'Kenya', 'Latonya',
                      'Ebony', 'Rasheed', 'Tremayne', 'Kareem', 'Darnell', 'Tyrone', 'Hakim', 'Jamal',
                      'Leroy', 'Jermaine']

w2v_small_racial_bias._identify_direction('Whites', 'Blacks',
                                          definitional=(white_common_names, black_common_names),
                                          method='sum')

Use the neutral profession names to measure the racial bias

In [None]:
from responsibly.we.data import BOLUKBASI_DATA

neutral_profession_names = BOLUKBASI_DATA['gender']['neutral_profession_names']

In [None]:
neutral_profession_names[:10]

In [None]:
import matplotlib.pylab as plt

f, ax = plt.subplots(1, figsize=(10, 10))

w2v_small_racial_bias.plot_projection_scores(neutral_profession_names, n_extreme=20, ax=ax);

Calculate the direct bias measure

In [None]:
w2v_small_racial_bias.calc_direct_bias(neutral_profession_names)

## Task 2 - Your WEAT test

Open the [word embedding demo page in `responsibly` documentation](http://docs.responsibly.ai/notebooks/demo-word-embedding-bias.html#it-is-possible-also-to-expirements-with-new-target-word-sets-as-in-this-example-citizen-immigrant), and look on the use of the function `calc_weat_pleasant_unpleasant_attribute`. What was the attempt in that experiment? What was the result? Can you come up with other experiments?

In [None]:
from responsibly.we import calc_weat_pleasant_unpleasant_attribute

In [None]:
targets = {'first_target': {'name': 'White common names',
                            'words': white_common_names},
          'second_target': {'name': 'Black common names',
                            'words': black_common_names}}

calc_weat_pleasant_unpleasant_attribute(w2v_small, **targets,
                                        pvalue_kwargs={'method': 'approximate'})

## Task 3 - Sentiment Analysis

#### Notes:
1. This task requires some background with NLP, particularly with training a text classifier in Python.
2. Our goal is to learn how word embeddings might affect downstream application from a gender bias perspective, focusing on learning. So we won't follow the best practices in NLP or use the most advanced techniques.

One way to examine bias in word embeddings is through a downstream application. Here we will use a sentiment analysis classifier of tweets; given a tweet, the system would infer the *valence* of the sentiment expressed in a tweet. The valence is expressed as a real number between 0 and 1, where 0 represents the negative and 1 is for the positive end.

The system is going to be rather simple and consists of three components:

1. Preprocessing (e.g., removing stopwords and punctuation, [tockenization](https://en.wikipedia.org/wiki/Text_segmentation#Word_segmentation))
2. Transforming the tweets' tokens  into a single 300-dimensional vector.
3. Applying logistic regression to predict the valence.

Our goal is to assess the word embedding's impact in its original version and the neutralize-"debiased" one on the system bias. We are going to build two versions of that system, each using one version of the two word embedding, and compare its performance on the [Equity Evaluation Corpus (EEC)](http://saifmohammad.com/WebPages/Biases-SA.html), which is designed to assess gender bias in sentiment analysis systems.

**Reference:**
Kiritchenko, S., & Mohammad, S. M. (2018). [Examining gender and race bias in two hundred sentiment analysis systems](https://arxiv.org/pdf/1805.04508.pdf). arXiv preprint arXiv:1805.04508.

### Data

First, let's load the datasets "Affect in Tweets" taken from the [SemEval 2018](https://competitions.codalab.org/competitions/17751#learn_the_details-datasets) competition. We have training, development, and test datasets. We will only use the first and the last datasets, but feel free to use the development dataset to tune select models and hyperparameters with cross-validation.

There are three columns:

1. `Tweet` - The tweet itself as a string, the input.
2. `Intensity Score` - The sentiment's valence of the tweet in the range [0, 1], the output
3. `Affect Dimension` - You can ignore it. It is `'valence'` for all of the data points.


In [None]:
import pandas as pd


train_df = pd.read_csv('../data/SemEval2018-Task1-all-data/English/V-reg/2018-Valence-reg-En-train.txt',
                       sep='\t', index_col=0)
dev_df = pd.read_csv('../data/SemEval2018-Task1-all-data/English/V-reg/2018-Valence-reg-En-dev.txt',
                       sep='\t', index_col=0)
test_df = pd.read_csv('../data/SemEval2018-Task1-all-data/English/V-reg/2018-Valence-reg-En-test-gold.txt',
                       sep='\t', index_col=0)

In [None]:
# A few examples

train_df.head()

In [None]:
# Convert all the labels from real numbers into boolean values,
# setting the threshold at 0.5, and creating a new column named
# `label`

train_df['label'] = train_df['Intensity Score'] > 0.5
dev_df['label'] = dev_df['Intensity Score'] > 0.5
test_df['label'] = test_df['Intensity Score'] > 0.5

Now, let's download the **complete** word2voc word embedding, (which is not filtered only to lowercased words), and load it using the `gensim` Python package.

In [None]:
!wget https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz

In [None]:
from gensim.models import KeyedVectors

# Limit vocabulary to top-500K most frequent words
VOCAB_SIZE = 500000

# Load the word2vec
w2v_model = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz',
                                              binary=True,
                                              limit=VOCAB_SIZE)

In [None]:
# Get the vector embedding for a word
w2v_model['home']

In [None]:
# Check whether there is an embedding for a word
'bazinga' in w2v_model

### Preprocessing & feature extraction

Before we transform a tweet into a vector of 300 dimensions, it should be broken into tokens ("words") and be cleaned. You can do that with various Python packages for NLP, such as [NLTK](https://www.nltk.org/) and 
[spaCy](https://spacy.io/). Feel free to use them if you would like to! We will use the basic preprocessing functionality that comes with [`gensim`](https://radimrehurek.com/gensim/parsing/preprocessing.html).

In [None]:
from gensim.parsing.preprocessing import (preprocess_string,
                                          strip_tags,
                                          strip_punctuation,
                                          strip_multiple_whitespaces,
                                          strip_numeric,
                                          remove_stopwords)


# We pick a subset of the default filters,
# in particular, we do not take
# strip_short() and stem_text().
FILTERS = [strip_punctuation,
           strip_tags,
           strip_multiple_whitespaces,
           strip_numeric,
           remove_stopwords]

# See how the sentece is transformed into tokes (words)
preprocess_string('This is a "short" text!', FILTERS)

After preprocessing all the tweets, we get tokens. We transform each token into a 300d vector using the word embedding and then compute the *average* vector. It will have 300 dimensions as well. This vector serves as the values of the features for each tweet. 

Note for these two possible pitfalls:

1. Make sure that the token exists int he word embedding.
2. Sometimes, there are tweets without any token found in the word embedding. Discard these tweets from the data. Keep in mind that you should discard the labels as well.

Write the function `generate_text_features(text, w2v)` that gets a string `text` and a word embedding `w2v` and produces the features of this text according to the method xdescribed above. The function should return an Numpy array with lengh of 300.

In [None]:
### SOLUTION ###

import numpy as np


def generate_text_features(text, w2v):
    preprocessed_text = preprocess_string(text, FILTERS)
    vectors = [w2v[token] for token in preprocessed_text
              if token in w2v]
    return np.mean(vectors, axis=0)

Now, use this function to produce the features for all three datasets (training, validation, test).

In [None]:
### SOLUTION ###

def generate_dataset_features(df, w2v, text_col, label_col=None):
    features = df[text_col].apply(lambda t: generate_text_features(t, w2v))
    na_mask = features.isna()

    features = features[~na_mask]
    X = np.stack(features)

    y = (df[label_col][~na_mask].round().astype(int)
    if label_col else None)

    return X, y


X_train, y_train = generate_dataset_features(train_df, w2v_model,
                                             'Tweet', 'Intensity Score')
X_dev, y_dev = generate_dataset_features(dev_df, w2v_model,
                                         'Tweet', 'Intensity Score')
X_test, y_test = generate_dataset_features(test_df, w2v_model,
                                           'Tweet', 'Intensity Score')

### Training a classifier

The next step is straightforward, train logistic regression on the dataset. Report the accuracy of the training and the test dataset.

We recommend using [`sklearn.linear_model.LogisticRegression`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html).

In [None]:
### SOLUTION ###

from sklearn.linear_model import LogisticRegression

clf = LogisticRegression().fit(X_train, y_train)
print(clf.score(X_train, y_train), clf.score(X_test, y_test))

### Evaluate gender bias in the downstream appliation

The **Equity Evaluation Corpus (EEC)** consists of 8,640 English sentences carefully chosen to tease out biases towards certain races and genders.

We foucs on the sentences releated to gender. Every sentence is a build out of three elements:

1. Person (e.g., `he`, `this woman`, `my uncly`, `my mother`)
2. Emotion word (e.g., `anger`, `happy`, `gloomy`, `amazing`)
3. Template (e.g., `<person subject> feels <emotion word>`).

that are mixed together to form a sentence, for examples:
* he feels anger
* she feels anger
* this woman feels happy
* this man feels happy

Thanks to this systemic constraction from templates, the sentence are paired by gender, i.e. the EEC data is built of pairs of sentences that are all the same except for a gender noun (e.g., `she`-`he`, `my mother`-`my father`). If we think about sentiment analysis, there is no reason that the a system would assign different prediction to the paird sentences! So if we find differce in that, it could point for a potential gender bias in the downstream application.

#### Keep in mind, this is only an operalization of the gender bias in a sentiment analysis system. All the issues with a concreate and single measurement arise also here! We should always take into accout the human contex in which the system is deployed!


The following cell is just for some data preperation, and it is not important to understand it; neverthless, make sure you run it!

In [None]:
# 🛠 Prepare the EEC data, no need to dig into this cell

eec_df = pd.read_csv('../data/Equity-Evaluation-Corpus/Equity-Evaluation-Corpus.csv')

# Remove the sentences for evaluating racial bias
gender_eec_df = eec_df[eec_df['Race'].isna()][:]

# Create identifier to mach sentence pairs
# The EEC data comes withot this matching
MALE_PERSONS = ('he', 'this man', 'this boy', 'my brother', 'my son', 'my husband',
                'my boyfriend', 'my father', 'my uncle', 'my dad', 'him')

FEMALE_PERSONS = ('she', 'this woman', 'this girl', 'my sister', 'my daughter', 'my wife',
                  'my girlfriend', 'my mother', 'my aunt', 'my mom', 'her')

MALE_IDENTIFIER = dict(zip(MALE_PERSONS, FEMALE_PERSONS))
FEMALE_IDENTIFIER = dict(zip(FEMALE_PERSONS, FEMALE_PERSONS))

PERSON_MATCH_WORDS = {**MALE_IDENTIFIER,
                      **FEMALE_IDENTIFIER}

gender_eec_df['PersonIdentifier'] = gender_eec_df['Person'].map(PERSON_MATCH_WORDS)

gender_eec_df = gender_eec_df.sort_values(['Gender', 'Template', 'Emotion word', 'PersonIdentifier'])

gender_split_index = len(gender_eec_df) // 2

# Create two DataFrames, one for 
female_eec_df = gender_eec_df[:gender_split_index].reset_index(False)
male_eec_df = gender_eec_df[gender_split_index:].reset_index(False)

In [None]:
female_eec_df.head()

In [None]:
male_eec_df.head()

Note that the two DataFrames are paired by index. If we take that *i*-th row in each one of them, then they are different only in the matched person word:

In [None]:
k = 543  # change my value and run the cell again!
female_eec_df.iloc[k]['Sentence'], male_eec_df.iloc[k]['Sentence']

Compute the probability estimations of the classifier for the female and male parts in the EEC data. If you use `sklearn`, then the classifier's method `predict_proba` is your friend for that!

In [None]:
### SOLUTION ###

X_male_eec, _ = generate_dataset_features(male_eec_df, w2v_model, 'Sentence')
X_female_eec, _ = generate_dataset_features(female_eec_df, w2v_model, 'Sentence')

male_eec_df['probs_orig'] = clf.predict_proba(X_male_eec)[:, 1]
female_eec_df['probs_orig'] = clf.predict_proba(X_female_eec)[:, 1]

### Do the same for the neutralize-"debiased" word2vec

Perform the all the previous steps for the neutralize-"debiased" word2vec to produce the probability estimations of the EEC data for the classifier using that word-embedding

#### Neutralize-"debias" the word embedding

Hints:
1. Use [`responsibly.we.GenderBiasWE`](https://docs.responsibly.ai/word-embedding-bias.html). 
2. Look for the method `debias`.
3. Set the `method` argument to `'neutralize'`. 
4. Make sure that you set `inplace=True` to save memory. Note that you won't be able to work with the original word embedding after that.
5. Validate the neutralize-"debias" was applied by computing the direct bias measure with the method `calc_direct_bias`.
6. After the bias mitigating, the word embedding itself (as a `KeyedVectors` of `gensim`), is accessible through the attribute `model`.

In [None]:
### SOLUTION ###

from responsibly.we import GenderBiasWE

gbwe = GenderBiasWE(w2v_model)
gbwe.debias('neutralize', inplace=True)
print(gbwe.calc_direct_bias())
w2v_db_model = gbwe.model

#### Generate features with the "debiased" word embedding and train a new classifier

Check the classifier's accuracy on the training and the test data - did the "debiasing" of the word embeddings hurt the classifier performance?

In [None]:
### SOLUTION ###

X_db_train, y_db_train = generate_dataset_features(train_df, w2v_db_model,
                                             'Tweet', 'Intensity Score')
X_db_dev, y_db_dev = generate_dataset_features(dev_df, w2v_db_model,
                                         'Tweet', 'Intensity Score')
X_db_test, y_db_test = generate_dataset_features(test_df, w2v_db_model,
                                           'Tweet', 'Intensity Score')


clf_db = LogisticRegression().fit(X_db_train, y_db_train)
print(clf_db.score(X_db_train, y_db_train), clf_db.score(X_db_test, y_db_test))

#### Compute the probability estimations for the male and female sentences in the EEC data with the new classifier

In [None]:
### SOLUTION ###

X_db_male_eec, _ = generate_dataset_features(male_eec_df, w2v_model, 'Sentence')
X_db_female_eec, _ = generate_dataset_features(female_eec_df, w2v_model, 'Sentence')

male_eec_df['probs_db'] = clf_db.predict_proba(X_db_male_eec)[:, 1]
female_eec_df['probs_db'] = clf_db.predict_proba(X_db_female_eec)[:, 1]

### Gender bias analysis

Now we are ready to blend all together. You have two classifiers, each one of them was trained on the same dataset, but with a different word embedding. The first used the original word2vec, and the other was undergone the neutralize-"debias" process. We computed the probability estimates for the EEC data twice for each one of the classifiers.


**Think about how to evaluate the impact of replacing the word embedding concerning gender bias. Keep in mind that the female and male EEC data is paired!**

#### Your analysis can take two points of view (there are more, but you start with that):
1. Analyze the difference between the female and male probability estimations for each system *separately* and compare the results.
2. Analyze the difference of differences; start with the difference of probability estimations between the paired female and male sentences for each system, and then compare the two differences.


#### Few possible ideas of what to do:
1. Plot distributions  ([`seaborn.displot`](https://seaborn.pydata.org/generated/seaborn.displot.html#seaborn.displot))
2. Compute the [effect size](https://en.wikipedia.org/wiki/Effect_size#Cohen's_d)
3. Perform statistical hypothesis testings to check whether means are eqaul using the paired t-test ([`scipy.stats.ttest_rel`])

In [None]:
### SOLUTION ###

import matplotlib.pylab as plt
import seaborn as sns

diff_orig = male_eec_df['probs_orig'] - female_eec_df['probs_orig']
diff_db = male_eec_df['probs_db'] - female_eec_df['probs_db']


##############################


_, ax = plt.subplots()
sns.kdeplot(male_eec_df['probs_orig'], label='original-male', ax=ax)
sns.kdeplot(female_eec_df['probs_orig'], label='original-female', ax=ax)
sns.kdeplot(male_eec_df['probs_db'], label='db-male', ax=ax)
sns.kdeplot(female_eec_df['probs_db'], label='db-female', ax=ax)
ax.legend()

_, ax = plt.subplots()
sns.ecdfplot(diff_orig, label='original', ax=ax)
sns.ecdfplot(diff_db, label='db', ax=ax)
ax.legend()

_, ax = plt.subplots()
sns.kdeplot(diff_orig, label='original', ax=ax)
sns.kdeplot(diff_db, label='db', ax=ax)
ax.legend()


##############################


from scipy.stats import ttest_rel


print(ttest_rel(male_eec_df['probs_orig'], female_eec_df['probs_orig']))

print(ttest_rel(male_eec_df['probs_db'], female_eec_df['probs_db']))

#### What is your conclusion? What would be your next steps?

Consider:

1. Group by the analysis according to the EEC columns (e.g., by emotion)
2. Try another classifier (e.g., `sklearn.ensemble.RandomForestClassifier`)
3. Change the mitigation bias to *hard* instead of *neutralize*.
4. Analyze the training data from gender prespective



Refer to this paper for some ideas:
[Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems](http://saifmohammad.com/WebDocs/EEC/ethics-StarSem-final_with_appendix.pdf). Svetlana Kiritchenko and Saif M. Mohammad. In Proceedings of *Sem, New Orleans, LA, USA, June 2018.

#### Finding
The results of the paired t-tests suggest that the "neutralize" method reduced the gender bias in the senitment analysis system **AS IT IS MEASURED BY THE EEC DATA**.