# Responsible Data Science
# Bias in Words Embeddings

### Powerd by [`ethically`](https://docs.ethically.ai/) - Toolkit for Auditing and Mitigating Bias and Fairness of Machine Learning Systems 🔎🤖🔧

## by Shlomi Hod


![](images/banner.png)

### Legend:
# 💎 Important
# 🛠️ Setup/Technical (aka the code is not important)
# 🦄 Out of scope 

![](images/banner.png)

## Install `ethically`

In [None]:
!pip install --user ethically

![](images/banner.png)

## Motivation: Why to Learn Word Embeddings?

### One-Hot Representation

![](https://www.tensorflow.org/images/audio-image-text.png)
<small>Source: [Tensorflow Documentation](https://www.tensorflow.org/tutorials/representation/word2vec)</small>


## 💎 Idea: Embedding a word in a n-dimensional space

### Distributional Hypothesis
> "a word is characterized by the company it keeps" - John Rupert Firth

#### Training (ot of scope): using *word-context* relationships from a corpus

### Distance ~ Meaning Similarity

## 🦄 Examples (algorithms and pre-trained models)
- [Word2Vec](https://code.google.com/archive/p/word2vec/)
- [GloVe](https://nlp.stanford.edu/projects/glove/)
- [fastText](https://fasttext.cc/)
- [ELMo](https://allennlp.org/elmo) (contextualized)

## Let's play with Word2Vec words embedding...!

[Word2Vec](https://code.google.com/archive/p/word2vec/) - Google News - 100B tokens, 3M vocab, cased, 300d vectors - only lowercase vocab extracted

Loaded using [ethically](http://docs.ethically.ai) package, the function [`ethically.we.load_w2v_small`]() returns a [gensim](https://radimrehurek.com/gensim/)'s [KeyedVectors](https://radimrehurek.com/gensim/models/keyedvectors.html#gensim.models.keyedvectors.KeyedVectors) object.

In [None]:
# 🛠️ ignore warnings
import warnings
warnings.filterwarnings('ignore')

In [None]:
from ethically.we import load_w2v_small

w2v_small = load_w2v_small()

In [None]:
# vocabulary size

len(w2v_small.vocab)

In [None]:
# get the vector of the word "home"

print('home =', w2v_small['home'])

In [None]:
# the words embedding dimension, in this case, is 300

len(w2v_small['home'])

In [None]:
# all the words are normalized (=have norm equal to one as vectors)

from numpy.linalg import norm

norm(w2v_small['home'])

## 💎 Demo - Mesuring Distance between Words

![](https://upload.wikimedia.org/wikipedia/commons/thumb/7/7e/Sphere_wireframe_10deg_6r.svg/480px-Sphere_wireframe_10deg_6r.svg.png)

🦄 Technical term: [cosine similariy](https://en.wikipedia.org/wiki/Cosine_similarity)

In [None]:
w2v_small['cat'] @ w2v_small['cats']

In [None]:
w2v_small['cat'] @ w2v_small['dog']

In [None]:
w2v_small['cat'] @ w2v_small['cow']

In [None]:
w2v_small['cat'] @ w2v_small['university']

## 🛠️ Demo - Visualization Words Embedding in 2D using T-SNE 

<small>Source: [Google's Seedbank](https://research.google.com/seedbank/seed/pretrained_word_embeddings)</small>

In [None]:
from sklearn.manifold import TSNE
from matplotlib import pylab as plt

# take the most common words in the corpus between 200 and 600
words = [word for word in w2v_small.index2word[200:600]]

# convert the words to vectors
embeddings = [w2v_small[word] for word in words]

# perform T-SNE
words_embedded = TSNE(n_components=2).fit_transform(embeddings)

# ... and visualize!
plt.figure(figsize=(20, 20))
for i, label in enumerate(words):
    x, y = words_embedded[i, :]
    plt.scatter(x, y)
    plt.annotate(label, xy=(x, y), xytext=(5, 2), textcoords='offset points',
                 ha='right', va='bottom')
plt.show()

## Demo - Most Similar

What are the most simlar words (=closer) to a given word?

In [None]:
w2v_small.most_similar('cat')

## Demo - Doesn't Match

Given a list of words, which one doesn't match?

The word further away from the mean of all words.

In [None]:
w2v_small.doesnt_match('breakfast cereal dinner lunch'.split())

## Demo - Vector Arithmetic

In [None]:
# nature + science = ?

w2v_small.most_similar(positive=['nature', 'science'])

## 💎 More Vector Arithmetic

![](https://www.tensorflow.org/images/linear-relationships.png)
<small>Source: [Tensorflow Documentation](https://www.tensorflow.org/tutorials/representation/word2vec)</small>

## Demo - Vector Analogy

In [None]:
# man:king :: woman:?
# king - man + woman = ?

w2v_small.most_similar(positive=['king', 'woman'],
                       negative=['man'])

In [None]:
w2v_small.most_similar(positive=['east', 'west'],
                       negative=['south'])

## Demo - Shift Context
<small>Source: [Google's Seedbank](https://research.google.com/seedbank/seed/pretrained_word_embeddings)</small>

In [None]:
def shift_context(sentence, from_context, to_context):
    new_sentence = []
    for word in sentence.split():
        if word in w2v_small:
            word = w2v_small.most_similar(positive=[word, to_context],
                                          negative=[from_context])[0][0]
        new_sentence.append(word)

    return ' '.join(new_sentence)

In [None]:
sentence = 'restaurant serving coffee with cream and bread'

print(shift_context(sentence, 'regular', 'fancy'))

![](images/banner.png)

# Gender Bias
Keep in mind, the data is from Google News, the writers are professional journalists.

### Bolukbasi Tolga, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, and Adam T. Kalai. [Man is to computer programmer as woman is to homemaker? debiasing word embeddings](https://arxiv.org/abs/1607.06520). NIPS 2016.

## Gender appropriate he-she analogies

In [None]:
# she:sister :: he:?
# sister - she + he = ?

w2v_small.most_similar(positive=['sister', 'he'],
                       negative=['she'])

```
queen-king
waitress-waiter
sister-brother
mother-father
ovarian_cancer-prostate_cancer
convent-monastery
```

## Gender stereotype he-she analogies

In [None]:
w2v_small.most_similar(positive=['nurse', 'he'],
                       negative=['she'])

```
sewing-carpentry
nurse-surgeon
blond-burly
giggle-chuckle
sassy-snappy
volleyball-football
register_nurse-physician
interior_designer-architect
feminism-conservatism
vocalist-guitarist
diva-superstar
cupcakes-pizzas
housewife-shopkeeper
softball-baseball
cosmetics-pharmaceuticals
petite-lanky
charming-affable
hairdresser-barber
```

## 💎 Gender Direction

# $\overrightarrow{she} - \overrightarrow{he}$

In [None]:
gender_direction = w2v_small['she'] - w2v_small['he']

gender_direction /= norm(gender_direction)

In [None]:
# make sure that all the vectors are normalized!

from numpy.testing import assert_almost_equal

length_vectors = norm(w2v_small.vectors, axis=1)

assert_almost_equal(actual=length_vectors,
                    desired=1,
                    decimal=5)

In [None]:
gender_direction @ w2v_small['architect']

In [None]:
gender_direction @ w2v_small['interior_designer']

In practice, we calculate the gender direction using multiple definitional pair of words for better estimation (words may have more than one meaning):

- woman - man
- girl - boy
- she - he
- mother - father
- daughter - son
- gal - guy
- female - male
- her - his
- herself - himself
- Mary - John

## Generating Gender Analogies

### a:x::b:y when a-b = `gender_direction`
### x - y ~ gender_direction

#### How?
1. Look for two words that are close to each other, with distance smaller than 1 (think why)
2. Take their difference, and normalize
3. Project the normalized difference on the gender direction, and order by the magnitude

In [None]:
from ethically.we import GenderBiasWE

w2v_small_gender_bias = GenderBiasWE(w2v_small,
                                     only_lower=True)

In [None]:
import pandas as pd
from IPython import display

# the first line is for forcing the display of all the 150 rows
with pd.option_context('display.max_rows', 150):
    display.display(w2v_small_gender_bias.generate_analogies(150))

## 💎 So What?

### Downstream Application

### Toy Example - Search Engine Ranking

- "MIT PhD. Student"
- "doctoral candidate" ~ "PhD. student"
- John:computer programmer :: Mary:homemaker

### Universal Embeddings
- Pre-trained on a large corpus
- Plugged in downstream task models (sentimental analysis, classification, translation …)
- Improvement of performances

### State of the Art
[The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)
](http://jalammar.github.io/illustrated-bert/)

## Measuring Bias in Words Embedding

# Think-Pair-Shar

```


















```
# Basic Ideas: Use neutral-gender words!
```


















```

# Professions!

### Projections

In [None]:
from ethically.we.data import BOLUKBASI_DATA

neutral_profession_names = BOLUKBASI_DATA['gender']['neutral_profession_names']

In [None]:
neutral_profession_names[:10]

In [None]:
w2v_small[neutral_profession_names[0]] @ gender_direction

In [None]:
import matplotlib.pylab as plt

f, ax = plt.subplots(1, figsize=(10, 10))

w2v_small_gender_bias.plot_projection_scores(n_extreme=20, ax=ax);

### Direct Bias

1. Project each **neutral profession names** on the gender direction
2. Calculate the absolute value of each projection
3. Average it all

In [None]:
neutral_profession_projections = [w2v_small[word] @ w2v_small_gender_bias.direction
                                  for word in neutral_profession_names]

abs_neutral_profession_projections = [abs(proj) for proj in neutral_profession_projections]

sum(abs_neutral_profession_projections) / len(abs_neutral_profession_projections)

In [None]:
w2v_small_gender_bias.calc_direct_bias()

### Indirect Bias - EXTRA
Similarity due to shared "gender direction" projection

In [None]:
w2v_small_gender_bias.generate_closest_words_indirect_bias('softball',
                                                           'football')

## Correlation of neutral profession projection between Word2Vec and FastText

![](http://docs.ethically.ai/_images/demo-words-embedding-bias_50_0.png)

(can be generated with the method `GenderBiasWE.plot_bias_across_words_embeddings`)

## Debias

### Neutralize

In this case, we will remove the gender projection from all the words, except the neutral-gender ones, and then normalize.

🦄 We need to "learn" what are the gender-specific words in the vocabulary.

In [None]:
w2v_small_gender_debias = w2v_small_gender_bias.debias(method='neutralize', inplace=False)

In [None]:
print('home:',
      'before =', w2v_small_gender_bias.model['home'] @ w2v_small_gender_bias.direction,
      'after = ', w2v_small_gender_debias.model['home'] @ w2v_small_gender_debias.direction)

In [None]:
print('man:',
      'before =', w2v_small_gender_bias.model['man'] @ w2v_small_gender_bias.direction,
      'after = ', w2v_small_gender_debias.model['man'] @ w2v_small_gender_debias.direction)

In [None]:
print('woman:',
      'before =', w2v_small_gender_bias.model['woman'] @ w2v_small_gender_bias.direction,
      'after = ', w2v_small_gender_debias.model['woman'] @ w2v_small_gender_debias.direction)

In [None]:
w2v_small_gender_debias.calc_direct_bias()

In [None]:
f, ax = plt.subplots(1, figsize=(10, 10))

w2v_small_gender_debias.plot_projection_scores(n_extreme=20, ax=ax);

### Equalize

- Do you see that `man` and `woman` have a different projection on the gender direction? 

- It might cause to different similarity (distance) to neutral words, such as to `kitchen`

In [None]:
w2v_small_gender_debias.model['man'] @ w2v_small_gender_debias.model['kitchen']

In [None]:
w2v_small_gender_debias.model['woman'] @ w2v_small_gender_debias.model['kitchen']

In [None]:
BOLUKBASI_DATA['gender']['equalize_pairs'][:10]

### Hard Debias = Neutralize + Equalize

In [None]:
w2v_small_gender_debias = w2v_small_gender_bias.debias(method='hard', inplace=False)

In [None]:
print('home:',
      'before =', w2v_small_gender_bias.model['home'] @ w2v_small_gender_bias.direction,
      'after = ', w2v_small_gender_debias.model['home'] @ w2v_small_gender_debias.direction)

In [None]:
print('man:',
      'before =', w2v_small_gender_bias.model['man'] @ w2v_small_gender_bias.direction,
      'after = ', w2v_small_gender_debias.model['man'] @ w2v_small_gender_debias.direction)

In [None]:
print('woman:',
      'before =', w2v_small_gender_bias.model['woman'] @ w2v_small_gender_bias.direction,
      'after = ', w2v_small_gender_debias.model['woman'] @ w2v_small_gender_debias.direction)

In [None]:
w2v_small_gender_debias.calc_direct_bias()

In [None]:
w2v_small_gender_debias.model['man'] @ w2v_small_gender_debias.model['kitchen']

In [None]:
w2v_small_gender_debias.model['woman'] @ w2v_small_gender_debias.model['kitchen']

In [None]:
f, ax = plt.subplots(1, figsize=(10, 10))

w2v_small_gender_debias.plot_projection_scores(n_extreme=20, ax=ax);

In [None]:
# 🛠️ the first line is for forcing the display of all the 150 rows
with pd.option_context('display.max_rows', 100):
    display.display(w2v_small_gender_debias.generate_analogies(100))

### Compare Preformances

After debiasing, the performance of the words embedding, using standard benchmarks, get only slightly worse!

In [None]:
w2v_small_gender_bias.evaluate_words_embedding()

In [None]:
w2v_small_gender_debias.evaluate_words_embedding()

![](images/banner.png)

# 💎 So What?

We removed the gender bias, **as we defined it**, in a words embedding - Is there any impact on a downstream application?


### Zhao, J., Wang, T., Yatskar, M., Ordonez, V., & Chang, K. W. (2018). [Gender bias in coreference resolution: Evaluation and debiasing methods](https://par.nsf.gov/servlets/purl/10084252). NAACL-HLT 2018.


#### WinoBias Dataset
![](images/coref-example.png)


#### Stereotypical Occupations
![](images/coref-occupations.png)

#### Results
![](images/coref-results.png)


EE = UW End-to-end Neural Coreference Resolution System


### Zhao, J., Zhou, Y., Li, Z., Wang, W., & Chang, K. W. (2018). [Learning gender-neutral word embeddings](https://arxiv.org/pdf/1809.01496.pdf). EMNLP 2018.

#### Another debias method (tailor-made for GloVe training process)

![](images/gn-glove-results.png)

![](images/banner.png)

# 💎 Have we really removed the bias?

Let's look on another metric, called **WEAT** (Word Embedding Association Test) which is inspired by **IAT** (Implicit-Association Test) from Pyschology.

### Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). [Semantics derived automatically from language corpora contain human-like biases.](http://www.cs.bath.ac.uk/~jjb/ftp/CaliskanEtAl-authors-full.pdf) Science, 356(6334), 183-186.


### Ingredients

1. Target words (e.g., Male ve. Female)

2. Attribute words (e.g., Math vs. Arts)

### 🛠️ Recipe

#### Part I
For each word in of a target word (e.g., `he`)
1. calc the mean similarity for every word in the **first** attribute words (e.g., `Math`)
2. calc the mean similarity for every word in the **second** attribute words (e.g., `Arts`)
3. calc the difference - a **measure of the association of one target words to the attributes** (e.g., `he` will be positive)

**Association of one attribute words:** Mean of `he` @ `[science, technology, ...]` MINUS Mean of `he` @ `[poetry, dance, ...]`


#### Part II
For each taget words group, sum the **measure of the association**, and calc the difference - this is the **WEAT** score

- Sum of **association of one attribute word** `[he, brother, ...]`
- Minus
- Sum of **association of one attribute word** `[she, sister, ...]`

In [None]:
from ethically.we.weat import WEAT_DATA

weat_gender_science_arts = WEAT_DATA[7]

In [None]:
weat_gender_science_arts['first_attribute']

In [None]:
weat_gender_science_arts['second_attribute']

In [None]:
weat_gender_science_arts['first_target']

In [None]:
weat_gender_science_arts['second_target']

In [None]:
from ethically.we import calc_all_weat

calc_all_weat(w2v_small_gender_bias.model, filter_by='model', with_original_finding=True,
              with_pvalue=True, pvalue_kwargs={'method': 'approximate'}).iloc[7:8]

### Important Note: Our results are weaker because we use a reduced Word2Vec 


#### Results from the Paper (computed on the complete Word2Vec):

![](images/weat-w2v.png)


#### Caveat about comparing WEAT to the IAT

- Individuals (IAT) vs. Words (WEAT)
- Therefore, the meaning of the effect size and p-value is totally different!

## Let's go back to our question - did we removed the bias?

### Gonen, H., & Goldberg, Y. (2019). [Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them](https://arxiv.org/pdf/1903.03862.pdf). arXiv preprint arXiv:1903.03862.

They used multiple methods, we'll show only two:
1. WEAT
2. Neutral words clustering

In [None]:
w2v_small_gender_bias.calc_direct_bias()

In [None]:
w2v_small_gender_debias.calc_direct_bias()

### I. WEAT - before and after

In [None]:
calc_all_weat(w2v_small_gender_bias.model, filter_by='model', with_original_finding=True,
              with_pvalue=True, pvalue_kwargs={'method': 'approximate'}).iloc[7:8]

In [None]:
calc_all_weat(w2v_small_gender_debias.model, filter_by='model', with_original_finding=True,
              with_pvalue=True, pvalue_kwargs={'method': 'approximate'}).iloc[7:8]

Note: In the paper they got a stronger result, probably because they used the complete Word2Vec (in this example, p-value of 0.0467).

### II. Clustering Neutral Gender Words

In [None]:
w2v_vocab = set(w2v_small_gender_bias.model.vocab.keys())

# 🦄 how we got these words - read the Bolukbasi's paper for details
all_gender_specific_words = set(BOLUKBASI_DATA['gender']['specific_full_with_definitional'])

all_gender_neutral_words = w2v_vocab - all_gender_specific_words

print('#vocab =', len(w2v_vocab),
      '#specific =', len(all_gender_specific_words),
      '#neutral =', len(all_gender_neutral_words))

In [None]:
neutral_words_gender_projections = [(w2v_small_gender_bias.project_on_direction(word), word)
                                    for word in all_gender_neutral_words]

neutral_words_gender_projections.sort()

In [None]:
neutral_words_gender_projections[-10:]

In [None]:
neutral_words_gender_projections[:10]

In [None]:
_, sorted_biased_neutral_words = zip(*neutral_words_gender_projections)

female_biased_neutral_words = sorted_biased_neutral_words[-500:]
male_biased_neutral_words = sorted_biased_neutral_words[:500]

biased_neutral_words = female_biased_neutral_words + male_biased_neutral_words

y_gender = [False] * 500 + [True] * 500

len(biased_neutral_words), len(y_gender)

### 🛠️ Plotting Clusters

In [None]:
from sklearn.cluster import KMeans
from sklearn.metrics import accuracy_score

def plot_clustered(model, biased_neutral_words, y_gender, ax=None):
    
    if ax is None:
        f, ax = plt.subplots(figsize=(10, 10))
    

    vectors = [model[word] for word in biased_neutral_words]
    
    y_cluster = KMeans(n_clusters=2, random_state=0).fit_predict(vectors)

    embedded_vectors = TSNE(n_components=2, random_state=0).fit_transform(vectors)

    ax.scatter(embedded_vectors[:, 0],
               embedded_vectors[:, 1],
               c=y_cluster)
    
    return accuracy_score(y_gender, y_cluster)

In [None]:
f, axes = plt.subplots(1, 2, figsize=(20, 10))


acc_biased = plot_clustered(w2v_small_gender_bias.model, biased_neutral_words, y_gender, ax=axes[0])
axes[0].set_title(f'Biased - Acc={acc_biased}')

acc_debiased = plot_clustered(w2v_small_gender_debias.model, biased_neutral_words, y_gender, ax=axes[1])
axes[1].set_title(f'Debiased - Acc={acc_debiased}');

Note: In the paper they got a stronger result, 92.5% accuracy for the debiased model.

### 💎 Strong words form the paper (my emphasis):

> The experiments ...
reveal a **systematic bias** found in the embeddings,
which is **independent of the gender direction**.


> The implications are alarming: while suggested
debiasing methods work well at removing the gender direction, the **debiasing is mostly superficial**.
The bias stemming from world stereotypes and
learned from the corpus is **ingrained much more
deeply** in the embeddings space.


> .. real concern from biased representations is **not the association** of a concept with
words such as “he”, “she”, “boy”, “girl” **nor** being
able to perform **gender-stereotypical word analogies**... algorithmic discrimination is more likely to happen by associating one **implicitly gendered** term with
other implicitly gendered terms, or picking up on
**gender-specific regularities** in the corpus by learning to condition on gender-biased words, and generalizing to other gender-biased words.


![](images/banner.png)

# Your Turn!

## Explore bias in words embedding by other groups (such as race and religious)

**Task 1.** Use the Tolga's direct bias measure. Use the [`ethically.we.BiasWordsEmbedding`](http://docs.ethically.ai/words-embedding-bias.html#ethically.we.bias.BiasWordsEmbedding) class. We used `GenderBiasWE` which uses `BiasWordsEmbedding` for the gender bias.

For example, that's how we would use `BiasWordsEmbedding` to analys the gender bias:

In [None]:
from ethically.we import BiasWordsEmbedding

gender_bias_we = BiasWordsEmbedding(w2v_small, only_lower=True)

In [None]:
BOLUKBASI_DATA['gender']['definitional_pairs']

In [None]:
# 💎💎💎 identify the direction
gender_bias_we._identify_direction(positive_end='she',
                                   negative_end='he',
                                   definitional=BOLUKBASI_DATA['gender']['definitional_pairs'])

In [None]:
BOLUKBASI_DATA['gender']['neutral_profession_names'][:10]

In [None]:
gender_bias_we.plot_projection_scores(BOLUKBASI_DATA['gender']['neutral_profession_names']);

In [None]:
gender_bias_we.calc_direct_bias(BOLUKBASI_DATA['gender']['neutral_profession_names'])

In [None]:
# Your Code Here...

**Task 2.** Open the [words embedding demo page in `ethically` documentation](http://docs.ethically.ai/notebooks/demo-words-embedding-bias.html#it-is-possible-also-to-expirements-with-new-target-word-sets-as-in-this-example-citizen-immigrant), and look on the use of the function [`calc_weat_pleasant_unpleasant_attribute`](). What was the attempt in that experiment? What was the result? Can you come up with other experiments?

In [None]:
from ethically.we import calc_weat_pleasant_unpleasant_attribute

In [None]:
# Your Code Here...

![](images/banner.png)

# More Related Work

- Brunet, M. E., Alkalay-Houlihan, C., Anderson, A., & Zemel, R. (2018). [Understanding the Origins of Bias in Word Embeddings](https://arxiv.org/pdf/1810.03611.pdf). arXiv preprint arXiv:1810.03611.

- Zhao, J., Wang, T., Yatskar, M., Cotterell, R., Ordonez, V., & Chang, K. W. (2019). [Gender Bias in Contextualized Word Embeddings](https://arxiv.org/pdf/1904.03310.pdf). arXiv preprint arXiv:1904.03310.


- Complete example of using `ethically` with Word2Vec, GloVe and fastText: http://docs.ethically.ai/notebooks/demo-gender-bias-words-embedding.html


# The Bigger Picture

1. FAT community - Fairness, Accountability, and Transparency
   - [ACM FAT*](https://fatconference.org)
   - [FATML](http://www.fatml.org)
   - [ML Fairness Book](https://fairmlbook.org)
   
2. NLP - around dozen of papers on this field (in the narrow sense)

3. [`ethically` - https://docs.ethically.ai


# 💎 Takeaways - Be Responsible

1. Think about your **downstream app**

2. Think about your **measurements** (aka "what is a good system?")

3. Think about your **data** (corpus building, selection bias, train vs. validation vs. test datasets)

4. Think about your impact on individuals, groups, society, and humanity

![](images/banner.png)

<center><h1>THE END!</h1></center>