# FACT-UVA: Man is to Programmer as Woman is to Homemaker?

## Links

Debiaswe: https://github.com/tolga-b/debiaswe
Lipstick: https://github.com/gonenhila/gender_bias_lipstick

### How to get the GoogleNews word2vec embeddings:
Download it directly from the official [website](https://code.google.com/archive/p/word2vec/) or clone [this github repo](https://github.com/mmihaltz/word2vec-GoogleNews-vectors). Place the downloaded **.bin** file in the embeddings folder.

### How to get the Glove embeddings:
Go to the official [website](https://nlp.stanford.edu/projects/glove/). Download **glove.840B.300d.zip**. Place the downloaded **.txt** file in the embeddings folder.

## Debiasing Word Embeddings

### Word2vec

The code block bellow executes the main debias function using the word2vec Google News embeddings. Additionally, the function takes as arugments several json files with definitional pairs and geneder specific words as described in the original paper. The function outputs two files - **bias_word2vec.bin** and **debiased_word2vec.bin**, which correspond to the embeddings before and after debiasing.

In [10]:
# Debias word2vec embeddings
!cd code && python3 main.py --debias_o_em=../embeddings/debiased_word2vec.bin --bias_o_em=../embeddings/bias_word2vec.bin

Namespace(bias_o_em='../embeddings/bias_word2vec.bin', debias_o_em='../embeddings/debiased_word2vec.bin', def_fn='../data/definitional_pairs.json', em_limit=50000, eq_fn='../data/equalize_pairs.json', g_words_fn='../data/gender_specific_full.json', i_em='../embeddings/GoogleNews-vectors-negative300.bin', o_ext='bin')
*** Reading data from ../embeddings/GoogleNews-vectors-negative300.bin
  'See the migration notes for details: %s' % _MIGRATION_NOTES_URL
Number of words:  26391
Saving biased vectors to file...
Debiasing...
Saving to file...


Done!



### Glove

The only difference between the two formats (word2vec and glove) is that the first line of word2vec contains the number of words and the vector size, while glove does no contain said line. In order to simply things and reduce the lenght of the code we can convert one of the two to the other format. This way the code has to supoort only one format. The code block below converts the glove embeddings to the word2vec fromat. Said code block needs to be executed only once.

In [11]:
# convert glove to word2vec format
!cd code/scripts && ./gloveToW2V.sh ../../embeddings/glove.840B.300d.txt ../../embeddings/glove.formatted.txt

extracting number of vectors
there are 2196017 lines
extracting vector dimension
cat: write error: Broken pipe
vectors have size 300
creating word2vec format file
done


After transforming the glove embeddings to the word2vec format we can rerun the previous experiment this time using the glove embeddings. The function will generate two files again - **bias_glove.bin** and **debiased_glove.bin** respectfully.

In [12]:
# Debias glove embeddings
!cd code && python3 main.py --i_em=../embeddings/glove.formatted.txt --debias_o_em=../embeddings/debiased_glove.bin --bias_o_em=../embeddings/bias_glove.bin

*** Reading data from ../embeddings/glove.formatted.txt
  'See the migration notes for details: %s' % _MIGRATION_NOTES_URL
Number of words:  23177
Saving biased vectors to file...
Debiasing...
Saving to file...


Done!



## Benchmark debiased embeddings

After generating the 4 embeddings files (both biased and debiased for word2vec and glove) we can run the benchmark tests on them to determine if the removing of the biased led to any deterioration. The results from the benchmarks would also show if the results have been replicated using the glove embeddings. The code block bellow evaluates each of the 4 embeddings on all of the benchmark test

In [9]:
!cd code/benchmark/scripts/ && ./run_test.sh

05:00:29 INFO:loading projection weights from /mnt/windows_drive_d/Amsterdam University/Year_1/FACT/embeddings/bias_word2vec.bin
05:00:29 INFO:Loading #26391 words with 300 dim
05:00:30 INFO:Transformed 26391 into 26391 words
05:00:30 INFO:Calculating similarity benchmarks
05:00:30 INFO:Spearman correlation of scores on WS353 0.6488884094043214
05:00:30 INFO:Spearman correlation of scores on MTurk 0.513563416944097
05:00:30 INFO:Spearman correlation of scores on WS353S 0.7197011080164891
05:00:31 INFO:Spearman correlation of scores on SimLex999 0.43510114901981567
05:00:31 INFO:Spearman correlation of scores on WS353R 0.5805558093795756
05:00:31 INFO:Spearman correlation of scores on MEN 0.7042259430744064
05:00:31 INFO:Spearman correlation of scores on RG65 0.6937745222655595
05:00:31 INFO:Spearman correlation of scores on RW 0.27659989138184415
05:00:31 INFO:Calculating analogy benchmarks
05:00:31 INFO:Processing 1/196 batch
05:00:32 INFO:Processing 20/196 batch
05:00:33 INFO:Process

In [17]:
# show results
import csv
from tabulate import tabulate
from statistics import variance as var

def show_benchmarks(file):
    with open(file, 'r') as f:
        rows = list(csv.reader(f))[:-1]
        rows = [list(x) for x in zip(*rows)]
        
    rows[0].append('VARIANCE')
    for row in rows[1:]:
        vals = [float(x) for x in row[1:]]
        row.append(var(vals))
        
    print(tabulate(rows[1:], headers=rows[0]))
    
show_benchmarks('./code/benchmark/scripts/result.csv')

Name             bias_word2vec    debiased_word2vec    bias_glove    debiased_glove     VARIANCE
-------------  ---------------  -------------------  ------------  ----------------  -----------
AP                    0.557214             0.557214      0.532338          0.542289  0.000148511
BLESS                 0.67                 0.68          0.755             0.76      0.00228958
Battig                0.235519             0.233034      0.272032          0.267826  0.000427686
ESSLI_1a              0.727273             0.727273      0.772727          0.727273  0.000516529
ESSLI_2b              0.8                  0.8           0.75              0.75      0.000833333
ESSLI_2c              0.644444             0.644444      0.622222          0.622222  0.000164609
MEN                   0.704226             0.703828      0.76402           0.763982  0.00119901
MTurk                 0.513563             0.51474       0.640383          0.634242  0.00506272
RG65                  0.693775   

## Generating analogies

Once we have tested that the main properties of the embeddings are still in the debiased ones, then we can generate `he:she = x:y` analogies, and observe the resulting `x:y`  pairs.

### Analogies for original word2vec

In [29]:
!cd ./code/analogies/ && python3 analogies.py --pairs_fname pairs_bias_word2vec.txt --i_em ../../embeddings/bias_word2vec.bin --pair_seed he-she --n 100

  'See the migration notes for details: %s' % _MIGRATION_NOTES_URL
  analogies_dir = analogies/norms[:, None] # Directions
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mpardon[0m is to [1m[92mpardons[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mgrounds[0m is to [1m[92mlinking[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mviolin[0m is to [1m[92msoprano[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mpeanut[0m is to [1m[92mspinach[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mstoryteller[0m is to [1m[92meducator[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mindication[0m is to [1m[92mindicates[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mfastball[0m is to [1m[92mhitter[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mworkspace[0m is to [1m[92mworkstation[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mhealth_care[0m is to [1m[92mcare[0m
[1m[94mhe[0m is to [1m[94mshe

### Analogies for Debiased word2vec

In [30]:
!cd ./code/analogies/ && python3 analogies.py --pairs_fname pairs_debiased_word2vec.txt --i_em ../../embeddings/debiased_word2vec.bin --pair_seed he-she --n 100

  'See the migration notes for details: %s' % _MIGRATION_NOTES_URL
  analogies_dir = analogies/norms[:, None] # Directions
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92msubscriber[0m is to [1m[92msubscription[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mbackfield[0m is to [1m[92mrusher[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mcompound[0m is to [1m[92mcompounds[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92msidestep[0m is to [1m[92mavoid[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mgrade[0m is to [1m[92mgrades[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mswipe[0m is to [1m[92mswiping[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mmotherhood[0m is to [1m[92mmothers[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92msexual_harassment[0m is to [1m[92mhazing[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92msentence[0m is to [1m[92mmaximum_sentence[0m
[1m[94mhe[0m is to

### Analogies original glove

In [31]:
!cd ./code/analogies/ && python3 analogies.py --pairs_fname pairs_bias_glove.txt --i_em ../../embeddings/bias_glove.bin --pair_seed he-she --n 100

  'See the migration notes for details: %s' % _MIGRATION_NOTES_URL
  analogies_dir = analogies/norms[:, None] # Directions
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mtuning[0m is to [1m[92mtuned[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mraise[0m is to [1m[92mdonate[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mlunchtime[0m is to [1m[92mbrunch[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92manalogy[0m is to [1m[92mcliche[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mjoke[0m is to [1m[92mbitch[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mchapel[0m is to [1m[92mconvent[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92msustainability[0m is to [1m[92mempowerment[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mbunny[0m is to [1m[92mgirl[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92memphasizing[0m is to [1m[92mcurricula[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m

### Analogies debiased glove

In [32]:
!cd ./code/analogies/ && python3 analogies.py --pairs_fname pairs_debiased_glove.txt --i_em ../../embeddings/debiased_glove.bin --pair_seed he-she --n 100

  'See the migration notes for details: %s' % _MIGRATION_NOTES_URL
  analogies_dir = analogies/norms[:, None] # Directions
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mdecal[0m is to [1m[92mbumper[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mpeninsula[0m is to [1m[92mharbour[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mlotus[0m is to [1m[92mlily[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mrefining[0m is to [1m[92mperfecting[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92msax[0m is to [1m[92msoprano[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92msettling[0m is to [1m[92msettles[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mpanel[0m is to [1m[92mpanels[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mnovel[0m is to [1m[92mauthors[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mhoof[0m is to [1m[92mhorse[0m
[1m[94mhe[0m is to [1m[94mshe[0m like [1m[92mofficer[0m

## Testing Debiasing (Lipstick on a Pig)
Besides generating analogies, we can also test the effects of debiasing quantitatively

### Word2Vec

Generate data

In [8]:
!cd ./code/lipstick/ && python3 classify_debiased.py --embedding=w2v --fname=w2v

  'See the migration notes for details: %s' % _MIGRATION_NOTES_URL
Number of words:  26391
Number of words:  26391
loading done
0.04705043832523501
0.001965401560908159

Data used (portion): 0.1

	train with bef
	test with bef
	accuracy: 0.8355
	train with aft
	test with aft
	accuracy: 0.512
	train with bef
	test with bef
	accuracy: 0.87275
	train with aft
	test with aft
	accuracy: 0.5405
	train with bef
	test with bef
	accuracy: 0.9195
	train with aft
	test with aft
	accuracy: 0.50775
	train with bef
	test with bef
	accuracy: 0.69325
	train with aft
	test with aft
	accuracy: 0.50125
	train with bef
	test with bef
	accuracy: 0.7585
	train with aft
	test with aft
	accuracy: 0.50825
	train with bef
	test with bef
	accuracy: 0.9275
	train with aft
	test with aft
	accuracy: 0.573
	train with bef
	test with bef
	accuracy: 0.84075
	train with aft
	test with aft
	accuracy: 0.60175
	train with bef
	test with bef
	accuracy: 0.866
	train with aft
	test with aft
	accuracy: 0.5
	train with bef
	te

Show results

In [9]:
def show_lipstick(file):
    with open(file, 'rb') as f:
        for line in f.readlines():
            print(line)

In [10]:
show_lipstick('./code/lipstick/w2v_males.data')
show_lipstic('./code/lipstick/w2v_females.data')

b'\x80\x03]q\x00(X\x13\x00\x00\x00disciplinary_actionq\x01X\x0c\x00\x00\x00unbelievableq\x02X\x08\x00\x00\x00maneuverq\x03X\t\x00\x00\x00penaltiesq\x04X\x04\x00\x00\x00wiseq\x05X\x05\x00\x00\x00coachq\x06X\x0b\x00\x00\x00businessmenq\x07X\x0c\x00\x00\x00neverthelessq\x08X\x03\x00\x00\x00hadq\tX\n'
b'\x00\x00\x00convertingq\n'
b'X\x07\x00\x00\x00suspendq\x0bX\x08\x00\x00\x00approachq\x0cX\n'
b'\x00\x00\x00rechargingq\rX\x0b\x00\x00\x00interceptedq\x0eX\x05\x00\x00\x00gloveq\x0fX\x08\x00\x00\x00infantryq\x10X\x0c\x00\x00\x00interferenceq\x11X\n'
b'\x00\x00\x00apologizedq\x12X\t\x00\x00\x00outplayedq\x13X\t\x00\x00\x00indicatedq\x14X\t\x00\x00\x00yard_dashq\x15X\t\x00\x00\x00suspendedq\x16X\t\x00\x00\x00disciplesq\x17X\n'
b'\x00\x00\x00officiallyq\x18X\n'
b'\x00\x00\x00aberrationq\x19X\x08\x00\x00\x00culpableq\x1aX\x04\x00\x00\x00funkq\x1bX\x07\x00\x00\x00certainq\x1cX\x07\x00\x00\x00meetingq\x1dX\x06\x00\x00\x00bailedq\x1eX\x0b\x00\x00\x00criticizingq\x1fX\n'
b'\x00\x00\x00compatriotq X\

NameError: name 'show_lipstic' is not defined

### Glove

Generate data

In [11]:
!cd ./code/lipstick/ && python3 classify_debiased.py --embedding=glove --fname=glove

  'See the migration notes for details: %s' % _MIGRATION_NOTES_URL
Number of words:  23177
Number of words:  23177
loading done
0.03978345669192328
0.0013121980356330023

Data used (portion): 0.1

	train with bef
	test with bef
	accuracy: 0.9505
	train with aft
	test with aft
	accuracy: 0.7535
	train with bef
	test with bef
	accuracy: 0.9525
	train with aft
	test with aft
	accuracy: 0.84125
	train with bef
	test with bef
	accuracy: 0.94575
	train with aft
	test with aft
	accuracy: 0.8735
	train with bef
	test with bef
	accuracy: 0.95725
	train with aft
	test with aft
	accuracy: 0.86225
	train with bef
	test with bef
	accuracy: 0.95725
	train with aft
	test with aft
	accuracy: 0.722
	train with bef
	test with bef
	accuracy: 0.94525
	train with aft
	test with aft
	accuracy: 0.87675
	train with bef
	test with bef
	accuracy: 0.95475
	train with aft
	test with aft
	accuracy: 0.88375
	train with bef
	test with bef
	accuracy: 0.9395
	train with aft
	test with aft
	accuracy: 0.8205
	train with

Show results

In [12]:
show_lipstick('./code/lipstick/glove_males.data')
show_lipstic('./code/lipstick/glove_females.data')

b'\x80\x03]q\x00(X\x04\x00\x00\x00paidq\x01X\x08\x00\x00\x00inferiorq\x02X\x03\x00\x00\x00sayq\x03X\x08\x00\x00\x00imposingq\x04X\x07\x00\x00\x00recordsq\x05X\x04\x00\x00\x00dothq\x06X\x06\x00\x00\x00tuningq\x07X\x04\x00\x00\x00owedq\x08X\x08\x00\x00\x00downfallq\tX\x04\x00\x00\x00paceq\n'
b'X\x06\x00\x00\x00bowledq\x0bX\t\x00\x00\x00conqueredq\x0cX\t\x00\x00\x00attemptedq\rX\x06\x00\x00\x00greatsq\x0eX\x08\x00\x00\x00merchantq\x0fX\x0c\x00\x00\x00conservatismq\x10X\n'
b'\x00\x00\x00profitableq\x11X\x06\x00\x00\x00imposeq\x12X\x07\x00\x00\x00grenadeq\x13X\x0b\x00\x00\x00nationalistq\x14X\x04\x00\x00\x00theeq\x15X\x07\x00\x00\x00alumnusq\x16X\x05\x00\x00\x00chiefq\x17X\x05\x00\x00\x00henryq\x18X\t\x00\x00\x00fishermanq\x19X\n'
b'\x00\x00\x00undefeatedq\x1aX\x03\x00\x00\x00ptsq\x1bX\n'
b'\x00\x00\x00repentanceq\x1cX\t\x00\x00\x00primarilyq\x1dX\n'
b'\x00\x00\x00reboundingq\x1eX\x05\x00\x00\x00deityq\x1fX\x05\x00\x00\x00kevinq X\x03\x00\x00\x00axeq!X\x07\x00\x00\x00futuresq"X\n'
b"\x00\x0

NameError: name 'show_lipstic' is not defined