## Evaluation with Standard Benchmarks: Coherence
### Using evaluation tool for word embeddings

Here, we apply standard benchmarks on coherence on w2v and debiased w2v.

Sources:

#### RG: H. Rubenstein and J. B. Goodenough. Contextual correlates of synonymy. Communications of the ACM, 8(10):627–633, 1965.

####  WS: L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin. Placing search in context: The concept  revisited. In WWW. ACM, 2001.

####  Wordsim benchmarks - Code adapted from source - embedding-evaluation: https://github.com/k-kawakami/embedding-evaluation


In [1]:
# Subset of GoogleNews-vectors:
# https://drive.google.com/file/d/1NH6jcrg8SXbnhpIXRIXF_-KUE7wGxGaG/view?usp=sharing

# For full embeddings:
# Download embeddings at https://github.com/tolga-b/debiaswe and put them on the following directory
# embeddings/GoogleNews-vectors-negative300-hard-debiased.bin
# embeddings/GoogleNews-vectors-negative300.bin

In [2]:
from __future__ import print_function, division
%matplotlib inline
from matplotlib import pyplot as plt
import json
import random
import numpy as np
import os
import copy

import debiaswe as dwe
import debiaswe.we as we
from debiaswe.we import WordEmbedding
from debiaswe.data import load_professions

from debiaswe.benchmarks import Benchmark

# FastText

## 1: original word embeddings on RG & WS

In [3]:
# Obtain small FastText embeddings set
E = WordEmbedding('./embeddings/fasttext_wiki-news-300d-1M.vec', limit=50000)
E.filter_words(lambda x: x.lower()==x)
for word in E.words:
    if len(word)>19:
        print(word)
E.filter_words(lambda x: len(x)<20)
E.filter_words(lambda x: x.isalpha())
E.save('./embeddings/fasttext_wiki-news-300d_small.txt')

*** Reading data from ./embeddings/fasttext_wiki-news-300d-1M.vec
Got weird line 999994 300

(50000, 300)
50000 words of dimension 300 : ,, the, ., and, ..., Winston-Salem, anion, JSP, Postecoglou
50000 words of dimension 300 : ,, the, ., and, ..., Winston-Salem, anion, JSP, Postecoglou
29950 words of dimension 300 : ,, the, ., and, ..., circumscribed, whos, salvaging, anion
article-for-deletion
government-sponsored
internationalization
29947 words of dimension 300 : ,, the, ., and, ..., circumscribed, whos, salvaging, anion
26894 words of dimension 300 : the, and, of, to, ..., circumscribed, whos, salvaging, anion
Wrote 26894 words to ./embeddings/fasttext_wiki-news-300d_small.txt


In [4]:
# Evaluate
benchmark = Benchmark()
result_original = benchmark.evaluate(E, "'Before', FastText")

Processing batch 1 of 40
Processing batch 2 of 40
Processing batch 3 of 40
Processing batch 4 of 40
Processing batch 5 of 40
Processing batch 6 of 40
Processing batch 7 of 40
Processing batch 8 of 40
Processing batch 9 of 40
Processing batch 10 of 40
Processing batch 11 of 40
Processing batch 12 of 40
Processing batch 13 of 40
Processing batch 14 of 40
Processing batch 15 of 40
Processing batch 16 of 40
Processing batch 17 of 40
Processing batch 18 of 40
Processing batch 19 of 40
Processing batch 20 of 40
Processing batch 21 of 40
Processing batch 22 of 40
Processing batch 23 of 40
Processing batch 24 of 40
Processing batch 25 of 40
Processing batch 26 of 40
Processing batch 27 of 40
Processing batch 28 of 40
Processing batch 29 of 40
Processing batch 30 of 40
Processing batch 31 of 40
Processing batch 32 of 40
Processing batch 33 of 40
Processing batch 34 of 40
Processing batch 35 of 40
Processing batch 36 of 40
Processing batch 37 of 40
Processing batch 38 of 40
Processing batch 39 o

## 2: Debiased word embeddings on RG & WS


### Step 2a: Hard debiased

In [5]:
from debiaswe.debias import hard_debias

# Path for hard_debiased embedding file 
hard_embedding_file = './embeddings/fasttext_wiki-news-300d-1M_hard_debiased.vec'

In [6]:
if os.path.exists(hard_embedding_file):
    E_hard = WordEmbedding(hard_embedding_file)

else:
    with open('./data/definitional_pairs.json', "r") as f:
        defs = json.load(f)
        print("def pairs size:", len(defs))

    with open('./data/equalize_pairs.json', "r") as f:
        equalize_pairs = json.load(f)

    with open('./data/gender_specific_seed.json', "r") as f:
        gender_specific_words = json.load(f)
        
    E_hard = copy.deepcopy(E)    
    
    for def_pair in defs:
        if not def_pair[0] in E_hard.words or not def_pair[1] in E_hard.words:
            defs.remove(def_pair)
    print("defs size:", len(defs))
    
    hard_debias(E_hard, gender_specific_words, defs, equalize_pairs)

def pairs size: 10
defs size: 9
26894 words of dimension 300 : the, and, of, to, ..., circumscribed, whos, salvaging, anion
{('Spokesman', 'Spokeswoman'), ('monastery', 'convent'), ('GRANDSONS', 'GRANDDAUGHTERS'), ('Gentleman', 'Lady'), ('HIMSELF', 'HERSELF'), ('GRANDFATHER', 'GRANDMOTHER'), ('prince', 'princess'), ('CATHOLIC_PRIEST', 'NUN'), ('Dad', 'Mom'), ('grandpa', 'grandma'), ('Fraternity', 'Sorority'), ('PRINCE', 'PRINCESS'), ('He', 'She'), ('HE', 'SHE'), ('father', 'mother'), ('SONS', 'DAUGHTERS'), ('COUNCILMAN', 'COUNCILWOMAN'), ('DADS', 'MOMS'), ('King', 'Queen'), ('KING', 'QUEEN'), ('Kings', 'Queens'), ('FATHERS', 'MOTHERS'), ('spokesman', 'spokeswoman'), ('SCHOOLBOY', 'SCHOOLGIRL'), ('boy', 'girl'), ('businessman', 'businesswoman'), ('HIS', 'HER'), ('DAD', 'MOM'), ('grandson', 'granddaughter'), ('Males', 'Females'), ('ex_girlfriend', 'ex_boyfriend'), ('councilman', 'councilwoman'), ('Dads', 'Moms'), ('fatherhood', 'motherhood'), ('gentlemen', 'ladies'), ('schoolboy', 'schoo

In [7]:
# Evaluate for hard-debiased
result_hard_debiased = benchmark.evaluate(E_hard, "'Hard-debiased', FastText")

Processing batch 1 of 40
Processing batch 2 of 40
Processing batch 3 of 40
Processing batch 4 of 40
Processing batch 5 of 40
Processing batch 6 of 40
Processing batch 7 of 40
Processing batch 8 of 40
Processing batch 9 of 40
Processing batch 10 of 40
Processing batch 11 of 40
Processing batch 12 of 40
Processing batch 13 of 40
Processing batch 14 of 40
Processing batch 15 of 40
Processing batch 16 of 40
Processing batch 17 of 40
Processing batch 18 of 40
Processing batch 19 of 40
Processing batch 20 of 40
Processing batch 21 of 40
Processing batch 22 of 40
Processing batch 23 of 40
Processing batch 24 of 40
Processing batch 25 of 40
Processing batch 26 of 40
Processing batch 27 of 40
Processing batch 28 of 40
Processing batch 29 of 40
Processing batch 30 of 40
Processing batch 31 of 40
Processing batch 32 of 40
Processing batch 33 of 40
Processing batch 34 of 40
Processing batch 35 of 40
Processing batch 36 of 40
Processing batch 37 of 40
Processing batch 38 of 40
Processing batch 39 o

### Step 2b: Soft debiased


In [8]:
from debiaswe.debias import soft_debias

# Path for soft_debiased embedding file 
soft_embedding_file = './embeddings/fasttext_wiki-news-300d-1M_soft_debiased.vec'

In [9]:
if os.path.exists(hard_embedding_file):
    E_soft = WordEmbedding(soft_embedding_file)
else:
    E_soft = copy.deepcopy(E)  
    soft_debias(E_soft, gender_specific_words, defs, log=True, epochs = 3000, lr=0.01, decrease_times=[])

Loss @ Epoch #0: 1975238.0
Loss @ Epoch #100: 212619.28125
Loss @ Epoch #200: 106259.140625
Loss @ Epoch #300: 61320.578125
Loss @ Epoch #400: 38644.35546875
Loss @ Epoch #500: 26022.271484375
Loss @ Epoch #600: 18496.947265625
Loss @ Epoch #700: 13752.0087890625
Loss @ Epoch #800: 10606.4404296875
Loss @ Epoch #900: 8426.341796875
Loss @ Epoch #1000: 6855.7451171875
Loss @ Epoch #1100: 5685.345703125
Loss @ Epoch #1200: 4786.84130859375
Loss @ Epoch #1300: 4078.810546875
Loss @ Epoch #1400: 3507.98291015625
Loss @ Epoch #1500: 3038.87646484375
Loss @ Epoch #1600: 2646.6923828125
Loss @ Epoch #1700: 2315.631591796875
Loss @ Epoch #1800: 2030.019775390625
Loss @ Epoch #1900: 1788.849609375
Loss @ Epoch #2000: 1569.93017578125
Loss @ Epoch #2100: 1382.8094482421875
Loss @ Epoch #2200: 1251.527099609375
Loss @ Epoch #2300: 1097.325927734375
Loss @ Epoch #2400: 984.4572143554688
Loss @ Epoch #2500: 862.98583984375
Loss @ Epoch #2600: 758.3908081054688
Loss @ Epoch #2700: 665.1876831054688


In [10]:
# Evaluate for soft-debiased
result_soft_debiased = benchmark.evaluate(E_soft, "'Soft-debiased', FastText")

Processing batch 1 of 40
Processing batch 2 of 40
Processing batch 3 of 40
Processing batch 4 of 40
Processing batch 5 of 40
Processing batch 6 of 40
Processing batch 7 of 40
Processing batch 8 of 40
Processing batch 9 of 40
Processing batch 10 of 40
Processing batch 11 of 40
Processing batch 12 of 40
Processing batch 13 of 40
Processing batch 14 of 40
Processing batch 15 of 40
Processing batch 16 of 40
Processing batch 17 of 40
Processing batch 18 of 40
Processing batch 19 of 40
Processing batch 20 of 40
Processing batch 21 of 40
Processing batch 22 of 40
Processing batch 23 of 40
Processing batch 24 of 40
Processing batch 25 of 40
Processing batch 26 of 40
Processing batch 27 of 40
Processing batch 28 of 40
Processing batch 29 of 40
Processing batch 30 of 40
Processing batch 31 of 40
Processing batch 32 of 40
Processing batch 33 of 40
Processing batch 34 of 40
Processing batch 35 of 40
Processing batch 36 of 40
Processing batch 37 of 40
Processing batch 38 of 40
Processing batch 39 o

In [11]:
benchmark.pprint_compare([result_original, result_hard_debiased, result_soft_debiased], ["Before", "Hard-debiased", "Soft-debiased"], "FastText")

+---------------------------------------------------------------------------+
|                        Results for FastText dataset                       |
+---------------+-------------------+-------------------+-------------------+
|     Score     |      EN-RG-65     |   EN-WS-353-ALL   |    MSR-analogy    |
+---------------+-------------------+-------------------+-------------------+
|     Before    | 83.86622701863348 | 74.10786418997199 | 55.87019429516329 |
| Hard-debiased | 83.50735694621694 | 74.17924052453014 | 55.99421248449773 |
| Soft-debiased | 85.07613183420915 | 73.73702832028661 | 46.01074824307565 |
+---------------+-------------------+-------------------+-------------------+
