## Evaluation with Standard Benchmarks: Coherence
### Using evaluation tool for word embeddings

Here, we apply standard benchmarks on coherence on w2v and debiased w2v.

Sources:

#### RG: H. Rubenstein and J. B. Goodenough. Contextual correlates of synonymy. Communications of the ACM, 8(10):627–633, 1965.

####  WS: L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin. Placing search in context: The concept  revisited. In WWW. ACM, 2001.

####  Wordsim benchmarks - Code adapted from source - embedding-evaluation: https://github.com/k-kawakami/embedding-evaluation


In [6]:
# Subset of GoogleNews-vectors:
# https://drive.google.com/file/d/1NH6jcrg8SXbnhpIXRIXF_-KUE7wGxGaG/view?usp=sharing

# For full embeddings:
# Download embeddings at https://github.com/tolga-b/debiaswe and put them on the following directory
# embeddings/GoogleNews-vectors-negative300-hard-debiased.bin
# embeddings/GoogleNews-vectors-negative300.bin

In [7]:
from __future__ import print_function, division
%matplotlib inline
from matplotlib import pyplot as plt
import json
import random
import numpy as np
import os
import copy

import debiaswe as dwe
import debiaswe.we as we
from debiaswe.we import WordEmbedding
from debiaswe.data import load_professions

from debiaswe.benchmarks import Benchmark

# FastText

## 1: original word embeddings on RG & WS

In [8]:
# Load FastText
E = WordEmbedding('./embeddings/fasttext_wiki-news-300d-1M.vec', limit=26423) # limit equal to small w2v set
# Evaluate
benchmark = Benchmark()
result_original = benchmark.evaluate(E, "'Before', FastText")

*** Reading data from ./embeddings/fasttext_wiki-news-300d-1M.vec
Got weird line 999994 300

(26423, 300)
26423 words of dimension 300 : ,, the, ., and, ..., Exclusive, Dolan, hemp, solicit
26423 words of dimension 300 : ,, the, ., and, ..., Exclusive, Dolan, hemp, solicit
Processing batch 1 of 40
Processing batch 2 of 40
Processing batch 3 of 40
Processing batch 4 of 40
Processing batch 5 of 40
Processing batch 6 of 40
Processing batch 7 of 40
Processing batch 8 of 40
Processing batch 9 of 40
Processing batch 10 of 40
Processing batch 11 of 40
Processing batch 12 of 40
Processing batch 13 of 40
Processing batch 14 of 40
Processing batch 15 of 40
Processing batch 16 of 40
Processing batch 17 of 40
Processing batch 18 of 40
Processing batch 19 of 40
Processing batch 20 of 40
Processing batch 21 of 40
Processing batch 22 of 40
Processing batch 23 of 40
Processing batch 24 of 40
Processing batch 25 of 40
Processing batch 26 of 40
Processing batch 27 of 40
Processing batch 28 of 40
Process

## 2: Debiased word embeddings on RG & WS


### Step 2a: Hard debiased

In [9]:
from debiaswe.debias import hard_debias

# Path for hard_debiased embedding file 
hard_embedding_file = './embeddings/fasttext_wiki-news-300d-1M_hard_debiased.vec'

In [13]:
if os.path.exists(hard_embedding_file):
    E_hard = WordEmbedding(hard_embedding_file)

else:
    with open('./data/definitional_pairs.json', "r") as f:
        print("def pairs")
        defs = json.load(f)
        print("defs size:", len(defs))

    with open('./data/equalize_pairs.json', "r") as f:
        print("equalize pairs")
        equalize_pairs = json.load(f)

    with open('./data/gender_specific_seed.json', "r") as f:
        print("gender specific")
        gender_specific_words = json.load(f)
        
    E_hard = copy.deepcopy(E)     
    for def_pair in defs:
        if not def_pair[0] in E_hard.words or not def_pair[1] in E_hard.words:
            defs.remove(def_pair)
    print("defs size:", len(defs))
    hard_debias(E_hard, gender_specific_words, defs, equalize_pairs)

def pairs
defs size: 10
equalize pairs
gender specific
defs size: 9
26423 words of dimension 300 : ,, the, ., and, ..., Exclusive, Dolan, hemp, solicit
{('Councilman', 'Councilwoman'), ('Fraternity', 'Sorority'), ('Father', 'Mother'), ('BOYS', 'GIRLS'), ('MALES', 'FEMALES'), ('brothers', 'sisters'), ('men', 'women'), ('GRANDPA', 'GRANDMA'), ('fatherhood', 'motherhood'), ('King', 'Queen'), ('Prince', 'Princess'), ('GENTLEMAN', 'LADY'), ('CATHOLIC_PRIEST', 'NUN'), ('Colt', 'Filly'), ('FRATERNITY', 'SORORITY'), ('FATHERHOOD', 'MOTHERHOOD'), ('Sons', 'Daughters'), ('fraternity', 'sorority'), ('uncle', 'aunt'), ('he', 'she'), ('MONASTERY', 'CONVENT'), ('Uncle', 'Aunt'), ('boys', 'girls'), ('DUDES', 'GALS'), ('PRINCE', 'PRINCESS'), ('MAN', 'WOMAN'), ('Fella', 'Granny'), ('Twin_Brother', 'Twin_Sister'), ('SCHOOLBOY', 'SCHOOLGIRL'), ('Fatherhood', 'Motherhood'), ('Congressman', 'Congresswoman'), ('dads', 'moms'), ('Wives', 'Husbands'), ('gentleman', 'lady'), ('Dudes', 'Gals'), ('FATHER', 'MOTH

In [14]:
# Evaluate for hard-debiased
result_hard_debiased = benchmark.evaluate(E_hard, "'Hard-debiased', FastText")

Processing batch 1 of 40
Processing batch 2 of 40
Processing batch 3 of 40
Processing batch 4 of 40
Processing batch 5 of 40
Processing batch 6 of 40
Processing batch 7 of 40
Processing batch 8 of 40
Processing batch 9 of 40
Processing batch 10 of 40
Processing batch 11 of 40
Processing batch 12 of 40
Processing batch 13 of 40
Processing batch 14 of 40
Processing batch 15 of 40
Processing batch 16 of 40
Processing batch 17 of 40
Processing batch 18 of 40
Processing batch 19 of 40
Processing batch 20 of 40
Processing batch 21 of 40
Processing batch 22 of 40
Processing batch 23 of 40
Processing batch 24 of 40
Processing batch 25 of 40
Processing batch 26 of 40
Processing batch 27 of 40
Processing batch 28 of 40
Processing batch 29 of 40
Processing batch 30 of 40
Processing batch 31 of 40
Processing batch 32 of 40
Processing batch 33 of 40
Processing batch 34 of 40
Processing batch 35 of 40
Processing batch 36 of 40
Processing batch 37 of 40
Processing batch 38 of 40
Processing batch 39 o

### Step 2b: Soft debiased


In [15]:
from debiaswe.debias import soft_debias

# Path for soft_debiased embedding file 
soft_embedding_file = './embeddings/fasttext_wiki-news-300d-1M_soft_debiased.vec'

In [16]:
if os.path.exists(hard_embedding_file):
    E_soft = WordEmbedding(soft_embedding_file)
else:
    E_soft = copy.deepcopy(E)  
    soft_debias(E_soft, gender_specific_words, defs, log=False)

In [17]:
# Evaluate for soft-debiased
result_soft_debiased = benchmark.evaluate(E_soft, "'Soft-debiased', FastText")

Processing batch 1 of 40
Processing batch 2 of 40
Processing batch 3 of 40
Processing batch 4 of 40
Processing batch 5 of 40
Processing batch 6 of 40
Processing batch 7 of 40
Processing batch 8 of 40
Processing batch 9 of 40
Processing batch 10 of 40
Processing batch 11 of 40
Processing batch 12 of 40
Processing batch 13 of 40
Processing batch 14 of 40
Processing batch 15 of 40
Processing batch 16 of 40
Processing batch 17 of 40
Processing batch 18 of 40
Processing batch 19 of 40
Processing batch 20 of 40
Processing batch 21 of 40
Processing batch 22 of 40
Processing batch 23 of 40
Processing batch 24 of 40
Processing batch 25 of 40
Processing batch 26 of 40
Processing batch 27 of 40
Processing batch 28 of 40
Processing batch 29 of 40
Processing batch 30 of 40
Processing batch 31 of 40
Processing batch 32 of 40
Processing batch 33 of 40
Processing batch 34 of 40
Processing batch 35 of 40
Processing batch 36 of 40
Processing batch 37 of 40
Processing batch 38 of 40
Processing batch 39 o

In [18]:
benchmark.pprint_compare([result_original, result_hard_debiased, result_soft_debiased], ["Before", "Hard-debiased", "Soft-debiased"], "FastText")

+----------------------------------------------------------------------------+
|                        Results for FastText dataset                        |
+---------------+-------------------+-------------------+--------------------+
|     Score     |      EN-RG-65     |   EN-WS-353-ALL   |    MSR-analogy     |
+---------------+-------------------+-------------------+--------------------+
|     Before    |  81.9172617497061 | 73.31884794383402 | 54.929256047466914 |
| Hard-debiased | 81.50096849406106 | 73.33134048563385 | 55.04335919671382  |
| Soft-debiased | 73.67812439839813 | 59.63640869764809 | 18.142400730260157 |
+---------------+-------------------+-------------------+--------------------+
