## Evaluation with Standard Benchmarks: Coherence
### Using evaluation tool for word embeddings

Here, we apply standard benchmarks on coherence on w2v and debiased w2v.

Sources:

#### RG: H. Rubenstein and J. B. Goodenough. Contextual correlates of synonymy. Communications of the ACM, 8(10):627–633, 1965.

####  WS: L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin. Placing search in context: The concept  revisited. In WWW. ACM, 2001.

####  Wordsim benchmarks - Code adapted from source - embedding-evaluation: https://github.com/k-kawakami/embedding-evaluation


In [1]:
# Subset of GoogleNews-vectors:
# https://drive.google.com/file/d/1NH6jcrg8SXbnhpIXRIXF_-KUE7wGxGaG/view?usp=sharing

# For full embeddings:
# Download embeddings at https://github.com/tolga-b/debiaswe and put them on the following directory
# embeddings/GoogleNews-vectors-negative300-hard-debiased.bin
# embeddings/GoogleNews-vectors-negative300.bin

In [2]:
from __future__ import print_function, division
%matplotlib inline
from matplotlib import pyplot as plt
import json
import random
import numpy as np
import os
import copy

import debiaswe as dwe
import debiaswe.we as we
from debiaswe.we import WordEmbedding
from debiaswe.data import load_professions

from debiaswe.benchmarks import Benchmark

# FastText

## 1: original word embeddings on RG & WS

In [3]:
import string
# Obtain small FastText embeddings set
E = WordEmbedding('./embeddings/fasttext_wiki-news-300d-1M.vec', limit=50000)
E.filter_words(lambda x: x.lower()==x)

count_long = 0
for word in E.words:
    if len(word)>19:
        count_long += 1 
print("longer than 19:", count_long, "next:", len(E.words) - count_long)
        
E.filter_words(lambda x: len(x)<20)

count_dig_words = 0
for word in E.words:
    dig = False
    for c in word:
        if c.isdigit():
            dig = True
    if dig:
        count_dig_words += 1
print("count dig words:", count_dig_words, "so next:", len(E.words) - count_dig_words)
        
E.filter_words(lambda x: not any((char.isdigit() for char in x)))

count = 0
for word in E.words:
    char = False
    for c in word:
        if c in string.punctuation:
            char = True
    if char:
        count += 1
print("count punctuation words:", count, "so next:", len(E.words) - count)

E.filter_words((lambda x: not any((char in set(string.punctuation)) for char in x)))
E.save('./embeddings/fasttext_wiki-news-300d_small.txt')

*** Reading data from ./embeddings/fasttext_wiki-news-300d-1M.vec
Got weird line 999994 300

(50000, 300)
50000 words of dimension 300 : ,, the, ., and, ..., Winston-Salem, anion, JSP, Postecoglou
50000 words of dimension 300 : ,, the, ., and, ..., Winston-Salem, anion, JSP, Postecoglou
29950 words of dimension 300 : ,, the, ., and, ..., circumscribed, whos, salvaging, anion
longer than 19: 3 next: 29947
29947 words of dimension 300 : ,, the, ., and, ..., circumscribed, whos, salvaging, anion
count dig words: 1780 so next: 28167
28167 words of dimension 300 : ,, the, ., and, ..., circumscribed, whos, salvaging, anion
count punctuation words: 1153 so next: 27014
27014 words of dimension 300 : the, and, of, to, ..., circumscribed, whos, salvaging, anion
Wrote 27014 words to ./embeddings/fasttext_wiki-news-300d_small.txt


In [4]:
# Evaluate
benchmark = Benchmark()
result_original = benchmark.evaluate(E, "'Before', FastText")

Processing batch 1 of 40
Processing batch 2 of 40
Processing batch 3 of 40
Processing batch 4 of 40
Processing batch 5 of 40
Processing batch 6 of 40
Processing batch 7 of 40
Processing batch 8 of 40
Processing batch 9 of 40
Processing batch 10 of 40
Processing batch 11 of 40
Processing batch 12 of 40
Processing batch 13 of 40
Processing batch 14 of 40
Processing batch 15 of 40
Processing batch 16 of 40
Processing batch 17 of 40
Processing batch 18 of 40
Processing batch 19 of 40
Processing batch 20 of 40
Processing batch 21 of 40
Processing batch 22 of 40
Processing batch 23 of 40
Processing batch 24 of 40
Processing batch 25 of 40
Processing batch 26 of 40
Processing batch 27 of 40
Processing batch 28 of 40
Processing batch 29 of 40
Processing batch 30 of 40
Processing batch 31 of 40
Processing batch 32 of 40
Processing batch 33 of 40
Processing batch 34 of 40
Processing batch 35 of 40
Processing batch 36 of 40
Processing batch 37 of 40
Processing batch 38 of 40
Processing batch 39 o

## 2: Debiased word embeddings on RG & WS


### Step 2a: Hard debiased

In [5]:
from debiaswe.debias import hard_debias

# Path for hard_debiased embedding file 
hard_embedding_file = './embeddings/fasttext_wiki-news-300d-1M_hard_debiased.vec'

In [6]:
if os.path.exists(hard_embedding_file):
    E_hard = WordEmbedding(hard_embedding_file)

else:
    with open('./data/definitional_pairs.json', "r") as f:
        defs = json.load(f)
        print("def pairs size:", len(defs))

    with open('./data/equalize_pairs.json', "r") as f:
        equalize_pairs = json.load(f)

    with open('./data/gender_specific_seed.json', "r") as f:
        gender_specific_words = json.load(f)
        
    E_hard = copy.deepcopy(E)    
    
    # If not all def pairs present
    for def_pair in defs:
        if not def_pair[0] in E_hard.words or not def_pair[1] in E_hard.words:
            defs.remove(def_pair)
    print("defs pair size:", len(defs))
    
    hard_debias(E_hard, gender_specific_words, defs, equalize_pairs)

def pairs size: 10
defs size: 9
27014 words of dimension 300 : the, and, of, to, ..., circumscribed, whos, salvaging, anion
{('FRATERNITY', 'SORORITY'), ('fella', 'granny'), ('Grandfather', 'Grandmother'), ('Gelding', 'Mare'), ('COLT', 'FILLY'), ('Schoolboy', 'Schoolgirl'), ('CHAIRMAN', 'CHAIRWOMAN'), ('grandfather', 'grandmother'), ('SPOKESMAN', 'SPOKESWOMAN'), ('male', 'female'), ('MALES', 'FEMALES'), ('SCHOOLBOY', 'SCHOOLGIRL'), ('Grandson', 'Granddaughter'), ('KINGS', 'QUEENS'), ('brothers', 'sisters'), ('BOYS', 'GIRLS'), ('fraternity', 'sorority'), ('prostate_cancer', 'ovarian_cancer'), ('Father', 'Mother'), ('BOY', 'GIRL'), ('PRINCE', 'PRINCESS'), ('MAN', 'WOMAN'), ('testosterone', 'estrogen'), ('GRANDPA', 'GRANDMA'), ('Uncle', 'Aunt'), ('men', 'women'), ('Nephew', 'Niece'), ('Gentleman', 'Lady'), ('twin_brother', 'twin_sister'), ('Prince', 'Princess'), ('Prostate_Cancer', 'Ovarian_Cancer'), ('Sons', 'Daughters'), ('councilman', 'councilwoman'), ('gentlemen', 'ladies'), ('congres

In [7]:
# Evaluate for hard-debiased
result_hard_debiased = benchmark.evaluate(E_hard, "'Hard-debiased', FastText")

Processing batch 1 of 40
Processing batch 2 of 40
Processing batch 3 of 40
Processing batch 4 of 40
Processing batch 5 of 40
Processing batch 6 of 40
Processing batch 7 of 40
Processing batch 8 of 40
Processing batch 9 of 40
Processing batch 10 of 40
Processing batch 11 of 40
Processing batch 12 of 40
Processing batch 13 of 40
Processing batch 14 of 40
Processing batch 15 of 40
Processing batch 16 of 40
Processing batch 17 of 40
Processing batch 18 of 40
Processing batch 19 of 40
Processing batch 20 of 40
Processing batch 21 of 40
Processing batch 22 of 40
Processing batch 23 of 40
Processing batch 24 of 40
Processing batch 25 of 40
Processing batch 26 of 40
Processing batch 27 of 40
Processing batch 28 of 40
Processing batch 29 of 40
Processing batch 30 of 40
Processing batch 31 of 40
Processing batch 32 of 40
Processing batch 33 of 40
Processing batch 34 of 40
Processing batch 35 of 40
Processing batch 36 of 40
Processing batch 37 of 40
Processing batch 38 of 40
Processing batch 39 o

### Step 2b: Soft debiased


In [8]:
from debiaswe.debias import soft_debias

# Path for soft_debiased embedding file 
soft_embedding_file = './embeddings/fasttext_wiki-news-300d-1M_soft_debiased.vec'

In [9]:
if os.path.exists(hard_embedding_file):
    E_soft = WordEmbedding(soft_embedding_file)
else:
    E_soft = copy.deepcopy(E)  
    soft_debias(E_soft, gender_specific_words, defs, log=True, epochs = 5000, lr=0.01, decrease_times=[])

Loss @ Epoch #0: 2370557.5
Loss @ Epoch #100: 230118.8125
Loss @ Epoch #200: 118769.8828125
Loss @ Epoch #300: 70960.890625
Loss @ Epoch #400: 46132.4375
Loss @ Epoch #500: 31855.505859375
Loss @ Epoch #600: 23072.251953125
Loss @ Epoch #700: 17380.1953125
Loss @ Epoch #800: 13527.67578125
Loss @ Epoch #900: 10818.865234375
Loss @ Epoch #1000: 8847.076171875
Loss @ Epoch #1100: 7365.91943359375
Loss @ Epoch #1200: 6221.6064453125
Loss @ Epoch #1300: 5315.6474609375
Loss @ Epoch #1400: 4583.33251953125
Loss @ Epoch #1500: 3980.914306640625
Loss @ Epoch #1600: 3477.994140625
Loss @ Epoch #1700: 3052.818115234375
Loss @ Epoch #1800: 2689.550537109375
Loss @ Epoch #1900: 2376.046142578125
Loss @ Epoch #2000: 2103.907958984375
Loss @ Epoch #2100: 1865.5926513671875
Loss @ Epoch #2200: 1656.3790283203125
Loss @ Epoch #2300: 1471.3201904296875
Loss @ Epoch #2400: 1318.3330078125
Loss @ Epoch #2500: 1222.083251953125
Loss @ Epoch #2600: 1038.669921875
Loss @ Epoch #2700: 921.4835815429688
Loss

In [10]:
# Evaluate for soft-debiased
result_soft_debiased = benchmark.evaluate(E_soft, "'Soft-debiased', FastText")

Processing batch 1 of 40
Processing batch 2 of 40
Processing batch 3 of 40
Processing batch 4 of 40
Processing batch 5 of 40
Processing batch 6 of 40
Processing batch 7 of 40
Processing batch 8 of 40
Processing batch 9 of 40
Processing batch 10 of 40
Processing batch 11 of 40
Processing batch 12 of 40
Processing batch 13 of 40
Processing batch 14 of 40
Processing batch 15 of 40
Processing batch 16 of 40
Processing batch 17 of 40
Processing batch 18 of 40
Processing batch 19 of 40
Processing batch 20 of 40
Processing batch 21 of 40
Processing batch 22 of 40
Processing batch 23 of 40
Processing batch 24 of 40
Processing batch 25 of 40
Processing batch 26 of 40
Processing batch 27 of 40
Processing batch 28 of 40
Processing batch 29 of 40
Processing batch 30 of 40
Processing batch 31 of 40
Processing batch 32 of 40
Processing batch 33 of 40
Processing batch 34 of 40
Processing batch 35 of 40
Processing batch 36 of 40
Processing batch 37 of 40
Processing batch 38 of 40
Processing batch 39 o

In [11]:
benchmark.pprint_compare([result_original, result_hard_debiased, result_soft_debiased], ["Before", "Hard-debiased", "Soft-debiased"], "FastText")

+---------------------------------------------------------------------------+
|                        Results for FastText dataset                       |
+---------------+-------------------+-------------------+-------------------+
|     Score     |      EN-RG-65     |   EN-WS-353-ALL   |    MSR-analogy    |
+---------------+-------------------+-------------------+-------------------+
|     Before    | 83.86622701863348 | 74.10786418997199 | 55.87019429516329 |
| Hard-debiased | 83.50735694621694 | 74.17924052453014 | 55.99421248449773 |
| Soft-debiased | 84.49852190812925 | 73.82362901883974 | 54.42331541959488 |
+---------------+-------------------+-------------------+-------------------+
