## Evaluation: Standard Benchmarks: Coherencene
### Using evaluation tool for word embeddings

Here, we apply standard benchmarks on coherence on w2v and debiased w2v.

In [9]:
# Setup:
# Clone the code repository from https://github.com/tolga-b/debiaswe.git
# mkdir debiaswe_tutorial
# cd debiaswe_tutorial
# git clone https://github.com/tolga-b/debiaswe.git

# To reduce the time of downloading data, we provide as subset of GoogleNews-vectors in the following location:
# https://drive.google.com/file/d/1NH6jcrg8SXbnhpIXRIXF_-KUE7wGxGaG/view?usp=sharing

# For full embeddings:
# Download embeddings at https://github.com/tolga-b/debiaswe and put them on the following directory
# embeddings/GoogleNews-vectors-negative300-hard-debiased.bin
# embeddings/GoogleNews-vectors-negative300.bin

In [10]:
from __future__ import print_function, division
%matplotlib inline
from matplotlib import pyplot as plt
import json
import random
import numpy as np

import debiaswe as dwe
import debiaswe.we as we
from debiaswe.we import WordEmbedding
from debiaswe.data import load_professions

import benchmarks as benchmarks
from benchmarks.wordsim.wordsim import Wordsim

## Part 1 & 2: W2vNEWS RG & WS

Sources:


RG: H. Rubenstein and J. B. Goodenough. Contextual correlates of synonymy. Communications of the ACM, 8(10):627–633, 1965.

WS: L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin. Placing search in context: The concept  revisited. In WWW. ACM, 2001.

## Step 1: "Before"

We first load the word embedding trained on a corpus of Google News texts consisting of 3 million English words and terms. The embedding maps each word into a 300-dimension vector. 


In [11]:
# load google news word2vec
E = WordEmbedding('./embeddings/w2v_gnews_small.txt')
print(E)
words = E.words
print("Words:", len(words))

*** Reading data from ./embeddings/w2v_gnews_small.txt
(26423, 300)
26423 words of dimension 300 : in, for, that, is, ..., Jay, Leroy, Brad, Jermaine
<debiaswe.we.WordEmbedding object at 0x7f58ae1a9210>
Words: 26423


### Wordsim benchmarks
Source: 

embedding-evaluation

https://github.com/k-kawakami/embedding-evaluation

In [12]:
wordsim = Wordsim("en")
word2vec = wordsim.load_vector('./embeddings/w2v_gnews_small.txt')
result = wordsim.evaluate(word2vec)
wordsim.pprint(result)

load_vector
loading vector...
loaded vector 26423 words found ..
+----------------+-------+-----------+-------------------+
| Dataset        | Found | Not Found |    Score (rho)    |
+----------------+-------+-----------+-------------------+
| EN-MTurk-771   |  759  |     12    | 67.38962809521252 |
| EN-WS-353-REL  |  231  |     21    | 59.83270678776218 |
| EN-MC-30       |   26  |     4     | 81.66895142451415 |
| EN-WS-353-ALL  |  318  |     35    | 68.82719646959825 |
| EN-MEN-TR-3k   |  2510 |    490    | 77.43180975336674 |
| EN-MTurk-287   |  199  |     88    | 69.62042203592812 |
| EN-WS-353-SIM  |  182  |     21    | 76.86014910176465 |
| EN-YP-130      |  116  |     14    | 52.02603844739614 |
| EN-RG-65       |   53  |     12    | 77.66555804950227 |
| EN-RW-STANFORD |  460  |    1574   | 65.46249805462735 |
+----------------+-------+-----------+-------------------+


## Step 2: Debiased 

You can use debiaswe debias function to do the debiasing with word sets of your choosing

You can leave equalize_pairs and gender_specific_words blank when coming up with your own groups. We give an example for the case of gender below for you to warm up.

### Step 2a: Hard debiased

In [13]:
from debiaswe.debias import debias

In [14]:
# Lets load some gender related word lists to help us with debiasing
with open('./data/definitional_pairs.json', "r") as f:
    defs = json.load(f)
print("definitional", defs)

with open('./data/equalize_pairs.json', "r") as f:
    equalize_pairs = json.load(f)

with open('./data/gender_specific_seed.json', "r") as f:
    gender_specific_words = json.load(f)
print("gender specific", len(gender_specific_words), gender_specific_words[:10])

definitional [['woman', 'man'], ['girl', 'boy'], ['she', 'he'], ['mother', 'father'], ['daughter', 'son'], ['gal', 'guy'], ['female', 'male'], ['her', 'his'], ['herself', 'himself'], ['Mary', 'John']]
gender specific 218 ['actress', 'actresses', 'aunt', 'aunts', 'bachelor', 'ballerina', 'barbershop', 'baritone', 'beard', 'beards']


In [15]:
debias(E, gender_specific_words, defs, equalize_pairs)
print("Saving to file...")
E.save('./embeddings/w2v_gnews_debiased_small.txt')
print("\n\nDone!\n")

26423 words of dimension 300 : in, for, that, is, ..., Jay, Leroy, Brad, Jermaine
{('boys', 'girls'), ('BROTHERS', 'SISTERS'), ('king', 'queen'), ('Colt', 'Filly'), ('FELLA', 'GRANNY'), ('King', 'Queen'), ('gentleman', 'lady'), ('SONS', 'DAUGHTERS'), ('wives', 'husbands'), ('Catholic_Priest', 'Nun'), ('MAN', 'WOMAN'), ('Gelding', 'Mare'), ('male', 'female'), ('HIMSELF', 'HERSELF'), ('TWIN_BROTHER', 'TWIN_SISTER'), ('Businessman', 'Businesswoman'), ('Fathers', 'Mothers'), ('fraternity', 'sorority'), ('men', 'women'), ('spokesman', 'spokeswoman'), ('BROTHER', 'SISTER'), ('Boys', 'Girls'), ('Wives', 'Husbands'), ('Schoolboy', 'Schoolgirl'), ('KINGS', 'QUEENS'), ('Grandsons', 'Granddaughters'), ('Gentleman', 'Lady'), ('Fatherhood', 'Motherhood'), ('Nephew', 'Niece'), ('Ex_Girlfriend', 'Ex_Boyfriend'), ('WIVES', 'HUSBANDS'), ('FATHER', 'MOTHER'), ('testosterone', 'estrogen'), ('PROSTATE_CANCER', 'OVARIAN_CANCER'), ('uncle', 'aunt'), ('GENTLEMAN', 'LADY'), ('FATHERHOOD', 'MOTHERHOOD'), ('SON

In [16]:
# Do benchmark
wordsim = Wordsim("en")
word2vec = wordsim.load_vector('./embeddings/w2v_gnews_debiased_small.txt')
result = wordsim.evaluate(word2vec)
wordsim.pprint(result)

load_vector
loading vector...
loaded vector 26423 words found ..
+----------------+-------+-----------+--------------------+
| Dataset        | Found | Not Found |    Score (rho)     |
+----------------+-------+-----------+--------------------+
| EN-MTurk-771   |  759  |     12    | 67.51685855110493  |
| EN-WS-353-REL  |  231  |     21    | 59.528904562337026 |
| EN-MC-30       |   26  |     4     | 81.63475169611193  |
| EN-WS-353-ALL  |  318  |     35    | 68.52623098234018  |
| EN-MEN-TR-3k   |  2510 |    490    | 77.31260859852208  |
| EN-MTurk-287   |  199  |     88    | 69.61341678196776  |
| EN-WS-353-SIM  |  182  |     21    | 76.57438959936287  |
| EN-YP-130      |  116  |     14    | 51.988688039001694 |
| EN-RG-65       |   53  |     12    | 77.49622028082247  |
| EN-RW-STANFORD |  460  |    1574   |  65.4806097177497  |
+----------------+-------+-----------+--------------------+
