## Evaluation with Standard Benchmarks: Coherence
### Using evaluation tool for word embeddings

Here, we apply standard benchmarks on coherence on w2v and debiased w2v.

Sources:

#### RG: H. Rubenstein and J. B. Goodenough. Contextual correlates of synonymy. Communications of the ACM, 8(10):627–633, 1965.

####  WS: L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin. Placing search in context: The concept  revisited. In WWW. ACM, 2001.

####  Wordsim benchmarks - Code adapted from source - embedding-evaluation: https://github.com/k-kawakami/embedding-evaluation


In [None]:
# Subset of GoogleNews-vectors:
# https://drive.google.com/file/d/1NH6jcrg8SXbnhpIXRIXF_-KUE7wGxGaG/view?usp=sharing

# For full embeddings:
# Download embeddings at https://github.com/tolga-b/debiaswe and put them on the following directory
# embeddings/GoogleNews-vectors-negative300-hard-debiased.bin
# embeddings/GoogleNews-vectors-negative300.bin

In [None]:
from __future__ import print_function, division
%matplotlib inline
from matplotlib import pyplot as plt
import json
import random
import numpy as np

import debiaswe as dwe
import debiaswe.we as we
from debiaswe.we import WordEmbedding
from debiaswe.data import load_professions

from debiaswe.benchmarks import Benchmark

# Small w2vNEWS set

## 1: original word embeddings on RG & WS

In [None]:
# Load google news word2vec
E = WordEmbedding('./embeddings/w2v_gnews_small.txt')
# Evaluate
benchmark = Benchmark()
result_original = benchmark.evaluate(E, "'Before'")

## 2: Debiased word embeddings on RG & WS


### Step 2a: Hard debiased

In [None]:
from debiaswe.debias import hard_debias

In [None]:
with open('./data/definitional_pairs.json', "r") as f:
    defs = json.load(f)

with open('./data/equalize_pairs.json', "r") as f:
    equalize_pairs = json.load(f)

with open('./data/gender_specific_seed.json', "r") as f:
    gender_specific_words = json.load(f)

In [None]:
%%capture
hard_debias(E, gender_specific_words, defs, equalize_pairs)

In [None]:
# Evaluate for hard-debiased
result_hard_debiased = benchmark.evaluate(E, "'Hard-debiased'")

### Step 2b: Soft debiased


In [None]:
from debiaswe.debias import soft_debias
E = WordEmbedding('./embeddings/w2v_gnews_small.txt')

In [None]:
%%capture
soft_debias(E, gender_specific_words, defs, log=False)

In [None]:
# Evaluate for hard-debiased
result_soft_debiased = benchmark.evaluate(E, "'Soft-debiased'")

In [None]:
benchmark.pprint_compare([result_original, result_hard_debiased, result_soft_debiased], ["Before", "Hard-debiased", "Soft-debiased"], "small")

# Full W2vNEWS set

## 1: original word embeddings on RG & WS

### Wordsim benchmarks
Code adapted from source 

#### embedding-evaluation: https://github.com/k-kawakami/embedding-evaluation

In [None]:
# Load google news word2vec
E = WordEmbedding('./embeddings/GoogleNews-vectors-negative300.bin')
# Evaluate
benchmark = Benchmark()
result_original = benchmark.evaluate(E, "'Before'")

## 2: Debiased word embeddings on RG & WS


### Step 2a: Hard debiased

In [None]:
from debiaswe.debias import hard_debias

In [None]:
with open('./data/definitional_pairs.json', "r") as f:
    defs = json.load(f)

with open('./data/equalize_pairs.json', "r") as f:
    equalize_pairs = json.load(f)

with open('./data/gender_specific_seed.json', "r") as f:
    gender_specific_words = json.load(f)

In [None]:
%%capture
hard_debias(E, gender_specific_words, defs, equalize_pairs)

In [None]:
# Evaluate for hard-debiased
result_hard_debiased = benchmark.evaluate(E, "'Hard-debiased'")

### Step 2b: Soft debiased

In [None]:
from debiaswe.debias import soft_debias
E = WordEmbedding('./embeddings/GoogleNews-vectors-negative300.bin')

In [None]:
%%capture
soft_debias(E, gender_specific_words, defs, log=False)

In [None]:
# Evaluate for hard-debiased
result_soft_debiased = benchmark.evaluate(E, "'Soft-debiased'")

In [None]:
benchmark.pprint_compare([result_original, result_hard_debiased, result_soft_debiased], ["Before", "Hard-debiased", "Soft-debiased"], "full")