# Unsupervised Word Sense Disambiguation on top of word2vec

This projet is based on the paper "A Simple Approach to Learn Polysemous Word Embeddings" : https://arxiv.org/abs/1707.01793v1

The main objective of this project is to disambiguate polysemous words using a very simple approach. The method we use is built on top of word2vec model that is trained on wikicorp corpus (http://nlp.stanford.edu/data/WestburyLab.wikicorp.201004.txt.bz2) . The vectors generated by this word2vec model are considered as **base vector embedding**, because every word has only one vector corresponding to it, as there is no consideration for poysemous words in this kind of model. 

Given a target **word** and its **context** we can construct a **contextual embedding** vector as a linear combination of weighted sum of its corresponding context words' **base embedding** vectors. 

To achieve the above mentioned, the method is to first build a matrix W of V*V dimensions where V is the total vocabulary of the selected corpus. In the paper above, W[i,j] is defined as,

W[i,j]=co-occurence of words i and j /(freq._of_word_i * freq_of_word_j)

This matrix gives us a score of what words occur together frequently, meaning it tells us whether word j is a relevant context of the word i.

I have been able to construct this **Matrix W using brown corpus**, even though I'm using a wiki corpus to train the gensim word2vec, Im using brown corpus to construct W because W takes up a huge amout of memory to store and brown is a smaller corpus suitable for this.

According to the method if we have two usages of the word bank,
example: (1)I was sitting by the side of the River bank.
         (2)I need to withdraw my savings from the bank.

then vector,
**bank1 =W[sitting,bank]*vector(sitting) + W[side,bank]*vector(side) + W[River,bank]*vector(river)**

and,

**bank2 = W[need,bank]*vector(need) + W[withdraw,bank]*vector(withdraw) + W[savings,bank]*vector(savings)**

The above representations mean that the target words can be represented as the sum of the vectors of their context. Here the vectors are directly taken from the word embedding (word2vec) trained on the Google news model. Also note that W[i,j] acts as a weight to each vector in the context to weigh the context word according to its relevance.



# Steps 

Lets import the wiki corpus that is in the same directory as this ipynb file

In [1]:
from model_utilities import remove_stopword,create_V,create_W2,Normalise_W,read_scws
import bz2

wiki = bz2.open('WestburyLab.wikicorp.201004.txt.bz2','rt',encoding = 'utf-8')


We read the first 10,00,000 sentences from the corpus

In [2]:
import nltk
from nltk.tokenize import sent_tokenize, word_tokenize

lines = []
count = 0
while(count<1000000):
    line = wiki.readline()
    
    if not line:
        break
    line = line.strip('\n')
    line = word_tokenize(line)
    
    
    if line:
        lines.append(line)
        count+=1
    
    
        
        

## Removing stopwords from wiki corpus

In [3]:



stopwordfree_wiki = []

for sentence in lines:
    stopword_free = remove_stopword(sentence)
    stopwordfree_wiki.append(stopword_free)

## gensim word2vec

In [5]:
from gensim.models.word2vec import Word2Vec
from gensim.models import KeyedVectors
from gensim.test.utils import datapath
import gensim.downloader as api

In [6]:
import numpy as np

In [7]:
import logging
logging.basicConfig(format='%(asctime)s: %(levelname)s: %(message)s',level=logging.INFO)

Train the model on the corpus which is free of stopwords using word2vec , we use 5 epochs, and window size=5 with negative sampling of 5. For more information on this model refer to this paper : word2vec Parameter Learning Explained
by Xin Rong , https://arxiv.org/pdf/1411.2738v3.pdf


In [8]:
model= Word2Vec(stopwordfree_wiki, size=100, window=5, min_count=5, hs=0, negative=5, sg=1 ,iter=5, workers=4  )

2021-05-20 19:11:11,684: INFO: collecting all words and their counts
2021-05-20 19:11:11,685: INFO: PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2021-05-20 19:11:11,750: INFO: PROGRESS: at sentence #10000, processed 378255 words, keeping 47470 word types
2021-05-20 19:11:11,810: INFO: PROGRESS: at sentence #20000, processed 738718 words, keeping 73214 word types
2021-05-20 19:11:11,871: INFO: PROGRESS: at sentence #30000, processed 1111881 words, keeping 94384 word types
2021-05-20 19:11:11,930: INFO: PROGRESS: at sentence #40000, processed 1461182 words, keeping 113740 word types
2021-05-20 19:11:11,991: INFO: PROGRESS: at sentence #50000, processed 1824096 words, keeping 131348 word types
2021-05-20 19:11:12,053: INFO: PROGRESS: at sentence #60000, processed 2204094 words, keeping 146853 word types
2021-05-20 19:11:12,118: INFO: PROGRESS: at sentence #70000, processed 2597150 words, keeping 161212 word types
2021-05-20 19:11:12,179: INFO: PROGRESS: at sentence #8

2021-05-20 19:11:16,295: INFO: PROGRESS: at sentence #720000, processed 26266966 words, keeping 673269 word types
2021-05-20 19:11:16,360: INFO: PROGRESS: at sentence #730000, processed 26629252 words, keeping 679687 word types
2021-05-20 19:11:16,427: INFO: PROGRESS: at sentence #740000, processed 26999138 words, keeping 685615 word types
2021-05-20 19:11:16,493: INFO: PROGRESS: at sentence #750000, processed 27350791 words, keeping 691968 word types
2021-05-20 19:11:16,554: INFO: PROGRESS: at sentence #760000, processed 27697188 words, keeping 697830 word types
2021-05-20 19:11:16,632: INFO: PROGRESS: at sentence #770000, processed 28047496 words, keeping 703453 word types
2021-05-20 19:11:16,698: INFO: PROGRESS: at sentence #780000, processed 28413802 words, keeping 709133 word types
2021-05-20 19:11:16,755: INFO: PROGRESS: at sentence #790000, processed 28758555 words, keeping 715558 word types
2021-05-20 19:11:16,824: INFO: PROGRESS: at sentence #800000, processed 29113014 words, 

2021-05-20 19:12:21,997: INFO: EPOCH 1 - PROGRESS: at 61.38% examples, 532904 words/s, in_qsize 7, out_qsize 0
2021-05-20 19:12:23,033: INFO: EPOCH 1 - PROGRESS: at 63.12% examples, 533274 words/s, in_qsize 7, out_qsize 0
2021-05-20 19:12:24,060: INFO: EPOCH 1 - PROGRESS: at 64.92% examples, 534050 words/s, in_qsize 7, out_qsize 0
2021-05-20 19:12:25,064: INFO: EPOCH 1 - PROGRESS: at 66.86% examples, 534580 words/s, in_qsize 7, out_qsize 0
2021-05-20 19:12:26,068: INFO: EPOCH 1 - PROGRESS: at 68.60% examples, 535196 words/s, in_qsize 6, out_qsize 1
2021-05-20 19:12:27,085: INFO: EPOCH 1 - PROGRESS: at 70.22% examples, 535612 words/s, in_qsize 7, out_qsize 0
2021-05-20 19:12:28,123: INFO: EPOCH 1 - PROGRESS: at 71.91% examples, 535458 words/s, in_qsize 8, out_qsize 0
2021-05-20 19:12:29,130: INFO: EPOCH 1 - PROGRESS: at 73.58% examples, 535758 words/s, in_qsize 7, out_qsize 0
2021-05-20 19:12:30,149: INFO: EPOCH 1 - PROGRESS: at 75.35% examples, 536273 words/s, in_qsize 7, out_qsize 0
2

2021-05-20 19:13:33,934: INFO: EPOCH 2 - PROGRESS: at 79.98% examples, 543057 words/s, in_qsize 7, out_qsize 0
2021-05-20 19:13:34,937: INFO: EPOCH 2 - PROGRESS: at 81.56% examples, 542639 words/s, in_qsize 7, out_qsize 0
2021-05-20 19:13:35,943: INFO: EPOCH 2 - PROGRESS: at 83.22% examples, 542558 words/s, in_qsize 7, out_qsize 0
2021-05-20 19:13:36,961: INFO: EPOCH 2 - PROGRESS: at 85.02% examples, 542915 words/s, in_qsize 7, out_qsize 0
2021-05-20 19:13:37,964: INFO: EPOCH 2 - PROGRESS: at 86.88% examples, 543279 words/s, in_qsize 7, out_qsize 0
2021-05-20 19:13:38,969: INFO: EPOCH 2 - PROGRESS: at 88.63% examples, 543576 words/s, in_qsize 7, out_qsize 0
2021-05-20 19:13:39,990: INFO: EPOCH 2 - PROGRESS: at 90.48% examples, 543827 words/s, in_qsize 7, out_qsize 0
2021-05-20 19:13:40,992: INFO: EPOCH 2 - PROGRESS: at 92.20% examples, 543994 words/s, in_qsize 7, out_qsize 0
2021-05-20 19:13:42,001: INFO: EPOCH 2 - PROGRESS: at 94.05% examples, 544333 words/s, in_qsize 7, out_qsize 0
2

2021-05-20 19:14:45,234: INFO: EPOCH 3 - PROGRESS: at 98.32% examples, 538991 words/s, in_qsize 7, out_qsize 0
2021-05-20 19:14:46,236: INFO: EPOCH 3 - PROGRESS: at 99.90% examples, 538003 words/s, in_qsize 4, out_qsize 0
2021-05-20 19:14:46,242: INFO: worker thread finished; awaiting finish of 3 more threads
2021-05-20 19:14:46,250: INFO: worker thread finished; awaiting finish of 2 more threads
2021-05-20 19:14:46,252: INFO: worker thread finished; awaiting finish of 1 more threads
2021-05-20 19:14:46,284: INFO: worker thread finished; awaiting finish of 0 more threads
2021-05-20 19:14:46,285: INFO: EPOCH - 3 : training on 36072662 raw words (32794062 effective words) took 61.0s, 538024 effective words/s
2021-05-20 19:14:47,290: INFO: EPOCH 4 - PROGRESS: at 1.49% examples, 505295 words/s, in_qsize 7, out_qsize 0
2021-05-20 19:14:48,301: INFO: EPOCH 4 - PROGRESS: at 3.18% examples, 529644 words/s, in_qsize 7, out_qsize 0
2021-05-20 19:14:49,316: INFO: EPOCH 4 - PROGRESS: at 4.95% exam

2021-05-20 19:15:51,878: INFO: EPOCH 5 - PROGRESS: at 6.12% examples, 503180 words/s, in_qsize 7, out_qsize 0
2021-05-20 19:15:52,888: INFO: EPOCH 5 - PROGRESS: at 7.76% examples, 515536 words/s, in_qsize 7, out_qsize 0
2021-05-20 19:15:53,893: INFO: EPOCH 5 - PROGRESS: at 9.38% examples, 518709 words/s, in_qsize 7, out_qsize 0
2021-05-20 19:15:54,907: INFO: EPOCH 5 - PROGRESS: at 10.84% examples, 512418 words/s, in_qsize 7, out_qsize 0
2021-05-20 19:15:55,916: INFO: EPOCH 5 - PROGRESS: at 12.42% examples, 512672 words/s, in_qsize 7, out_qsize 0
2021-05-20 19:15:56,934: INFO: EPOCH 5 - PROGRESS: at 14.09% examples, 514963 words/s, in_qsize 7, out_qsize 0
2021-05-20 19:15:57,946: INFO: EPOCH 5 - PROGRESS: at 15.63% examples, 515186 words/s, in_qsize 7, out_qsize 0
2021-05-20 19:15:58,954: INFO: EPOCH 5 - PROGRESS: at 17.03% examples, 509208 words/s, in_qsize 7, out_qsize 0
2021-05-20 19:15:59,970: INFO: EPOCH 5 - PROGRESS: at 18.40% examples, 503311 words/s, in_qsize 7, out_qsize 0
2021

In [112]:
model.save("word2vec.model")

2021-05-20 20:32:13,387: INFO: saving Word2Vec object under word2vec.model, separately None
2021-05-20 20:32:13,389: INFO: storing np array 'vectors' to word2vec.model.wv.vectors.npy
2021-05-20 20:32:13,463: INFO: not storing attribute vectors_norm
2021-05-20 20:32:13,464: INFO: storing np array 'syn1neg' to word2vec.model.trainables.syn1neg.npy
2021-05-20 20:32:13,540: INFO: not storing attribute cum_table
2021-05-20 20:32:15,393: INFO: saved word2vec.model


In [None]:
model = Word2Vec.load("word2vec.model")

### Create V_wiki and V_brown 
We create the volcabulary dictionary V_wiki, and V_brown for wiki and brown corpus trespectively.Each dictionary has keys ass words in the corpus and the frequency as values. 

In [10]:
V_wiki=create_V(stopwordfree_wiki)

In [11]:
brown1 = nltk.corpus.brown.sents()

In [12]:
stopwordfree_brown = []

for sentence in brown1:
    stopword_free = remove_stopword(sentence)
    stopwordfree_brown.append(stopword_free)
    

In [14]:
V_brown=create_V(stopwordfree_brown)

### Read scws from scws text file
SCWS stands for Stanfordâ€™s Contextual Word Similarities, this is a test set that has two (word,coxtext) pairs and have a similarity measure(ground-truth ranking) between two words annotated by human readers based on the context. 

Description:

Each line in ratings.txt consists of a pair of words, their respective contexts, the 10 individual human ratings, as well as their averages. The target word is surrounded by <b>...</b> in its context. Each line is tab-
delimited with the following format:


< id > < word1 > < POS of word1> < word2 > <POS of word2> < word1 in context> < word2 in context> < average human rating> < 10 individual human ratings>
    
source: https://www.socher.org/index.php/Main/ImprovingWordRepresentationsViaGlobalContextAndMultipleWordPrototypes

In [117]:
scws_data = read_scws(V_brown,V_wiki)

In [18]:
scws_data
    

{'1': {'target1': 'Brazil',
  'target2': 'nut',
  'context1': 'gap in income between blacks and other non-whites is relatively small compared to the large gap between whites and non-whites . Other factors such as illiteracy and education level show the same patterns . Unlike in the US where African Americans were united in the civil rights struggle , in  Brazil  the philosophy of whitening has helped divide blacks from other non-whites and prevented a more active civil rights movement . Though Afro-Brazilians make up half the population there are very few black politicians . The city of Salvador , Bahia for instance is 80 % Afro-Brazilian but has never',
  'context2': 'of the neck , bridge , and pickups , there are features which are found in almost every guitar . The photo below shows the different parts of an electric guitar . The headstock ( 1 ) contains the metal machine heads , which are used for tuning ; the  nut  ( 1.4 ) , a thin fret-like strip of metal , plastic , graphite or 

### Read the W_norm matrix
This matrix as discussed will help us find the relevance of a context word with its target word.

In [31]:
fileW = open("W_norm_final","rb")
W_norm_array = np.load(fileW)
fileW.close()


## Contextual vector (compose_vector())
This is the heart of the algorithm where we compose the contextual embedding vector, as a linear sum of base vectors of context words. Note that this sum is weighted i.e., each base vector is multiplied by W[i][j] where W is a normalised co-occurence matrix anf i are j are indexes of target word and context word in the matrix

In [32]:
def compose_vector(target,context,V_brown):
    V_brown_list = list(V_brown.keys())
    i = V_brown_list.index(target)
    count = 0
    for word in context:
        if word not in V_brown_list:
            
            context.remove(word)
            
        if word in V_brown_list:
            j = V_brown_list.index(word)
            
            weight = W_norm_array[i][j]
            
            try:

                if count == 0:
                    vector = model.wv[word]*weight
                    count+=1
                else:
                    vector += model.wv[word]*weight
                    
            except KeyError:
                continue
                
                
            
    return vector
            
        
            
            
    
    
            
            
    
            
            
            
    
    

Here we compute the contextual embedding veector for all the both the target words in each instance of SCWS test set. 

In [33]:
target_list1 = []
target_list2 = []
for key,data in scws_data.items():
    context1 = remove_stopword(data['context1'].split())
    context2 = remove_stopword(data['context2'].split())
    target1 = data['target1']
    target2 = data['target2']
    vector1 = compose_vector(target1,context1,V_brown)
    vector2 = compose_vector(target2,context2,V_brown)
    
    
    target_list1.append(vector1)
    target_list2.append(vector2)                           
    
    

non-whites
non-whites
illiteracy
non-whites
Afro-Brazilians
%
pickups
headstock
1.4
fret-like
fingerboard
genocide
2009
Armenia
Andorra
Belarus
Benin
Chad
's
Cote
Croatia
Cyprus
Timor
Estonia
Pythagorean
Pythagorean
"
"
"
c
Pythagorean
"
"
doctorates
Universiteit
Stockholm
2002
2005
Beijing
2007
2007
Institutet
2008
2009
2007
Lone
1977
's
Wald
Retardation
Career
Psychological
1974-1990
1978
1978
Retarded
1985
Excellence
1985
's
trilobite
Blackmore
metres
Fossil
Whittington
Derek
reassessment
Shale
sketchiest
vanquish
Moabites
Chemosh
Jehoram
Ahaziah
Aram
Ahaziah
Shunem
Empress
Eulsa
Japan-Korea
annexed
seizing
Kiaochow
Twenty-One
pathogens
nanowires
TNO
Bioaerosol
Particle
eQuipment
BiosparQ
bioweapons
Researchers
BioPen
"
"
ELISA
immunological
1995
Miri
"
Lashalom
1969
uprising
Intifada
Heyman
"
VeBokhim
Shoot
Hanoch
II
731
Ningbo
bubonic
Tsuneyoshi
Takahito
mastermind
Overall
$
Hispanic
2006
post-secondary
Hispanic
underrepresented
1999
"
"
healings
al-Khader
's
700,000
Tourism
's
Pa

"
squamosal
frill
paleontologist
fibula
Internet
non-commercial
DNS
ICANN
identifiers
Internet
2005
Internet
IGF
Internet-related
Zones
Holocaust
atrocity
reclassify
POWs
Disarmed
DEFs
DEFs
expedite
terminations
terminations
Ponca
"
"
"
"
Ideology
theorist
1970s-1980s
12-story
Rockwell
tallest
Dedicated
Libraries
Youngstown
1973
's
consortium
Academical
Stylistically
quadrangle
quad
"
"
Autoplan/Autotab
APLDOT
"
"
APLDOT
360/91
Applied
MD.
Conrail
"
Yu
Yu
licence
Faye
"
"
Bayshore
Yu
Yu
Zalm
hydroxide
hydroxide
impermeable
carbonate
aluminium
40.002
"
"
reentering
's
km
Satellite
accession
unfavourable
rife
offence
"
"
Screenplay
Depictions
Coen
"
"
's
's
Arizonans
"
"
"
Lebowski
conferreth
MacDougall
"
"
MacDougall
MacDougall
"
"
Gorbachev
SFSR
Yeltsin
Ivashko
's
Archive
Archive
Archive
de-Africanize
chattel
1977
"
"
"
"
"
relocate
Kibera
1990s
"
"
2001
mistreated
reconquered
statehood
%
Radicals
Wade-Davis
Radicals
Therapeutics
PG-paclitaxel
polyglutamate
polyglutamate
membranes
PG-p

Cyan
Spawn
mystically
symbiotic
delusional
Wanda
Astros
Boxes
downside
predated
Thucydides
Milesias
aristocrat
ostracised
fiberglass
competitions
incarceration
deity
Maasai
hoe
Kamba
Maasai
hoe
Kamba
"
"
kicker
"
"
docking
"
warps
dinghies
"
"
unravels
worsen
Jains
"
"
pre-requisite
Jains
"
Mantra
a.k.a.
Mantra
Mantra
Namaskar
Jains
"
"
Derasar
tirthankaras
Rituals
hierarchical
aggregated
"
"
"
"
competitions
roller-ski
"
bike
"
orienteering
"
Primitive
"
snowshoes
Scouts
"
"
jamboree
BMX
biathlon
Cadets
biathlon
competitions
Bonds
Giant
2007
left-center
2008
Bonds
relievers
Jeremy
infielders
phylogeny
Haeckel
synthesise
Lamarckism
's
Naturphilosophie
's
Haeckel
's
Lamarckian
biogenic
"
Haeckel
Lamarckian
ontogenic
Henrietta-Marie
1618
"
"
's
"
"
approximating
Masala
ghee
turmeric
Lesser
"
Invocation
"
"
Meal
"
consecrated
CLEP
DANTES
DSST
CLEP
CLEP
online
DSST
exams
Crops
millet
plethora
pancakes
kvass
vodka
Flavourful
shchi
ukha
okroshka
soups
Pirozhki
"
"
grossing
$
Scorsese
"
"
Act

's
's
Bevin
anti-communist
"
"
simplification
multipath
real-world
pathloss
Polarization
Radars
Hadl
AFL
AFL
AFL
Championship
AFL
"
"
"
AFL
foothold
roadblocks
CBS-TV
Palestinian
anthem
"
"
Sharif
Amar
2003
Gaza
hip-hop
Palestinians
Kamilya
's
"
"
GT40
Matech
GT
2008
GT3
Championship
arguably
's
SCCA
Championship
SCCA
Championship
's
Interscholastic
Alonzo
Autarchy
surnames
"
"
Sardinian
's
Sardinians
's
diverged
anti-war
protesters
protesters
"
Ho
NLF
"
"
"
"
"
multi-building
Grange
Safe
Banking
Bancshares
headquartered
1998
2004
19-17
1,001
Hostetler
Bavaro
7:32
Merlin
"
"
dismantled
"
"
Cunnington
excavated
barrows
1983
Quiwonkpa
Quiwonkpa
secretary-general
's
Unhappy
Quiwonkpa
1983
1985
solidify
Quiwonkpa
Manx
Manx
stereotypical
Manx
Symbols
's
's
Whithersoever
1666
13,000
"
Wren
1667
1681
Barbon
"
"
"
"
unwritten
Parliaments
's
"
...
taxonomy
Infinity
Metaphilosophy
formalization
formula_25
formula_26
formula_16
formula_29
formula_30
semantics
3-tuple
ID
formula_11
Signaling
931
X

2004
WV
.005
.004
neap
"
"
"
"
DR
Turing
Turing
desktop
Computer
Turing
"
"
macros
TOS
MMORPGs
MMORPGs
macro
macros
Keyboard
macros
's
macro
macros
macro
Scripting
megabytes
3D
LightWave
Blender
Amiga
software
Amiga
Eyetech
Amiga
AmigaOne
2002
A-Cube
Sam440
AmigaOS
"
"
"
Adding
commutative
ISP
's
's
content-control
desktop
desktop
chipsets
"
ATI
SiS
chipset
Athlon
X2
FX
Quad
chipset
Opteron
AMD
server
2004
AMD-8111
server
chipsets
Ochs
's
hootenannies
Ochs
venues
anti-war
Ochs
"
"
Elektra
"
's
"
"
n't
Anymore
FTA
2008
P4
Singapore
Brunei
2008
OECD
2009
Sax
impractically
sopranino
saxophonists
utilised
Haley
"
Clock
"
"
rockabilly
"
"
"
's
"
"
's
2001
2002
uncompleted
Lynne
"
"
's
2002
Surround
"
"
2003
DVD
Grammy
2006
didgeridoo
"
timbres
"
"
didgeridoos
carnage
's
Aldis
heliograph
dits
dits
wpm
's
Osiris
Wheat
devourer
annihilated
depictions
Coptic
Innocent
Innocent
convocation
mediation
Mediation
Mediation
mediation/arbitration
mediation
arbitrator
's
"
"
routing
Structure
Internet
I

"
"
"
"
"
"
"
"
"
"
"
"
onomatopoeia
"
"
queue
"
"
miaou
"
"
gooier
"
"
archaeoastronomy
"
"
community-based
Consequences
Bracks
Bracks
Bracks
44th
1999
Bracks
Bracks
Ballarat
's
X-Men
Iceman
Rogue
Magneto
demolishing
Rogue
burying
X-Men
Ultimate
Spider-Man
Ultimate
predators
reeds
Platypus
ovaries
granulation
Treskow
re-invention
1990s
jewellery
jewellery
jewellery
talismans
jewellery
Hamsa
Mason-Dixon
Linguist
connecticutie
's
"
"
CT
Mystic
decoy
Demographics
2005
abrasive
jewellery
Laces
jewellery
foils
dyes
sub-department
propmaking
stylists
Angkorean
passageways
lintels
pediment
lintel
tympanum
pediment
Angkorean
lintels
lintels
periodization
Angkorean
's
's
's
posthumously
semi-finals
Kangaroos
Adelaide
Kangaroos
Tens-of-thousands
homeless
congregating
Hoovervilles
Hooverville
foreclosures
foreclosures
%
XVI
polarised
gallicised
Defeated
1795
abdication
yellow-orange
munition
"
"
"
"
immune
trinitrotoluene
USA
Willkie
Willkie
Willkie
Willkie
's
1617
1618
"
"
Bohemia
Hardie
1989
B

H2O2
High-concentration
%
D001
's
DOT
EPA
Quantity
RQ
D001
well-ventilated
's
Seaway
DOT
FAA
DOT
bike
Historic
Croatan
Raleigh
Historic
Manteo
Courthouse
Greensboro
Battlefield
Currie
2001
Ethereal
Laxon
Caulfield
2,400
"
"
1973
Aboriginal
Reys
's
.300
's
95-59
Infrastructure
2005
%
's
limited-access
"
"
"
"
fretboard
gasolines
engine-fouling
oxidative
rubbers
Polymers
ozonolysis
degrades
unzip
"
one-flat
sharps
1400s
1500s
1649
"
"
"
"
"
"
outbound
"
"
1st
mainframes
high-performance
backup
Bahamas
Triangle
Oceans
Sundaland
"
Kandam
Yonaguni
"
Benci
Leonardo
Vinci
Liechtenstein
Liechtenstein
's
USD
's
's
Liechtenstein
2006
sci-fi
Sequences
Mumbai
big-budget
's
Grammy
1973
"
Sting
ragtime
soundtrack
's
's
"
Entertainer
1974
's
forefront
minuets
mazurkas
Waltzes
Ragtime
Debussy
's
1/115
Benaud
4/95
5/114
match-winning
8-12
floodplain
Gully
Goulburn
Cadell
"
's
's
Discernment
's
"
"
"
"
's
's
self-knowledge
jolts
criminology
epistemologies
enquiry
diverged
Intentional
Generalities
reason

Distinguishing
"
"
propositional
"
"
"
"
"
=
"
"
"
Choctawhatchee
's
2007
Ornithological
2005-06
Ivory-billed
Choctawhatchee
1968
Ivory-billed
Wetmore
tourism
Genetics
2007
"
mitochondrial
Indigenous
haplogroups
Haplogroup
mtDNA
"
Bering
DNA
DNA
Amerindian
Geneticists
Postage
postage
Postage
teiids
"
"
lacertids
"
"
Komodo
Varanidae
parthenogenesis
chameleons
xantusiids
reptiles
temperature-dependent
TDSD
hatches
TDSD
crocodiles
tuataras
"
...
reducible
reducible
"
"
Particles
bosons
bosons
Bose-Einstein
Particles
antisymmetric
fermions
antisymmetry
Pauli
fermions
fermions
Fermi-Dirac
Adeimantus
Potone
Speusippus
"
"
Glaucon
Memorabilia
Glaucon
Ariston
Perictione
Ariston
Perictione
LC
LC
thermotropic
metallotropic
Thermotropic
LCs
Thermotropic
LC
Lyotropic
LC
Metallotropic
Centre
Arena
Nelly
Moffatts
Vallis
Overhead
Soderberg
Armchair
Racoons
Atom
Actors
Smoking
Stargate
Battlestar
starburst
Starbursts
starburst-forming
M82
M81
starburst
Innocent
401-417
Zosimus
417-418
Anselm
"
"
Scho

2006
Intel
XScale
Intel
XScale
Marvell
$
Intel
x86
2006
microprocessor
's
random-access
Intel
's
now-familiar
"
"
archaeoastronomy
archaeoastronomical
archaeoastronomers
archaeoastronomy
dissertations
polarise
insurer
Liability
providers
torts
Mirrors
ray-tracing
parabolic
Curved
etymology
Theophrastus
"
Forefather
"
Calendar
Gregorian
"
"
"
Forefathers
Kates
Bayliss
Coat
Whale
cobbler
differentiates
prehistory
awakes
"
"
"
"
"
"
EC
EC
EC
SCF
EC
seasonings
formula_23
formula_24
formula_26
formula_27
formula_28
formula_23
formula_28
formula_32
formula_28
Malthus
Population
arithmetically
genetically
chromosome
interphase
mitotic
mitosis
chromosome
"
chromatids
chromosome
"
"
chromatid
chromosome
Luba
Flores
's
tourism
Flores
Flores
maize
cassava
cashew
Arabica
Coffea
glucose
triglycerides
risperidone
ziprasidone
perphenazine
antipsychotic
meta-analysis
Leucht
Lancet
perphenazine
extrapyramidal
=
trophic
aquaculture
filter-feeding
mollusks
environmentally
Filter-feeders
pollutants
Rooste

UniverCity
Burnaby
Fraser
Undergraduate
tri-semester
Fraser
"
"
J.B.
Nicene
787
tonsure
abbots
1489
Innocent
subdiaconate
vesting
paternal
UniverCity
Burnaby
Fraser
Undergraduate
tri-semester
Fraser
"
"
J.B.
Tempe
ASU
Tempe
14,000
Pac-10
Centers
ASU
Residence
RHA
University-Tempe
ASU
on-campus
1894
coursework
A&M
%
coursework
dissertation
unsanctioned
"
"
Rings
chug
"
"
hailstone
"
Formation
precipitation
cumulonimbus
supercooled
's
upwardly-directed
hailstones
updraft
hailstones
updraft
updrafts
thunderstorm
"
"
graduate-level
"
clausus
II
"
"
"
"
"
"
"
"
"
highly-qualified
post-secondary
"
"
"
1775
Petition
III
Rebellion
"
"
1775
Benedict
Specialized
Degree
Computer
Applied
J.D.
MBA
interdisciplinary
"
"
"
"
"
's
"
Asimov
pm
O'Toole
Request
Triple
's
weeknights
9pm
hosted
Beaton
Midnight
"
Mid-dawn
Graveyard
1am
timeslot
hosted
DJs
mid-dawn
Zan
mid-dawn
timeout
Theismann
tackled
Blackwood
Redskins
Theismann
linebacker
Duhe
47-yard
37-yard
SAT
Psychological
's
"
Knowns
"
Atomists
"
"


DNA
DNA-dependent
polymerases
polymerase
transcribes
genome
subunits
recombination
Holliday
recombination
DNA
DNA
DNA
chromosomes
"
1995
"
"
lysergic
LSD
Schedule
Controlled
Ergotamine
precursor
Diversion
LSD
LSD
LSD
basswood
buckeye
Earlywood
softwoods
latewood
latewood
latewood
1990
's
Bishopric
1992
Lavina
"
"
BYU
"
"
"
"
1993
"
"
breadwinner
McNeilly
"
"
"
"
mid-1960s
"
"
"
"
Panel
"
"
"
"
switchboards
Hayflick
Witkowski
"
"
mutated
's
1979
telecom
telecommunication
telecom
1880
Anglo-Indian
LaserWriter
AppleTalk
AppleTalk
AppleShare
AppleTalk
"
"
OS
OS
AppleTalk
Internet-based
defaults
AppleTalk
compatibility
OS
v10
OS
AppleTalk
AppleTalk
Bont
2003
favourable
"
"
debuted
$
%
's
$
$
negatives
non-digital
photoprinting
Full-spectrum
summoning
symbolising
Usher
MPs
's
BMA
hypnotherapy
hypnotism
genuineness
"
"
hypnotism
Supporters
...
"
CIIR
NGO
non-battle
disappearances
"
"
CIIR
Amnesty
NTSB
terrorist
NTSB
's
variolation
vaccine-preventable
targeted
WHO
Somalia
1988
WHO
TWA
's
summa

Mamluks
reconquest
's
"
Melodie
voci
instrumenti
tre
cinque
sei
1621
cornetts
"
Bavara
trombones
trombone
"
trombone
"
Weiner
landfall
Ignatiev
Storm
"
Ignatiev
Agony
"
1979
Covers
Somoza
"
"
1969
$
"
"
Owl
"
"
Pitkin
"
"
Calvinism
lawgiver
discordant
Calvinism
deists
slew
"
"
Fleetwood
teleconference
2009
've
Fleetwood
"
Nicks
Fleetwood
"
n't
...
non-invasion
"
"
trodden
"
"
Postscript
sinks
Anarchy
Quran
16-The
Ramadhan
favoured
Jellicoe
Beatty
Leicestershire
Brooksby
Dingley
Leicestershire
VAD
's
"
Sold
"
"
"
"
Oddity
"
Dorys
"
1973
's
"
Gnome
drew-up
regimenting
flogging
mathematicians
abacus
"
"
abacus
Abacus
"
"
abacus
abacus
abacus
"
"
"
"
abacus
USMC
F-35B
's
Prowlers
F-35B
Ship-borne
Vertical
SRVL
SRVL
F-35Bs
's
conscripts
Cannae
Hispania
free-stream
airfoil
simplifications
purine
pyrimidine
transversion
purine
vice-versa
purines
degenerate
fault-tolerance
mutation
hydrophilicity
codon
=
recusants
XIII
oleic
stearic
LDL
Consuming
LDL
Karolinska
Stockholm
"
"
2009
RF
Amplifiers

top-ranked
Sumerian
"
"
"
Sumerian
Enki
riverbeds
"
Enki
hieros
Ki/Ninhursag
constellation
Restorer
Enki
alaph
"
"
"
"
"
"
"
Kabbalistically
's
's
's
's
's
's
"
"
footpaths
Iulia
night-time
"
publicae
praetoriae
"
"
transgender
transvestite
cross-dress
's
cross-dressing
transsexual
sprint
wrestler
's
disqualification
restart
2002
2006
2002
knockout
scanner
telecine
's
post-production
Digital
"
"
n't
kart
kart
kart
themed
"
Kart
Mode
graphics
isolates
Audio
balun
referenced
Blantyre
"
"
pollinated
agronomic
Over-pollination
Pest
uncharacteristic
"
producible
magnetron
magnetron
centimetric
radars
Telecommunications
Establishment
magnetron
ground-mapping
codenamed
redeveloped
Ballarat
surges
whilst
Ballarat
Centre
Ballarat
hawthorns
hawthorn
allozyme
hawthorn
interbreeding
4-6
hybridization
sympatric
hawthorn
"
Tans
"
"
Black-and-Tan
"
Anti-Treaty
"
na
"
IRA
Listowel
Attributed
Smyth
somatotopic
octopuses
octopuses
Octopuses
condo
Centre
Centre
CCMA
CCMA
marginalized
website
predation
fo

Caeiro
Reis
impermanence
lateral
lateral
fricatives
IPA
fricatives
approximants
diacritic
"
"
approximants
wormhole
's
2005
's
2010
2005
Backness
backness
formant
F2
hyena
's
's
"
THX
1138
"
"
's
"
"
1138
2009
"
"
Skywalker
THX
1979
Toto
"
"
"
axiomatized
"
"
neo-logicism
"
"
$
"
's
's
$
Potential
"
"
Bros.
Lin
Eidos
2008
reboot
1856
"
"
"
"
"
"
"
"
"
Architecture
massively
Deco
Nouveau
's
Cyberpunk
Schumacher
"
"
pre-American
"
"
manatees
conservationists
Manatees
"
"
MV
MV
NASA
Shuttle
Rocket
manatee
inhabits
Banana
1973
Remarkably
"
"
sought-after
Toy
Bandai
2004
2007
Aoshima
Takara
"
"
Takara
Thunderbird
motorised
"
"
retooled
's
riffs
scripted
"
"
riffs
riffs
Morrill
"
"
Edmunds-Tucker
Persecution
secularising
1989
1992
2004/2005
felonies
Penal
"
"
HM
II
Monetary
banknotes
Dockyard
's
%
's
's
$
2007
91,477
GDP
Juans
"
"
soundtracks
Vitaphone
Bros.
Vitaphone
"
"
J.C.
Mannheim
Mannheim
buffa
impounded
's
II
Site
"
"
"
"
Cage
Ono
Marclay
Winant
Tiers
Kosugi
's
popularizing
DGC
2000-2

1973
spearheaded
Gandhi
well-monitored
tripling
tigers
1,200
1990s
Sandusky
Rockne
Dorais
Rockne-coached
passer
in-stride
35-13
Dorais-to-Rockne
downfield
FARC-EP
Rios
FARC-EP
seven-man
$
backup
backup
Capsizing
Reciprocity
"
"
"
"
U-571
radioman
SOS
S-33
crewman
Cornish
's
2008
2009
Cornish
Cornish
Bard
Beeman
Englysshe
speche
speake
worde
Cornyshe
1549
Uniformity
Paddy
Bloodgood
drop-kickers
Bloodgood
NFL
drop-kicked
offs
flyball
Barmes
Pujols
2007
Holliday
dominoes
signalled
"
"
"
dominoes
"
"
"
domino
"
Freestyle
freestyle
's
freestyle
1980s
1990s
songstress
1990
"
"
Caleb-B
Enriquez
Factor
Buffy
Daize
DJ
"
Campain
Campain
Santos
Romo
IV
Alternative
oxidases
eukaryotic
much-studied
NADH
oxidize
cytosol
mitochondrial
ubiquinone
ubiquinone
electrochemical
"
oxidase
BC
Crowsnest
Vermilion
Kicking
Trans-Canada
Alberta
Yellowhead
Khusraw
Jahangir
amnesty
Mughal
Jahangir
Jahangir
"
"
lesbian
's
Officially
clergy-performed
Network
Covenant
Presbyterians
1997
"
"
Altarpiece
altarpieces
pre

972
papacy
Benedict
interventions
papacy
962
Charlemagne
Frankish
Papacy
Kingdoms
Trans-Jurane
Eucalyptus
Silent
Hawkinson
Calit2
Mandeville
"
"
"
"
"
"
"
magnesium
strontium
"
"
's
's
academia
precludes
's
1994
1994
demos
A&R
1994
Studios
"
"
Herberts
IV
III
Antwerp
III
II
's
Mortimer
presumptive
disguising
Suspicious
Blight
seizes
Lab
laser
Blight
inquisitorial
software
software
non-free
software
software
lobbying
software
copyright
software
2006
3,390
software
Dr
Villanueva
Rodrich
deploy
antisemitism
Usage
prefix
"
"
"
"
"
"
Semitic
Assyrians
Semitic
"
"
low-quality
providers
Ansolabehere
Lessem
endorsements
propensity
USA
Psychiatrists
Psychological
Psychological
Psychiatrists
NARTH
quantify
self-labeling
Baxter
Chitto
BIA
Vitamin
's
's
pasteurized
Vitamin
Vitamin
Vitamin
"
"
ppb
ppb
"
photocopiers
"
"
"
"
dimethyl
phytoplankton
eradicated
Licinius
"
"
"
's
"
commemorative
relics
MP
832
Kew
Marianne
2009
"
"
saponify
Lutefisk
lye
lye
harina
lye
Appert
"
"
Free-State
Lemen
"
"
Puce

2005
2011
Sudan
's
>
>
>
>
>
>
Wood-inspired
preassigned
"
"
2006
"
's
"
2005
"
Bra
2004
2007
"
"
's
"
"
bike
http://www.cedartrailspartnership.org/metro_trail_guide.pdf
Blackhawks
Hockey
Arena
Northwoods
Riverfront
Visitor
website
's
1995
"
"
HTML
implementations
Request
HTML
HTML
IETF
1996
software
Web
"
"
's
detaining
Preventive
's
"
"
Muresk
PhD
AINSE
PhD
Murdoch
's
2.27
kilometres
's
Cowan
skunk
"
"
BBC
's
programme
Nutt
"
"
's
ACMD
"
...
"
Elephant
Southwold
Etonian
Gow
Pitter
Portobello
mediator
's
assistive
"
"
buzzer
"
"
"
Ming
...
"
Pei
Ming
Anhui
Suzhou
Pei
Pei
Tsuyee
2002
Anaheim
Sharks
Gretzky
L.A.
Gretzky
Gordie
's
goal-scoring
playoffs
Refsnes
2005
2006
woodcut
"
"
mennesker
ensomme
8.1
NOK
1.27
USD
2008
"
"
Theodotion
Latin-speaking
Vulgate
"
"
canonical
Septuagint
Esdras
"
Esdras
Esdras
kickoffs
Bowls
TD
4-straight
XXV-XXVIII
rusher
31-of-50
interception
Nesmith
's
non-studio
"
Monkees
"
"
singles
"
"
's
Monkees
's
alleviates
level-triggered
routing
in-band
synchroniza

telecommunications
mid-1987
's
transmissions
direct-dial
Beijing
Hohhot
Guangzhou
Intelsat
2008
384
Luna
regolith
2008
electromagnetic
Raven
raven
Raven
Raven
Maya
Maya
12.25
Semi-final
Edgbaston
Cronje
earpiece
Woolmer
Talat
1999
's
scorer
Zimbabwe
2-0
Rayleigh
"
"
arithmetically
arithmetically
Rayleigh
10-5
$
Enraged
Peckinpah
"
"
Critics
soured
1988
Peckinpah
's
hailing
's
Filmmakers
's
Libraries
's
"
"
"
"
Idealism
fortuitous
rule-based
Bravanese
Bravanese
Somalis
Sunni
Shafi
Shia
Somalia
Somali
sharia
aqaurist
ammonia/nitrite
cardinalfish
Chromis
reef
anthias
reef
Damsels
Chromis
Havok
Northstar
Iceman
Iceman
Northstar
Iceman
's
Havok
Iceman
rekindle
Havok
's
three-layered
membranes
dura
spinal
PNS
CNS
axon
PNS
neurons
spinal
PNS
god-son
Yat-sen
II
Linebarger
Instrumentality
Seele
"
"
Anno
"
Leonardo
Vinci
's
's
peripatetic
basil
nationalities
flavours
"
debuted
Piece
Lennon
Lene
's
"
Lobster
Ono
Ono
garnish
bok
kai-lan
Stir-frying
"
"
southwestern
Caspian
Iranian
Aden
tectonic
"


cosmicism
atheism
Machen
2009
stand-alone
stand-alone
DVD
50-page
Blu-ray
Bros.
Digibook
Digital
1985
2004
"
Silent
"
speechwriter
b.
b.
"
"
Climate
Championships
Henie
hinder
orchestrated
Torpedo
spar
self-propelled
battleships
Luppis
Blaz
Rijeka
Austria-Hungary
"
pathway
pesticide
evolvability
nonphysical
materialist
mind-brain
Smart
Churchland
Occam
minotaur
ogre
Sadly
"
"
forehand
forehand
cross-court
left-handers
smashes
left-hander
angled
Skrull
Galactus
Kree
Surfer
Surfer
Cosmic
unconsciousness
Elders
restart
Galactus
Mantis
Surfer
battled
singles
singles
Smashing
singles
smasher
singles
forehand
full-length
Vertical
shuttlecock
shuttlecock
"
"
shuttlecock
forecourt
Linux
Stef
Puppy
server
Puppy
toothy
server
Puppy
unworldly
Quake
's
fades
lovingly
Filming
USC
UC
fumigation
smoke-screen
preservative
slows
back-reaction
's
electromagnetic
mediated
photon
photons
photons
cannabis
hashish
cannabis
vaporised
amphoteric
Drama
Awards
d'Or
's
Quick/TV
Awards
TRIC
Awards
Drama
Fans
Jowe

memory-to-memory
register-to-register
variants
ADD
variants
18-bit
"
ADDM
"
"
ADDB
"
"
's
fletching
's
fletching
buildups
fletching
headstock
Sustainiac
handheld
EBow
inductor
vibrate
bow-like
footswitch-controlled
fifths
highest-pitched
viola
luthier
IV
clericalism
monarchy
's
disadvantaged
's
Campuses
shuttle
Campuses
northeastern
UM
Edsel
ceded
II
's
trident
flanks
chariots
pediment
pediments
438
steersman
straighter
steersmen
Oars
oar
"
Hadassah
CAA
's
Mor-Yosef
"
"
thinners
's
demolish
$
Pataki
Wye
Elgar
USA
Elgar
Peyton
Bantock
"
Togo
Mosquito
unaffordable
Nets
voucher
Edgeworth
2007
Eveleigh
Darlington
Redfern
neighbouring
1998-99
55,200
literacy
Academics
's
Sylvanus
homework
homework
collaboratively
cadets
commissioning
Lieutenants
purview
scrutinise
lama
Vajrayana
Buddhahood
Vajrayana
ACLU
Wiener
1992
1997
's
Population
Tourism
tourism
place-based
colonization
1998-99
55,200
literacy
resupplying
ARA
"
Westland
HAS
HMS
Antrim
Argentine
HMS
"
Westland
HAS
.1
HMS
Brilliant
Westl

's
Rolla
Militarily
Affair
blockaded
sleight-of-hand
casinos
"
"
casinos
well-done
Healing
induct
Absent
quasi-physical
ectoplasm
...
"
"
ff
themed
touchscreen
Deception
Expert
slicing
shuttlecock
downwards
episcopacy
Parliamentary
's
"
Discipline
"
1641
"
Prelatical
"
defences
presbyterian
"
"
's
"
Deception
Expert
slicing
shuttlecock
downwards
Interzonal
Candidates
Candidates
Candidates
Candidates
multi-round
Fischer
1969
Championship
's
Wii
Xbox
PC
console
Xbox
online
downloadable
Xbox
graphics
Nintendo
"
Hottest
"
Wii
Wii
Citizenship
agoge
agoge
Trophimoi
"
Xenophon
trophimoi
electromagnets
Mapping
visualizing
stemming
technologies
psychoacoustics
"
XEmacs
GNU
Lisp
"
Unicode
XEmacs
2005
unmaintained
Mule-UCS
Unicode
XEmacs
Unicode
2002
Mule
2005
bookmakers
non-sports
competitions
"
Interactive
"
"
Parimutuel
greyhound
Wagering
parimutuel
bookmakers
prosody
moras
diphthong
elision
%
expatriate
remittance
expatriates
US$
2007
CCASG
Dubai
US$
2007
's
Kuwait
%
expatriate
remittance
exp

ascertaining
finalised
's
immersing
Galileo
"
Nirvana
"
"
dream-like
sleepiness
self-report
"
"
Crichton
"
"
10pm
2am
Crichton
Crichton
2008
chemotherapy
CAT
's
"
Sleepy
Ingmar
"
"
Pratchett
"
"
"
Discworld
"
"
"
"
Discworld
"
2003
left-arm
right-arm
clasp
retched
obeisance
Chimpanzees
gibbons
bipedalism
chimpanzees
bipedalism
bipedal
Bonobos
baboons
wading
macaque
chimps
Poko
bipedally
bipedalism
10:9
scrip
remission
relapse
's
"
Grief
"
N.W.
Mordecai
terminally
plantations
's
cowtowns
Bronte
Poteet
Oran
Bagwell
shoots
snakebite
"
"
Dann
"
"
mitzvah
n't
"
Asimov
"
Macau
casinos
Cotai
Ponte
casinos
's
9.1
Macau
18.7
2005
2006
%
Dragoon
Lappeenranta
Uusimaa
Dragoons
bacteriologic
special-ops
dragoons
Namakura
's
sieges
sieges
Liege
Namur
Maubeuge
Antwerp
Liege
Maubeuge
Antwerp
Anglo-Belgian
Samu
's
Hussein
Jordanian
Hussein
north-eastern
Golan
legalism
unhistoric
"
"
redaction
edict
deuterosis
machined
petrol
petrol
Nader
spoiler
"
"
Nader
's
Nader
Nader
"
"
FAO
Framework
FAO
inter-agen

Utilities
pronominal
Subordinate
Maharashtra
MSRTC
intercity
Mumbai
Maharashtra
Mumbai
Mumbai
BRTS
Bus
Mumbai
2009
%
's
Mumbai
Mumbai
Vector
"
"
parametrized
topological
"
"
"
"
"
"
cotangent
cotangent
manifolds
"
"
monarchy
consuls
Tarquinius
monarchy
hoax
ca
monarchy
Ilex
"
"
"
"
"
sweetened
flavour
near-boiling
flavour
2007
"
Scorsese
's
2009
Beatles
Lynne
McCartney
's
Hanks
womanizer
chauvinist
diatribes
's
stereotypical
businesswoman
promotions
Maximiliano
Rochac
Directorate
Matanza
Rodolfo
1856
"
"
"
"
Eduardo
's
Claes
57th
Wesselmann
Rosenquist
Castelli
Warhol
"
"
Actor
"
Producers
"
's
"
Kelley
1989
Reagans
high-fashion
benefiting
1992
Reagans
$
Milgram
"
Holocaust
Holocaust
"
"
"
's
"
"
learner
Palestinians
"
Zionist
Palestinian
"
Intifada
1987-1993
uprising
Followed
"
"
hornbook
well-accepted
hornbook
2007
software
spammers
spamming
ISPs
acceptable-use
spammers
spam
Usenet
spam
servers
Spammers
"
"
BCG
pox
Adverse
BCG
immunocompromised
SCID
life-threatening
immunizations
2007

neodymium
reddish-purple
trichromatic
color-change
selenium
neodymium
"
"
fractioned
's
EU
EU
Bundesrat
Seat
"
mac
"
"
mac
"
Macpherson
"
"
Gaelic
's
Horner
=
IEEE
system-dependent
floating-point
galleon
II
's
VI
XXXIII
Substantively
's
overseen
videos
Moonraker
"
"
's
"
"
Romance
"
"
's
"
"
Raider
"
Ninja
falsified
Hellenistic
"
"
foretelling
Papias
martyrologies
Cc
SMTP
codice_4
server
Ok
codice_5
<CR>
Hizb-ut-Tahrir
aggravating
offence
racially-aggravated
2004
manifestos
2007
%
elects
Riigikogu
Estonian
Estonian
"
peepers
ladybugs
houseflies
"
"
Cityscape
's
Widely
icons
laminar
time-periodic
Synagogue
Revival
Tasmanian
's
's
's
in-home
tenet
"
"
plural
polytheism
1:26
Elohim
creepeth
Khazar
Jewry
Koestler
monarchy
cannons
cannons
flintlock
cannons
linstock
touchhole
breech
cavitation
"
nucleate
undissolved
pre-existing
's
Stza
Lyrics
's
Linx
"
Kool
Rap
popularize
touted
partying
popularizing
Nas
Notorious
Jay-Z
Mus
credential
1980s
diplomas
glucose
commercialized
Komppa
Pharmaceuti

## Evaluate results
In this section we take the cosine similarity of the both target vectors generated in above step for each instance of SCWS test. And compare the results with human scores in the SCWS test. 

In [113]:
from sklearn.metrics.pairwise import cosine_similarity

def evaluate_results():
    '''This function is used to evaluate the results, i.e., first take the cosine similarity of both target words 
    in each instance of SCWS test set and compare the predicted similarity with human scores in the test set. This 
    function returns the average difference between the predictited similaty and human scores. It also returns the lists 
    of predicted similarity and human score'''
    predicted_similarity  = []
    for i in range(len(scws_data.keys())):
        score = cosine_similarity(target_list1[i].reshape(1,-1),target_list2[i].reshape(1,-1))
        predicted_similarity.append(score[0][0])
        
    human_scores = []
    for data in scws_data.values():
        score = data['human_score']

        human_scores.append(float(score))
        
    avg_diff=0

    for i in range(len(human_scores)):
        d = abs(predicted_similarity[i]-human_scores[i]/10)


        avg_diff+= d
    avg_diff = avg_diff/len(human_scores)
    
    return avg_diff,human_scores,predicted_similarity


In [114]:
avg_diff,human_scores,predicted_similarity = evaluate_results()

### The average difference
Note that the everage difference here is 0.27, which means our predicted similarities are closer to the human scores. Also note that while taking avg difference all the difference values were positive (i.e., absolute values were taken) therefore they do not cancel each other out

In [116]:
avg_diff

0.276063709477444