 <div style="background-color: #99CD4E; text-align:center; vertical-align: middle; padding:40px 0;"> 
  <h1 style="color: white;"> *Incorporating semantics of words* </h1>.
 </div>
 
 # References
 
[NLTK Wordnet Interface documentation](http://www.nltk.org/howto/wordnet.html)
 
 # Why is this required ?
 
 - Vector space representation does not incorporates the semantic similarity of words

<div style="background-color: #99CD4E; padding:5px 0;"> 
  <h2 style="color: white;"> *Meaning, Synonym and Hyponymy* </h1>
 </div>

- *Meaning* the idea that is represented by a word, phrase, etc
- *Synonym* is a word or phrase that means exactly or nearly the same as another word or phrase in the same language. A set of synonym is synset 
- *Hyponymy* shows the relationship between a generic term (hypernym, eg. colour) and a specific instance of it (hyponym, eg., red). 

# Find meaning, synonymy and hyponymy 

>WordNet is a lexical database for the English language.[1] It groups English words into sets of synonyms called synsets, provides short definitions and usage examples, and records a number of relations among these synonym sets or their members. WordNet can thus be seen as a combination of dictionary and thesaurus. 

* Synonyms are grouped together in something called Synset
* A synset contains lemmas, which are the base form of a word
* There are hierarchical links between synsets (ISA relations or hypernym/hyponym relations)
* Several other properties such as antonyms or related words are included for each lemma in the synset



# Meaning

In [2]:
from nltk.corpus import wordnet as wn
syns = wn.synsets("goat")
print(syns[0].definition())

any of numerous agile ruminants related to sheep but having a beard and straight horns


# Synonymy

In [14]:
for synset in wn.synsets('good'):
    print(synset.definition())
    print(synset.pos(), " [", ", ".join([l.name() for l in synset.lemmas()]), "]" )

benefit
n  [ good ]
moral excellence or admirableness
n  [ good, goodness ]
that which is pleasing or valuable or useful
n  [ good, goodness ]
articles of commerce
n  [ commodity, trade_good, good ]
having desirable or positive qualities especially those suitable for a thing specified
a  [ good ]
having the normally expected amount
s  [ full, good ]
morally admirable
a  [ good ]
deserving of esteem and respect
s  [ estimable, good, honorable, respectable ]
promoting or enhancing well-being
s  [ beneficial, good ]
agreeable or pleasing
s  [ good ]
of moral excellence
s  [ good, just, upright ]
having or showing knowledge and skill and aptitude
s  [ adept, expert, good, practiced, proficient, skillful, skilful ]
thorough
s  [ good ]
with or in a close or intimate relationship
s  [ dear, good, near ]
financially sound
s  [ dependable, good, safe, secure ]
most suitable or right for a particular purpose
s  [ good, right, ripe ]
resulting favorably
s  [ good, well ]
exerting force or influe

# Hyponymy

In [20]:
panda = wn.synset("panda.n.01")
hyper = lambda s: s.hypernyms()
list(panda.closure(hyper))

[Synset('procyonid.n.01'),
 Synset('carnivore.n.01'),
 Synset('placental.n.01'),
 Synset('mammal.n.01'),
 Synset('vertebrate.n.01'),
 Synset('chordate.n.01'),
 Synset('animal.n.01'),
 Synset('organism.n.01'),
 Synset('living_thing.n.01'),
 Synset('whole.n.02'),
 Synset('object.n.01'),
 Synset('physical_entity.n.01'),
 Synset('entity.n.01')]

# Similarity between words

Using wordnet we can find similarity between words

> synset1.path_similarity(synset2): Return a score denoting how similar two word senses are, based on the shortest path that connects the senses in the is-a (hypernym/hypnoym) taxonomy. The score is in the range 0 to 1. 

In [23]:
man = wn.synset('man.n.01')
women = wn.synset('woman.n.01')
king = wn.synset('king.n.01')
queen = wn.synset('queen.n.01')

In [27]:
my_items = [man, women, king, queen]

In [28]:
list_of_pairs = [(p1, p2) for p1 in my_items for p2 in my_items if p1 != p2]

In [32]:
for (a, b) in list_of_pairs:
    print(a, ' ', b)
    print(a.path_similarity(b))

Synset('man.n.01')   Synset('woman.n.01')
0.3333333333333333
Synset('man.n.01')   Synset('king.n.01')
0.16666666666666666
Synset('man.n.01')   Synset('queen.n.01')
0.1111111111111111
Synset('woman.n.01')   Synset('man.n.01')
0.3333333333333333
Synset('woman.n.01')   Synset('king.n.01')
0.16666666666666666
Synset('woman.n.01')   Synset('queen.n.01')
0.1111111111111111
Synset('king.n.01')   Synset('man.n.01')
0.16666666666666666
Synset('king.n.01')   Synset('woman.n.01')
0.16666666666666666
Synset('king.n.01')   Synset('queen.n.01')
0.1
Synset('queen.n.01')   Synset('man.n.01')
0.1111111111111111
Synset('queen.n.01')   Synset('woman.n.01')
0.1111111111111111
Synset('queen.n.01')   Synset('king.n.01')
0.1


 <div style="background-color: #99CD4E; text-align:center; vertical-align: middle; padding:40px 0;"> 
  <h1 style="color: white;"> *The End* </h1>.
 </div>