# WordNet

NLTK proves a Python interface to WordNet. 

WordNet is a lexical database for English. As described on [the home page](https://wordnet.princeton.edu/): Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms called synsets that each express a distinct concept. Synsets are interlinked by semantic and lexical relations. 

Words are linked in various relations, including:
* hypernym (higher) --  canine is a hypernym of dog
* hyponym (lower) -- a dog is a hyponym of canine
* meronym (part of) --  wheel is a meronym of car
* holonym (whole) -- car is a holonym of wheel


This notebook looks at some ways to use WordNet. First, we need to import it. 

In [1]:
from nltk.corpus import wordnet as wn

## synsets

We can look up synsets for a word. There is an optional POS parameter: VERB, NOUN, ADJ, or ADV.  A synset has a 3-part name: word.pos.nn. As we see below, there are 7 noun synsets and one verb synset.

In [2]:
wn.synsets('dog')

[Synset('dog.n.01'),
 Synset('frump.n.01'),
 Synset('dog.n.03'),
 Synset('cad.n.01'),
 Synset('frank.n.02'),
 Synset('pawl.n.01'),
 Synset('andiron.n.01'),
 Synset('chase.v.01')]

In [3]:
# narrow it down to verb
wn.synsets('dog', pos=wn.VERB)

[Synset('chase.v.01')]

### synset information

We can extract definitions, usage examples, and lemmas from a synset.

In [4]:
# get a definition
wn.synset('dog.n.01').definition()

'a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds'

In [5]:
# get usage examples
wn.synset('dog.n.01').examples()

['the dog barked all night']

In [6]:
# get lemmas for each sense in the synsets
# a sense in WordNet is one meaning of the word
# a lemma in WordNet is one dictionary entry
# the lemmas within one synset are synonyms
for sense in wn.synsets('dog'):
    lemmas = [l.name() for l in sense.lemmas()]
    print("Sense " + sense.name() + " lemmas: " + str(lemmas))

Sense dog.n.01 lemmas: ['dog', 'domestic_dog', 'Canis_familiaris']
Sense frump.n.01 lemmas: ['frump', 'dog']
Sense dog.n.03 lemmas: ['dog']
Sense cad.n.01 lemmas: ['cad', 'bounder', 'blackguard', 'dog', 'hound', 'heel']
Sense frank.n.02 lemmas: ['frank', 'frankfurter', 'hotdog', 'hot_dog', 'dog', 'wiener', 'wienerwurst', 'weenie']
Sense pawl.n.01 lemmas: ['pawl', 'detent', 'click', 'dog']
Sense andiron.n.01 lemmas: ['andiron', 'firedog', 'dog', 'dog-iron']
Sense chase.v.01 lemmas: ['chase', 'chase_after', 'trail', 'tail', 'tag', 'give_chase', 'dog', 'go_after', 'track']


In [7]:
# we can also find the synset of a lemma
wn.lemma('dog.n.01.dog').synset()

Synset('dog.n.01')

### hierarchy

WordNet organizes words in a "is-a" hierarchy that can be traversed. 

In [8]:
dog = wn.synset('dog.n.01')
# hypernyms() are superclasses
print('hypernyms: ', dog.hypernyms())
# hyponyms() are subclasses (more specific)
print('holonyms: ', dog.member_holonyms())
# find the root hypernym
print('root hypernyms: ', dog.root_hypernyms())
# find the lowest common hypernym of dog and cat
print('dog and cat hypernym: ', dog.lowest_common_hypernyms(wn.synset('cat.n.01')))

hypernyms:  [Synset('canine.n.02'), Synset('domestic_animal.n.01')]
holonyms:  [Synset('canis.n.01'), Synset('pack.n.06')]
root hypernyms:  [Synset('entity.n.01')]
dog and cat hypernym:  [Synset('carnivore.n.01')]


In [9]:
# look at all the noun synsets
i = 1
for synset in list(wn.all_synsets('n')):
    print(synset)
    # comment out lines below to see all of them instead of 10
    if i > 10:
        break
    i += 1

Synset('entity.n.01')
Synset('physical_entity.n.01')
Synset('abstraction.n.06')
Synset('thing.n.12')
Synset('object.n.01')
Synset('whole.n.02')
Synset('congener.n.03')
Synset('living_thing.n.01')
Synset('organism.n.01')
Synset('benthos.n.02')
Synset('dwarf.n.03')


In [10]:
# walk up the hierarchy for dog
hyp = dog.hypernyms()[0]
top = wn.synset('entity.n.01')
while hyp:
    print(hyp)
    if hyp == top:
        break
    if hyp.hypernyms():
        hyp = hyp.hypernyms()[0]

Synset('canine.n.02')
Synset('carnivore.n.01')
Synset('placental.n.01')
Synset('mammal.n.01')
Synset('vertebrate.n.01')
Synset('chordate.n.01')
Synset('animal.n.01')
Synset('organism.n.01')
Synset('living_thing.n.01')
Synset('whole.n.02')
Synset('object.n.01')
Synset('physical_entity.n.01')
Synset('entity.n.01')


In [11]:
# walk up the hierarchy for dog using a closure
hyper = lambda s: s.hypernyms()
list(dog.closure(hyper))

[Synset('canine.n.02'),
 Synset('domestic_animal.n.01'),
 Synset('carnivore.n.01'),
 Synset('animal.n.01'),
 Synset('placental.n.01'),
 Synset('organism.n.01'),
 Synset('mammal.n.01'),
 Synset('living_thing.n.01'),
 Synset('vertebrate.n.01'),
 Synset('whole.n.02'),
 Synset('chordate.n.01'),
 Synset('object.n.01'),
 Synset('physical_entity.n.01'),
 Synset('entity.n.01')]

#### What is a closure?

Speaking generally, a closure in Python is a special type of object that combines a function and the environment in which that function was created. This allows a function to retain information that would normally be out of scope.

The first line above creates a lambda function, "hyper", that returns the hypernym of a synset. The second line calls the method "closure" which is defined in nltk. 


In [13]:
# another closure example
cat = wn.synset('cat.n.01')
list(cat.closure(lambda s: s.hypernyms()))

[Synset('feline.n.01'),
 Synset('carnivore.n.01'),
 Synset('placental.n.01'),
 Synset('mammal.n.01'),
 Synset('vertebrate.n.01'),
 Synset('chordate.n.01'),
 Synset('animal.n.01'),
 Synset('organism.n.01'),
 Synset('living_thing.n.01'),
 Synset('whole.n.02'),
 Synset('object.n.01'),
 Synset('physical_entity.n.01'),
 Synset('entity.n.01')]

### similarity

Similarity measures range from 0 (no similarity) to 1 (identity).

In [14]:
# similarity of 'dog' and 'cat'
dog = wn.synset('dog.n.01')
cat = wn.synset('cat.n.01')
dog.path_similarity(cat)

0.2

In [15]:
# similarity of 'laugh' and 'giggle'
laugh = wn.synset('laugh.v.01')
giggle = wn.synset('giggle.v.01')
laugh.path_similarity(giggle)

0.5

In [16]:
# another similarity measure is the Wu-Palmer metric 
#    which is based on the depth of the two sensens in the taxonomy 
#    and that of their most specific common ancestor node.
wn.wup_similarity(dog, cat)

0.8571428571428571

### morphology

Morphy lets you look up other morphological forms of a word.

In [17]:
wn.morphy('laughed', wn.VERB)

'laugh'

In [18]:
# find derivations of laugh
laugh = wn.lemma('laugh.v.01.laugh')
laugh.derivationally_related_forms()

[Lemma('laugh.n.02.laugh'),
 Lemma('laugh.n.01.laugh'),
 Lemma('joke.n.01.laugh'),
 Lemma('amusing.s.02.laughable'),
 Lemma('laugher.n.01.laugher')]

In [19]:
laugh.antonyms()

[Lemma('cry.v.02.cry')]

### finding lemmas

Another way to find lemmas.

In [20]:
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
lemmatizer.lemmatize("better")

'better'

In [21]:
# add pos to get better results
lemmatizer.lemmatize("better", pos="a")

'good'

### Lesk algorithm

NLTK also has an implementation of the Lesk algorithm for WSD. The function returns the synset with the highest number of overlapping words between the context sentence and the definitions in each synset for the target word. You can provide a pos argument as well as the target word.

In [22]:
from nltk.wsd import lesk

In [23]:
# look at the definitions for 'bank'
for ss in wn.synsets('bank'):
    print(ss, ss.definition())

Synset('bank.n.01') sloping land (especially the slope beside a body of water)
Synset('depository_financial_institution.n.01') a financial institution that accepts deposits and channels the money into lending activities
Synset('bank.n.03') a long ridge or pile
Synset('bank.n.04') an arrangement of similar objects in a row or in tiers
Synset('bank.n.05') a supply or stock held in reserve for future use (especially in emergencies)
Synset('bank.n.06') the funds held by a gambling house or the dealer in some gambling games
Synset('bank.n.07') a slope in the turn of a road or track; the outside is higher than the inside in order to reduce the effects of centrifugal force
Synset('savings_bank.n.02') a container (usually with a slot in the top) for keeping money at home
Synset('bank.n.09') a building in which the business of banking transacted
Synset('bank.n.10') a flight maneuver; aircraft tips laterally about its longitudinal axis (especially in turning)
Synset('bank.v.01') tip laterally
Sy

In [24]:
sent1 = 'I went to the bank to desposit money'.split()
s = lesk(sent1, 'bank', 'n')
print(s)
print(s.definition())

Synset('savings_bank.n.02')
a container (usually with a slot in the top) for keeping money at home


In [25]:
sent2 = 'I went to the river bank to see the flood'.split()
s = lesk(sent2, 'bank', 'n')
print(s)
print(s.definition())

Synset('bank.n.07')
a slope in the turn of a road or track; the outside is higher than the inside in order to reduce the effects of centrifugal force


In [26]:
# senses of 'able'
for ss in wn.synsets('able'):
    print(ss, ss.definition())

Synset('able.a.01') (usually followed by `to') having the necessary means or skill or know-how or authority to do something
Synset('able.s.02') have the skills and qualifications to do things well
Synset('able.s.03') having inherent physical or mental ability or capacity
Synset('able.s.04') having a strong healthy body


In [27]:
sent3 = 'People should be able to think independently'.split()
print(lesk(sent3, 'able'))

Synset('able.s.02')


In [28]:
# now try with a specific pos
print(lesk(sent2, 'able', pos='a'))

Synset('able.a.01')
