In [7]:
%doctest_mode

from IPython.display import Image


Exception reporting mode: Plain
Doctest mode is: ON


# WordNet

WordNet is a lexical database for the English language:
- groups English words into sets of synonyms called *synsets*; 
- provides short, general definitions (glosses);
- records the various semantic relations between these synonym sets (antonymy, meronymy etc. - POS specific)

# Structure of WordNet

Relations between synsets are POS specific (noun, verb, adjective).

For nouns we have:
- hypernyms: (canine is a hypernym of dog, because every dog is a member of the larger category of canines)
- hyponyms: (dog is a hyponym of canine)
- coordinate terms: (wolf is a coordinate term of dog, and dog is a coordinate term of wolf)
- holonym: (building is a holonym of window)
- meronym: (window is a meronym of building)

In [3]:
Image(url="https://courses.washington.edu/hypertxt/cgi-bin/book/maps/imaginediag.gif")

# NLTK and WordNet

- NLTK is very useful for accessing the information in WordNet

In [4]:
# we need to import wordnet first
from nltk.corpus import wordnet as wn

# Working with synsets

In [5]:
# retrieve all the synsets for a word
wn.synsets('motorcar')

[Synset('car.n.01')]

- the name of a synset is structured as follows: **word**.**POS**.**sense number**

In [8]:
# retrieve all the synsets for a word and a POS
wn.synsets('suppose', wn.VERB)

[Synset('suppose.v.01'), Synset('think.v.02'), Synset('speculate.v.01'), Synset('presuppose.v.01'), Synset('presuppose.v.02')]

In [9]:
# access directly information about a synset. 
wn.synset('car.n.01').lemma_names()

['car', 'auto', 'automobile', 'machine', 'motorcar']

In [10]:
wn.synsets('car')

[Synset('car.n.01'), Synset('car.n.02'), Synset('car.n.03'), Synset('car.n.04'), Synset('cable_car.n.01')]

In [11]:
wn.lemmas('car')

[Lemma('car.n.01.car'), Lemma('car.n.02.car'), Lemma('car.n.03.car'), Lemma('car.n.04.car'), Lemma('cable_car.n.01.car')]

In [12]:
l = wn.synsets("banks")
l

[Synset('banks.n.01'), Synset('bank.n.01'), Synset('depository_financial_institution.n.01'), Synset('bank.n.03'), Synset('bank.n.04'), Synset('bank.n.05'), Synset('bank.n.06'), Synset('bank.n.07'), Synset('savings_bank.n.02'), Synset('bank.n.09'), Synset('bank.n.10'), Synset('bank.v.01'), Synset('bank.v.02'), Synset('bank.v.03'), Synset('bank.v.04'), Synset('bank.v.05'), Synset('deposit.v.02'), Synset('bank.v.07'), Synset('trust.v.01')]

In [13]:
l[0].definition()

'English botanist who accompanied Captain Cook on his first voyage to the Pacific Ocean (1743-1820)'

In [14]:
l[0].examples()

[]

In [15]:
# how to display all the definitions
for i, synset in enumerate(wn.synsets("banks")):
    print("Definition for sense", i + 1, "is", synset.definition())

Definition for sense 1 is English botanist who accompanied Captain Cook on his first voyage to the Pacific Ocean (1743-1820)
Definition for sense 2 is sloping land (especially the slope beside a body of water)
Definition for sense 3 is a financial institution that accepts deposits and channels the money into lending activities
Definition for sense 4 is a long ridge or pile
Definition for sense 5 is an arrangement of similar objects in a row or in tiers
Definition for sense 6 is a supply or stock held in reserve for future use (especially in emergencies)
Definition for sense 7 is the funds held by a gambling house or the dealer in some gambling games
Definition for sense 8 is a slope in the turn of a road or track; the outside is higher than the inside in order to reduce the effects of centrifugal force
Definition for sense 9 is a container (usually with a slot in the top) for keeping money at home
Definition for sense 10 is a building in which the business of banking transacted
Definit

# Hyponyms

In [16]:
motorcar = wn.synset('car.n.01')
types_of_motorcar = motorcar.hyponyms()
print(types_of_motorcar)

[Synset('ambulance.n.01'), Synset('beach_wagon.n.01'), Synset('bus.n.04'), Synset('cab.n.03'), Synset('compact.n.03'), Synset('convertible.n.01'), Synset('coupe.n.01'), Synset('cruiser.n.01'), Synset('electric.n.01'), Synset('gas_guzzler.n.01'), Synset('hardtop.n.01'), Synset('hatchback.n.01'), Synset('horseless_carriage.n.01'), Synset('hot_rod.n.01'), Synset('jeep.n.01'), Synset('limousine.n.01'), Synset('loaner.n.02'), Synset('minicar.n.01'), Synset('minivan.n.01'), Synset('model_t.n.01'), Synset('pace_car.n.01'), Synset('racer.n.02'), Synset('roadster.n.01'), Synset('sedan.n.01'), Synset('sport_utility.n.01'), Synset('sports_car.n.01'), Synset('stanley_steamer.n.01'), Synset('stock_car.n.01'), Synset('subcompact.n.01'), Synset('touring_car.n.01'), Synset('used-car.n.01')]


# Hypernyms

In [17]:
motorcar = wn.synset('car.n.01')
motorcar.hypernyms()

[Synset('motor_vehicle.n.01')]

In [18]:
# most general hypernym
motorcar.root_hypernyms()

[Synset('entity.n.01')]

In [19]:
wn.synset('bank.v.01').root_hypernyms()

[Synset('move.v.02')]

# Meronyms and holonyms

In [20]:
# we can explore meronyms and holonyms of nouns if they have
wn.synset('tree.n.01').part_meronyms()

[Synset('burl.n.02'), Synset('crown.n.07'), Synset('limb.n.02'), Synset('stump.n.01'), Synset('trunk.n.01')]

In [21]:
wn.synset('crown.n.07').part_holonyms()

[Synset('tree.n.01')]

In [22]:
wn.synset('tree.n.01').substance_meronyms()

[Synset('heartwood.n.01'), Synset('sapwood.n.01')]

In [23]:
wn.synset('tree.n.01').member_holonyms()

[Synset('forest.n.01')]

# Antonyms

In [24]:
# we want to find the antonyms of the word bright
bright = wn.synsets("bright")
bright

[Synset('bright.a.01'), Synset('bright.s.02'), Synset('bright.s.03'), Synset('bright.s.04'), Synset('bright.s.05'), Synset('bright.s.06'), Synset('undimmed.a.01'), Synset('bright.s.08'), Synset('bright.s.09'), Synset('bright.s.10'), Synset('brilliantly.r.01')]

In [25]:
print(bright[0].definition())

emitting or reflecting light readily or in large amounts


In [26]:
# some of the relations are not between synsets, but between lemmas
bright[0].antonyms()

AttributeError: 'Synset' object has no attribute 'antonyms'

In [27]:
bright[0].lemmas()

[Lemma('bright.a.01.bright')]

In [28]:
bright[0].lemmas()[0].antonyms()

[Lemma('dull.a.02.dull')]

In [29]:
bright[0].lemmas()[0].antonyms()[0].synset()

Synset('dull.a.02')

In [30]:
bright[0].lemmas()[0].antonyms()[0].synset().definition()

'emitting or reflecting very little light'

# Semantic similarity

In [31]:
right = wn.synset('right_whale.n.01')
orca = wn.synset('orca.n.01')
minke = wn.synset('minke_whale.n.01')
tortoise = wn.synset('tortoise.n.01')
novel = wn.synset('novel.n.01')
right.lowest_common_hypernyms(minke)

[Synset('baleen_whale.n.01')]

In [32]:
right.lowest_common_hypernyms(orca)

[Synset('whale.n.02')]

In [33]:
right.lowest_common_hypernyms(tortoise)

[Synset('vertebrate.n.01')]

In [34]:
right.lowest_common_hypernyms(novel)

[Synset('entity.n.01')]

In [35]:
# we can calculate the similarity by looking at the shortest path
right.path_similarity(minke)

0.25

In [36]:
right.path_similarity(orca)

0.16666666666666666

In [37]:
# Leacock-Chodorow Similarity takes into consideration the distance and 
# the depth of the taxonomy
right.lch_similarity(minke)

2.2512917986064953

In [38]:
right.lch_similarity(orca)

1.845826690498331

Have a look at WordNet howto for more similarity metrics <a href="http://www.nltk.org/howto/wordnet.html" target="_blank">http://www.nltk.org/howto/wordnet.html</a>

# Exercises

- Ask the user to enter two nouns. Determine whether the two nouns are synonyms and display the meaning they share.
- Part two of the above: if the two nouns are not synonyms find which senses of the two nouns are the closest (choose whatever measure of closeness you want).
- Take a text from Brown corpus. Find out for each word in the text its synsets. Find out which of the synsets are most used in the text.

# Read more

- More examples how to access information in WordNet <a href="http://www.nltk.org/howto/wordnet.html" target="_blank">http://www.nltk.org/howto/wordnet.html</a>
- A tutorial how to use NLTK and WordNet: <a href="https://pythonprogramming.net/wordnet-nltk-tutorial/" target="_blank">https://pythonprogramming.net/wordnet-nltk-tutorial/</a> (includes a video)
- Section 5 of http://www.nltk.org/book/ch02.html
- Online interface to WordNet: <a href="http://wordnetweb.princeton.edu/perl/webwn" target="_blank">http://wordnetweb.princeton.edu/perl/webwn</a>
- For those of you who want to learn more about WordNet:
<a href="http://wordnetcode.princeton.edu/5papers.pdf" target="_blank">http://wordnetcode.princeton.edu/5papers.pdf</a>