# Exercise 04: WordNet

### Task 1: Word relations

Find the relationship between the following words

__a)__ ``dog`` is a/an ___ of ``animal``

__b)__ ``good`` is a/an ___ of ``bad``

__c)__ ``wheel`` is a/an ___ of ``car``

__d)__ ``building`` is a/an ___ of ``skyscraper`` 

a) Hyponym (is-a relation, subordinate term)

b) Antonym (opposite)

c) Meronym (part-of relation) (opposite is holonym)

d) Hypernym (superordinate term)

### Task 2: WordNet 
In this task we will work with the WordNet module of NLTK (https://www.nltk.org/api/nltk.corpus.reader.wordnet.html). WordNet is a large lexical database for English that groups words in so-called synsets, which are sets of cognitive synonyms that express distinct concepts (https://wordnet.princeton.edu/).

In [1]:
import nltk

nltk.download('wordnet')
nltk.download('omw-1.4')

[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/schloett/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     /Users/schloett/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


True

In [2]:
from nltk.corpus import wordnet as wn

__a)__ How many synsets do the words ``monitor`` and ``living`` have? For each synset, print the names of the lemmas associated with the synset, their definition and the assigned examples (if present). 

In [3]:
print(wn.synsets('monitor'))

[Synset('proctor.n.01'), Synset('admonisher.n.01'), Synset('monitor.n.03'), Synset('monitor.n.04'), Synset('monitor.n.05'), Synset('monitor.n.06'), Synset('monitor.n.07'), Synset('monitor.v.01'), Synset('monitor.v.02')]


In [9]:
print(wn.synsets('monitor')) # all synsets

print(len(wn.synsets('monitor')))  # how many synsets

def print_synset_info(synsets):
    for synset in synsets:
        print()
        print(synset)
        #print (synset.lemmas())
        print ('lemma names:',synset.lemma_names())  # words with synonym senses
        print ('definition:',synset.definition())  # definition
        examples = synset.examples()
        if len(examples) > 0:
            print ('examples:',examples)  # example sentence(s)

print_synset_info(wn.synsets('monitor'))

[Synset('proctor.n.01'), Synset('admonisher.n.01'), Synset('monitor.n.03'), Synset('monitor.n.04'), Synset('monitor.n.05'), Synset('monitor.n.06'), Synset('monitor.n.07'), Synset('monitor.v.01'), Synset('monitor.v.02')]
9

Synset('proctor.n.01')
lemma names: ['proctor', 'monitor']
definition: someone who supervises (an examination)

Synset('admonisher.n.01')
lemma names: ['admonisher', 'monitor', 'reminder']

Synset('monitor.n.03')
lemma names: ['Monitor']
definition: an ironclad vessel built by Federal forces to do battle with the Merrimac

Synset('monitor.n.04')
lemma names: ['monitor', 'monitoring_device']
definition: display produced by a device that takes signals and displays them on a television screen or a computer monitor

Synset('monitor.n.05')
lemma names: ['monitor']
definition: electronic equipment that is used to check the quality or content of electronic transmissions

Synset('monitor.n.06')
lemma names: ['monitor']
definition: a piece of electronic equipment that keeps t

In [10]:
# solution
print(wn.synsets('living'))
print(len(wn.synsets('living')))  # how many synsets

print_synset_info(wn.synsets('living'))

[Synset('life.n.02'), Synset('living.n.02'), Synset('animation.n.01'), Synset('support.n.06'), Synset('populate.v.01'), Synset('live.v.02'), Synset('survive.v.01'), Synset('exist.v.02'), Synset('be.v.11'), Synset('know.v.05'), Synset('live.v.07'), Synset('living.a.01'), Synset('living.s.02'), Synset('living.s.03'), Synset('surviving.s.01'), Synset('living.s.05'), Synset('living.s.06')]
17

Synset('life.n.02')
lemma names: ['life', 'living']
definition: the experience of being alive; the course of human events and activities
examples: ['he could no longer cope with the complexities of life']

Synset('living.n.02')
lemma names: ['living']
definition: people who are still living
examples: ['save your pity for the living']

Synset('animation.n.01')
lemma names: ['animation', 'life', 'living', 'aliveness']
definition: the condition of living or the state of being alive
examples: ["while there's life there's hope", 'life depends on many chemical and physical processes']

Synset('support.n.06

__b)__ Take a look at the lemma `car` as in "a motor vehicle with four wheels; usually propelled by an internal combustion engine". What other lemmas have the same definition in WordNet?

In [11]:
# solution
for synset in wn.synsets("car"):
    print(synset)
    print (synset.definition())  # find the right sense of car -> index 0
    print()

Synset('car.n.01')
a motor vehicle with four wheels; usually propelled by an internal combustion engine

Synset('car.n.02')
a wheeled vehicle adapted to the rails of railroad

Synset('car.n.03')
the compartment that is suspended from an airship and that carries personnel and the cargo and the power plant

Synset('car.n.04')
where passengers ride up and down

Synset('cable_car.n.01')
a conveyance for passengers or freight on a cable railway



In [12]:
# solution
synset = wn.synsets("car")[0]  # first synset of car has the right description
print(synset)
print (synset.definition())
print()

for lemma in synset.lemmas():  # all lemmas in synset have the same definition
    print(lemma)

Synset('car.n.01')
a motor vehicle with four wheels; usually propelled by an internal combustion engine

Lemma('car.n.01.car')
Lemma('car.n.01.auto')
Lemma('car.n.01.automobile')
Lemma('car.n.01.machine')
Lemma('car.n.01.motorcar')


__c)__ Find the lowest common hypernym of ``tea`` and ``coffee``in the sense of tree/shrub. 

In [13]:
# solution
for synset in wn.synsets("tea"):  # find the right sense of tea
    print(synset.name())
    print(synset.definition())

tea_tree = wn.synsets("tea")[2]  # tea in sense of tree

for synset in wn.synsets("coffee"):  # find the right sense of coffee
    print(synset.name())
    print(synset.definition())
    
coffee_tree = wn.synsets("coffee")[1]  # coffee in sense of tree

tea.n.01
a beverage made by steeping tea leaves in water
tea.n.02
a light midafternoon meal of tea and sandwiches or cakes
tea.n.03
a tropical evergreen shrub or small tree extensively cultivated in e.g. China and Japan and India; source of tea leaves
tea.n.04
a reception or party at which tea is served
tea.n.05
dried leaves of the tea shrub; used to make tea
coffee.n.01
a beverage consisting of an infusion of ground coffee beans
coffee.n.02
any of several small trees and shrubs native to the tropical Old World yielding coffee beans
coffee_bean.n.01
a seed of the coffee tree; ground to make coffee
chocolate.n.03
a medium brown to dark-brown color


In [14]:
print(tea_tree.lowest_common_hypernyms(coffee_tree))  # superordinate term

[Synset('woody_plant.n.01')]


In [15]:
# word sense disambiguation of 'tea' with Lesk algorithm

from nltk.wsd import lesk
context = ["tea", "can", "be", "a", "tree", "with", "leaves"]
print(lesk(context, 'tea',))

Synset('tea.n.03')


__d)__ Print the path(s) of ``coffee`` in the sense of beverage to the root of the hypernym/hyponym tree. Then compute the distance of the shortest path linking ``coffee`` and ``tea`` in the sense of beverages. Compare it to the shortest path connecting ``coffee`` and ``tea`` in the sense of trees/shrugs.

In [16]:
# solution
tea_bev = wn.synsets("tea")[0]  # tea in sense of beverage
coffee_bev = wn.synsets("coffee")[0]  # coffee in sense of beverage

coffee_paths = coffee_bev.hypernym_paths()
print(len(coffee_paths))  # 3 different paths up to the root
coffee_paths
# coffee_tree.hypernym_paths()
# coffee_tree.hypernym_paths()

3


[[Synset('entity.n.01'),
  Synset('abstraction.n.06'),
  Synset('relation.n.01'),
  Synset('part.n.01'),
  Synset('substance.n.01'),
  Synset('fluid.n.01'),
  Synset('liquid.n.01'),
  Synset('beverage.n.01'),
  Synset('coffee.n.01')],
 [Synset('entity.n.01'),
  Synset('physical_entity.n.01'),
  Synset('matter.n.03'),
  Synset('substance.n.01'),
  Synset('fluid.n.01'),
  Synset('liquid.n.01'),
  Synset('beverage.n.01'),
  Synset('coffee.n.01')],
 [Synset('entity.n.01'),
  Synset('physical_entity.n.01'),
  Synset('matter.n.03'),
  Synset('substance.n.07'),
  Synset('food.n.01'),
  Synset('beverage.n.01'),
  Synset('coffee.n.01')]]

In [17]:
# solution

# distance of shortest path 
print(tea_bev.shortest_path_distance(coffee_bev))  # beverages
print(tea_tree.shortest_path_distance(coffee_tree))  # trees

2
4


__e)__ Implement a function that finds __all__ hyponyms (on all deeper levels) for a given synset. With the parameter ``instance`` the user should control if instance hyponyms should also be included.

In [18]:
# "WordNet distinguishes among Types (common nouns/concepts) and Instances (specific persons, countries and geographic entities). 
# Thus, armchair is a type of chair, Joe Biden is an instance of a president. Instances are always leaf (terminal) nodes 
# in their hierarchies." (https://wordnet.princeton.edu/)

s = wn.synset('american_state.n.01')
print(s.hyponyms())
print(s.instance_hyponyms())  # instances are always leaves

[Synset('slave_state.n.01'), Synset('free_state.n.02')]
[Synset('north_dakota.n.01'), Synset('new_york.n.02'), Synset('texas.n.01'), Synset('kansas.n.01'), Synset('rhode_island.n.01'), Synset('virginia.n.01'), Synset('ohio.n.01'), Synset('north_carolina.n.01'), Synset('colorado.n.01'), Synset('alabama.n.01'), Synset('nevada.n.01'), Synset('idaho.n.01'), Synset('new_mexico.n.01'), Synset('missouri.n.01'), Synset('illinois.n.01'), Synset('kentucky.n.01'), Synset('west_virginia.n.01'), Synset('montana.n.01'), Synset('south_dakota.n.01'), Synset('alaska.n.01'), Synset('arkansas.n.01'), Synset('pennsylvania.n.01'), Synset('oregon.n.01'), Synset('maryland.n.01'), Synset('iowa.n.02'), Synset('washington.n.02'), Synset('wyoming.n.01'), Synset('minnesota.n.01'), Synset('new_jersey.n.01'), Synset('maine.n.01'), Synset('connecticut.n.01'), Synset('massachusetts.n.01'), Synset('california.n.01'), Synset('wisconsin.n.02'), Synset('michigan.n.01'), Synset('louisiana.n.01'), Synset('hawaii.n.01'), Sy

In [19]:
# solution
def get_hyponyms(synset, instance=False):
    '''
    :param synset: synset for which we want to find all hyponyms
    :param instance: indicates if instance hyponyms should be included, False by default
    :return: a list of all hyponyms for the given synset
    '''
    queue = [synset]  # stores all synsets for which we need to find hyponyms
    res = set() 
    
    while len(queue) > 0:
        syn = queue.pop()  # take out first element
        if syn in res:  # ignore if we have found the synset already
            continue
        res.add(syn)  # add current synset to hyponyms
        
        queue.extend([hypo for hypo in syn.hyponyms()])  # find hyponyms and add them to list
        if instance:
            queue.extend([hypo for hypo in syn.instance_hyponyms()])  # include instance hyponyms
    
    res.remove(synset)  # remove original synset
    return list(res)

In [20]:
wn.synsets("bank")

[Synset('bank.n.01'),
 Synset('depository_financial_institution.n.01'),
 Synset('bank.n.03'),
 Synset('bank.n.04'),
 Synset('bank.n.05'),
 Synset('bank.n.06'),
 Synset('bank.n.07'),
 Synset('savings_bank.n.02'),
 Synset('bank.n.09'),
 Synset('bank.n.10'),
 Synset('bank.v.01'),
 Synset('bank.v.02'),
 Synset('bank.v.03'),
 Synset('bank.v.04'),
 Synset('bank.v.05'),
 Synset('deposit.v.02'),
 Synset('bank.v.07'),
 Synset('trust.v.01')]

In [21]:
# test
b = wn.synsets("bank")[1]
s = wn.synset('american_state.n.01')

get_hyponyms(b)  
get_hyponyms(s)
get_hyponyms(s, instance=True)  

[Synset('south_dakota.n.01'),
 Synset('free_state.n.02'),
 Synset('missouri.n.01'),
 Synset('new_hampshire.n.01'),
 Synset('maryland.n.01'),
 Synset('west_virginia.n.01'),
 Synset('arkansas.n.01'),
 Synset('massachusetts.n.01'),
 Synset('florida.n.01'),
 Synset('north_dakota.n.01'),
 Synset('new_york.n.02'),
 Synset('wyoming.n.01'),
 Synset('slave_state.n.01'),
 Synset('hawaii.n.01'),
 Synset('mississippi.n.02'),
 Synset('michigan.n.01'),
 Synset('nevada.n.01'),
 Synset('washington.n.02'),
 Synset('maine.n.01'),
 Synset('virginia.n.01'),
 Synset('georgia.n.01'),
 Synset('louisiana.n.01'),
 Synset('alabama.n.01'),
 Synset('pennsylvania.n.01'),
 Synset('colorado.n.01'),
 Synset('new_mexico.n.01'),
 Synset('nebraska.n.01'),
 Synset('ohio.n.01'),
 Synset('kentucky.n.01'),
 Synset('vermont.n.01'),
 Synset('texas.n.01'),
 Synset('kansas.n.01'),
 Synset('arizona.n.01'),
 Synset('montana.n.01'),
 Synset('alaska.n.01'),
 Synset('california.n.01'),
 Synset('indiana.n.01'),
 Synset('illinois.n.01

__*f)__ Implement a function that __recursively__ finds all hypernyms up to the root for a given synset. Keep in mind that the input synset might be an instance.

_Note: A recursive function is a function that calls itself._

_*this task is optional_

In [28]:
# solution
def get_hypernyms(synset, level=0):
    '''
    :param synset: synset for which we want to find all hypernyms
    :param level: indicates the level of recursion, can be used for termination
    :return: a list of all hypernyms for the given synset
    '''
    res = {} # result dictionary that keeps track of recursion level as well
    res[synset] = level-1 # add initial synset
    
    for hyp in synset.hypernyms() + synset.instance_hypernyms():  # all direct hypernyms in list
        res[hyp] = level+1  # store recursion level per hypernym
        
        tmp_res = get_hypernyms(hyp, level+1) # call function on hypernym -> this returns all hypernyms  
        res.update(tmp_res)  # add all new found hypernyms to res
        #print(level)
        #print(res)
        
    if level == 0:  # final return
        return res#list(res.keys())  # all hypernyms found
    return res  # recursive return (tmp_res)     

In [30]:
# alternative solution
def get_hypernyms(synset):
    '''
    :param synset: synset for which we want to find all hypernyms
    :return: a list of all hypernyms for the given synset
    '''
    res = [] # result set
    
    for hyp in synset.hypernyms() + synset.instance_hypernyms():  # all direct hypernyms in list
        tmp_res = [hyp] + get_hypernyms(hyp) # call function on hypernym -> this returns all hypernyms  
        for hyp in tmp_res:
            if hyp not in res:
                res.append(hyp)
    return res  # recursive return (tmp_res)     

In [31]:
# test
b = wn.synsets("tea")[0]
get_hypernyms(b)

#get_hypernyms(wn.synsets('jersey')[0])  # works for instances as well

[Synset('beverage.n.01'),
 Synset('liquid.n.01'),
 Synset('fluid.n.01'),
 Synset('substance.n.01'),
 Synset('part.n.01'),
 Synset('relation.n.01'),
 Synset('abstraction.n.06'),
 Synset('entity.n.01'),
 Synset('matter.n.03'),
 Synset('physical_entity.n.01'),
 Synset('food.n.01'),
 Synset('substance.n.07')]

In [32]:
# additional iterative solution

def get_hypernyms_it(synset):
    queue = [synset]
    res = set()
    
    while len(queue) > 0:
        syn = queue.pop()
        if syn in res:
            continue
        res.add(syn)
        # find all hypernyms and add them to the list
        queue.extend([hyp for hyp in syn.hypernyms()]) 
        queue.extend([hyp for hyp in syn.instance_hypernyms()])  # also include instance hypernyms
    return list(res)

In [33]:
get_hypernyms_it(b)

[Synset('abstraction.n.06'),
 Synset('liquid.n.01'),
 Synset('physical_entity.n.01'),
 Synset('tea.n.01'),
 Synset('part.n.01'),
 Synset('fluid.n.01'),
 Synset('substance.n.01'),
 Synset('food.n.01'),
 Synset('relation.n.01'),
 Synset('substance.n.07'),
 Synset('entity.n.01'),
 Synset('matter.n.03'),
 Synset('beverage.n.01')]

In [65]:
b.hypernym_paths()

[[Synset('entity.n.01'),
  Synset('abstraction.n.06'),
  Synset('relation.n.01'),
  Synset('part.n.01'),
  Synset('substance.n.01'),
  Synset('fluid.n.01'),
  Synset('liquid.n.01'),
  Synset('beverage.n.01'),
  Synset('tea.n.01')],
 [Synset('entity.n.01'),
  Synset('physical_entity.n.01'),
  Synset('matter.n.03'),
  Synset('substance.n.01'),
  Synset('fluid.n.01'),
  Synset('liquid.n.01'),
  Synset('beverage.n.01'),
  Synset('tea.n.01')],
 [Synset('entity.n.01'),
  Synset('physical_entity.n.01'),
  Synset('matter.n.03'),
  Synset('substance.n.07'),
  Synset('food.n.01'),
  Synset('beverage.n.01'),
  Synset('tea.n.01')]]