# Wordnet

Wordnet is a semantic dictionary organized into synsets. A **synset** is a group of synonym words that has an associated id and sense, illustrated by a brief gloss (definition). A word can appear in multiple synsets if that word (or collocation) is polysemantic.

To easily visualize the links between words, you can use this [web based GUI](https://wordvis.com/)

In [6]:
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to /root/nltk_data...


True

In [None]:
from nltk.corpus import wordnet

In [None]:
wordnet.synsets('school')

[Synset('school.n.01'),
 Synset('school.n.02'),
 Synset('school.n.03'),
 Synset('school.n.04'),
 Synset('school.n.05'),
 Synset('school.n.06'),
 Synset('school.n.07'),
 Synset('school.v.01'),
 Synset('educate.v.03'),
 Synset('school.v.03')]

In [None]:
school_synset = wordnet.synsets('school')[0]
print(school_synset.definition())
print(school_synset.examples())
print(school_synset.lemma_names()) # get synonyms

an educational institution
['the school was founded in 1900']
['school']


In [None]:
wordnet.synsets('trains') # get the list of synsets of a word

[Synset('train.n.01'),
 Synset('string.n.04'),
 Synset('caravan.n.01'),
 Synset('train.n.04'),
 Synset('train.n.05'),
 Synset('gearing.n.01'),
 Synset('train.v.01'),
 Synset('train.v.02'),
 Synset('discipline.v.01'),
 Synset('prepare.v.05'),
 Synset('educate.v.03'),
 Synset('aim.v.01'),
 Synset('coach.v.01'),
 Synset('train.v.08'),
 Synset('train.v.09'),
 Synset('train.v.10'),
 Synset('trail.v.05')]

### Types of relations for nouns

**hypernyms/hyponyms**: A hypernym is a more general term that describes a concept. A synset s1 is a hypernym of s2 if s2 is a type of s1. If s1 is a hypernym of s2, s2 is a hyponym of s1

In [None]:
school_synset = wordnet.synset('school.n.01')
print(school_synset.hypernyms()) # hypernymy -> is-a relation
print(school_synset.hyponyms()) # types of schools

[Synset('educational_institution.n.01')]
[Synset('dancing_school.n.01'), Synset('flying_school.n.01'), Synset('religious_school.n.01'), Synset('secondary_school.n.01'), Synset('sunday_school.n.01'), Synset('secretarial_school.n.01'), Synset('night_school.n.01'), Synset('graduate_school.n.01'), Synset('technical_school.n.01'), Synset('academy.n.03'), Synset('driving_school.n.01'), Synset('correspondence_school.n.01'), Synset('private_school.n.01'), Synset('nursing_school.n.01'), Synset('training_school.n.01'), Synset('riding_school.n.01'), Synset('public_school.n.01'), Synset('conservatory.n.01'), Synset('direct-grant_school.n.01'), Synset('finishing_school.n.01'), Synset('veterinary_school.n.01'), Synset('crammer.n.03'), Synset('dance_school.n.01'), Synset('day_school.n.02'), Synset('grade_school.n.01'), Synset('language_school.n.01'), Synset('alma_mater.n.01')]


In [None]:
synset = wordnet.synset('oak.n.02')
print(synset.definition())
print(synset.hypernyms()) # hypernymy -> is-a relation
print(synset.hyponyms()) # types of oaks

a deciduous tree of the genus Quercus; has acorns and lobed leaves
[Synset('tree.n.01')]
[Synset('jack_oak.n.02'), Synset('overcup_oak.n.01'), Synset('scrub_oak.n.01'), Synset('willow_oak.n.01'), Synset('nuttall_oak.n.01'), Synset('red_oak.n.01'), Synset('pin_oak.n.01'), Synset('cork_oak.n.01'), Synset('live_oak.n.01'), Synset('chestnut_oak.n.01'), Synset('holm_oak.n.02'), Synset('black_oak.n.01'), Synset('american_turkey_oak.n.01'), Synset('scarlet_oak.n.01'), Synset('california_black_oak.n.01'), Synset('laurel_oak.n.01'), Synset('chinese_cork_oak.n.01'), Synset('shingle_oak.n.01'), Synset('white_oak.n.01'), Synset('european_turkey_oak.n.01'), Synset('post_oak.n.01'), Synset('japanese_oak.n.01'), Synset('spanish_oak.n.01'), Synset('water_oak.n.01'), Synset('bluejack_oak.n.01')]


**meronyms/holonyms**: A synset s1 is a holonym of synset s2 (and , simultaneously, s2 is a meronym of s1), if s2 is contained in s1. The meronyms and holonyms are of three types: "part of", "substance", and "member of".

In [None]:
room = wordnet.synset('room.n.01')
print(f"{room.name()}:, {room.definition()}")
print("part holonyms:", room.part_holonyms())
print("part meronyms:", room.part_meronyms())

room.n.01:, an area within a building enclosed by walls and floor and ceiling
part holonyms: [Synset('building.n.01')]
part meronyms: [Synset('room_light.n.01'), Synset('ceiling.n.01'), Synset('wall.n.01'), Synset('floor.n.01')]


In [None]:
air = wordnet.synset('air.n.01')
print(f"{air.name()}:, {air.definition()}")
print("substance holonyms:", air.substance_holonyms())
print("substance meronyms:", air.substance_meronyms())

air.n.01:, a mixture of gases (especially oxygen) required for breathing; the stuff that the wind consists of
substance holonyms: [Synset('wind.n.01')]
substance meronyms: [Synset('nitrogen.n.01'), Synset('xenon.n.01'), Synset('krypton.n.01'), Synset('neon.n.01'), Synset('oxygen.n.01'), Synset('argon.n.01')]


In [None]:
tree = wordnet.synset('tree.n.01')
print(f"{tree.name()}:, {tree.definition()}")
print("member holonyms:", tree.member_holonyms())
print("member meronyms:", tree.member_meronyms())

tree.n.01:, a tall perennial woody plant having a main trunk and branches forming a distinct elevated crown; includes both gymnosperms and angiosperms
member holonyms: [Synset('forest.n.01')]
member meronyms: []


**attributes** (relation to adjective synsets): An adjective synset s1 is an attribute of noun synset s2 if s1 can be a value of s2

In [None]:
speed = wordnet.synset('speed.n.02')
print(speed.definition())
speed.attributes()

a rate (usually rapid) at which something happens


[Synset('slow.a.01'), Synset('fast.a.01')]

### Types of relations for verbs:

**hypernyms/troponyms**: Troponyms give a specification for a verb. In other words, a troponym is a verb that specifies an action in a certain context. For example, diving is a type of swimming, therefore the verb dive is a troponym for the verb swim. Troponyms are actually verb hyponyms.

A hypernym of a verb indicates a broader category of actions the verb belongs to.

In [None]:
swim = wordnet.synset('swim.v.01')
print(swim.hypernyms())
print(swim.hyponyms())

[Synset('travel.v.01')]
[Synset('paddle.v.03'), Synset('skinny-dip.v.01'), Synset('fin.v.02'), Synset('crawl.v.05'), Synset('dive.v.03'), Synset('school.v.03'), Synset('backstroke.v.01'), Synset('fin.v.03'), Synset('breaststroke.v.01')]


In [None]:
run = wordnet.synset('run.v.01')
print(run.hypernyms())
print(run.hyponyms())

[Synset('travel_rapidly.v.01')]
[Synset('streak.v.02'), Synset('rush.v.05'), Synset('run_bases.v.01'), Synset('jog.v.03'), Synset('trot.v.01'), Synset('scurry.v.01'), Synset('sprint.v.01'), Synset('lope.v.01'), Synset('outrun.v.01'), Synset('hare.v.01'), Synset('romp.v.02'), Synset('run.v.33')]


**entailments**: an action (verb) is dependent on another action (verb) - the first action needs the other action to take place

In [None]:
wordnet.synset('look.v.01').entailments()

[Synset('see.v.01')]

### Types of relations for adjectives

**antonyms**

In [None]:
lemma = wordnet.synset('good.a.01').lemmas()[0] # same for adverbs
print(lemma.antonyms())
lemma.synset()

[Lemma('bad.a.01.bad')]


Synset('good.a.01')

**similar to** (also for adjective satellites)

In [None]:
wordnet.synset('smart.a.01').similar_tos()

[Synset('astute.s.01'), Synset('streetwise.s.01'), Synset('cagey.s.01')]

**attributes** (relation to noun synsets)

In [None]:
wordnet.synset('slow.a.01').attributes()

[Synset('speed.n.02')]

### The similarity of two synsets

In [None]:
wordnet.synset('plant.n.02').definition()

'(botany) a living organism lacking the power of locomotion'

In [None]:
tree_syn = wordnet.synset('tree.n.01')
tree_syn.path_similarity(wordnet.synset('plant.n.02'))

0.25

In [None]:
wordnet.synset('plant.n.01').definition()

'buildings for carrying on industrial labor'

In [None]:
tree_syn.path_similarity(wordnet.synset('plant.n.01'))

0.09090909090909091

# Lesk for WSD

The Lesk algorithm is used in word sense disambiguation (WSD). It associates a sense to a given word based on how related it is to the context (the rest of the words in the text). The algorithm uses the **lesk measure** to measure the relatedness of two words.

The Lesk measure is the number of common words in the definitions (glosses) of two words (senses)

Simplified Lesk is already implemented in nltk, which compares the target word glosses with the words from the context (not their definitions):

In [7]:
import nltk
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


True

In [9]:
from nltk.wsd import lesk

ambiguous_word = 'school'
pos = 'n'
sentence_context = nltk.word_tokenize('Students enjoy going to school, studying and reading books')

synset = lesk(sentence_context, ambiguous_word, pos)
print(synset.name())
print(synset.definition())

school.n.06
an educational institution's faculty and students


In [None]:
import nltk
nltk.download('punkt_tab')
nltk.download('wordnet')
nltk.download('omw-1.4')

from nltk.wsd import lesk
from nltk.corpus import wordnet as wn
from nltk.tokenize import word_tokenize


### Exercises

Exercises 1 to 9 are worth 0.1 points each (not doubled). Exercise 10 is worth 0.5 points.

1. Create a function that receives a word and prints the associated glosses for all the possible senses of that word (you must find all its corresponding synsets and print the gloss for each).

2. Create a function that receives two words as parameters. The function will check, using WordNet, if the two words can be synonyms, i.e. there is at least one synset that contains the two words. Print the name and gloss for all such synsets.

3. Create a function that receives a synset object and returns a tuple with 2 lists. The first list contains the holonyms (all types of holonyms) and the second one the meronyms (all types).

4. Find a word that has either holonyms or meronyms of different types. Print them separately (on categories of holonyms/meronyms) and then all together using the created function (in order to check that it prints them all).

5. Create a function that for a given synset, prints the path of hypernyms (going to the next hypernym, and from that hypernym to the next one and so on, until it reaches the root). You can use the function hypernym_paths() of a synset object.

6. Create a function that receives two synsets as parameters. We consider d1(k) the length of the path from the first word to the hypernym k (the length of the path is the number of hypernyms it goes through, to reach k) and d2(k) the length of the path from the second word to the hypernym k. The function will return the list of hypernyms having the property that d1(k)+d2(k) is minimum (i.e. lowest common hypernyms).

7. Create a function that receives a synset object and a list of synsets (the list must contain at least 5 elements). The function will return a sorted list. The list will be sorted by the similarity between the first synset and the synsets in the list. For example (we consider we take the first synset for each word) we can test for the word cat and the list: animal, tree, house, object, public_school, mouse.

8. Create a function that checks if two synsets can be indirect meronyms for the same synset. An indirect meronym is either a part of the given element or a part of a part of the given element (and we can extend this relation as being part of part of part of etc...). This applies to any type of meronym.

9. Print the synonyms and antonyms of an adjective (for example, "beautiful"). If it's polisemantic, print them for each sense, also printing the gloss for that sense (synset).

10. Implement the Lesk algorithm with the help of a function that computes the score for two given glosses. For a given text and a given word, try to find the sense of that word using the Lesk measure. Print the definition for that sense (synset). Compare your result with the Lesk algorithm implemented in nltk.

# Each cell represents an exercises from above


In [None]:
def get_glosses(word):
    synsets = wn.synsets(word)
    if not synsets:
        print(f"No synsets found for '{word}'.")
        return

    print(f"Glosses for '{word}':\n")
    for i, synset in enumerate(synsets, 1):
        print(f"{i}. {synset.name()}: {synset.definition()}")

get_glosses('car')

In [None]:
def check_syn(word1, word2):
    synsets1 = wn.synsets(word1)
    common_synsets = []
  
    for syn1 in synsets1:
        for lemma in syn1.lemmas():
            if lemma.name() == word2:
                common_synsets.append(syn1)
                break

    if common_synsets:
        print(f"The words '{word1}' and '{word2}' share the following synsets (they may be synonyms):\n")
        for syn in common_synsets:
            print(f"{syn.name()}: {syn.definition()}")
    else:
        print(f"No shared synsets found for '{word1}' and '{word2}'. They do not appear as synonyms in WordNet.")

check_syn("car", "automobile")

In [None]:
def get_holonyms(synset):
    holonyms = ( synset.member_holonyms() +
                synset.part_holonyms() +
                synset.substance_holonyms()
    )
    meronyms = (
        synset.member_meronyms() +
        synset.part_meronyms() +
        synset.substance_meronyms()
    )

    return holonyms, meronyms

h, m = get_holonyms(wn.synset('car.n.01'))
print(m)

In [None]:
print("All Holonyms:")
for holo in h:
    print(f"- {holo.name()}: {holo.definition()}")
print()
print("All Meronyms:")
for mero in m:
    print(f"- {mero.name()}: {mero.definition()}")

In [None]:
def hyper_path(synset):
    paths = synset.hypernym_paths()
    
    if not paths:
        print(f"No hypernym paths found for {synset.name()}")
        return

    print(f"Hypernym paths for synset '{synset.name()}':\n")

    for path_idx, path in enumerate(paths, 1):
        print(f"Path {path_idx}:")
        for level, s in enumerate(path):
            print(f"{'  ' * level}-> {s.name()}: {s.definition()}")
        print()

hyper_path(wn.synset('car.n.01'))


In [None]:
def min_common_hypernyms(synset1, synset2):
    path1 = synset1.hypernym_paths()
    path2 = synset2.hypernym_paths()
    set_of = set()
    for p in path2:
        for pp in p:
            for path in path1:
                for step in path:
                    if step == pp and step not in set_of:
                        print(step)
                        set_of.add(step)


min_common_hypernyms(wn.synset('dog.n.01'), wn.synset('domestic_cat.n.01'))

In [None]:
def similarity(synset, synsets):
    similarities = [(syn, synset.wup_similarity(syn)) for syn in synsets]
    similarities.sort(key=lambda x: x[1])
    return similarities

print(similarity(wn.synset('cat.n.01'),[wn.synset('animal.n.01'), wn.synset('tree.n.01'), wn.synset('house.n.01'), wn.synset('object.n.01'), wn.synset('public_school.n.01'), wn.synset('mouse.n.01')]))

In [None]:
def check_indirect_meronyms(synset1, synset2):
    print(synset1.hypernym_paths())
    print(synset2.hypernym_paths())
    
check_indirect_meronyms(wn.synset('wardrobe.n.01'), wn.synset('cupboard.n.01'))

In [None]:
def gloss_overlap(g1, g2):
    words1 = set(word_tokenize(g1.lower()))
    words2 = set(word_tokenize(g2.lower()))
    counter = 0
    for word in words1:
        if word in words2:
            counter += 1
    return counter

def custom_lesk(context_sentence, ambiguous_word):
    max_overlap = 0
    best_sense = None
    context = word_tokenize(context_sentence.lower())

    for sense in wn.synsets(ambiguous_word):
        gloss = sense.definition()
        overlap = gloss_overlap(' '.join(context), gloss)
        if overlap > max_overlap:
            max_overlap = overlap
            best_sense = sense

    return best_sense

sentence = "Students enjoy going to school, studying and reading books."
target_word = "school"


custom_result = custom_lesk(sentence, target_word)
print("Custom Lesk result:")
print(f"Sense: {custom_result.name()}")
print(f"Definition: {custom_result.definition()}")
print()

nltk_result = lesk(word_tokenize(sentence), target_word)
print("NLTK Lesk result:")
print(f"Sense: {nltk_result.name()}")
print(f"Definition: {nltk_result.definition()}")
