<a href="https://colab.research.google.com/github/dgromann/MCMLR/blob/main/Word_sense_disambiguation_WordNet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Word Sense Disambiguation with WordNet

This notebook provides an example of how to access WordNet from Python utilizing NLTK. 

First NLTK needs to be installed on Google Colab and then the required libraries need to be downloaded.

In [None]:
!pip install nltk

In [None]:
import nltk
nltk.download('wordnet')
nltk.download('omw-1.4')

As a next step we can query all the synsets on WordNet for a specific word. 

In [None]:
from nltk.corpus import wordnet as wn
wn.synsets("bank")

[Synset('bank.n.01'),
 Synset('depository_financial_institution.n.01'),
 Synset('bank.n.03'),
 Synset('bank.n.04'),
 Synset('bank.n.05'),
 Synset('bank.n.06'),
 Synset('bank.n.07'),
 Synset('savings_bank.n.02'),
 Synset('bank.n.09'),
 Synset('bank.n.10'),
 Synset('bank.v.01'),
 Synset('bank.v.02'),
 Synset('bank.v.03'),
 Synset('bank.v.04'),
 Synset('bank.v.05'),
 Synset('deposit.v.02'),
 Synset('bank.v.07'),
 Synset('trust.v.01')]

Then we can choose one synset and obtain its definition and its hypernyms.

In [None]:
bank = wn.synset('bank.n.01')
print("Definition: ", bank.definition())
print("Hypernyms: ", bank.hypernyms())

Definition:  sloping land (especially the slope beside a body of water)
Hypernyms:  [Synset('slope.n.01')]


Maybe we also want to find all synonyms and antonyms for a specific word in WordNet.

In [None]:
synonyms = []
antonyms = []

for syn in wn.synsets("bank"):
    for l in syn.lemmas():
        synonyms.append(l.name())
        if l.antonyms():
            antonyms.append(l.antonyms()[0].name())

print(set(synonyms))
print(set(antonyms))

{'bank', 'swear', 'cant', 'savings_bank', 'trust', 'bank_building', 'banking_company', 'coin_bank', 'rely', 'banking_concern', 'deposit', 'camber', 'depository_financial_institution', 'money_box'}
{'mistrust', 'withdraw'}


WordNet can then be used with simple algorithms to disambiguate a word occurring in a specific sentence. For instance, the Lesk algorithm can be used to this end. This algorithm counts the overal between the words in the sentence and the synonyms and words in the definitions of a synset.  

In [None]:
from nltk.corpus import wordnet as wn

def lesk(sentence, ambiguous_word):
    max_overlaps = 0
    lesk_dictionary = []
    lesk_sense = ""
    context = sentence.split()

    for sense in wn.synsets(ambiguous_word):
        lesk_dictionary += sense.definition().split()
        lesk_dictionary += sense.lemma_names()
          
        overlaps = set(lesk_dictionary).intersection(context)

        if len(overlaps) > max_overlaps:
            lesk_sense = sense
            max_overlaps = len(overlaps)  
    return str(lesk_sense)

sentence = "The bank on the riverside was full of dead fishes."
sense = lesk(sentence, "bank")
print(sense)