# **Word Sense Disambiguation (WSD)**



## **1- Introduction**
Words can have different meanings in different contexts. Sometimes the intended
meaning of a word is hard to understand and leads to miscommunication. How does
NLP approach this challenge? If a word has multiple meanings, this is called word
sense ambiguity. While solving syntactic ambiguity is done with part-of-speech (POS)
tagging, solving semantic ambiguity is done with word sense disambiguation (WSD).
The challenge is to semantically separate words by their meaning in context.

### **Content**
In this notebook the disambiguation of words with the **n**atural **l**anguage **t**ool **k**it (``nltk``) is shown.<br>
The WSD is demonstrated with 3 samples:
- ``bank``
- ``plant``
- ``fair``

For all examples the ``Lesk`` algorithm is used. 

For more detail about usage of the lesk-method please refer to [[1]](#scrollTo=fPge5oRLQwid)

For sample usage of wsd please refer to [[2]](#scrollTo=fPge5oRLQwid)


## **2- Code Examples**

### **1- Import necessary datasets and libraries**
In this example, we will import WordNet and Lesk. 

WordNet® is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations based on [[3]](#scrollTo=fPge5oRLQwid). 

The Lesk algorithm is an example of a knowledge-based method and is based on contextual overlap of dictionary definitions. The Lesk algorithm works as follows: We identify the overlapping definitions (underlined in this example) based on the contextual overlap among our Wiktionary definitions referring to the various senses of the ambiguous words. The approach is based on the assumption that words used together are also related to each other.

In [1]:
# Import nltk module
# Download 'wordnet' package by using the nltk module
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.


True

In [2]:
# Import lesk library
# Import WordNet which is a NLTK corpus reader. 
## The nltk.corpus package defines a collection of corpus reader classes, which can be used to access the contents of a diverse set of corpora based on [4].
from nltk.wsd import lesk
from nltk.corpus import wordnet as wn

### **2- Ambiguation of the word _bank_**

In [6]:
# Create sample sentences 
bank_sents = ['I went to the bank to deposit my money.',
              'The river bank was full of dead fishes.']

# By using lesk algorithm, anaylze the first sentence and print the definition of the word "bank"
print( "=============== analyse sentence 1 =================\n")
print( "Context:", bank_sents[0])
answer = lesk(bank_sents[0], 'bank') 
print( "Sense:", answer)
print( "Definition:",answer.definition())

# Anaylze the second sentence and print the definition of the word "bank"
print( "\n=============== analyse sentence 2 =================\n")
print( "Context:", bank_sents[1])
answer = lesk(bank_sents[1].split(), 'bank', 'n')
print( "Sense:", answer)
print( "Definition:", answer.definition())

# For a general overview, print all definitions of the word "bank"
print( "\n=============== all definitions of \'bank\'===============\n")
for s in wn.synsets('bank'):
    print('\t', s, s.definition())



Context: I went to the bank to deposit my money.
Sense: Synset('savings_bank.n.02')
Definition: a container (usually with a slot in the top) for keeping money at home


Context: The river bank was full of dead fishes.
Sense: Synset('bank.n.09')
Definition: a building in which the business of banking transacted


	 Synset('bank.n.01') sloping land (especially the slope beside a body of water)
	 Synset('depository_financial_institution.n.01') a financial institution that accepts deposits and channels the money into lending activities
	 Synset('bank.n.03') a long ridge or pile
	 Synset('bank.n.04') an arrangement of similar objects in a row or in tiers
	 Synset('bank.n.05') a supply or stock held in reserve for future use (especially in emergencies)
	 Synset('bank.n.06') the funds held by a gambling house or the dealer in some gambling games
	 Synset('bank.n.07') a slope in the turn of a road or track; the outside is higher than the inside in order to reduce the effects of centrifugal fo

### **3- Ambiguation of the word *plant***

In [7]:
# Create sample sentences
plant_sents = ['The workers at the industrial plant were overworked.', 'The plant was no longer bearing flowers.']

# By using lesk algorithm, anaylze the first sentence and print the definition of the word "plant"
print( "=============== analyse sentence 1 =================\n")
print( "Context:", plant_sents[0])
answer_1 = lesk(plant_sents[0].split(),'plant','n')
print( "Sense:", answer_1)
print( "Definition:",answer_1.definition())

# Anaylze the second sentence and print the definition of the word "plant"
print( "\n=============== analyse sentence 2 =================\n")
print( "Context:", plant_sents[1])
answer_2 = lesk(plant_sents[1],'plant','n')
print( "Sense:", answer_2)
print( "Definition:",answer_2.definition())

# For a general overview, print all definitions of the word "plant"
print( "\n=============== all definitions of \'plant\'===============\n")
for s in wn.synsets('plant'):
    print('\t', s, s.definition())


Context: The workers at the industrial plant were overworked.
Sense: Synset('plant.n.03')
Definition: an actor situated in the audience whose acting is rehearsed but seems spontaneous to the audience


Context: The plant was no longer bearing flowers.
Sense: Synset('plant.n.02')
Definition: (botany) a living organism lacking the power of locomotion


	 Synset('plant.n.01') buildings for carrying on industrial labor
	 Synset('plant.n.02') (botany) a living organism lacking the power of locomotion
	 Synset('plant.n.03') an actor situated in the audience whose acting is rehearsed but seems spontaneous to the audience
	 Synset('plant.n.04') something planted secretly for discovery by another
	 Synset('plant.v.01') put or set (seeds, seedlings, or plants) into the ground
	 Synset('implant.v.01') fix or set securely or deeply
	 Synset('establish.v.02') set up or lay the groundwork for
	 Synset('plant.v.04') place into a river
	 Synset('plant.v.05') place something or someone in a certain po

### **4- Ambiguation of the word *fair***

In [8]:
# Create sample sentences
sntc_1 = "Everyone needs to be given a fair chance in the competition."
sntc_2 = "The annual fair in our city is next weekend."

# By using lesk algorithm, anaylze the first sentence and print the definition of the word "fair"
print( "=============== analyse sentence 1 =================\n")
print( "Context:", sntc_1)
answer_1 = lesk(sntc_1,'fair')
print( "Sense:", answer_1)
print( "Definition:",answer_1.definition())

# Anaylze the second sentence and print the definition of the word "fair"
print( "\n=============== analyse sentence 2 =================\n")
print( "Context:", sntc_2)
answer_2 = lesk(sntc_2,'fair', 'n')
print( "Sense:", answer_2)
print( "Definition:",answer_2.definition())

# For a general overview, print all definitions of the word "fair"
print( "\n=============== all definitions of \'fair\'===============\n")
for s in wn.synsets('fair'):
    print('\t', s, s.definition())


Context: Everyone needs to be given a fair chance in the competition.
Sense: Synset('fairly.r.02')
Definition: without favoring one party, in a fair evenhanded manner


Context: The annual fair in our city is next weekend.
Sense: Synset('fair.n.03')
Definition: a competitive exhibition of farm products


	 Synset('carnival.n.03') a traveling show; having sideshows and rides and games of skill etc.
	 Synset('fair.n.02') gathering of producers to promote business
	 Synset('fair.n.03') a competitive exhibition of farm products
	 Synset('bazaar.n.03') a sale of miscellany; often for charity
	 Synset('fair.v.01') join so that the external surfaces blend smoothly
	 Synset('fair.a.01') free from favoritism or self-interest or bias or deception; conforming with established standards or rules
	 Synset('fair.s.02') not excessive or extreme
	 Synset('bonny.s.01') very pleasing to the eye
	 Synset('fair.a.04') (of a baseball) hit between the foul lines
	 Synset('average.s.03') lacking exceptional

## **3- References**

- [1] https://www.nltk.org/api/nltk.html#nltk.wsd.lesk
- [2] http://www.nltk.org/howto/wsd.html
- [3] https://wordnet.princeton.edu/
- [4] https://www.nltk.org/howto/corpus.html



Copyright © 2021 IU International University of Applied Sciences