# **Word Sense Disambiguation**

Words can have different meanings in different contexts. Sometimes the intended
meaning of a word is hard to understand and leads to miscommunication. If a word has multiple meanings, this is called word
sense ambiguity. While solving syntactic ambiguity is done with part-of-speech (POS)
tagging, solving semantic ambiguity is done with word sense disambiguation (WSD).
The challenge is to semantically separate words by their meaning in context [[1]](#scrollTo=fPge5oRLQwid).

This notebook shows some basic WSD examples by using the following libraries:
* ``nltk``
* ``pywsd``

## **``nltk``**
``nltk``(Natural Language Toolkit) is an open source Python library for natural language processing. For more detail about ``nltk``, please refer to [[2]](https://www.nltk.org/api/nltk.html#nltk.wsd.lesk).

### Import libraries

#### Import ``wordnet``

``wordnet`` is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations [[3]](http://www.nltk.org/howto/wsd.html). 

In [None]:
# Import nltk module
import nltk

# Download 'wordnet' package by using the nltk module
nltk.download('wordnet')

# Import WordNet class by using the nltk.corpus package
from nltk.corpus import wordnet as wn

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


#### Import ``lesk``
The ``lesk`` algorithm is an example of a knowledge-based method and is based on contextual overlap of dictionary definitions. It works as follows: We identify the overlapping definitions (underlined in this example) based on the contextual overlap among our Wiktionary definitions referring to the various senses of the ambiguous words. The approach is based on the assumption that words used together are also related to each other [[1]](#scrollTo=fPge5oRLQwid).

In [None]:
# Import lesk library
from nltk.wsd import lesk

### Apply WSD for some words

#### Bank

In [None]:
# Create sample text 
text1 = ['I went to the bank to deposit my money.',
              'The river bank was full of dead fishes.']

# By using lesk algorithm, anaylze the first sentence and print the definition of the word "bank"
print( "=============== analyse sentence 1 =================\n")
print( "Context:", text1[0])
answer1 = lesk(text1[0], 'bank') 
print( "Sense:", answer1)
print( "Definition:",answer1.definition())

# Anaylze the second sentence and print the definition of the word "bank"
print( "\n\n=============== analyse sentence 2 =================\n")
print( "Context:", text1[1])
answer2 = lesk(text1[1].split(), 'bank', 'n')
print( "Sense:", answer2)
print( "Definition:", answer2.definition())

# For a general overview, print all definitions of the word "bank"
print( "\n\n=============== all definitions of \'bank\'===============\n")
for s in wn.synsets('bank'):
    print('\t', s, s.definition())



Context: I went to the bank to deposit my money.
Sense: Synset('savings_bank.n.02')
Definition: a container (usually with a slot in the top) for keeping money at home



Context: The river bank was full of dead fishes.
Sense: Synset('bank.n.09')
Definition: a building in which the business of banking transacted



	 Synset('bank.n.01') sloping land (especially the slope beside a body of water)
	 Synset('depository_financial_institution.n.01') a financial institution that accepts deposits and channels the money into lending activities
	 Synset('bank.n.03') a long ridge or pile
	 Synset('bank.n.04') an arrangement of similar objects in a row or in tiers
	 Synset('bank.n.05') a supply or stock held in reserve for future use (especially in emergencies)
	 Synset('bank.n.06') the funds held by a gambling house or the dealer in some gambling games
	 Synset('bank.n.07') a slope in the turn of a road or track; the outside is higher than the inside in order to reduce the effects of centrifugal 

#### Plant

In [None]:
# Create sample text
text2 = ['The workers at the industrial plant were overworked.', 'The plant was no longer bearing flowers.']

# By using lesk algorithm, anaylze the first sentence and print the definition of the word "plant"
print( "=============== analyse sentence 1 =================\n")
print( "Context:", text2[0])
answer1 = lesk(text2[0].split(),'plant','n')
print( "Sense:", answer1)
print( "Definition:",answer1.definition())

# Anaylze the second sentence and print the definition of the word "plant"
print( "\n\n=============== analyse sentence 2 =================\n")
print( "Context:", text2[1])
answer2 = lesk(text2[1],'plant','n')
print( "Sense:", answer2)
print( "Definition:",answer2.definition())

# For a general overview, print all definitions of the word "plant"
print( "\n\n=============== all definitions of \'plant\'===============\n")
for s in wn.synsets('plant'):
    print('\t', s, s.definition())


Context: The workers at the industrial plant were overworked.
Sense: Synset('plant.n.03')
Definition: an actor situated in the audience whose acting is rehearsed but seems spontaneous to the audience



Context: The plant was no longer bearing flowers.
Sense: Synset('plant.n.02')
Definition: (botany) a living organism lacking the power of locomotion



	 Synset('plant.n.01') buildings for carrying on industrial labor
	 Synset('plant.n.02') (botany) a living organism lacking the power of locomotion
	 Synset('plant.n.03') an actor situated in the audience whose acting is rehearsed but seems spontaneous to the audience
	 Synset('plant.n.04') something planted secretly for discovery by another
	 Synset('plant.v.01') put or set (seeds, seedlings, or plants) into the ground
	 Synset('implant.v.01') fix or set securely or deeply
	 Synset('establish.v.02') set up or lay the groundwork for
	 Synset('plant.v.04') place into a river
	 Synset('plant.v.05') place something or someone in a certain 

#### Fair

In [None]:
# Create sample text
text3 = ['Everyone needs to be given a fair chance in the competition.', 'The annual fair in our city is next weekend.']

# By using lesk algorithm, anaylze the first sentence and print the definition of the word "fair"
print( "=============== analyse sentence 1 =================\n")
print( "Context:", text3[0])
answer1 = lesk(text3[0].split(),'fair','n')
print( "Sense:", answer1)
print( "Definition:",answer1.definition())

# Anaylze the second sentence and print the definition of the word "fair"
print( "\n\n=============== analyse sentence 2 =================\n")
print( "Context:", text3[1])
answer2 = lesk(text3[1],'fair','n')
print( "Sense:", answer2)
print( "Definition:",answer2.definition())

# For a general overview, print all definitions of the word "fair"
print( "\n\n=============== all definitions of \'fair\'===============\n")
for s in wn.synsets('fair'):
    print('\t', s, s.definition())


Context: Everyone needs to be given a fair chance in the competition.
Sense: Synset('fair.n.03')
Definition: a competitive exhibition of farm products



Context: The annual fair in our city is next weekend.
Sense: Synset('fair.n.03')
Definition: a competitive exhibition of farm products



	 Synset('carnival.n.03') a traveling show; having sideshows and rides and games of skill etc.
	 Synset('fair.n.02') gathering of producers to promote business
	 Synset('fair.n.03') a competitive exhibition of farm products
	 Synset('bazaar.n.03') a sale of miscellany; often for charity
	 Synset('fair.v.01') join so that the external surfaces blend smoothly
	 Synset('fair.a.01') free from favoritism or self-interest or bias or deception; conforming with established standards or rules
	 Synset('fair.s.02') not excessive or extreme
	 Synset('bonny.s.01') very pleasing to the eye
	 Synset('fair.a.04') (of a baseball) hit between the foul lines
	 Synset('average.s.03') lacking exceptional quality or ab

## **``pywsd``**
``pywsd`` is a Python library that provides WSD functions as well as several variations of the Lesk algorithm [[1]](#scrollTo=fPge5oRLQwid).

For more detail about ``pywsd``, please refer to [[4]](https://pypi.org/project/pywsd/).

### Import libraries

#### Install Pywsd

In [None]:
pip install pywsd  

Collecting pywsd
  Downloading pywsd-1.2.4.tar.gz (26.8 MB)
[K     |████████████████████████████████| 26.8 MB 1.3 MB/s 
Collecting wn
  Downloading wn-0.9.1-py3-none-any.whl (75 kB)
[K     |████████████████████████████████| 75 kB 3.3 MB/s 
Building wheels for collected packages: pywsd
  Building wheel for pywsd (setup.py) ... [?25l[?25hdone
  Created wheel for pywsd: filename=pywsd-1.2.4-py3-none-any.whl size=26940436 sha256=e833ed6fc8f5eacbc2593128bcefb971428d43ae403fc7b24098478c4b1807ee
  Stored in directory: /root/.cache/pip/wheels/56/67/c0/6e6fa8456d1374b393328368316c3b33844cb4043bd225bc66
Successfully built pywsd
Installing collected packages: wn, pywsd
Successfully installed pywsd-1.2.4 wn-0.9.1


#### Install ``wn``
``wn`` is a new Python library for working with wordnets. Unlike previous libraries, ``wn`` is built from the beginning to accommodate multiple wordnets (for multiple languages or multiple versions of the same wordnet) while retaining the ability to query and traverse them independently. For more detail about the ``wn`` library, please refer to [[5]](https://pypi.org/project/wn/) and [[6]](https://aclanthology.org/2021.gwc-1.12/).



In [None]:
pip install wn==0.0.22

Collecting wn==0.0.22
  Downloading wn-0.0.22.tar.gz (31.5 MB)
[K     |████████████████████████████████| 31.5 MB 1.4 MB/s 
[?25hBuilding wheels for collected packages: wn
  Building wheel for wn (setup.py) ... [?25l[?25hdone
  Created wheel for wn: filename=wn-0.0.22-py3-none-any.whl size=31618484 sha256=1a21ed871c9266ec13dc810ed6f88051ba58e6063b31d97fd30e0c7ece5c2c8d
  Stored in directory: /root/.cache/pip/wheels/3d/0d/59/4b7902879d8cbad9bb73aaf0cc0a051edc1b18da983889c412
Successfully built wn
Installing collected packages: wn
  Attempting uninstall: wn
    Found existing installation: wn 0.9.1
    Uninstalling wn-0.9.1:
      Successfully uninstalled wn-0.9.1
Successfully installed wn-0.0.22


#### Import ``nltk``

In [None]:
import nltk
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')
nltk.download('punkt')

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

#### Import ``simple_lesk``

In [None]:
# simple_lesk returns the sense most suited to the given word as per the Simple LESK Algorithm
from pywsd.lesk import simple_lesk  

Warming up PyWSD (takes ~10 secs)... took 3.9495387077331543 secs.


### Apply WSD for some words

#### Bank

In [None]:
# Create sample text 
text1 = ['I went to the bank to deposit my money', 'The river bank was full of dead fishes']

# Anaylze the first sentence and print the definition of the word "bank"
print( "=============== analyse sentence 1 =================\n")
print ("Context-1:", text1[0])  
answer1 = simple_lesk(text1[0],'bank')  
print ("Sense:", answer1)  
print ("Definition : ", answer1.definition())  

# Anaylze the second sentence and print the definition of the word "bank"
print( "\n\n=============== analyse sentence 2 =================\n")
print ("Context-2:", text1[1])  
answer2 = simple_lesk(text1[1],'bank')  
print ("Sense:", answer2)  
print ("Definition : ", answer2.definition())  

#for s in wn.synsets('fair'):
#    print('\t', s, s.definition())


Context-1: I went to the bank to deposit my money
Sense: Synset('depository_financial_institution.n.01')
Definition :  a financial institution that accepts deposits and channels the money into lending activities



Context-2: The river bank was full of dead fishes
Sense: Synset('bank.n.01')
Definition :  sloping land (especially the slope beside a body of water)


#### Plant

In [None]:
# Create sample text 
text2 = ['The workers at the industrial plant were overworked.', 'The plant was no longer bearing flowers.']

# Anaylze the first sentence and print the definition of the word "plant"
print( "=============== analyse sentence 1 =================\n")
print ("Context-1:", text2[0])  
answer1 = simple_lesk(text2[0],'plant')  
print ("Sense:", answer1)  
print ("Definition : ", answer1.definition())  

# Anaylze the second sentence and print the definition of the word "plant"
print( "\n\n=============== analyse sentence 2 =================\n")
print ("Context-2:", text2[1])  
answer2 = simple_lesk(text2[1],'plant')  
print ("Sense:", answer2)  
print ("Definition : ", answer2.definition())  


Context-1: The workers at the industrial plant were overworked.
Sense: Synset('plant.n.01')
Definition :  buildings for carrying on industrial labor



Context-2: The plant was no longer bearing flowers.
Sense: Synset('plant.v.01')
Definition :  put or set (seeds, seedlings, or plants) into the ground


#### Fair

In [None]:
# Create sample text 
text3 = ['Everyone needs to be given a fair chance in the competition.', 'The annual fair in our city is next weekend.']

# Anaylze the first sentence and print the definition of the word "fair"
print( "=============== analyse sentence 1 =================\n")
print ("Context-1:", text3[0])  
answer1 = simple_lesk(text3[0],'fair')  
print ("Sense:", answer1)  
print ("Definition : ", answer1.definition())  

# Anaylze the second sentence and print the definition of the word "fair"
print( "\n\n=============== analyse sentence 2 =================\n")
print ("Context-2:", text3[1])  
answer2 = simple_lesk(text3[1],'fair')  
print ("Sense:", answer2)  
print ("Definition : ", answer2.definition())  


Context-1: Everyone needs to be given a fair chance in the competition.
Sense: Synset('honest.s.07')
Definition :  gained or earned without cheating or stealing



Context-2: The annual fair in our city is next weekend.
Sense: Synset('honest.s.07')
Definition :  gained or earned without cheating or stealing


# **References**

- [1] NLP and Computer Vision_DLMAINLPCV01 Lecture Book
- [2] https://www.nltk.org/api/nltk.html#nltk.wsd.lesk
- [3] http://www.nltk.org/howto/wsd.html
- [4] https://pypi.org/project/pywsd/
- [5] https://pypi.org/project/wn/
- [6] https://aclanthology.org/2021.gwc-1.12/


Copyright © 2022 IU International University of Applied Sciences